Text-Independent Algorithm for Source Printer Identification Based on Ensemble Learning

Naglaa F.; Mohamed Taha; Hala Zayed

doi:10.32604/cmc.2022.028044

Open Access icon Open Access

ARTICLE

Text-Independent Algorithm for Source Printer Identification Based on Ensemble Learning

Naglaa F. El Abady^1,*, Mohamed Taha¹, Hala H. Zayed^1,2

1 Department of Computer Science, Faculty of Computers and Artificial Intelligence, Benha University, 13518, Egypt
2 School of Information Technology and Computer Science (ITCS), Nile University, 12677, Egypt

* Corresponding Author: Naglaa F. El Abady. Email: email

Computers, Materials & Continua 2022, 73(1), 1417-1436. https://doi.org/10.32604/cmc.2022.028044

Received 31 January 2022; Accepted 12 April 2022; Issue published 18 May 2022

Abstract

Because of the widespread availability of low-cost printers and scanners, document forgery has become extremely popular. Watermarks or signatures are used to protect important papers such as certificates, passports, and identification cards. Identifying the origins of printed documents is helpful for criminal investigations and also for authenticating digital versions of a document in today’s world. Source printer identification (SPI) has become increasingly popular for identifying frauds in printed documents. This paper provides a proposed algorithm for identifying the source printer and categorizing the questioned document into one of the printer classes. A dataset of 1200 papers from 20 distinct (13) laser and (7) inkjet printers achieved significant identification results. A proposed algorithm based on global features such as the Histogram of Oriented Gradient (HOG) and local features such as Local Binary Pattern (LBP) descriptors has been proposed for printer identification. For classification, Decision Trees (DT), k-Nearest Neighbors (k-NN), Random Forests, Aggregate bootstrapping (bagging), Adaptive-boosting (boosting), Support Vector Machine (SVM), and mixtures of these classifiers have been employed. The proposed algorithm can accurately classify the questioned documents into their appropriate printer classes. The adaptive boosting classifier attained a 96% accuracy. The proposed algorithm is compared to four recently published algorithms that used the same dataset and gives better classification accuracy.

Keywords

Document forensics; source printer identification (SPI); HOG; LBP; principal component analysis (PCA); bagging; AdaBoost

Cite This Article

APA Style

El Abady, N.F., Taha, M., Zayed, H.H. (2022). Text-Independent Algorithm for Source Printer Identification Based on Ensemble Learning. Computers, Materials & Continua, 73(1), 1417–1436. https://doi.org/10.32604/cmc.2022.028044

Vancouver Style

El Abady NF, Taha M, Zayed HH. Text-Independent Algorithm for Source Printer Identification Based on Ensemble Learning. Comput Mater Contin. 2022;73(1):1417–1436. https://doi.org/10.32604/cmc.2022.028044

IEEE Style

N. F. El Abady, M. Taha, and H. H. Zayed, “Text-Independent Algorithm for Source Printer Identification Based on Ensemble Learning,” Comput. Mater. Contin., vol. 73, no. 1, pp. 1417–1436, 2022. https://doi.org/10.32604/cmc.2022.028044

BibTex EndNote RIS

Copyright © 2022 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Text-Independent Algorithm for Source Printer Identification Based on Ensemble Learning

Abstract

Keywords

Cite This Article

2293

1507

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link