|Journal of Cyber Security |
Deep Learning Based Image Forgery Detection Methods
1Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science and Technology
2School of Computer Science, Nanjing University of Information Science & Technology, Nanjing, 210044, China
*Corresponding Author: Liang Xiu-jian. Email: firstname.lastname@example.org
Received: 01 May 2022; Accepted: 01 June 2022
Abstract: Increasingly advanced image processing technology has made digital image editing easier and easier. With image processing software at one’s fingertips, one can easily alter the content of an image, and the altered image is so realistic that it is illegible to the naked eye. These tampered images have posed a serious threat to personal privacy, social order, and national security. Therefore, detecting and locating tampered areas in images has important practical significance, and has become an important research topic in the field of multimedia information security. In recent years, deep learning technology has been widely used in image tampering localization, and the achieved performance has significantly surpassed traditional tampering forensics methods. This paper mainly sorts out the relevant knowledge and latest methods in the field of image tampering detection based on deep learning. According to the two types of tampering detection based on deep learning, the detection tasks of the method are detailed separately, and the problems and future prospects in this field are discussed. It is quite different from the existing work: 1) This paper mainly focuses on the problem of image tampering detection, so it does not elaborate on various forensic methods. 2) This paper focuses on the detection method of image tampering based on deep learning. 3) This paper is driven by the needs of tampering targets, so it pays more attention to sorting out methods for different tampering detection tasks.
Keywords: Digital image forensics; image tampering detection; deep learning; image splicing detection; copy-move detection
Digital images are an indispensable part of news reports, medical images, diplomatic justice, scientific research and other fields in the information age. People take digital images through digital devices and hope that the photos can truly record real scenes that happen in real life. However, with the development of multimedia technology and image processing technology, it becomes easier to process digital images. The hidden dangers of image security follow, which will undoubtedly bring negative effects to all aspects of society, resulting in a serious crisis of trust. In 2018, in an interview with the National Broadcasting Company (NBC), Russian President Vladimir Putin officially responded to the “bear riding image” that was widely circulated on the Internet: Fig. 1, left, Putin’s fake image of riding a bear is actually obtained after tampering with the real image of the horse on the right. It is worth noting that with the development of technology, image tampering for the purpose of transmitting false information has become more and more imperceptible, and can even be faked. Therefore, some important digital image application fields, such as news media, security detection, forensic forensics and other scenarios, should strengthen the detection of image authenticity, and timely find the traces of digital image tampering to ensure the authenticity of digital images.
AlexNet  was born in 2012. With the successful application of deep learning in the field of computer vision, since 2016, more and more researchers have tried to apply deep learning methods to the field of image forensics [2–4]. But compared with the tasks in the conventional computer vision field, there are big differences: 1) The recognition target is different: the image forensics field needs the model to recognize the tampered area of the image; 2) The statistical features are different: the image forensics task needs to pay attention to the subtle changes of the tampering boundary; 3) Post-processing effects are different: the image post-processing technology greatly damages the tampering clues of the image.
So far, many image forgery detection methods based on deep learning have emerged, and the performance of forensic methods has been greatly improved. In general, digital image forensics based on deep learning mainly has the following two tasks : 1) Tampering method detection: It is necessary to identify the tampering method of image content, mainly including splicing, copying-moving, computer generation, multiple 2) Location of tampered area: It is necessary to locate the tampered area in the false image, and there are two ways to output the content, one is output in the way of bounding box, the other is in the form of binary mask.
In contrast to some existing reviews on digital forensics [6–8], the classification perspective and the focus of our paper are quite different from the existing work: 1) This paper focuses on the image tampering detection problem, and thus does not dwell on various forensic methods including image traceability forensics and image tampering localization. 2) This paper focuses on the detection method of image tampering based on deep learning, and does not invest too much in traditional tampering detection methods. 3) This paper is driven by the demand of tampering targets, and therefore focuses more on the organization of methods for different tampering detection tasks rather than on the classification of deep network architectures.
2 Relevant Knowledge
With the rapid development of network communication and multimedia technology, the security risks of digital images are becoming more and more serious. Therefore, forensic research on digital image tampering becomes very important. At present, the modification methods of digital images  mainly include: image manipulation, which refers to the collection of all operations performed on digital images through computer software, also known as image editing. Image forgery, which is a subset of image processing, refers to the modification of images in order to convey deceptive information. Image tampering involves changing part of an image in order to hide an object in the scene, or to add a new object. The relationship between the three image modification methods is shown in Fig. 2.
2.1 Image Tampering Technology
To forensically investigate the tampering of digital images, we first need to understand what are the methods of tampering with digital images. A more comprehensive summary and classification of the tampering methods of digital images was made by Linna . In this paper, we combine the increasingly innovative tampering methods in recent years and classify the specific tampering methods of digital images into eight major categories. A brief description of each of these eight major categories of tampering methods is given below.
(1) Composited, a compositing operation is the process of combining parts of images into a single image to create an erroneous visual effect on the viewer.
(2) Re-touched, which mainly refers to the use of image editing tools to beautify, stretch and skin the content of the image, so as to achieve the purpose of hiding some important details of the image or repairing some broken images. It is widely used in more and more image editing tools such as Photoshop.
(3) Computer Generated, these are images that are generated in specialized software using computer code. With the progress of science and technology, computer-generated pictures can already reach the degree of falseness.
(4) Morphed, which is to gradually change an image into another image. We first find out the feature points between the two images, and then superimpose the two images with different weights to obtain different intermediate images, so as to obtain a tampered image with the features of both images.
(5) Enhanced, mainly by changing the brightness, light, contrast and color level of the image, in order to highlight some areas of the image. This method usually does not involve a change in the content of the image, but only to enhance the overall appreciation of the image.
(6) Painted, which is an image drawn by drawing software (such as Photoshop and CAD) or other drawing tools. Tamperers are good at using such images for some commercial promotional activities, which bring some trouble to people’s life.
(7) Rebroadcast, which refers to the use of photo acquisition tools to obtain new digital images by secondary acquisition of images that are needed but difficult to obtain. The images obtained after secondary acquisition can deceive people and be used by unscrupulous people to do improper things.
(8) Stego Image, which is to hide the image or text that needs to be transmitted or hidden in a carrier image, so that the transmitter or the witness cannot judge through the carrier image itself whether it has hidden information other than the image itself, thus achieving the purpose of secure transmission of secret information.
2.2 Image Forensic Technology
The forensic technology of digital image tampering is mainly to identify the authenticity, integrity and origin of the image by analyzing the characteristics of the digital image. In other words, the forensic technology of digital image tampering mainly judges whether the content of the image is real after the image is generated from the imaging device, whether the image has been tampered with, and what kind of device is it generated from.According to the analysis of some existing achievements, the forensic technology of digital image tampering is mainly divided into active forensics technology and passive forensics technology. The specific classification is shown in Fig. 3. Next, this article will introduce the classification and methods of image forensics in detail.
2.2.1 Image Active Forensics Technology
The main feature of active image forensics is the need to embed secret information in the image beforehand , and the receiver receives the image and then extracts a watermark or digital signature to determine whether the image has been tampered with by judging the condition of the watermark or signature.
In 1994, Schyndel defined the concept of “digital watermark” for the first time and proposed an encryption technique that embeds cryptographic information in images invisible to the human eye. Digital watermarking contains two main parts: watermark embedding module, and watermark extraction and verification module. As shown in Fig. 4:
The principle of digital signature technology is similar to that of digital watermark technology, and the digital signature-based image active forensics technology merges the image and its digest encryption to form a digital signature. When verifying the authenticity of an image, the abstract is extracted from the image and a digital signature is generated, and the image is judged by comparing the digital signature to determine whether it has been tampered with. The specific process is shown in Fig. 5.
However, in practical applications, due to various reasons, the watermark may not be embedded in the image in time, which will affect the use of the image. Therefore, the passive forensics technology of digital image tampering has become a research direction with great research value at home and abroad.
2.2.2 Image Passive Forensics Technology
Passive image forensics  is also known as digital image Blind Forensics, where the “blind” means that the forensics can be performed directly from the image without a pre-embedded digital watermark or signature, which is more widely applicable compared to active forensics.
Image traceability forensics refers to identifying the acquisition device of an unknown image. Lukas  et al. were the first to propose a pattern noise approach to identify the source of an image, and in the paper, source identification was performed for nine different devices with a recognition rate of 100%. Swaminathan et al. [16,17] used CFA (Color Filter Array) interpolation as a method and then used SVM to classify’ images generated by 19 different devices and the accuracy could reach 85%. san Choi  et al. extracted the linear distortion in the images for image sources for identification, and the accuracy can reach 92% by classifying three device images. The current artificial intelligence generated images are prevalent , and the main identification methods proposed earlier are: current identification methods based on imaging devices [19,20], geometric features  and statistical features [22,23].
Deep learning methods can regard passive image forensics as object detection problems and anomaly detection problems . In recent years, more scholars hope to take advantage of the self-adaptability of deep learning methods to enable deep learning models to automatically extract effective features. However, due to the fact that the data set is too small and the tampering methods are various, this is still the main problem of deep learning methods in images. Most of the current image tampering is to modify the image content, such as stitching [24,25], copy-move [26–28], image restoration [29,30], etc. The following will describe the data set evaluation indicators.
2.3 Data Set and Evaluation Indicators
2.3.1 Data Set
Collecting and constructing image datasets suitable for tampering localization tasks is not an easy task. In tasks such as image tampering detection, it is often possible to generate large amounts of data using programs that batch process images, but it is difficult to obtain a high-quality dataset of tampered images using similar methods. This is because the images in the dataset are supposed to objectively reflect the actual tampering situation, which requires that the modifications made in the original image should indeed distort its semantics and that the resulting image should not contain obvious visual anomalies. At the same time, to assist in the training of the classifier, corresponding pixel-level labels need to be provided for each tampered image. This in turn makes it difficult to directly collect a potentially large number of tampered images in the network as a dataset. In summary, a more desirable approach to construct tampered image datasets is to generate tampered images manually under controlled conditions . Currently, some publicly available tampered image datasets exist, and the relevant information is summarized in Tab. 1.
It is worth noting that deep learning-based approaches require high data size, and data volumes in the tens of thousands. Therefore, in some deep learning-based image tampering localization efforts other datasets are also used to automate the generation of a large number of tampered images as training data [47,48], but the quality of such automatically generated tampered images is not high. As far as the available literature is concerned, several datasets such as MSCOCO , Deresden , ImageNet , MITPlaces , and SUN  are commonly used as raw material for automatically generated tampered images.
2.3.2 Evaluation Indicators
As mentioned earlier, image tampering localization is actually a pixel-level binary classification problem. Therefore, the performance of the tampering model can be measured by the commonly used classification evaluation metrics. The commonly used evaluation metrics include Accuracy (ACC), F1-score, Area Under the Curve (AUC), Matthews Correlation Coefficient (MCC), and Intersection over Union (IoU). Intersection over Union (IoU), etc.
3 Splicing Tampering Detection Based on Deep Learning
3.1 Splicing Detection Based on Single Tampering
The image stitching operation refers to stitching a part of the donor image into the source image to generate a new tampered image. Compared with other image content tampering detection, image stitching detection is simpler, because different images have different feature information, the comparison between the stitched area and the real area is usually obvious, and there are relatively more features that can be used.
In 2016, Zhang et al.  applied deep learning techniques to image passive forensics for the first time and proposed a deep learning image forensics method based on Daubechies wavelet features. The tampered region was relatively roughly localized and the recognition accuracy was low. Long et al.  had proposed a full convolutional network for semantic segmentation task in 2015, which achieved pixel-level classification. Inspired by this, in 2017, Salloum et al.  proposed a multi-task image passive forensic framework (MFCN) based on edge reinforcement for pixel-level tampered region segmentation.
Since image stitching and tampering is to stitch together two different image regions, how to distinguish the source of the donor image is a key issue. In 2021, Niu et al.  proposed an end-to-end system for stitching detection and localization of Double-JPEG images. It can also distinguish regions from different donor images. The proposed method can work in a wide variety of settings, including aligned and unaligned dual JPEG compression, with superior performance compared to baseline methods working under similar conditions.
Almost all of the above methods use deep network models, and these methods require dense pixelated image data to train the network. On the one hand, it is impractical to construct a training set to represent the myriad of tampering possibilities. On the other hand, this method is often limited in social media platforms or commercial applications. In 2022, Agrawal et al.  proposed a method of Self-Supervised Image Signature Learning (SISL) to train a splice detection localization model from image frequency transformation, as shown in Fig. 6. Experiments demonstrate that the model can produce performance similar to or better than multiple existing methods on standard datasets without relying on labels or metadata.
3.2 Splicing Detection Based on Constraint Image
Due to the complexity and diversity of current tampering techniques, it is difficult to extract effective generic features from a single image for learning, so Wu et al.  extended the stitching detection task: the original single tampered image detection task was extended to a source and donor image similarity matching task called constrained image stitching detection task (CISD).
Yue Wu et al. proposed a structure called deep matching verification network (DMVN) . It is worth mentioning that the authors introduce the Attention idea at the end of the framework to extract the tampered region features again for visual consistency verification based on the obtained mask, which further improves the region segmentation accuracy. Although the accuracy of detection methods based on Convolutional Neural Network (CNN) has been improving, the performance of existing detection methods is still unsatisfactory. 2019 Bi et al.  proposed a Ringed Residual U-Net (RRU-Net) based on the existing U-Net , as shown in Fig. 7.The residual propagation recalls the feature information of the source and donor images to solve the gradient degradation problem in the deep network; the residual feedback consolidates the input feature information to make the similarity of image attributes between the source and donor image tampered regions more obvious. And the F1-score reaches 84.1% on CASIA v2, and the F1-score performance is 91.5% on COLUMB .
Inspired by the Deep Matching Verification Network (DMVN)  of Yue Wu et al., Liu et al.  proposed a Deep Matching Model (DAMC) based on atrous convolution in 2019, which also uses tampered images and feeds. volume image as input. The model is based on the GAN  framework and is divided into DMAC network, category detection network and area localization network. Compared with DMVN, the model further improves the recognition accuracy.
4 Copy-move Tampering Detection Based on Deep Learning
An image copy-paste operation refers to copying an area of an image and pasting it into the same image. Often, copy-paste operations are used to mask an area in an image to make it difficult to distinguish between real and fake. This tampering method is the same as stitching, both tampering with the content of the image, but the detection difficulty is much higher than that of stitching technology. Because the copy-paste operation is an internal operation of the same image, the real area and the tampered area are very similar in statistical properties, so the inherent properties of the imaging device and most of the image statistical characteristics cannot be used. Currently, image copy-paste detection techniques can be divided into two categories: 1) based on region boundary artifacts, and 2) based on region similarity.
4.1 Copy-move Detection Based on Region Boundary
The image after copy-paste operation usually has boundary artifacts between the tampered area and the boundary of the real area, which is very different from the real image. The detection method based on boundary artifact is to use convolutional network to extract image boundary information, and then classify it through machine learning classifier.
In 2016, Rao et al.  applied the deep learning method to the copy-paste operation detection task for the first time. Experiments show that this method can effectively learn boundary artifact features, capture boundary abnormal information, and achieve high classification accuracy. In 2017, Ouyang et al.  proposed a copy-paste detection method based on convolutional neural network. Since the amount of copy-paste dataset is too small, the model is first pre-trained on ImageNet, and then the network parameters are fine-tuned using smaller copy-paste training samples, and the final model achieves true and false image classification.In 2020 Kumar et al.  used deep semantic image painting and copy-move forgery algorithms to create a synthetic forgery dataset. And use an unsupervised domain adaptation network to detect copy-move forgery behaviors in new domains by mapping the feature space of the synthesized datasets, improving the F1-scores of the CASIA and CoMoFoD datasets to 80.3% and 78.8%, respectively.
Although the method based on region boundary artifact is more in line with human visual habits, it is very difficult for deep learning networks to extract such small boundary artifact information. Therefore, the recognition model using this method can only complete the true and false classification of images, and cannot achieve pixel-level region segmentation.
4.2 Copy-move Detection Based on Regional Similarity
The essence of the copy-stitch operation is to copy an area of an image and paste it into the same image. The image generated by this tampering method must contain two identical regions, so researchers propose a detection method based on regional similarity, which is very similar to the Constrained Image Splicing Detection (CISD) task in the splicing detection problem.
In 2018, Wu et al.  proposed an image copy-paste forgery detection framework. This method realizes pixel-level copy-paste task detection for the first time, and the recognition accuracy exceeds that of traditional detection methods. Soon Wu et al.  further extended this framework, combining the strengths of boundary artifact-based methods and region similarity methods, and proposed BusterNet, which can detect source targets and tampered targets, as shown in Fig. 8. The model is divided into Mani-Det branch and Simi-Det branch. It is worth noting that the model fuses the features of the Mani-Det branch and the Simi-Det branch, and then classifies the two similar regions at the pixel level to accurately predict the source target and the tampered target. Experiments show that the method has good robustness and achieves the best results on multiple datasets.
However, the above parallel deep neural network approach still suffers from 1) the necessity to ensure that both branches correctly locate regions; 2) the Simi-Det branch only uses VGG16 with four pooling layers to extract single-level and low-resolution features. In 2021, Chen et al.  innovatively proposed a serial deep neural network approach by introducing two successively constructed sub-networks: the replication-move similarity detection network (CMSDNet) and source/target region differentiation network (STRDNet) ensured the correct identification of both branches; by removing the last pooling layer in VGG16, atrous convolution was introduced to preserve the field of view of the filter after removing the fourth pooling layer; STRDNet was designed to obtain similar regions directly from CMSDNet that are identified as tampered and untampered regions at the image level.
This paper summarizes the image forgery localization method based on deep learning. In particular, we comb these methods by the network architecture they use. It can be seen that different network architectures have their own characteristics and advantages, which provide a variety of choices for the design of tamper location methods for different specific problems. Deep learning technology is still developing, which brings a lot of challenges and opportunities to image forgery positioning. This paper also introduces datasets and performance evaluation metrics commonly used in image forgery localization, and discusses current issues and some possible research directions in this field. This helps readers to fully grasp the research trend in the field of image forgery location.
Looking forward to the future, we will continue to explore the long road of fighting against image tampering and forgery, and constantly enrich the technical equipment library of digital image forensics to escort the security of multimedia information.
Funding Statement: This work is supported by Key Projects of Innovation and Entrepreneurship Training Program for College Students in Jiangsu Province of China (202210300028Z).
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|