As one of the most popular digital image manipulations, contrast enhancement (CE) is frequently applied to improve the visual quality of the forged images and conceal traces of forgery, therefore it can provide evidence of tampering when verifying the authenticity of digital images. Contrast enhancement forensics techniques have always drawn significant attention for image forensics community, although most approaches have obtained effective detection results, existing CE forensic methods exhibit poor performance when detecting enhanced images stored in the JPEG format. The detection of forgery on contrast adjustments in the presence of JPEG post processing is still a challenging task. In this paper, we propose a new CE forensic method based on convolutional neural network (CNN), which is robust to JPEG compression. The proposed network relies on a Xception-based CNN with two preprocessing strategies. Firstly,unlike the conventional CNNs which accepts the original image as its input, we feed the CNN with the gray-level co-occurrence matrix (GLCM) of image which contains CE fingerprints, then the constrained convolutional layer is used to extract high-frequency details in GLCMs under JPEG compression, finally the output of the constrained convolutional layer becomes the input of Xception to extract multiple features for further classification. Experimental results show that the proposed detector achieves the best performance for CE forensics under JPEG post-processing compared with the existing methods.
As the image editing techniques rapidly developed and the media processing software improves, some malicious users can generate forged images easily with powerful editing tools such as Photoshop, etc. In addition, social networking has accelerated the dissemination of forged images which may cause detrimental effect on our society. Hence, answering the originality and integrity of images becomes increasingly fundamental. In order to determine whether the image is modified and understand what has happened during the tampering, many kinds of forensic techniques for detecting different types of image manipulations were invented.
Contrast enhancement (CE), one of the most popular and efficient image processing operations, is frequently used by malicious image attackers to conceal traces of forgery. CE can improve the visual quality of the forged images by eliminating inconsistent brightness of tampered images. As a consequence, verifying the authenticity and integrity of digital images in CE forensics has always been of great interest for image forensics community.
Many techniques have been proposed to detect contrast-enhanced image in recent years. Earlier traditional CE forensic methods are based on histogram. Stamm et al. [
With the development of deep learning-based technique, many deep learning-based methods [
However, all these methods are failed to detect CE in images which are contrast enhanced with JPEG post-processing. As JPEG is the most common image format, contrast-enhanced (CE) images are distributed in compressed formats to reduce the overheads of both storage and network traffic in practice. The previous effective CE forensic fingerprints in the uncompressed images are modified after JPEG compression, so it is much difficult to classify CE images. In this case, handcrafted feature-based approach is difficult to capture effective fingerprints. Within a data-driven learning framework, Barni et al. [
In order to improve the JPEG-robustness of CE forensic method over a range of Quality Factors (QFs), we propose a novel Xception-based CNN detector. The Network is directly fed with the GLCM of image instead of pixels, which suppresses the interference of the image content. Furthermore, it contains traceable features for CE forensics tasks. Since JPEG compression reduced the artifacts features of GLCM, making it becomes more difficult to detect CE, we add a constrained convolutional (Cons-conv) layer [
The remaining part of the paper is organized as follow. In Section 2, we will describe the proposed CNN-based network architecture. Experimental settings and results will be present in Section 3. Section 4 provides the conclusion.
In traditional computer vision research, classification tasks tend to learn features from image content. While in the manipulation forensics research, tasks tend to extract traces left by image operations instead of image content, in that case image content becomes redundant information. Therefore, the traditional CNN cannot be directly applied to the task of image forensics. To solve this problem, the network usually added preprocessing layers.
As presented in
Xception is a deep convolution neural network structure inspired by Inception [
Detailed introduction and analysis of the network is described as follows.
As mentioned earlier, the existing traditional CNN tends to learn features related to the content of the image. When CNN of this form is directly used for image manipulation forensics, it will cause the classifier to identify the scene content associated with the training data. CE just changes the brightness and contrast of an image, rather than the image content. Therefore, we use the GLCM as a preprocessing measure instead of using the image directly.
A gray-level co-occurrence matrix [
where
When CE is executed, the brightness range in the image will contract or expand. Therefore, the previous adjacent pixel values will be mapped to the same value or mapped separately. This results in some peak or empty rows and columns in the GLCM of the CE image. GLCM is second-order statistics of an image, which has more information than histogram, and it always has the same size, even for different resolutions input images. As shown in
Hence, we use the GLCM of the image as the input of the network, which suppresses the interference caused by image content and also contains traceable fingerprints left by CE.
Under the case of JPEG post-processing, it cannot get the expected performance when directly use GLCM as the input of traditional CNNs. The reason may be attributed to that the gaps and peaks of GLCM of a CE image have reduced after JPEG operation, it causes tremendous loss of CE fingerprints in GLCM, therefore the difference between the CE images and CE with JPEG post-processing (CE-JPEG) images is narrowed. As shown in
The constrained convolutional layer [
In order to validate the constrained convolutional layer can actually capture high-frequency components from the input GLCM, the Cons-conv layer’s feature maps are displayed in
The constrained convolutional layer can adaptively extract high-frequency components and amplify the details of GLCM. It can enlarge the difference between two kinds of images. The corresponding experimental results showed in Section 3.2.
In the experiments, 10000 raw images were obtained from BossBase v1.01 [
To obtained JPEG compression datasets, we selected QF = {50, 70, 90, 95} to compress ORG respectively to get ORG-50, ORG-70, ORG-90, ORG-95. Do the same processing for ORG-CE to get ORG-CE-50, ORG-CE-70, ORG-CE-90, ORG-CE-95. We also consider ORG/ORG-CE compressed with all three QFs 50, 70 and 90, then get ORG-Mix and ORG-CE-Mix. Finally, we get four sets of corresponding experimental data.
For each positive-negative set, we randomly selected 16000 ORG-JPEG images and 16000 ORG-CE-JEPG images for training set, 2000 ORG-JPEG images and 2000 ORG-CE-JPEG images as validation set, then assigned the remaining 2000 pairs as the test dataset. All the grayscale GLCM image with a size of 256 × 256 is fed into the network, the maximum gray level is 255. The final fully-connected layer applied with 2048 input neurons, and 2 output neurons, followed by a softmax layer.
All the experiments trained on a machine equipped with a GPU card of type GeForce RTX 2080 manufactured by Nvidia. We set the batch size for training and testing to 32, and maximum iteration is setting to 30 epochs. The training parameters of the Adam solver were set as follows: momentum is 0.99, the learning rate is initialized to 0.001 and multiplied by 0.1 after 10 epochs, as the training time increased, the learning rate decreased gradually.
In this section, we evaluate the performance of our proposed method for CE forensics under JPEG compression. We also compared with other CNN-based detectors, MISLnet [
The detection results of the experiments are presented in
Method | Q = 50, 70, 90 | Q = 50 | Q = 70 | Q = 90 | Q = 95 |
---|---|---|---|---|---|
SRM [ |
0.5892 | 0.5675 | 0.5680 | 0.6148 | 0.6648 |
MISLnet [ |
0.4952 | 0.4905 | 0.4848 | 0.4700 | 0.5080 |
SunNet [ |
0.6205 | 0.6315 | 0.6410 | 0.7053 | 0.8547 |
BarniNet [ |
0.5003 | 0.5523 | 0.5668 | 0.6577 | 0.7076 |
Proposed |
In this section, to verify the feasibility of using GLCM and Cons-conv layer to achieve feature extraction and classification, we test the performance of the proposed model when there is only one preprocessing that means method using GLCM or Cons-conv.
The results of the experiments are shown in
Principal component analysis (PCA) is a helpful algorithm in statistical signal processing because it can reduce the dimensionality of datasets for purposes such as data interpretation.
Method | Q = 50, 70, 90 | Q = 50 | Q = 70 | Q = 90 | Q = 95 |
---|---|---|---|---|---|
Without GLCM | 0.5305 | 0.5332 | 0.5340 | 0.5450 | 0.6076 |
Without Cons-conv | 0.6617 | 0.6552 | 0.6572 | 0.7308 | 0.8693 |
Proposed |
In this paper, we propose a novel Xception-based network to cope with the challenging task which is the detection of contrast adjusted images in the presence of JPEG post-processing. The design of the overall framework makes the network more robust to CE detection when JPEG post-processing exist. We use the GLCM of image as the input of the network, GLCM can suppress the interference of the image content and contain the trace of CE forensic features. By adding a constrained convolutional layer in front of Xception, the high-frequency components in GLCMs under JPEG compression extracted, then feed it into the Xception. As a powerful CNN, Xception significantly extracts higher-level manipulation detection features for further classification. Experimental results show that the performance of proposed method has greatly improved compared with the existing method, it also demonstrates that GLCM-based preprocessing plays an important role in the improvement of CE forensics under the JPEG post processing.