Image Denoising with GAN Based Model

: Image denoising is often used as a preprocessing step in computer vision tasks, which can help improve the accuracy of image processing models. Due to the imperfection of imaging systems, transmission media and recording equipment, digital images are often contaminated with various noises during their formation, which troubles the visual effects and even hinders people’s normal recognition. The pollution of noise directly affects the processing of image edge detection, feature extraction, pattern recognition, etc., making it difficult for people to break through the bottleneck by modifying the model. Many traditional filtering methods have shown poor performance since they do not have optimal expression and adaptation for specific images. Meanwhile, deep learning technology opens up new possibilities for image denoising. In this paper, we propose a novel neural network which is based on generative adversarial networks for image denoising. Inspired by U-net, our method employs a novel symmetrical encoder-decoder based generator network. The encoder adopts convolutional neural networks to extract features, while the decoder outputs the noise in the images by deconvolutional neural networks. Specially, shortcuts are added between designated layers, which can preserve image texture details and prevent gradient explosions. Besides, in order to improve the training stability of the model, we add Wasserstein distance in loss function as an optimization. We use the peak signal-to-noise ratio (PSNR) to evaluate our model and we can prove the effectiveness of it with experimental results. When compared to the state-of-the-art approaches, our method presents competitive performance.


Introduction
Noise often appears in the image as isolated pixels or pixel blocks that causes strong visual effects. Generally, the noise signal is irrelevant to the object to be studied. It appears in the form of useless information, disturbing the observable information of the image, which is a great obstacle to image processing. Therefore, image denoising is a very promising task in graphics, whose purpose is to restore original signal and preserve details from the noisy image.
In recent decades, various image denoising methods, no matter based on traditional methods or machine learning methods, have been proposed. The current traditional algorithms can be roughly divided into two categories, namely spatial noise reduction and frequency noise reduction. Based on the rule that the sum of random noise in the spatial domain is zero, the former method obtains a new center pixel by calculating other adjacent pixels in the gray space within a certain size receptive field, thereby offsetting the image noise. Differently, the latter method first transforms the image from the spatial domain to the frequency domain, and then divides the noise into high, medium and low frequency noise. In this way, the noise of different frequencies can be separated. However, many traditional filtering methods are poor targeting, not having optimal expression and adaptation for specific images. Moreover, in the process of noise reduction, details or edge information of images are often lost.
Recent studies regarded that deep learning has revolutionized the computer vision and pattern recognition community. Among them, Convolutional Neural Networks (CNN) have been demonstrated that have excellent performance in image processing tasks, especially in the applications of image recognition, object detection, lossy image compression, image super-resolution and image denoising. Besides, Generative Adversarial Networks (GAN) [1] have generally been applied to image processing after it was proposed. Many summaries of training techniques and articles for improving the model itself gradually began to be published in the second half of 2015 and early 2016, which greatly stimulated the subsequent development of GAN, especially in the application of data generation and other related issues.
Inspired by the idea mentioned above, we research GAN and its variants, and then propose a novel model for generating denoised images when it is inputted with noisy images. The model aims to construct generative adversarial networks to learn how to offset the picture noise, which can be divided into a generator and a discriminator. Specially, the architecture of our proposed generator consists of two parts: encoder and decoder [2]. The encoder adopts convolutional neural networks to extract features, while the decoder outputs the noise in the images by deconvolutional neural networks. Shortcuts are added between designated layers which is used in Res-Net [3] for preserving image texture details and preventing gradient explosions. We add Wassertein distance in loss function as an optimization to improve the training stability of the model. We validate our method using 100 screen-shots and showed that it has competitive performance compared to the state-of-the-art approaches.
The remaining of this article is structured as follows: In Section 2, we introduce the current research situation of image denoising with neural networks. In Section 3, we present the proposed GAN based networks model. Finally, we evaluate the experimental results with peak signal-to-noise ratio (PSNR) and draw a conclusion in Section 4.

Related Work
This section briefly describes the existing research in image denoising and related deep learning techniques.

Image Denoising
One can model the observed noisy image as the superposition of two images -one corresponding to noisy and the other corresponding to the clear background image. Hence, the input image can be expressed as = + , where is the non-noisy signal, and N is the noise signal.
At present, Non-Local Means (NLM) [4] and Block Method of 3-Dimension (BM3D) [5] have high usage rates. The NLM algorithm estimates the position of the center point of the reference block by performing weighted average operations on the self-similar structure blocks, thereby reducing noise. The most serious drawback of NLM is that the details in the image are removed along with the noise. The BM3D takes usage of the similarity between image blocks, achieving higher signal-to-noise ratio and excellent visual effects. The shortcoming of this method is the high time complexity.
Additionally, Gu et al. [6] proposed a new idea. They use the set of similar patches to evaluate whether there is noise in the image. Elad et al. [7] proposed a method named K-SVD. The method removes noise with a sparse dictionary. Liu et al. [8] raised a novel parallel method for denoising and deduplicating mass web documents.

CNN Based Image Processing
The first complete convolution neural network model is raised by Lecun et al. [9]. The model of multilevel deep learning has the basic construction of CNNs. Szegedy et al. presents a new complex CNN [10]. They propose a new construction named Inception using dense matrix to approximate sparse matrix. In addition, enlightened by network in network [11], they use 1x1 convolution before traditional convolution to reduce parameters of calculation. Taigman et al. present a CNN method for face alignment and weight non-sharing convolution layers to realize face recognition [12]. Considering that statistics features in different face area are not same. They use local-convolution in high layers instead of traditional convolution.
In recent years, the advances in computer hardware have provided the soil for the development of deep learning. There are plenty of researchers had tried to do image processing using convolutional neural network. One class of methods is used CNN to do denoising on blind image. Lerenhan et al. [13] proposed a data-driven priori denoising method. Nithish et al. [14] proposed a novel CNN-based model. The model consists of a multi-scale feature extraction layer, a lp regularizer, and a three-step training approach.

Generative Adversarial Networks
Goodfellow et al. [1] proposed a novel adversarial network for image generation, which has had a profound impact on deep learning. The network provides a new idea for image processing. Scott et al. [15] proposed a novel model built on generative adversarial networks. The author believes that the excellent performance of GANS in image processing can be applied to text modeling. Liu et al. [16] applied multiscale and multi-class conditions to GANS to generate handwritten characters.
As we all know, at present, the best model for image processing applications in deep learning is CNN. DCGAN [17] is one of the best attempts at combining CNN with GAN, and Yan et al. use DCGAN to do image super-resolution, denoising and deconvolution [18]. However, although DCGAN has an excellent architecture. It still does not solve the problem from the source root for the training stability of GAN. During the DCGAN training process, it is still necessary to carefully balance the G and D.
Different from DCGAN, WGAN [19] has made an important contribution to improving the stability of training of GANs through the loss function, which can get good performance on a fully connected layer. WGAN proposed using Wassertein Distance as an optimization training method. In order to meet the conditions for using the Wassertein distance, the network needs to clip weights in a range to force the Lipschitz continuity. WGAN-GP [20] further enhances the stability of GANs.
Related research on denoising and neural networks is continuing. In this paper, we propose a novel image denoising model, which has achieved competitive results.

Methodology
In this part, we first introduce the overview of the proposed model, and then describe the structure and loss functions in detail.

Overview of the Proposed Model
The proposed model consists of a generator network and a discriminator network. Fig. 1 shows the architecture of the model we proposed. The upper part shows the framework of generator, and the lower one shows the framework of discriminator.
The generator network consists of 12 blocks. The second and fourth convolutional block are connected with the 10th and 8th deconvolution block respectively. It means that the intermediate calculation results obtained by the second and fourth convolutional layer are sent to the 10th and 8th deconvolution layer. Finally, the characteristic map obtained by the 12th layer is subtracted from the noise map to obtain an output result. The discriminator network is composed of 6 convolution units, with the picture processed by the generator as input. The discriminator network scores the picture and the score is output.

Structure of Generator Network
Inspired by the symmetrical structure of U-net, we propose a novel symmetrical encoder-decoder based generator network. The network consists of 12 blocks, divided into an encoder and a decoder. In figure 1, the encoder consists of 6 blue blocks, while the decoder consists of 6 gray blocks. The input of the generator network are noisy images and the output are denoised images.
The encoder extracts image features in the noisy images, and then the decoder outputs the noise in the images according to the features. Instead of adding fully-connected layer to the encoder and decoder, we use max pooling to reduce the dimensions of feature maps in the encoder. And to repair the dimensions in the decoder, we use upsampling. In order to preserve as much image texture details as possible, shortcuts are added between the encoder and the decoder.
The final objective of the prediction of U-net is to approximate the labeled color map. The loss function contains the pixel-wised-soft-max [7] of the whole image to describe the deviation of each pixel: According to the deviation, the loss function uses cross entropy to penalize the network:

Objective Function
Tradition GAN consists of a generator G and a discriminator D. GAN always uses a mini-max object function to guide the confrontational training between G and D: where Pg is the distribution of the generated data by G, and Pr is the distribution learned by D. Pg is defined by In the original GAN, the input z is some random noise in noisy images.
One of the most prominent problems of GAN is that the training of GAN is more difficult than other neural networks. In order to overcome this shortcoming, Wasserstein GAN (WGAN) is proposed. The most important innovation of WGAN is the modified object function [21]: Besides, Gulrajani et al. [20] further improved WGAN. They proposed to use a gradient penalty algorithm GP to replace the gradient clipping algorithm in WGAN. The improved gradient clipping algorithm reduces the risk of failure of converge or other undesired behaviors of GAN. In order to ensure the reliability and stability when training the model, the GP algorithm is introduced into the object function: During the training process, the model is minimizing the value of the above formula, which means that the output of the generator is close to non-noisy images; and the discriminator network expects the target value to be maximized, to predict the authenticity of the input more and more accurately.

Denoising with GAN
The flow chart of two kinds of GANS are shown in Fig. 2. In our model, G receives a noisy image x, learns the characteristics of noise and depresses the noise. Finally, G generates a picture G(x) without noise. The output D(x) represents the confidence whether x is not generated by G. The higher the D(x) is, the higher the confidence that the picture is a real image. Conversely, the lower the D(x) is, that x is less likely to be a real image. In the training process, the clean images and the denoised images generated by the generator are used as the input of discriminator network, and the discriminant results generated by the two types of pictures are respectively added to the object function. We truncate the generator discriminant result and calculate the overall mean of the truncated matrix to prevent numerical divergence: where loss_real is obtained from clean pictures, loss_fake is obtained from denoised images, ai,j and bi,j are the elements in the discriminant result matrix, which are the truncated value.
In order to prevent gradient disappearance and gradient explosion, we also add gradient penalty to the loss function of D: where G(x) is the image generated by G, input is the clean image corresponding to G(x), μ is a parameter in the constructed model, and A is a result obtained by M passing through the discriminator network.
Built on the structure of DCGAN, we replace the G and D network with CNN since convolution layer has an excellent effect on either image feature extraction or image production. Convolutional layers with weight sharing reduce the parameters that are required to be trained, and obtain powerful image feature mapping. In G network, we first process image data through a set of convolutional layers in order to learn image features. Then G network uses a set of deconvolution layers to reconstitution images. In D network, multiply convolution layers acquire knowledge among images generated by G network and those real ones. The generator network can be defined as follows: where CBL(K) refers to a block which consists of a convolutional layer with K kernels, a BN layer and a LReLU layer [22]. In RBL(K), the convolutional layer is replaced with a residual layer.
The discriminator network can be defined as follows: where CL(K2) refers to a block which consists of a convolutional layer with K2 kernels, and a LReLU layer. CBR(K2) refers to a block composed of a convolutional layer with K2 kernels, a BN layer and a ReLU layer.
The LReLU used in the G network can be defined as follows: x a x x y (13) In the D network, apart from use as LReLU, we also use ReLU as activation function:

Experiment Results and Analysis
We use different training sets and validation sets to evaluate the model's noise reduction capabilities in the same experimental environment. In the experiment, we take 100 screen shots from two documentaries (marine and grassland) as training sets. Then we conduct the same for two training sets: adding Gaussian noise of constant standard deviation. For validation, we use 10 images completely distinct from the training set and add Gaussian noise. For testing, we have both added Gaussian images and real noisy images.
We use peak signal-to-noise ratio (PSNR) to measure the similarities between the outputs of our methods with the original image. Our code utilized TensorFlow library v1.6, and the training process was performed in HP Z840 desktop workstation with NVIDIA K80 GPU. Each task took 12 hours to train the model in 10K iterations. The GAN parameters used in the experiments are the follows: Adversarial Loss Factor is 0.5, Pixel Loss Factor is 1.0, Feature Loss Factor is 1.0 and Smoothness Loss Factor is 0.0001. Fig. 3 shows the result obtained when we apply the model to images with similar pixel thresholds. From left to right, there are the added Gaussian noise result, the denoising result and the original image. We can see that our model achieves effective results.
The value of PSNR is proportional to the signal quality. The greater the PSNR value, the less distortion is represented.
Tab. 1 shows the PSNR measurements for the denoising results. We calculate the PSNR value comparison between our proposed model and other denoising models. Our denoising model is better than the traditional GAN denoising effect, but the PSNR value is still lower than the classic filter algorithms. We speculate that, the noisy image and the original image are relatively close in pixel distribution. However, in the later stage of training, the discriminator loses the guiding effect, making it difficult to further optimize.

Conclusion
In this paper, we propose a novel GAN based model for image denoising. With the novel generator network, the proposed model provides a more efficient and universal image denoising method, and can be easily trained to learn how to offset the picture noise. Experiments' result shows the effectiveness of our proposed model. We believe there is still space to further enhance the experimental results.
At present, color saturation loss that caused by our model is a bit high. At the same time, we are also thinking about whether GAN based noise reduction methods can have wider applications than CNN based methods or traditional methods, which may be the direction of our future research.

Conflicts of Interest:
We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.