Algorithm Development of Cloud Removal from Solar Images Based on Pix2Pix Network

: Sky clouds affect solar observations significantly. Their shadows obscure the details of solar features in observed images. Cloud-covered solar images are difficult to be used for further research without pre-processing. In this paper, the solar image cloud removing problem is converted to an image-to-image translation problem, with a used algorithm of the Pixel to Pixel Network (Pix2Pix), which generates a cloudless solar image without relying on the physical scattering model. Pix2Pix is consists of a generator and a discriminator. The generator is a well-designed U-Net. The discriminator uses PatchGAN structure to improve the details of the generated solar image, which guides the generator to create a pseudo realistic solar image. The image generation model and the training process are optimized, and the generator is jointly trained with the discriminator. So the generation model which can stably generate cloudless solar image is obtained. Extensive experiment results on Huairou Solar Observing Station, National Astronomical Observatories, and Chinese Academy of Sciences (HSOS, NAOC and CAS) datasets show that Pix2Pix is superior to the traditional methods based on physical prior knowledge in peak signal-to-noise ratio, structural similarity, perceptual index, and subjective visual effect. The result of the PSNR, SSIM and PI are 27.2121 dB, 0.8601 and 3.3341. is compared the these found the model can learn more effective features and clearer by using the of the

accumulated a large number of solar image data by monitoring solar activity. However, researchers are facing many difficulties in analyzing these images. Many of the observed images are obscured by clouds. For example, at the Big Bear Solar Observatory (BBSO) in California, USA, and the days that the observations were affected by clouds (may be from just a few images to large portion of the images during the day) occupy 55% of total days based on the site survey of the Global Oscillation Network Group (GONG) [1]. At other Observatories in the world, the percentage may be higher. In these days, instruments work normally; however, if cloud cover the Sun, their shadows will degrade the observed images. Therefore, it is of great significance for all ground-based observation stations to remove the contamination of cloud on the full-disk solar images [2]. Removing the effects of the clouds from the affected solar images becomes an urgent problem. As a key station in the world, the Halpha images of the solar chromosphere taken by Huairou Solar Observing Station (HSOS) provide ideal information for the detailed study of the subtle structure of the sun. There are some cloud-covered full disk images and normal full-disk images, which were observed by the HSOS (Fig. 1). Therefore, the research object is combined with the deep learning method, and use Pixel to Pixel network to get more cloudless full-disk Halpha images from the cloud-covered full-disk Halpha images. The contributions of our work are as follows: New method on cloud removal. Pix2Pix [3] is used for image cloud removal, which does not rely on the physical scattering model, while adopts the alternative image-to-image translation proposed in 2017.
U-Net Generator. Inspired by the global-first property of visual perception, the embedded U-Net Generator are designed to produce images with more details.
A joint training scheme. A joint training scheme is developed for updating the U-Net Generator through reasonably combining two kinds of loss functions.
The Perceptual Index (PI). PI is introduced for quantitative evaluation from the perceptual perspective. In addition, extensive experiments on solar dataset indicate that Pix2Pix performs favorably against the state-of-the-art methods. Especially, results are outstanding in visual perception.
The rest of the paper is structured as: Section 2 discusses the relevant work of processing solar images by traditional method and deep learning. Section 3 explains the image generation algorithm based on the Pix2Pix network, describes the process steps of the overall framework of the algorithm. Section 3 also gives the network structure diagram and loss function. Section 4 first explains how to generate training set and test set, and then compares and analyzes the effects of different modules in the algorithm presented on model performance. Finally, a large number of comparative experiments are carried out to analyze and evaluate the effectiveness of the proposed algorithm. The evaluation mainly starts from two aspects: subjective visual effect and objective evaluation index. Finally, the conclusion of the article will be presented.

The Traditional Method-DCP
Early cloud removal methods are mostly prior-based methods, and they can achieve a good cloud removal effect to a certain extent. Dark channel prior (DCP [4]) method, which estimates the transmission map by investigating the dark channel prior. Hu [5] propose a two-stage haze and thin cloud removal method based on homomorphic filtering and sphere model improved dark channel prior. Xu [6] propose a fast haze removal algorithm based on fast bilateral filtering combined with dark colors prior. Xu [7] propose a method based on signal transmission and spectral mixture analysis for pixel correction. HoanN [8] propose a cloud removal of remote sensing algorithm image based on multi-output support vector regression. Most of these approaches depend on the physical scattering model [9], which is formulated as Eqs. (1): where I is the observed hazy image, J is the scene radiance, t is the transmission map, A is the atmospheric light and z is the pixel location. The solution of the cloudless image depends on the estimation of the atmospheric light and the transmission map.
DCP method on Fig. 2a produces a cloudless image with rich details compared with DCP method used on Fig. 2c. Therefore, the prior may be easily violated in practice, which leads to an inaccurate estimation of transmission map so that the quality of the cloud removal image is not desirable.

Deep Learning-Generative Adversarial Network
In recent years, researchers in the field of solar physics have gradually begun to explore the use of deep learning [10] to analyze and process solar activity observation data. The study of GAN [11] has made great progress. GAN is widely used in computer vision. In particular, GAN has achieved good results in image generation.
Densely Connected Pyramid Dehazing Network (DCPDN) [12] implements GAN on image cloud removal which learns transmission map and atmospheric light simultaneously in the generators by optimizing the final image cloud removal performance for cloudless images. Zhang et al. [13] proposed a real-time image processing system for detecting and removing cloud shadows on Halpha full-disk solar. Luo [14] identified that heavy cloud covered images by calculating the ratio of major and minor axes of a fitted ellipse. Sun [15] proposed the Cloud-Aware Generative Network. The network consists of two stages: the first is a recurrent convolution network for potential cloud region detection and the second is an auto encoder for cloud removal. Yang [16] proposed the disentangled cloud removal network, which uses unpaired supervision. The network proposed by Yang [16] contains three generators: the generator for the cloudless image, the generator for the atmospheric light, and the generator for transmission map. Zi [17] proposed a method to remove thin cloud from multispectral images, which combines the traditional method with the deep learning method. Firstly, Convolution Neural Network is used to estimate the thickness of thin cloud in different bands. Then, according to the traditional thin cloud imaging model, the thin cloud thickness image is subtracted from the cloud image to get a clear cloud image. DehazeGAN [18] draws lessons from the differential programming to draw lessons from GAN for simultaneous estimations of the atmospheric light and the transmission map. The use of GAN in image cloud removal task is still in the beginning. The current cloud removal methods via GAN all depend on the physical scattering model. Through the research on the development of GAN, it shows that CycleGAN realizes the transformation of unmatched data image. The image training set does not need paired data, so it is widely used. But the training set does not match, it can't be trained in absolute mapping relationship, only the real mapping relationship can be predicted. Therefore, the mapping relationship learned will deviate, which leads to unexpected transformation style of training results. The non-matching data is the input of two groups of completely unrelated data, which can achieve random image style conversion.
Until now, few papers discuss how to deal with image cloud removal independent of the physical scattering model. As discussed in Introduction, it is meaningful to investigate a model-free cloud removal method via GAN.

Cloud Removal Method Based on Pix2Pix
GAN is an unsupervised learning, it can't realize the conversion between pixels. Some researchers propose to use Conditional Generation Adversarial Network to complete the work of image to image, which is called Pix2Pix. It realizes the transformation of matching data image.
Pix2Pix Network adopts a fully supervised method (Fig. 3) that is to train the model with fully matched input and output images, and generate the target image of the specified task from the input image through the trained model. The algorithm is mainly divided into two stages: training and testing. In the training phase, the Pix2Pix network is trained with paired solar images, and then the network model is optimized by iterating the generation network and decision network to optimize the network parameters. In the test phase, the cloud-covered image is input into the trained Pix2Pix network to get the cloudless image. The generator adopts the U-Net [19] architecture (Fig. 4a). Eight convolution layers and eight deconvolution layers are used to generate pseudo samples. In the design of the discriminator (Fig. 4b), the network structure of 5 convolution layers and leaky relu layer is adopted. In generator G, the training Halpha images is input, and convolution operation is performed through 8 convolution layers. The activation function used in this part is leaky relu function, and the batch normalization method [20] is used in each layer to enhance the convergence performance of the model.

Generator
An 8-level U-Net architecture is employed for pixel-level feature learning. The architecture comprises of three parts: encoding network, decoding network, and a bridge that connects both the networks (Tab. 1). The complete network is constructed using 4 × 4 convolution layers with a stride of 2 for down-sampling and up-sampling.
U-Net architecture is used as Generator. U-Net architecture consists of two paths: a contractive path and an expansive path and both are symmetric to each other. It yields an architecture like Ushape ( Fig. 5) (Thus called as U-Net). The network on the left side is a contracting path that is like a traditional CNN involving convolution and activations. On the right side of Fig. 5 is an expansive path which includes up-sampling layers and the corresponding convolutional layers on the left side. Both the network paths are merged to compensate for the loss of information. As a result, the architecture preserves the same resolution of images as in input network layer.    Figure 5: The structure of Generator

Discriminator
The discriminator uses PatchGAN (Fig. 6). PatchGAN makes the model more efficient and the detail achieves a better effect. The discriminator splices the image generated by the generator with the condition image. The number of convolution kernels in the first down-sampling layer is set to 64. The number of convolution kernels in each layer is set to twice that of the upper layer. The number of convolution kernels in the last down-sampling layer is set to 1. The first three down-sampling layers have a stride size of 2. The length and width of each image are half of the origin. The last two subsampling layers have a stride size of 1, keeping the image size unchanged. Finally, the results are obtained through the Sigmoid layer (Fig. 4).
The deeper the subsampling convolution layer is, the more accurate the extracted features will be. The size of the input image is 512 × 512, and each time the image is subsampled, the length and width of the image are changed to half of the original image. The feature image extracted from each layer of convolution is used as the criterion for discriminator. The size of feature image is the result of pixel, full image and patch. Fig. 7 indicated that the experimental results based on the patch were the best. So the PatchGAN is selected to be the discriminator.

Loss Function
The following is the loss function of Pix2Pix Network (Eqs. (2)). Where D(x, y) represents the result of the input object x and output object y of the real matching data for the discriminator D, while D(x, G (x, z)) is the result of the image G(x,z) generated by the generator for the discriminator. L CGAN (G, D) represents the loss function of CGAN. E represents the expectation.
In addition to the above-mentioned optimization function, Pix2Pix optimizes the network by adding Loss L1 [21] as the traditional loss function (Eqs. (3)).
The loss function of the final generator is shown in Eqs. (4), where λ is a hyperparameter, which can be adjusted as appropriate. When λ= 0, the loss function of Loss L1 is not used:

Experimental Results
In this section, the data set is constructed to demonstrate the effectiveness of the proposed method. And four methods: Dark channel prior network (DCP), CGAN (Pix2Pix model without Loss L1 ), L1 model (Pix2Pix model without Loss GAN ), CycleGAN are compared with the proposed method.

Data Preprocessing
The data set is constructed which are partly provided by HSOS. The data set includes two parts: the full-disk Halpha solar images obscured by cloud and the cloudless full-disk Halpha solar images.82 typical pairs of Halpha images are selected as original training set, and 15 pairs of them are chosen to test the performance of the proposed method. Data augmentation is also critical for network in variance and robustness when a small number of training images are available. Flipping, shearing, zooming, and rotation are the main methods (Tab. 2). In order to obtain a usable model, only those images with distinct features are selected as training samples. Finally, 2731 samples are generated from 97 high-quality Halpha full-disk solar images. The proposed method takes 2566 pairs as inputs that are resized to 512 × 512 and sent to the enhanced Pix2Pix architecture to generate an appropriate weight model, which is used to removing cloud in Halpha images.

Quality Measures
To better evaluate the performance of the proposed method, three image quality evaluation methods: the Peak Signal to Noise Ratio (PSNR) [22], the Structural Similarity (SSIM) [22] and Perceptual Index (PI) [23] can be used as image quality evaluation metrics.

Peak Signal to Noise Ratio (PSNR).
The PSNR is a full-reference image quality evaluation indicator. Let MSE denote the mean square error between the current image X and the reference image Y; let H and W denote the image height and width, respectively. i and j represents the position of the pixel. Let n denote the number of bits per pixel (generally 8), meaning that the number of possible gray levels of a pixel is 256. The PSNR is expressed in units of dB. The larger the PSNR is, the smaller the distortion. The MSE (Eqs. (5)) and PSNR (Eqs. (6)) are calculated as follows: Structural Similarity (SSIM). The SSIM is another full-reference image quality evaluation metric that measures image similarity from three perspectives: brightness, contrast, and structure. SSIM values closer to 1 indicate greater similarity between the original X and reconstructed image blocks Y and represent a better reconstruction effect. The formula for calculating the SSIM (Eqs. (7)) is as follows: where, μ x and μ y represent the means of image blocks X and Y, respectively; σ x , σ y and σ xy represent the variances of image blocks X, Y and their covariance, respectively; C 1 and C 2 are constants.
Perceptual Index (PI). The PSNR and SSIM are the most commonly used objective indicators for image evaluation. PSNR is based on the error between corresponding pixels, which is based on error-sensitive image quality evaluation. SSIM is a full reference image quality evaluation index, which measures image similarity from brightness, contrast and structure. Since these measures do not consider the visual characteristics of the human eye, the evaluation results are often inconsistent with human subjective perception. PI (Eqs. (8)) is a new criterion which bridges the visual effect with computable index. And it has been recognized to be effective in image super-resolution [24] .In the experiment, PI is used to evaluate the performance of image cloud removal. The higher the image quality is, the lower PI is.
where, Ma and NIQE are two image qualification indexes which are detailed in [25].

Training Details 4.3.1 The Number of Iterations
The loss value is recorded every 1000 iterations (Fig. 8). It can be seen from the change trend that the local minimum is reached around 1000,000 times, and then the discriminator continues to be trained to make the judgment more accurate. After 1000,000 times, the generated image is basically close to the target image and it tends to be stable, so the network trained by 1026,400 iterations is selected. The adjustment of this parameter is mainly to consider the weight of the two loss functions. It aims to make the detail information of the generated image closer to the real image. That is, make the generator more accurate. So consider taking λ to 100 and 0 respectively. When λ = 0, the loss function only has Loss GAN and it doesn't has Loss L1 Block effect will appear when loss function is defined as Loss GAN , block uniform stripes will appear in the image, causing image distortion (Fig. 9). When loss function is defined as Loss=Loss GAN +Loss L1 block effect of the image is significantly weakened or even disappeared, which improves the texture clarity of the image. In the experimental comparison of parameter sizes, Loss GAN and Loss L1 are not on the same order of magnitude, and the influence of Loss L1 should be appropriately increased without affecting the function of Loss GAN (Fig. 8). Therefore, the weight is increased to make the generated image closer to the target image. 1, 10, 30, 50, 70, 90, 100 were set as the value of λ respectively (Tab. 3).Subjective comparison was conducted under the results of 1,026,400 iterations. As the value of λ increases, the PSNR and SSIM significantly increases. And the overall perceived quality is better. The PSNR is 27.2121, which is 0.5 dB higher than other results, the SSIM is 0.8601, which is also higher than other results. The experimental results show that as the value of λ increases, the PI significantly decreases (Fig. 10). Therefore, from the three image quality evaluation methods, the choice of 100 as the value of λ is the best.

Results Analysis
The method is performed on the tensorflow framework and a NVIDIA GeForce RTX2080ti GPU. During training, Adam optimizer [26] is adopted with a batch size of 1, and set a learning rate as 0.0002. The hyperparameters of loss function is set as λ = 100. The discriminator tended to converge after about 1,026,400 iterations on the training set. Fig. 11 shows the results of test set in 2011. From left to right are the cloud covered Halpha images, the generated cloudless images and the real cloudless images. It can be seen that the model has a good generating effect on the test set. Dark thread-like features are filaments and bright patches surrounding sunspots are plages which are also associated with concentrations of magnetic fields. So the characteristic areas on the figure are restored and the cloud is removed.  Fig. 12 gives the comparison of visual effect in which the comparison results on proposed data set. DCP method suffers from color distortion, where the results are usually darker than the ground truth images and it is observed that there remains some cloud in the images. CGAN suffers from color distortion and it fails in details restoration. In the L1 loss model, the generated picture also loses detail information, and there is noise in the generated picture. Compared with CycleGAN, Pix2Pix makes the synthesized image look more like the ground truth image. It is obvious that Pix2Pix out-performs the above mentioned methods in details recovery, and it improves the cloud removal results qualitatively and quantitatively.  The SSIM value obtained by Pix2Pix is 0.8601, and the PSNR value is 27.2121 dB (Tab. 4). Both of them are higher than other four models. From these two indicators, the cloud removal effect of Pix2Pix model is better than other models, which is consistent with the subjective perception of the target image. According to the description of PI index, the lower the PI value, the higher the image quality. The PI value of the DCP model is the smallest (Fig. 12). The image has been sharpened after DCP processing, but the goal of image cloud removal has not been achieved. After excluding the DCP model, the Pix2Pix model performs the best. From these three objective indicators, it can indirectly prove that the algorithm in this paper can better generate the details and features of the image, make the generated image closer to the real image and remove the influence of cloud on the full disk data set of solar image. The achieved results conclude the proposed model as an effective model among all the used models in this investigation.

Limitation
The proposed method is not very robust for heavily cloud scene (Fig. 13), the details of solar image in heavily cloud can't be recovered naturally. The size of data set is not big enough and core details of images can't be learned. The limitation might be solved by applying enhancing blocks in the network. Imaging analysis and processing are critical components in solar physics. It is difficult for researchers to analyze solar activity in images obscured by clouds. Therefore, it is of great significance to remove the cloud from the image for all ground-based observations. This paper uses an image translation algorithm based on pixel to pixel network for image cloud removal. In this network, a U-Net generator and PatchGAN discriminator is used to improve the generated image details, so that the visual effect of the generated cloudless image is more realistic, and the goal of image cloud removal is realized. And it is compared with the traditional mainstream deep learning image generation algorithms. In these experiments, it be found that the model can learn more effective solar image features and synthesize clearer cloudless images by using the data set of the National Astronomical Observatories. The results of this study will be deployed to HSOS to improve the image quality of the full disk data set of solar image. The proposed method will also use image processor technology to develop corresponding processing software for astronomical observation researchers. The future work will focus on creating a larger data set from solar images, obtaining a data set with various solar activity conditions, making the network have better generalization capabilities, and studying some different quantitative performance indicators to evaluate our method.