Intelligent Fusion of Infrared and Visible Image Data Based on Convolutional Sparse Representation and Improved Pulse-coupled Neural Network

Multi-source information can be obtained through the fusion of infrared images and visible light images, which have the characteristics of complementary information. However, the existing acquisition methods of fusion images have disadvantages such as blurred edges, low contrast, and loss of details. Based on convolution sparse representation and improved pulse-coupled neural network this paper proposes an image fusion algorithm that decompose the source images into high-frequency and low-frequency subbands by non-subsampled Shearlet Transform (NSST). Furthermore, the low-frequency subbands were fused by convolutional sparse representation (CSR), and the high-frequency subbands were fused by an improved pulse coupled neural network (IPCNN) algorithm, which can effectively solve the problem of difficulty in setting parameters of the traditional PCNN algorithm, improving the performance of sparse representation with details injection. The result reveals that the proposed method in this paper has more advantages than the existing mainstream fusion algorithms in terms of visual effects and objective indicators.


Introduction
Infrared imaging sensors have a strong ability to identify low-illuminance or camouflage targets by thermal radiation imaging of target scenes, while their imaging clarity is relatively low. Correspondingly, visible light imaging sensors are imaged by reflection of the target scenes, with higher spatial resolution and clarity, but their imaging quality is easily affected by factors such as harsh environment [1]. It is believed that the fusion of infrared images and visible light images can make full use of their advantages, which can contribute to applications in many fields such as military operations, resource detection, and security monitoring [2]. of multi-scale transform (MST). Furthermore, methods of image conversion [3] and fusion techniques of separated sub-bands are two significant research topics in the field of MST-based fusion methods. A large number of literatures have shown that the performance of MST-based fusion methods can be significantly improved by selecting appropriate conversion methods and designing effective fusion technique. Specifically, the NSST algorithm proposed by Xia et al. [4] is remarked as great local time domain feature, multi-directionality and translation invariance, which can effectively save and extract details from source images. A study conducted by Ding et al. [5] fuses high-frequency subbands by PCNN, successfully combining important details from different source images. In addition, Singh et al. [6] suggested to add PCNN in the NSST framework to draw the key details of source images, ignoring the complexity of setting a large number of parameters in PCNN. Yang et al. [7] proposed an image fusion method based on sparse representation (SR), which improves the productivity of image fusion, but it only performs sparse representation on the low-frequency coefficients in four directions and does not fully represent characteristics and details of source images.
Furthermore, another SR-based denoising method, which was proposed by Liu et al. [8], improves the performance of the traditional MST-based method. However, this method still has two shortcomings, namely the limited ability to save details and high sensitivity to registration errors. Subsequently, a CSR algorithm was developed by Liu et al. [9], which can solve the two problems of sparse representation and achieve image fusion by implementing sparse representation of the entire image. In the research of Chen et al. [10], an image segmentation method based on simplified pulse coupled neural network model (SPCNN) was proposed, which can automatically set its free parameters for better segmentation results. Based on the optimization of the SPCNN model, Ma obtained an IPCNN for image fusion [11].
Based on the analysis of existing image fusion algorithms, especially IPCNN and CSR, this study proposes a new image fusion algorithm, namely NSST-IPCNN-CSR, which can retain the global features of source images and characteristics of each pixel point, and highlight the edges of the images to achieve a better visual perception of fusion results.

Non-subsampled Shearlet Transform (NSST)
Easley et al. [12] suggested NSST based on Shearlet transform which has no translation invariance. NSST achieves the multi-scale decomposition of an image through the non-subsampling pyramid set (NSP) and then obtains the coefficients of the images in various scales using the shear filter bank (SF) for directional decomposition, thereby obtaining the coefficients of the images in different scales and directions [13]. The NSP generates k + 1 sub images made of a low-frequency sub image and k highfrequency sub images with the same size as the source images, where k represents the decomposition order. In image decomposition and reconstruction, no up and down sampling is performed on the image, and NSST not only has good frequency doIn general, non-standard acronyms/abbreviations must be defined at their first mention in the Abstract and in the main text and used consistently thereafter. For non-standard acronyms/abbreviations mentioned once, only the full form must be used.
Main localisation characteristics and multi-directionality, but also translation invariance for suppressing the pseudo-Gibbs phenomenon. Fig. 1 shows the schematic of NSST multi-scale multi-directional decomposition.

Improved PCNN (IPCNN)
Setting the size of free parameters, such as the connection strength, is a significant problem in the traditional PCNN model. To overcome the difficulty of setting these parameters manually, Ma [11] recently proposed an improved PCNN model (IPCNN) and an automatic parameter setting method for image segmentation. We believe that it has the same effect on image fusion. Furthermore, the highfrequency coefficients obtained by applying the IPCNN model to the fusion of multiscale transform (MST) are reasonable.
The IPCNN model is described as follows: (1) The improved PCNN model uses the ignition conditions of the standard PCNN model, U ij ðnÞ > E ij ðnÞ, and retains the feedback input and link input modes of the standard PCNN model and connection strength coefficient b which reflects the degree of influence between the domain neurons.
The working mechanism of the IPCNN model is improved compared with that of the traditional PCNN model. In the IPCNN model, dynamic threshold E ij ðnÞis affected by the combination of previous state E ij ðn À 1Þ and previous state output Y ij ðn À 1Þ. When the internal activity item of current state U ij ðnÞ is better than that of the dynamic threshold (E ij ðnÞ) of the current state, ignition is performed, and the output is Y ij ½n ¼ 1. The source image is defined as S. In the IPCNN model, the decay time constant of parameter feedback a u and input a f is changed to the decay time constant of the internal activity item which can more accurately express the meaning of each item in the model and further unify the expression of this model.
The IPCNN model has four tuneable parameters, i.e., a u , a e , b and V E , and a synaptic connection matrix (W). a u and a e are the exponential decay constants for the internal activity terms and the dynamic thresholds, respectively; b is the connection strength; and V E is the magnitude of the dynamic threshold. The choice of these parameters greatly affects the image processing. The above model is mainly used in image segmentation and is effective for image fusion. Currently, b is usually constant in many image processing applications. However, the response to regions with significant features should be stronger than that to regions with insignificant features, depending on human visual characteristics. Therefore, assigning b a constant value is unreasonable. Parameter b indicates the connection strength of the neighbouring neurons in the IPCNN model. The magnitude of b indicates the degree of interaction between neighbouring neurons. The larger the value is, the greater the interaction between adjacent neurons will be, making the internal activity items more volatile, and vice versa. At present, b is normally set manually by empirical value in image processing applications. To integrate infrared visible images better, we adjust b to construct the local-direction contrast model and then apply the model to b as follows to determine its value: where X indicates the source image that must be fused, indicates the coefficient of the ði; jÞ position in the r sub-band at the K NSP decomposition level, and X 0 K ði; jÞ indicates the local average of the lowfrequency coefficients of image X at the Kth decomposition level and is expressed as follows: where M is generally assumed to be equal to N, and M × N is the area of the neighbourhood within the centre of ði; jÞ.

Convolutional Sparse Representation (CSR)
CSR is a convolutional form of SR [9], i.e., the convolutional sum of filter dictionary and representative response is used instead of the product of redundant dictionary and sparse coefficient, sparsely encoding the image in the 'entirety' unit. The CSR model can be expressed as arg min In Eq. (6), d m f g represents an M-dimensional convolution dictionary, is the symbol of convolution operation, x m f g represents the representative response, and s represents the source image. The alternating direction method of multipliers (ADMM) is a dual convex optimisation algorithm that can solve the convex programming problem with a separable structure by alternately solving several sub-problems. Considering that the ADMM algorithm can solve the basis pursuit denoising problem well, the literature [14] proposes a Fourier domain ADMM algorithm for solving the sparse convolution model. In this algorithm, dictionary learning is defined as the optimisation problem of Eq. (7).
In the literature [9], CSR is applied to image fusion for the first time; CSR is considered an improved form of SR for the sparse representation of the entire picture. The CSR algorithm addresses the weak points of traditional sparse representations with limited detail preservation capacity and high sensitivity to registration errors. The algorithm is also useful in low-frequency sub-bands fusion. The low-frequency sub-bands obtained by NSST decomposition represent the general description of the picture and there is generous with approximate value of 0, sparsely representing the low-frequency details in the image. Thus, we introduce the CSR model in low-frequency sub-bands fusion. Fig. 2 shows the specific fusion framework of this work. The framework is divided into four steps: NSST decomposition, fusion of high-frequency sub-bands, fusion of low-frequency sub-bands and NSST reconstruction.

NSST-IPCNN-CSR Algorithm
Step 1 NSST Decomposition Source images A and B are disassembled by the L-level NSST model to obtain their low-frequency coefficients L K;A ; L K;B È É and high-frequency coefficients H l k A ; H l k B È É , where K is the multiscale decomposition level, l is the decomposition order, k is the decomposition direction, and 1 k K.
Step 2 Fusion of High-frequency Sub-bands The IPCNN model is used to fuse high-frequency sub-bands [15]. The normalisation coefficients of high-frequency coefficients H l k A and H l k B are taken as the feed input, and the value of the local direction contrast model as the joint strength according to Eq. (4). In the entire iterative process, the total emission time is applied to measure the activity level of the high-frequency coefficients. In line with the IPCNN model described by Eq. (1)-Eq. (3), the trigger time accumulates by adding the following steps at the end of each iteration: The excitation time of each neuron is T ij ½N, where N is the total number of iterations, in correspondence to high-frequency sub-bands A l k and B l k . The IPCNN times of A l k and B l k could be respectively calculated as T l k ij;A ½N and T l k ij;B ½N. The final fusion coefficients are obtained as follows: where H l k ij;F is the fusion coefficient of the high-frequency sub-bands. If T l k ij;A ½N is larger than T l k ij;B ½N, than the pixel at ði; jÞ in image A has more obvious characteristics than the pixel at the same position in image B; therefore, the former is chosen as the pixel in the fused image, and vice versa.

Step 3 Fusion of Low-frequency Sub-bands
The low-frequency sub-bands fusion strategy also has a significant impact on the final fusion effect. Literature [16] fused low-frequency sub-bands with a convolutional sparse representation. Low-frequency sub-bands are assumed after the decomposition of k source images and set as L k , k 2 1; …:K f g, and a group of dictionary filters d m , m 2 1; …:M f g is assumed. Fig. 3 shows the low-frequency sub-bands fusion based on CSR.

Objective Evaluation Index
The evaluation method of image fusion quality is divided into subjective visual and objective index evaluations. Objective index evaluation selects relevant indices to measure the effect of the human visual system on image quality perception. To quantitatively evaluate the performance of different methods, six accepted objective fusion evaluation indices were selected in the experiment, i.e., entropy (EN), edge information retention (Q AB=F ), mutual information (MI), average gradient (AG), space frequency (SF) and standard deviation (SD). Entropy characterises the amount of information available in the source and fused images; edge information retention characterises the amount of edge detail information in the source image injected into the fused image; mutual information is used to measure the information of the fused image containing the source image; average gradient can be used to represent the sharpness of the image, and the larger the value is, the clearer the image will be; space frequency reflects the overall activity of the image in the space domain, and its size is proportional to the image fusion effect; standard deviation reflects the dispersion degree of the pixel and mean values of the image, and the greater the deviation is, the better the quality of the image will be. In general, the larger the six objective indices are, the higher the quality of the fused image and the clearer the image will be.

Experimental Settings
To verify the effectiveness of the proposed method, three pairs of multi-focus images were used in our experiments. These source images were collected from the Lytro Multi-focus Dataset and have the same spatial resolution of 256 × 256 pixels. The source images in each pair have been accurately registered. The experiments were conducted on a PC equipped with an Intel(R) Core(TM) i7-6700 K CPU (4.00 GHz) and 32-GB RAM. The MATLAB R2013b software was installed on a Win 7 64-bit operating system.

Image Fusion Experiment
To verify the availability of NSST-IPCNN-CSR, three sets of infrared and visible images with a preregistration size of 256 × 256 were selected as the experimental data. The three sets of source images have different complexities in terms of target and scene expressions. Group 1 represents complex objects under a single scene; Group 2 represents single object under a single scene; Group 3 represents complex objects under a complex scene. Fig. 4 illustrates the fusion results of Group 1, and the objective evaluation indexes are illustrated in Tab. 1.
The fusion results in Figs. 4(c)-4(h) clearly indicate that these compared methods can effectively complete complementary information fusion from source images, but their abilities to capture feature information vary. In Algorithm 1, the fusion results indicate that the contrast between the character and the scene is low, the texture of the open space in the upper right image is unclear, and the details of the visible image cannot be displayed well. In Algorithm 2, the road, the railing, the house and the tree are enhanced in layering, but plaque shadows and unclear textures can be observed in the shrub. In Algorithm 3, the details of the road, the railing and the house are seriously lost, resulting in the unclear road and railing edges and blurred house edges. In Algorithm 4, the bushes are not textured clearly, and the house exhibits a clear 'block effect' phenomenon. The details of the house and the road in the fusion results of Algorithm 5 are prominent, but 'discontinuity' can be observed in the space. In the fusion result of the NSST-IPCNN-CSR algorithm, the grey level, the brightness and the sharpness optimally match those of the source images, and the overall texture structure is obvious, with a desirable visual effect on human eyes.
The performance indicators in Tab. 1 clearly show that except for the MI value of Algorithm 3 that is slightly larger than that of the proposed algorithm, the evaluation indicators of the proposed algorithm are better than those of the contrast algorithm. Therefore, based on the subjective visual evaluation of Fig. 4, when a single source image is targeted under a complex scene, the proposed algorithm can conserve the serviceable parts of the source image excellently and has a good visual effect.
The fusion results of Group 2 are shown in Fig. 5, and Tab. 2 shows the objective quality evaluation indicators.  Figs. 5(c)-5(h) clearly indicate that the above methods can effectively fuse the target and scene information, but the visual qualities of the fusion images vary. In the fusion results of Algorithm 1, the target vessel information is prominent, but the scene information is ambiguous. In Algorithm 2, the target vessel information is prominent, but a serious 'plaque effect' phenomenon can be observed. In Algorithm 3, the target vessel contour is blurred, and a slight plaque can be observed on the fused image. In Algorithm 4, the detailed information of the infrared light is displayed poorly, and the scene information is too smooth. In Algorithm 5, the detailed features of the image are prominent, and a 'discontinuity' phenomenon can be observed in the space, with poor visual effects. In the NSST-IPCNN-CSR algorithm, the target vessel information is prominent, and the background information structure features are obvious, with a great visual effect.
The performance indicators in Tab. 2 indicate that the proposed algorithm is inferior to Algorithm 6 in terms of edge information retention and better than other contrast algorithms in terms of the other five evaluation indicators. The proposed algorithm has good visual effects and performance indicators according to both subjective visual and objective quality evaluations. Furthermore, its fusion effect has good recognisability.
The fusion results of Group 3 are shown in Fig. 6, and Tab. 3 shows the objective quality evaluation indicators.
Figs. 6(c)-6(h) clearly show that these above methods can effectively complete infrared/visible images fusion, but their detail expression capabilities vary. In the fusion results of Algorithm 1, the contrast of the image is enhanced, although the target feature information is seriously lost, and the text information on the billboard is nearly impossible to distinguish. In Algorithm 2, the contrast of the image is weak, and a slight 'discontinuity' phenomenon can be observed in the space. In Algorithm 3, the image contrast is enhanced, but the details are displayed poorly. In Algorithm 4, the detailed information of the infrared images cannot be displayed well, and the scene information is too smooth. In Algorithm 5, the overall image contrast is low, the edge of the target object is blurred, and the loss of detail feature information is serious. In the fusion results of the NSST-IPCNN-CSR algorithm, the layering of the image is enhanced, the continuity is good, the edge contour texture is clear, and the text information on the billboard can be recognised well.

Conclusion
We propose the application of a new algorithm called NSST-IPCNN-CSR to medical image fusion. The novelty of the proposed algorithm is primarily reflected in two aspects. First, IPCNN model was used for the first time in high-frequency sub-band fusion in which all required parameters could be calculated adaptively in line with the input high-frequency sub-bands. Moreover, local direction contrast model was used to adjust parameter b to its optimal value, and the IPCNN model was adjusted to function better in image fusion. Second, we fused low-frequency sub-bands by convolutional sparse representation, addressing two matters in sparse representation, i.e., the insufficient ability to save detail texture information and the high sensitivity to mismatch rate, and performing better in low-frequency sub-bands fusion. Three sets of source images and five sets of comparison algorithms were used for experiments. The results demonstrate  that the NSST-IPCNN-CSR algorithm can effectively express the details in the image, i.e., presenting the required details clearly and making smooth edges; retain the useful information of the source image and capture its geometric structure at a deeper level; and perform well in visual perception and objective effect evaluation.