Within the application of driving assistance systems, the detection of driver’s facial features in the cab for a spectrum of luminosities is mission critical. One method that addresses this concern is infrared and visible image fusion. Its purpose is to generate an aggregate image which can granularly and systematically illustrate scene details in a range of lighting conditions. Our study introduces a novel approach to this method with marked improvements. We utilize non-subsampled shearlet transform (NSST) to obtain the low and high frequency sub-bands of infrared and visible imagery. For the low frequency sub-band fusion, we incorporate the local average energy and standard deviation. In the high frequency sub-band, a residual dense network is applied for multiscale feature extraction to generate high frequency sub-band feature maps. We then employ the maximum weighted average algorithm to achieve high frequency sub-band fusion. Finally, we transform the fused low frequency and high frequency sub-bands by inverse NSST. The results of the experiment and application in real world driving scenarios proved that this method showed excellent performance when objectively compared to the indexing from the other contemporary, industry standard algorithms. In particular, the subjective visual effect, fine texture, and scene were fully expressed, the target’s edge distinct was pronounced, and the detailed information of the source image was exhaustively captured.
Infrared image is a kind of radiation image which distinguishes the target from the background according to the radiation difference. It has the ability to capture the thermal radiation of the object, but it cannot reflect the real scene information. Visible image is a reflection image with multiple high-frequency components, which can show the scene details under certain lighting conditions. However, it is difficult to capture all the useful information in the environment of low light intensity and heavy fog at night. They have strong complementarity in reflecting scene information.
Image fusion is to analyze and extract the complementary information from two or more images of the same target to generate a composite image which can accurately describe the scene information. Image fusion is of great significance for image recognition and understanding. It has been widely used in computer vision, remote sensing [
The image fusion algorithm includes two directions: the traditional fusion algorithms based on model driven and the fusion algorithms based on neural network driven by data.
The traditional fusion algorithms are based on the models such as multi-scale decomposition [
Son [
Multi-scale fusion [
Based on the model of the retinex, Jung et al. [
Jee et al. [
Inspired by the nonlocal mean de-noising filter, Zhang et al. [
Zhang et al. [
Li et al. [
In recent years, deep learning technology has shown remarkable effects in image processing applications. Due to the lack of high-quality training data set, there is not much research on the fusion of visible and infrared images based on deep learning [
Jung et al. [
On the other hand, the neural-network-based fusion of visible and near-infrared images requires high registration accuracy, and most of the existing data sets are unable to meet the requirements. Using scale invariant feature transform (SIFT) method, Brown et al. [
Tang et al. [
Li et al. [
Li et al. [
In the past research, several open-source datasets were used for training and testing, such as TNO [
We built a system based on non-subsampled shearlet transform (NSST) [
NSST is an improvement of shearlet transform. It inherits the characteristics of shearlet transform and avoids pseudo-Gibbs. Compared with the wavelet transform, it has the advantages of low complexity and high efficiency, which has been widely used in image segmentation, edge detection, recognition, and other fields.
RDN is a combination of residual network (ResNet) [
The steps are as follows.
Step 1. NSST
Step 1.1. The non-sub sampled Laplacian pyramid filter bank (NSLP) was used to achieve multi-scale decomposition, which ensures translation invariance and suppress pseudo-Gibbs [
Step 1.2. Shear filter (SF) was used to achieve direction localization. After N-level decomposition of the source image, one low-frequency sub-band image and N high-frequency sub-band images with the same size but different scales were obtained.
Step 2. The average energy in the neighborhood of low frequency sub-band coefficients was calculated. The regional energy eigenvalues of each sub-region were calculated. The low-frequency sub-band coefficients were calculated by the weighted strategy based on the regional energy eigenvalues.
Step 3. The local features were extracted by dense convolution layers. Residual dense block (RDB) allows the states of the previous RDB to be directly connected to all layers of the current RDB, forming a continuous memory (CM) mechanism.
Step 4. Local feature fusion was used to adaptively learn more effective features from previous and current local features, and stabilize the training of larger networks.
Step 5. The fused image is obtained by inverse NSST transform.
This section introduces the details of the fusion algorithm in this paper.
Shearlet is a system theory which combines geometry and multiresolution analysis by classical affine [
If
then the system can be generated by
in which
The shearlet transform of function
It can be seen that the shearlet transform is a function of scale (
NSST is an improvement of shearlet transform. By using non-subsampled Laplacian filter banks, NSST obtains the multiscale decomposition of the original image. Then, the shear filter combination is used to decompose the sub-band images of different scales. The sub-band images with different scales and directions are obtained.
Low frequency sub-band reflects the outline and basic information of the image. The commonly used fusion methods for low frequency sub-bands are: weighted average method, absolute value method, standard deviation selection method, etc. In this paper, the method of combining local average energy with local standard deviation was applied [
Step 1. The average of low frequency coefficient (
The size of the window is
Step 2. According to
In
Step 3. The strategy for Low frequency sub-band fusion is:
in which
RDN is based on deep learning, and was proposed for image super-resolution [
The main innovation of RDN is the structure of residual dense block (RDB), combining ResNet and DenseNet, as
There are three parts in an RDB.
Contiguous Memory. The state of the previous RDB was passed to all the layers of the current RDB. For instance, for the adjacent RDB-c and RDB-d, the mathematical model is: Local feature fusion The feature number was reduced due to the direct connection between the preceding RDB and the current RDB. Furthermore, a
Local residual learning
On the basis of the above, the output of the
High frequency sub-band images mainly contain the edge features and texture details. Therefore, the fusion rules of high frequency sub-band directly affect the resolution and clarity of the fused image.
In this paper, high frequency sub-band images of infrared and visible images were input into the trained RDN model to extract deep features, and then generate high frequency sub-band feature map. By using the maximum weighted average fusion strategy, the weight map is obtained.
The formulate is:
in which
According to the fusion rules of low frequency sub-band and high frequency sub-band mentioned above, the basic part
In order to verify the effectiveness of the method in this paper, the UN-Camp-images were taken as an example, and the results were compared with the following six methods: NSCT-PCNN [
There are two main aspects of image fusion quality evaluation, namely subjective evaluation, and objective index evaluation.
Subjective evaluation refers to the performance of the fused image details through visual perception.
The fusion results of the UN-Camp are as shown in
In order to assess the performance of the algorithms more objectively, five indexes were selected to evaluate the quality of fused images: spatial frequency (SF), sharpness (SP), structural similarity (SSIM), Xydeas-Petrovic (
As shown in
Visible image can reflect the texture and details of the scene. The infrared image is generated according to the thermal radiation of the object and is not affected by the external illumination. Therefore, visible image and infrared image fusion technology are particularly suitable for all-weather image processing and computer vision.
As an important part of the driving assistance system, the detection of driver’s facial features is quite a critical issue. We took the vehicle cab as the application scenarios, and applied our method to the fusion of driver’s facial images to capture the driver’s facial images in both daytime and nighttime. The camera with a colored lens and an infrared lens was mounted above the dashboard [
We took the driver’s infrared images and visible images in the actual driving scene with low intensity of illumination intensity, and compared the results of different image fusion algorithms, which was similar to Section 4.
As can be seen from
In this paper, on the basis of existing image processing algorithms, infrared image and visible image are decomposed into low frequency sub-band and high frequency sub-band by NSST transform. Low-frequency sub-band fusion was proceeded by combining local average energy with local standard deviation. RDN model was used to extract multi-scale features of high frequency sub-band, and then feature mapping was generated. The maximum weighted average fusion strategy was applied for high frequency sub-band fusion. Finally, the fusion image was obtained by inverse NSST transform. The results of the experiment and application in the driver’s cab showed the fused image generated by our method has better visual effects. Moreover, it generally retains the largest amount of information in the source images.