iconOpen Access

ARTICLE

RetinexWT: Retinex-Based Low-Light Enhancement Method Combining Wavelet Transform

Hongji Chen, Jianxun Zhang*, Tianze Yu, Yingzhu Zeng, Huan Zeng

College of Computer Science and Engineering, Chongqing University of Technology, Chongqing, 400054, China

* Corresponding Author: Jianxun Zhang. Email: email

(This article belongs to the Special Issue: Computer Vision and Image Processing: Feature Selection, Image Enhancement and Recognition)

Computers, Materials & Continua 2026, 86(2), 1-20. https://doi.org/10.32604/cmc.2025.067041

Abstract

Low-light image enhancement aims to improve the visibility of severely degraded images captured under insufficient illumination, alleviating the adverse effects of illumination degradation on image quality. Traditional Retinex-based approaches, inspired by human visual perception of brightness and color, decompose an image into illumination and reflectance components to restore fine details. However, their limited capacity for handling noise and complex lighting conditions often leads to distortions and artifacts in the enhanced results, particularly under extreme low-light scenarios. Although deep learning methods built upon Retinex theory have recently advanced the field, most still suffer from insufficient interpretability and sub-optimal enhancement performance. This paper presents RetinexWT, a novel framework that tightly integrates classical Retinex theory with modern deep learning. Following Retinex principles, RetinexWT employs wavelet transforms to estimate illumination maps for brightness adjustment. A detail-recovery module that synergistically combines Vision Transformer (ViT) and wavelet transforms is then introduced to guide the restoration of lost details, thereby improving overall image quality. Within the framework, wavelet decomposition splits input features into high-frequency and low-frequency components, enabling scale-specific processing of global illumination/color cues and fine textures. Furthermore, a gating mechanism selectively fuses down-sampled and up-sampled features, while an attention-based fusion strategy enhances model interpretability. Extensive experiments on the LOL dataset demonstrate that RetinexWT surpasses existing Retinex-oriented deep-learning methods, achieving an average Peak Signal-to-Noise Ratio (PSNR) improvement of 0.22 dB over the current State Of The Art (SOTA), thereby confirming its superiority in low-light image enhancement. Code is available at https:// github.com/CHEN-hJ516/RetinexWT (accessed on 14 October 2025).

Keywords

Low-light image enhancement; retinex algorithm; wavelet transform; vision transformer

1  Introduction

As technology progresses, the need for clear, accurate information extraction from images continues to expand. Environmental factors, however, often obstruct the capture of high-quality images. Nighttime or low-light images typically suffer from issues such as noise, uneven illumination, and blurred details, all of which degrade visual quality and result in incomplete information. These limitations not only impair visual perception but also have cascading effects on advanced computer vision tasks, such as object recognition, autonomous driving, and image classification, ultimately reducing the accuracy of downstream processing tasks.

To address these challenges, a variety of low-light image enhancement algorithms have been proposed, including basic methods like gamma correction and histogram equalization. However, these approaches frequently lead to over-enhancement and image distortion, as their success depends heavily on the accuracy of manually set priors. In real-world scenarios, the complexity of lighting conditions complicates the determination of low-light factors.

Traditional cognitive methods, like the Retinex theory [1], inspired by the human visual system, decompose an image into two components—illumination and reflectance—to emulate how human perception interprets color and brightness across varying lighting conditions. According to Retinex theory, the objective of low-light enhancement is to mitigate illumination effects, amplify the reflectance component, and recover details and color. However, traditional Retinex approaches require complex parameter tuning and often introduce significant noise and artifacts, especially under low-light conditions.

With advancements in deep learning, convolutional neural networks (CNNs) and Transformer-based models have set new benchmarks for low-light image enhancement. CNNs effectively capture local image features, such as edges and textures, which are essential for detail restoration and noise suppression in low-light settings. However, under low-light conditions, image details and textures are frequently lost, while accurate separation of illumination information remains essential. Conventional convolution operations struggle to balance detail preservation with precise illumination estimation. Furthermore, traditional convolutional upsampling and downsampling mechanisms can degrade details, as texture and edge occlusion or blurring is common in low-light images. Direct convolutional downsampling can exacerbate this issue by conflating noise with detail, especially in dim and degraded regions, where noise is often erroneously amplified.

In feature fusion during upsampling and downsampling, standard channel-wise concatenation is typically used. This technique, which concatenates features along the channel dimension, lacks the ability to selectively emphasize relevant features, often leading to an accumulation of redundant information that dilutes key details and hampers the model’s ability to capture critical features effectively. Additionally, because CNNs primarily capture local features, relying on these features alone may be insufficient to address global illumination deficiencies in low-light images. While Transformer models, with their self-attention mechanism, enable a global perspective and model long-range dependencies more effectively, they enhance detail and structure restoration in low-light images. However, applying the original Transformer architecture is computationally intensive and involves a complex training process, making it challenging to adopt in real-time low-light image enhancement tasks.

To overcome these limitations, we propose RetinexWT, a unified low-light image enhancement framework that tightly integrates physical Retinex modeling with modern deep architectures in both spatial and frequency domains. Building upon the traditional Retinex theory, we augment the reflectance and illumination components with perturbation terms to explicitly characterize complex low-light degradations. The framework comprises two main modules: an Illumination Estimator and a Corruption Restorer. The Illumination Estimator, enhanced by a Wavelet Transform Feature Decomposer, exploits low-frequency illumination priors and high-frequency structural details to generate accurate illumination maps while adaptively attenuating noise. The Corruption Restorer employs an Illumination-Guided Transformer with Wavelet (IGTW), in which self-attention is explicitly guided by illumination features and reinforced through a Gated Fusion Mechanism, enabling selective feature integration from downsampling and upsampling paths. This design not only preserves fine details and suppresses noise but also mitigates overexposure artifacts and reduces the computational burden commonly observed in vanilla Transformers. Collectively, RetinexWT forms a cohesive and interpretable enhancement pipeline that achieves robust brightness enhancement, effective degradation suppression, and superior visual fidelity across diverse low-light conditions.

In summary, our main contributions can be summarized as follows:

•   We propose a Transformer-based hybrid attention network for low-light image enhancement, ensuring effective modeling of long-range dependencies during the enhancement process. Our method leverages frequency domain information obtained from wavelet transform in combination with the Retinex model to obtain more accurate illumination information.

•   We avoid traditional downsampling and instead introduce a Haar wavelet decomposer to preserve information. The input features are decomposed into high-frequency and low-frequency components through wavelet transform, allowing the structural and detail information of the image to be retained and separated during downsampling, thereby improving enhancement performance. Additionally, by providing frequency domain features, the model can utilize this rich information during reconstruction to perform more targeted operations in noise suppression, edge refinement, and noise removal based on frequency domain characteristics.

•   We introduce a gating mechanism to selectively fuse the features from upsampling and downsampling. Compared to direct channel concatenation, this mechanism can adaptively determine which features are more important. By controlling the contribution of different channel features, it better balances noise suppression and detail preservation. During feature fusion, convolution and activation operations are applied to further refine and enhance information flow, reducing the risk of information loss during upsampling and downsampling.

•   Qualitative and quantitative experiments demonstrate that our RetinexWT outperforms all previous Retinex-based deep learning methods and achieves superior results over state-of-the-art (SOTA) methods across multiple datasets.

In conclusion, Our innovative use of a Haar wavelet decomposer for downsampling ensures the retention of important structural and textural features across multiple frequency components, providing a superior enhancement performance. Additionally, the introduction of a gating mechanism for feature fusion offers an adaptive method for balancing noise suppression and detail preservation, a crucial improvement over traditional concatenation methods. By incorporating the Transformer’s self-attention mechanism, we also enhance the model’s ability to capture global dependencies, further improving the quality of the enhanced images.

2  Related Work

2.1 Low-Light Image Enhancement

Traditional methods: Traditional methods generally employ mathematical models to enhance low-light images, such as Histogram Equalization (HE) [25] and Gamma Correction (GC) [68]. The core idea of these methods is to map the distribution of low-light input images by enhancing smaller values (usually representing darker regions) to achieve an enhancement effect. These methods are intuitive, easy to use, and computationally efficient. However, methods based on HE and GC often struggle to effectively handle color information, which may result in color distortion or artifacts in the image. Additionally, in certain cases (such as extremely dark or bright images), it is challenging to obtain reliable information from the environment, leading to information loss or poor enhancement results. Unlike the aforementioned methods, the Retinex theory [9] is a framework used to explain and simulate how the human visual system perceives object colors. This theory decomposes low-light images into reflectance and illumination components, thereby recovering an underlying normally lit image. It enables a better balance between brightness enhancement and noise suppression. Jobson [10,11] and others conducted studies based on the Retinex model, gradually recognizing through exploratory research that estimating the illumination layer is key to achieving brightness enhancement. However, as these methods [1214] rely on manually designed priors, they often require meticulous parameter tuning. Inaccurate priors or regularization can lead to artifacts and color shifts in the enhanced images, significantly limiting their generalization capability, and the optimization process is usually time-consuming. Additionally, these studies often overlook the presence of noise, resulting in noise retention or amplification in the enhanced images.

Deep Learning Methods: In recent years, deep learning has achieved widespread success across various fields of computer vision tasks, such as object detection, scene segmentation, and low-light image enhancement. Based on Retinex theory, many methods utilize convolutional neural networks (CNNs) [1518] for low-light image enhancement. For example, RetinexNet, proposed by Wei et al. [19], combines Retinex theory with deep convolutional networks to estimate and adjust illumination maps, achieving image contrast enhancement, and uses BM3D [20] denoising as a post-processing step. Zhang et al. [21] developed a self-supervised CNN to address low-light enhancement tasks. RetinexDIP [22] leverages the implicit prior information inherent in neural network structures to transform the low-light image enhancement problem into a generation problem. Reference [23] combine Retinex decomposition with deep learning. However, many deep learning methods [24,25] involve cumbersome multi-stage training pipelines and perform poorly when addressing image contamination factors, often resulting in amplified noise and color distortion. Moreover, CNN-based methods have limitations in capturing long-range dependencies across different regions of the image. Star [26] applied the Transformer architecture to the low-light enhancement domain, successfully addressing the challenge of capturing long-range dependencies. Later, reference [27]proposed the IAGC model, designing a novel Transformer block that fully models pixel dependencies through a hierarchical attention mechanism from local to global, allowing poorly illuminated regions to effectively utilize information from distant regions. However, Transformer-based models frequently encounter issues with overexposure in illuminated areas. In addition, the self-attention mechanism of Transformers introduces significant computational burden and complexity when handling long sequences, which remains a major challenge.

2.2 Wavelet Transforms in Image Processing

Wavelet Transforms have recently gained significant attention in image processing as a new technique that provides multi-resolution representation while capturing both low-frequency and high-frequency components of signals or images, which helps in better understanding their structure and characteristics (Fig. 1). Given that wavelet transforms are reversible and capable of preserving all information, they have been utilized within CNN architectures to improve performance across various computer vision tasks. For instance, Bae et al. [28] demonstrated that learning CNN representations on wavelet subbands could benefit image restoration tasks. DWSR [29] uses low-resolution wavelet subbands as input to recover lost details in image super-resolution tasks. Wavelet-SRNet [30], a wavelet-based CNN, was introduced for multi-scale face super-resolution. Haar wavelet transforms are integrated with multi-resolution analysis within [31] for texture classification and image annotation. In low-light image enhancement tasks, the application of Wavelet Transforms enables targeted brightness enhancement and detail improvement by separating low-frequency luminance information from high-frequency detail information. For example, reference [32] decomposing an image into low-frequency and high-frequency components allows for enhancing the low-frequency part to improve brightness while utilizing low-frequency information to adjust high-frequency details, ensuring accuracy in details. Another approach [33] uses low-frequency restoration and high-frequency reconstruction subnetworks to enhance brightness and detail information separately, achieving a more natural low-light enhanced image. These examples show that incorporating wavelet transforms into low-light enhancement tasks holds great potential for future applications.

images

Figure 1: A schematic diagram of feature decomposition using Haar wavelet transform. Here, A(x) represents applying a low-pass filter to the original data to obtain low-frequency approximation coefficients, and D(x) represents applying a high-pass filter to the original data to obtain high-frequency detail coefficients. Each decomposition reduces the size of the features by half

3  Proposed Method

Fig. 2 illustrates the overall architecture of our proposed RetinexWT method. As shown in Fig. 2a, RetinexWT primarily consists of two components: the Illumination Estimator and the Corruption Restorer. The Illumination Estimator is inspired by the traditional Retinex model, with enhancements that introduce perturbation terms combined with frequency-domain information. To further refine the illumination feature estimation, the design integrates wavelet transform. The Corruption Restorer is based on the Illumination Guided Transformer with Wavelet (IGTW), as shown in Fig. 2b. The core unit of IGTW is the Enhanced Illumination-Guided Attention Model (EIGAM), which comprises the Illumination-Guided Attention (IGA), Nonlinear Activation-Free (NAF) Block, Layer Normalization (LN), and Feed-Forward Network (FFN). The details of IGTW are depicted in Fig. 3a.

images

Figure 2: Overview of our method. RetinexWT consists of an Illumination Estimator (a) and an Illumination Guided Transformer with Wavelet (IGTW) (b)

images

Figure 3: (a) The Enhanced Illumination-Guided Attention Model (EIGAM) incorporates illumination guidance through IGA mechanism, followed by a NAF module to reduce computational complexity. LN and a FFN are applied with residual connections to enhance feature extraction and maintain the integrity of original details. (b) In the Illumination-Guided Attention (IGA) module, the illumination feature captured by the IE is used as the query Q, while the input vectors are treated as keyvalue pairs KV to compute attention scores

3.1 Retinex-Based Framework

In the domain of low-light image enhancement, Retinex theory is commonly applied to simulate the human visual system’s perception of brightness and color. Traditional Retinex algorithms decompose an input image IRH×W×3 into a reflectance component RRH×W×3 and an illumination component LRH×W, represented by the formula:

I=RL(1)

where denotes element-wise multiplication. This approach effectively addresses illumination variation and color degradation within images. However, under low-light conditions, traditional Retinex methods lack mechanisms for handling noise and artifacts, potentially leading to significant degradation during enhancement. To address these limitations, we adopt a perturbation model, as proposed in [34]. Unlike the original Retinex formula, which assumes an undistorted image I, this model introduces perturbation terms to the reflectance R and illumination L components to better simulate degradation under low-light conditions, as follows:

I=(R+R^)(L+L^)=RL+RL^+R^L+R^L^(2)

here, R^RH×W×3 and L^RH×W represent the degradation factors, with R ideally representing a well-lit image. Subsequent feature extraction is performed via convolution to obtain the enhanced illumination component L¯. This component L¯ is applied through element-wise multiplication to brighten the low-light image I, where L¯L=1, as described by the formula:

IL¯=(R+R^)(L+L^)L¯=R(LL¯)+R(L^L¯)+R^(LL¯)+R^(L^L¯)=R+R(L^L¯)+(R^(L+L^))L¯(3)

which can be simplified as:

Ilu=IL¯=R+R(L^L¯)+(R^(L+L^))L¯=R+C(4)

where IluRH×W×3 represents the brightened image, and CRH×W×3 is the cumulative degradation term. This term accounts for various sources of degradation, including noise amplified by L^, artifacts introduced during the enhancement process. Thus, our RetinexWT model can be formulated as:

(Ilu,Flu)=IE(I,Lp),Ien=CR(Ilu,Flu)(5)

In this formulation, IE represents the Illumination Estimator, and CR denotes the Corruption Restorer. The Illumination Estimator (IE) takes I and LpRH×W as inputs and outputs both the brightened image Ilu and illumination feature Flu. The parameter Lp=meanc(I), where meanc denotes the channel-wise mean computation, serves as a metric for assessing the overall brightness level of the image. Subsequently, the Corruption Restorer (CR) uses Ilu and Flu to address noise and artifacts in the enhanced image, producing the final restored image IenRH×W×3. To enhance the model’s expressiveness, we employ a convolutional neural network (CNN) to extract illumination features and process them in conjunction with illumination guidance. By integrating contextual image information, our approach provides robust enhancement for images captured under complex low-light conditions.

3.2 Illumination Estimator

As shown in Fig. 2a, the Illumination Estimator (IE) combines the original low-light image I with the illumination prior Lp, which is obtained by calculating the mean pixel values across the channel dimension of I. This is followed by three convolutions to extract features. First, a conv 1×1 is applied to fuse I and Lp, hereby projecting the illumination prior onto the low-light image to enrich its illumination representation. Subsequently, a depth-wise separable conv 5×5 is employed to upsample the input, allowing for additional feature extraction and producing the initial illumination feature map Flu.

It is well-known that the low-frequency components in wavelet transforms predominantly capture the overall illumination and structural information of an image. The illumination prior Lp, which disregards color detail, primarily emphasizes the overall brightness and illumination information of the image. As a result, the low-frequency component of Lp more accurately represents the essence of illumination, focusing specifically on illumination-related information. To construct a more precise illumination feature that maintains robustness under low-light and complex lighting conditions, we apply a wavelet transform to Lp. The low-frequency component obtained from this transformation is then combined with the initial illumination feature to generate the final illumination feature Flu, with the feature dimension nfeat set to 40 to balance detail and computational efficiency.

Finally, another conv 1×1 layer is used for downsampling to restore a 3-channel illumination map L¯, which is then element-wise multiplied with the original low-light image I to obtain the enhanced image Ilu.

3.3 Illumination Guided Transformer with Wavelet

In the RetinexWT framework, the Corruption Restorer (IGTW) module consists of an encoder and a decoder, both based on an Illumination-Guided Transformer with Wavelet architecture and incorporating a Gated Fusion Mechanism. The encoder handles the downsampling process, while the decoder manages upsampling, as illustrated in Fig. 2b. Both the downsampling and upsampling processes are divided into two stages. Initially, the Illumination Estimator (IE) generates an illumination map Ilu, which is then downsampled through a conv 3×3 (stride = 2). This operation is intended to match the dimensions of the illumination feature Flu for subsequent processing. Following this, two levels of downsampling are performed to extract progressively deeper features, with each downsampling level employing an Enhanced Illumination-Guided Attention Model along with a Wavelet Transform Feature Decomposer Downsampling (WTFDown) module. The WTFDown module effectively downscales the input while preserving high-frequency information, enhancing IGTW’s capability for fine-grained restoration and suppressing noise during the enhancement process. With each application of WTFDown, the width and height of the image are halved, while the feature dimension is doubled. Given an initial feature dimension of C, the first downsampling stage increases this dimension to 2C, and the second stage expands it to 4C. This downsampling approach enhances feature extraction in preparation for the upsampling phase. The upsampling process mirrors the structure of downsampling, with each level comprising a deconv 2×2 (stride = 2), a conv 1×1, and an EIGAM module. Each deconvolution layer operation increases the spatial dimensions of the feature map while reducing its depth. Outputs from each deconvolution stage are then fused through the Gated Fusion Module (GFM), which aims to preserve contextual information during upsampling, restore connections across different feature levels, and incrementally suppress noise, thereby enhancing detail and information integration. Finally, the feature map undergoes a conv 3×3 (stride = 2) to further reduce its dimensionality, transforming it into a three-channel RGB format. The restored image is then combined with the illumination map Ilu to produce the final enhanced image Ien.

EIGAM. The structure of the Enhanced Illumination-Guided Attention Model (EIGAM) is depicted in Fig. 3a. Within EIGAM, the input feature Fin first undergoes processing by the Illumination-Guided Attention (IGA) module. The IGA utilizes the illumination featureFlu, generated by the Illumination Estimator (IE), to guide the attention computation effectively. Subsequently, a Nonlinear Activation Free (NAF) module is applied to reduce computational complexity, which simplifies the model architecture and decreases the resource requirements. To mitigate gradient vanishing and retain original detail information, residual connections are employed following both the IGA and NAF modules. Finally, the output feature Fout is obtained through Layer Normalization (LN) and a Feed-Forward Network (FFN). The IGA module is specifically designed to process illumination features and guide the multi-head self-attention calculation, as illustrated in Fig. 3b. To address the high computational cost associated with global multi-head self-attention in Transformers, the IGA module reshapes the input features into k heads of tokens XRH×W×C, and then further divides them into k individual heads XiRH×W×dk. dk denotes the dimensionality of each head, and dk=Ck, with C representing the dimensionality of the input features. Therefore, we treat Flu as QRH×W×dk and the input x as K, VRH×W×dk, to fuse the two input features, enabling Flu to guide the self-attention calculation of x.

Qi=XiWQ,iT,Ki=XiWK,iT,Vi=XiWV,iT(6)

here, WQi, WK,i, and WV,iRdk×dk represent the learnable parameters of the fully connected (fc) layers, with T indicating the matrix transposition. The illumination feature Flu is then utilized to encode illumination information, providing a global illumination context for the model. This feature is reshaped into k heads of tokens, denoted by YRH×W×C, and subsequently divided into k individual heads YiRH×W×dk, which serve as guidance for computing self-attention within each head.

Attention(Qi,Ki,Yi,Vi)=(YiVi)softmax(KiTQiαi)(7)

where αiR1 is a learnable parameter that serves as a scaling factor to adjust the attention scores, thereby controlling the sharpness of the attention weights. After computing attention across the k heads, these heads are reshaped back into the standard image format (B,C,H,W) and subsequently aggregated through a convolutional layer, producing an output with dimensions that align with the original input.

GFM. The Gated Fusion Module (GFM) in the Illumination Guided Transformer with Wavelet (IGTW) is designed to fuse features from the downsampling and upsampling paths, as shown in Fig. 4a. GFM first concatenates the upsampling feature Fu and downsampling feature Fd along the channel dimension to integrate the information from both sources. This concatenated feature is then processed by a point-wise conv, which mixes channel-wise information to enhance feature integration. Subsequently, a depth-wise conv is applied to capture local spatial information, preserving essential spatial structures, particularly edges and fine details. After the convolutional operations, the combined feature is split along the channel dimension into two parts: Fgate and Fcontent. where the channel splitting operation is defined as:

[Fgate,Fcontent]=Split(DConv(PConv(Concat(Fu,Fd)))).(8)

images

Figure 4: (a) Gated Fusion Module (GFM), (b) Nonlinear Activation Free (NAF)

The Fgate component is activated by GELU to produce a non-linear gating signal that dynamically modulates information flow. This gating signal is then multiplied element-wise (Hadamard product) with Fcontent, enabling adaptive feature filtering that emphasizes critical details while suppressing noise. The modulated feature is combined with the original input via a residual connection to form the final output. Compared to direct channel concatenation for detail preservation, GFM enhances information flow and strengthens feature extraction through convolutional and activation operations, which provide more sophisticated feature integration during the fusion of downsampling and upsampling paths. This approach not only mitigates information loss during downsampling and upsampling but also maintains feature consistency, enhancing the representational power of the features. As a result, the model can more effectively learn and represent the brightness and color distribution in low-light images, leading to improved enhancement quality.

NAF. We employ a module called the Nonlinear Activation Free (NAF), which eliminates traditional nonlinear activation functions to reduce computational complexity and enchance performance. As illustrated in Fig. 4b, the NAF module begins by applying a conv 1×1 to adjust the input feature’s channel count, enabling a linear transformation along the channel dimension. This is followed by a depth-wise conv 3×3 that performs independent convolutions on each channel, allowing the module to better capture spatial features such as edges and fine details. After the depth-wise conv, the feature map is processed through Simplified Channel Attention (SCA) and SimpleGate, followed by another conv 1×1 that restores the original channel count. This processed output is then combined with the input feature via a residual connection to produce the first output. The first output subsequently undergoes additional convolution and gating operations, generating a second output, which is then combined with the first output through another residual connection, resulting in the final output. The core characteristic of the NAF module is the absence of traditional nonlinear activation functions (e.g., ReLU, Sigmoid), instead replacing channel attention/GELU with Simplified Channel Attention (SCA) and SimpleGate. By removing nonlinear activations and adopting simple element-wise operations, the NAF module achieves both reduced computational complexity and preservation of feature information, making it an efficient network design.

3.4 Wavelet Transform Feature Decomposer Downsampling

Wavelet Transform is a signal processing technique that decomposes a signal into sub-signals of different frequency bands, enabling simultaneous analysis in the time and frequency domains. In the context of image processing, Wavelet Transform is employed to decompose an image into multiple sub-bands, consisting of low-frequency and high-frequency components. The low-frequency components capture the overall brightness and structural information of the image, making them suitable for global brightness enhancement. Conversely, the high-frequency components encode details such as edges and textures, while also containing noise. In low-light image enhancement tasks, separately processing the low- and high-frequency components can improve image brightness while preserving fine details, such as textures and edges. However, traditional approaches often directly apply Wavelet Transform in the downsampling layers of neural networks, substituting spatial domain features with frequency domain features. While effective in certain aspects, this substitution can lead to the loss of critical spatial information, resulting in blurry or distorted enhanced images. To address this limitation, we propose a method that integrates both frequency and spatial domain information: the Wavelet Transform Feature Decomposer Downsampling (WTFDown). This approach incorporates frequency domain feature mapping into the downsampling process within low-light enhancement networks. By combining the strengths of frequency and spatial domain representations, WTFDown aims to preserve both structural integrity and fine details in the enhanced images. The architecture of WTFDown is illustrated in Fig. 5.

images

Figure 5: The WTFDown module introduces frequency domain feature mapping into the network for downsampling. This structure uses the Haar wavelet transform to decompose the original features into high-frequency and low-frequency components, processes them separately, and then adds them together to obtain a composite of global and local features. Finally, it combines these with spatial features, allowing the model to consider features in a new representation domain

In WTFDown, the input features are first passed through a conv 1×1 enhance nonlinearity. Subsequently, a Haar wavelet transform is applied to decompose the spatial features into four frequency-domain components: one low-frequency component (A) and three high-frequency components, which are further categorized as horizontal high-frequency (H), vertical high-frequency (V), and diagonal high-frequency (D) components. The low-frequency features are processed through a convolutional layer to learn feature representations and extract global structural information. For the high-frequency components, the horizontal, vertical, and diagonal features are concatenated and subjected to a point-wise conv (conv 1×1) followed by Batch Normalization. This operation reduces dimensionality and enhances nonlinearity, effectively strengthening the representation of edges and local details within the high-frequency features, which is critical for preserving fine details in low-light images. Finally, the processed low-frequency and high-frequency features are combined through element-wise addition to produce a composite feature representation. This fusion integrates global and local information, preserving the overall structure while enhancing edges and fine details. By balancing the contributions of different frequency components, this approach improves the representation quality and enhances the visual detail in the enhanced image. The WTFDown method leverages the Haar wavelet transform to efficiently decompose image signals, introducing frequency information into the network for a more comprehensive feature representation. As shown in Fig. 1, the Haar wavelet transform involves a series of filtering steps. The low-pass filter, denoted by A(x), is applied to the original data to capture the low-frequency components, while the high-pass filter, denoted by D(x), extracts the high-frequency components. Specifically, cA represents the low-frequency component, while cH, cV, and cD correspond to the horizontal, vertical, and diagonal high-frequency components, respectively. With each successive filtering operation, the size of the features is halved. This process ensures a more detailed and multi-scale representation of the image’s characteristics, enabling the network to effectively utilize both low and high-frequency information for enhanced performance.

Specifically, the input feature XRH×W×C first undergoes a conv 1×1 to enhance non-linearity. It is then passed through the Haar wavelet transform, which converts the spatial features into four frequency domain components: one low-frequency component A, and three high-frequency components. The high-frequency components are further divided into horizontal high-frequency component H, vertical high-frequency component V, and diagonal high-frequency component D. For each channel XcRH×W, the wavelet transform is computed as follows:

Ac(i,j)=Xc(i,2j1)+Xc(i,2j)2Dc(i,j)=Xc(i,2j1)Xc(i,2j)2(9)

where Ac(i,j) and Dc(i,j) represent the low-frequency approximation coefficients and high-frequency detail coefficients of channel c, respectively. The variable i denotes the row index, ranging from 1 to H, while j denotes the column index, ranging from 1 to W2 Subsequently, the Haar wavelet transform is applied to the approximation coefficients and detail coefficients along each column, expressed as:

{Ac=AAc(i,j)=Ac(i,2j1)+Ac(i,2j)2Hc=ADc(i,j)=Dc(i,2j1)Dc(i,2j)2Vc=DAc(i,j)=Ac(i,2j1)Ac(i,2j1)Dc=DDc(i,j)=Dc(i,2j1)Dc(i,2j)(10)

here, the Ac represents the low-frequency approximation coefficients of a single channel, while Hc, Vc, and Dc correspond to the high-frequency detail coefficients in the horizontal, vertical, and diagonal directions, respectively. Subsequently, the three high-frequency components are concatenated, and point-wise conv is applied to reduce dimensionality. This operation removes unimportant information (such as noise) while retaining critical features (such as edges and detail information). As a result, the final high-frequency feature Fh and low-frequency feature Fl are obtained, whose formula is given as follows:

Fl,Fh=fw(x)=(ϕ(conv1×1(A)),ϕ(conv1×1(Cat(H,V,D))))(11)

where ϕ() represents batch normalization. Finally, the obtained high-frequency and low-frequency features are added together to provide a downsampled feature map containing frequency domain information.

4  Experiment

This section begins by outlining the implementation details, datasets, and evaluation metrics used in the study. Subsequently, it presents both quantitative and qualitative results, comparing the proposed approach with state-of-the-art methods for low-light image enhancement. Finally, an ablation study is conducted to evaluate the contribution and effectiveness of each component within the proposed model.

4.1 Experimental Settings

Implementation Detail. Our RetinexWT is implemented in PyTorch and trained on an NVIDIA RTX A6000 GPU (CUDA 11.8, Python 3.9, PyTorch 2.0) using the Adam optimizer [35] with β1=0.9, β2=0.999 for 2.5×105 iterations. The learning rate starts at 2×104 and decays to 1×106 via a cosine annealing schedule [36]. Training uses 128×128 cropped paired patches, batch size 8, with random rotation and flipping for augmentation, and MAE loss for optimization. The wavelet kernel is fixed as the Haar basis for its computational efficiency and orthogonality, the Transformer operates on single-channel tokens with a global attention window to capture full-image dependencies, and feature dimensions are set considering a trade-off between performance and Graphics Processing Unit (GPU) memory constraints. The choice of hyperparameters was guided by both prior work and empirical validation: the initial learning rate ensures stable convergence without oscillation, the batch size balances gradient diversity with GPU memory feasibility, the iteration number corresponds to the point where validation performance plateaued, and the patch size allows global illumination and local details to be captured efficiently. These settings collectively provide a balance between convergence stability, computational efficiency, and final enhancement quality.

Datasets and Metrics. We evaluated our method on the LOL dataset, which is divided into two versions: LOL-v1 [19] and LOL-v2 [37]. In LOL-v1, the dataset contains 485 training image pairs and 15 test image pairs, with each pair comprising a low-light input image and a corresponding high-quality reference image. The LOL-v2 dataset is further categorized into two subsets: LOL-v2-real and LOL-v2-synthetic. The training-to-test data ratios in LOL-v2-real and LOL-v2-synthetic are 689:100 and 900:100, respectively. The distribution of training and test pairs within these subsets mirrors the structure of LOL-v1. In addition to the paired LOL datasets, we utilized several unpaired datasets to evaluate the model’s generalization capabilities. These datasets include LIME [38] (10 images), DICM [39] (64 images), MEF [40] (17 images), VV [41] (24 images), and ExDark [42] (7358 images). These datasets provide diverse visual conditions, facilitating a comprehensive assessment of the model. For performance evaluation, we employed two full-reference metrics—PSNR, SSIM [43]—to quantitatively assess the quality of the enhanced images compared to the reference images in the paired datasets.

4.2 Comparison with the State-of-the-Art

We compared the proposed method with various deep learning-based SOTA methods listed in Table 1, including SID [44], 3DLUT [45], DeepUPE [46], SCI [47], RetinexNet [19], and others. The datasets used for comparison include synthetic data from LOL-v1 [19] and both real and synthetic data from LOL-v2 [37]. For a fair comparison, we utilized the official pre-trained models of each method and their publicly available code to obtain the quantitative results.

images

Quantitative analysis. The evaluation metrics used for comparison are Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity Index Measure (SSIM), where PSNR reflects the overall enhancement quality, with higher values indicating better performance, and SSIM measures the preservation of high-frequency details and structural information, with higher values signifying superior retention of image content. Our proposed RetinexWT method demonstrates a significant performance advantage over the aforementioned SOTA methods on the LOL dataset.

Specifically, when compared to other Retinex-based deep learning SOTA methods, including SID [44], DeepUPE [46], SCI [47], LIME [38], RetinexNet [19], RUAS [50], FIDE [51], KinD [20], and Retinexformer [34], our method achieves notable improvements in both PSNR and SSIM across the LOL-v1 and LOL-v2 datasets. In terms of PSNR, RetinexWT achieves enhancements of 0.27 dB, 0.07 dB, and 0.35 dB on LOL-v1, LOL-v2-real, and LOL-v2-synthetic datasets, respectively. Similarly, SSIM improvements of 0.007 dB, 0.02 dB, and 0.001 dB are observed on the same datasets, underscoring the superior capability of our method to balance enhancement quality and detail preservation. As shown in Table 1, further highlighting the efficacy of our approach. Furthermore, RetinexWT attains these improvements with a computational cost of only 16.83 G FLOPs and 2.11 M parameters, which is among the lowest in all compared methods. Notably, when compared to models of similar complexity, such as Retinexformer, our approach achieves higher PSNR and SSIM scores across all datasets, underscoring its favorable performance-efficiency trade-off.

Quantitative analysis. To provide a more comprehensive and intuitive comparison, we conducted a visual evaluation of our RetinexWT method against other state-of-the-art (SOTA) approaches. Fig. 6 and Fig. 7 are drawn from the LOL dataset (LOL-v1 [19] and LOL-v2 [37]), where the input consists of severely degraded low-light images. The results reveal several limitations of existing methods: noise amplification (e.g., RetinexNet in Fig. 7), underexposure or overexposure (e.g., LEDNet and RUAS in Fig. 6), color distortion (e.g., Restormer in Fig. 7), and the introduction of black spots and artifacts (e.g., EnGAN and Kind in Fig. 7).

images

Figure 6: The qualitative experimental results of our method and SOTA approaches on the LOL-v1 [19] and LOL-v2 [37] datasets demonstrate clear advantages. Upon closer examination with magnification, our method achieves superior visual effects

images

Figure 7: The qualitative experimental results of our method and SOTA approaches on the LOL-v2 [38] datasets demonstrate clear advantages. Upon closer examination with magnification, our method achieves superior visual effects

In contrast, our method demonstrates notable improvements in both global and local enhancement. It effectively manages exposure levels, significantly improves visibility and contrast, and minimizes noise, thereby delivering superior visual quality. Additionally, we present unpaired benchmark results in Fig. 8, where our method exhibits precise exposure control and vibrant color restoration. In particular, Fig. 8 deliberately includes challenging cases such as extremely low-light, high-noise, and overexposure-prone scenes, demonstrating not only the strong generalization capability of our approach across diverse low-light conditions, but also its robustness against severe degradations.

images

Figure 8: Visual results on LIME [38], DICM [39], MEF [40], VV [41], ExDark [42], and we selected one “hard-sample” image from each dataset and compared it against the other methods, our method achieves superior visual effects

4.3 Low-Light Object Detection

In this section, we examine the effect of various preprocessing methods on the efficiency of object detection under low-light conditions. Specifically, we perform experiments on the ExDark [42] dataset, which comprises 7363 real-world nighttime images spanning 12 object categories. To evaluate the impact of preprocessing, we first apply our proposed RetinexWT method along with several comparative approaches, and subsequently employ YOLO-v3 [55] as the object detection model to assess the detection performance.

Quantitative analysis. In Table 2, we present the average precision (AP) scores achieved by different methods used as preprocessing steps for object detection. RetinexWT stands out by not only achieving the highest overall average AP score but also securing the highest AP in five specific categories: Bicycle, Cat, Dog, People, and Table. Furthermore, it demonstrates strong performance by achieving the second-highest AP in the categories of Boat, Bus, and Motor. These results highlight the effectiveness and robustness of RetinexWT in enhancing object detection performance across diverse categories.

images

Quantitative analysis. We performed a visual comparison of object detection results on the original low-light images and those enhanced by RetinexWT, as illustrated in Fig. 9. In the detection results on the original low-light images, some categories were missed, and the overall detection accuracy was significantly limited. In contrast, the enhanced images processed by RetinexWT demonstrated comprehensive category detection with noticeable improvements in detection accuracy.

images

Figure 9: Visual comparison of the impact of our method on low-light object detection

4.4 Ablation Study

In this section, we conduct an ablation study by progressively integrating the proposed components into the framework and analyzing their coupling mechanisms. All experiments are performed on the LOL-v1 and LOL-v2-syn datasets. As shown in Table 3, when the IGA, WTFDown, GFM, and NAF components are excluded, the model achieves PSNR values of 22.748 and 24.572 dB, and SSIM values of 0.823 and 0.887, respectively, indicating the limitations of the baseline framework.

images

The improvements brought by each component are closely tied to their roles and interactions in the processing pipeline. Specifically, the IGA exploits illumination priors from the Illumination Estimator to guide global context modeling, increasing the PSNR by 1.032 dB on both datasets. The WTFDown enhances this process by providing frequency-domain decomposition, enabling the network to jointly utilize low-frequency illumination and high-frequency detail information, contributing an additional 0.196 and 0.238 dB. When combined, IGA and WTFDown exhibit a synergistic effect, raising the PSNR gains to 1.222 and 1.232 dB, as the attention mechanism can better leverage the decomposed multi-scale features.

The GFM further strengthens this synergy by selectively integrating upsampled and downsampled features, thereby preserving edge and texture details while suppressing noise, yielding improvements of 0.100 dB and 0.162 dB. The NAF block reduces computational complexity while maintaining feature integrity, providing an additional 0.080 and 0.110 dB gain. When IGA, WTFDown, and GFM are combined, the PSNR is further improved by 0.018 and 0.067 dB compared to the preceding configuration.

Ultimately, the complete RetinexWT framework, integrating all four components, achieves the highest PSNR and SSIM values among all configurations, with improvements of 1.328 and 1.338 dB in PSNR, and 0.015 and 0.045 in SSIM, compared to the baseline. This confirms that the proposed components are not independent “stacked” modules, but rather interdependent mechanisms in which illumination guidance, frequency-domain decomposition, gated spatial fusion, and efficient feature refinement operate in a cohesive manner to enhance low-light image restoration performance.

5  Conclusions

This paper presented a novel deep learning model for low-light image enhancement, which integrated Retinex theory with wavelet transform. The model introduced a perturbation term into the traditional Retinex framework and leveraged wavelet transform to estimate illumination information for initial enhancement. To address detail loss, a Corruption Restorer module was developed, which combined wavelet transform with an illumination-guided transformer to effectively recover lost details. Extensive experiments validated the superiority of the proposed approach in handling low-light image enhancement tasks. Both quantitative evaluations and qualitative visual comparisons demonstrated that the method consistently outperformed state-of-the-art techniques, achieving remarkable improvements in both metrics and visual fidelity.

Our future research will focus on three key aspects: computational efficiency, robustness in extreme scenarios, and cross-domain generalization. First, we will reduce inference latency through model compression, quantization, and operator-level optimization to achieve real-time enhancement, particularly in application domains such as autonomous driving and security monitoring where low-latency and high reliability are critical. Second, for extremely dark and high-noise conditions, we will introduce learnable noise modeling and degradation-consistent regularization to further improve detail recovery and noise suppression capabilities. Additionally, leveraging large-scale unlabeled data, we will explore self-supervised paradigms based on contrastive learning or masked reconstruction to reduce reliance on paired training data and enhance cross-scene generalization. Finally, we will jointly optimize the enhancement network with downstream tasks such as object detection and semantic segmentation, enabling mutual reinforcement between low-light enhancement and high-level perception objectives. This approach aims to achieve simultaneous improvements in perceptual quality, task accuracy, and computational feasibility under hardware-constrained scenarios.

Acknowledgement: Thanks for the support from my teachers and friends during the writing of this thesis.

Funding Statement: This work is supported in part by the National Natural Science Foundation of China [Grant number 62471075], the Major Science and Technology Project Grant of the Chongqing Municipal Education Commission [Grant number KJZD-M202301901].

Author Contributions: The authors confirm contribution to the paper as follows: Methodologies, coding, and thesis writing, Hongji Chen; experimental guidance, thesis writing revision, Jianxun Zhang; dataset processing, Tianze Yu and Yingzhu Zeng; experimental data organization, Huan Zeng. All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials: Not applicable.

Ethics Approval: Not applicable.

Conflicts of Interest: The authors declare no conflicts of interest to report regarding the present study.

References

1. Land EH, McCann JJ. Lightness and retinex theory. J Opt Soc Am. 1971;61(1):1–11. doi:10.1364/josa.61.000001. [Google Scholar] [PubMed] [CrossRef]

2. Roy S, Bhalla K, Patel R. Mathematical analysis of histogram equalization techniques for medical image enhancement: a tutorial from the perspective of data loss. Multimed Tools Appl. 2024;83(5):14363–92. doi:10.1007/s11042-023-15799-8. [Google Scholar] [CrossRef]

3. Dhal KG, Das A, Ray S, Gálvez J, Das S. Histogram equalization variants as optimization problems: a review. Arch Comput Methods Eng. 2021;28(3):1471–96. doi:10.1007/s11831-020-09425-1. [Google Scholar] [CrossRef]

4. Dyke RM, Hormann K. Histogram equalization using a selective filter. Vis Comput. 2023;39(12):6221–35. doi:10.1007/s00371-022-02723-8. [Google Scholar] [PubMed] [CrossRef]

5. Jha K, Sakhare A, Chavhan N, Lokulwar PP. A review on image enhancement techniques using histogram equalization.grenze. Int J Eng Technol (GIJET). 2024;10(1):923–8. [Google Scholar]

6. Sun X, Fang H, Yang Y, Zhu D, Wang L, Liu J, et al. Robust retinal vessel segmentation from a data augmentation perspective. In: Ophthalmic Medical Image Analysis: 8th International Workshop, OMIA 2021. Cham, Switzerland: Springer International Publishing; 2021. p. 189–98. [Google Scholar]

7. Zhu Z, Wei H, Hu G, Li Y, Qi G, Mazur N. A novel fast single image dehazing algorithm based on artificial multiexposure image fusion. IEEE Trans Instrum Meas. 2020;70:1–23. doi:10.1109/tim.2020.3024335. [Google Scholar] [CrossRef]

8. Rahman S, Rahman MM, Abdullah-Al-Wadud M, Al-Quaderi GD, Shoyaib M. An adaptive gamma correction for image enhancement. EURASIP J Image Video Process. 2016;2016(1):35. doi:10.1186/s13640-016-0138-1. [Google Scholar] [CrossRef]

9. Provenzi E, De Carli L, Rizzi A, Marini D. Mathematical definition and analysis of the Retinex algorithm. J Opt Soc Am A. 2005;22(12):2613–21. doi:10.1364/josaa.22.002613. [Google Scholar] [PubMed] [CrossRef]

10. Jobson DJ, Rahman ZU, Woodell GA. A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans Image Process. 1997;6(7):965–76. doi:10.1109/83.597272. [Google Scholar] [PubMed] [CrossRef]

11. Jobson DJ, Rahman ZU, Woodell GA. Properties and performance of a center/surround retinex. IEEE Trans Image Process. 1997;6(3):451–62. doi:10.1109/83.557356. [Google Scholar] [PubMed] [CrossRef]

12. Rahman ZU, Jobson DJ, Woodell GA. Retinex processing for automatic image enhancement. J Electron Imaging. 2004;13(1):100–10. doi:10.1117/1.1636183. [Google Scholar] [CrossRef]

13. Wang S, Zheng J, Hu HM, Li B. Naturalness preserved enhancement algorithm for non-uniform illumination images. IEEE Trans Image Process. 2013;22(9):3538–48. doi:10.1109/tip.2013.2261309. [Google Scholar] [PubMed] [CrossRef]

14. Wu W, Weng J, Zhang P, Wang X, Yang W, Jiang J. Uretinex-net: retinex-based deep unfolding network for low-light image enhancement. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 Jun 18–24; New Orleans, LA, USA. [Google Scholar]

15. Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang MH, et al. Learning enriched features for real image restoration and enhancement. In: Computer Vision—ECCV 2020 (ECCV 2020). Cham, Switzerland: Springer; 2020. p. 492–511. [Google Scholar]

16. Lv F, Lu F, Wu J, Lim C. Mbllen: low-light image/video enhancement using CNNs. In: British Machine Vision Conference (BMVC); 2018 Sep 2–6; Newcastle, UK. [Google Scholar]

17. Lore KG, Akintayo A, Sarkar S. Llnet: a deep autoencoder approach to natural low-light image enhancement. Pattern Recognit. 2017;61(6):482–95. doi:10.1016/j.patcog.2016.06.008. [Google Scholar] [CrossRef]

18. Xu X, Wang R, Fu CW, Jia J. SNR-aware low-light image enhancement. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 Jun 18–24; New Orleans, LA, USA. p. 17714–24. [Google Scholar]

19. Wei C, Wang W, Yang W, Liu J. Deep retinex decomposition for low-light enhancement. In: Proceedings of the British Machine Vision Conference (BMVC); 2018 Sep 2–6; Newcastle, UK. [Google Scholar]

20. Dabov K, Foi A, Katkovnik V, Egiazarian K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans Image Process. 2007;16(8):2080–95. doi:10.1109/tip.2007.901238. [Google Scholar] [PubMed] [CrossRef]

21. Zhang Y, Zhang J, Guo X. Kindling the darkness: a practical low-light image enhancer. In: MM ’19: Proceedings of the 27th ACM International Conference on Multimedia; 2019 Oct 21–25; Nice France. p. 1632–40. [Google Scholar]

22. Zhao Z, Xiong B, Wang L, Ou Q, Yu L, Kuang F. RetinexDIP: a unified deep framework for low-light image enhancement. IEEE Trans Circuits Syst Video Technol. 2022;32(3):1076–88. doi:10.1109/tcsvt.2021.3073371. [Google Scholar] [CrossRef]

23. Zhang Y, Guo X, Ma J, Liu W, Zhang J. Beyond brightening low-light images. Int J Comput Vis. 2021;129(4):1013–37. doi:10.1007/s11263-020-01407-x. [Google Scholar] [CrossRef]

24. Jiang Y, Gong X, Liu D, Cheng Y, Fang C, Shen X, et al. EnlightenGAN: deep light enhancement without paired supervision. IEEE Trans Image Process. 2021;30:2340–9. doi:10.1109/tip.2021.3051462. [Google Scholar] [PubMed] [CrossRef]

25. Mi A, Luo W, Qiao Y, Huo Z. Rethinking Zero-DCE for low-light image enhancement. Neural Process Lett. 2024;56(2):93. doi:10.1007/s11063-024-11565-5. [Google Scholar] [CrossRef]

26. Zhang Z, Jiang Y, Jiang J, Wang X, Luo P, Gu J. STAR: a structure-aware lightweight transformer for real-time image enhancement. In: Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision; 2021 Oct 11–17; Montreal, QC, Canada. p. 4106–15. [Google Scholar]

27. Wang Y, Liu Z, Liu J, Xu S, Liu S. Low-light image enhancement with illumination-aware gamma correction and complete image modelling network. In: Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision; 2023 Oct 1–6; Paris, France. p. 13128–37. [Google Scholar]

28. Bae W, Yoo J, Ye JC. Beyond deep residual learning for image restoration: persistent homology-guided manifold simplification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW2017 Jul 21–26; Honolulu, HI, USA. p. 1141–9. [Google Scholar]

29. Guo T, Mousavi HS, Vu TH, Monga V. Deep wavelet prediction for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; 2017 Jul 21–26; Honolulu, HI, USA. p. 1100–9. [Google Scholar]

30. Huang H, He R, Sun Z, Tan T. Wavelet-SRNet: a wavelet-based CNN for multi-scale face super resolution. In: Proceedings of the 2017 IEEE International Conference on Computer Vision; 2017 Oct 22–29; Venice, Italy. p. 1689–97. [Google Scholar]

31. Fujieda S, Takayama K, Hachisuka T. Wavelet convolutional neural networks. arXiv:1805.08620. 2018. [Google Scholar]

32. Zou W, Gao H, Yang W, Liu T. Wave-Mamba: wavelet state space model for ultra-high-definition low-light image enhancement. In: Proceedings of the 32nd ACM International Conference on Multimedia; 2024 Oct 28–Nov 1; Melbourne, VIC, Australia. p. 1534–43. [Google Scholar]

33. Xiang Y, Hu G, Chen M, Emam M. WMANet: wavelet-based multi-scale attention network for low-light image enhancement. IEEE Access. 2024;12(6):105674–85. doi:10.1109/access.2024.3434531. [Google Scholar] [CrossRef]

34. Cai Y, Bian H, Lin J, Wang H, Timofte R, Zhang Y. Retinexformer: one-stage retinex-based transformer for low-light image enhancement. In: Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision; 2023 Oct 1–6; Paris, France. p. 12504–13. [Google Scholar]

35. Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv:1412.6980. 2014. [Google Scholar]

36. Loshchilov I, Hutter F. SGDR: stochastic gradient descent with warm restarts. arXiv:1608.03983. 2016. [Google Scholar]

37. Yang W, Wang W, Huang H, Wang S, Liu J. Sparse gradient regularized deep retinex network for robust low-light image enhancement. IEEE Trans Image Process. 2021;30:2072–86. doi:10.1109/tip.2021.3050850. [Google Scholar] [PubMed] [CrossRef]

38. Guo X, Li Y, Ling H. LIME: low-light image enhancement via illumination map estimation. IEEE Trans Image Process. 2017;26(2):982–93. doi:10.1109/tip.2016.2639450. [Google Scholar] [PubMed] [CrossRef]

39. Lee C, Lee C, Kim CS. Contrast enhancement based on layered difference representation. In: Proceedings of the 19th IEEE International Conference on Image Processing (ICIP); 2012 Sep 30–Oct 3; Orlando, FL, USA. p. 965–8. [Google Scholar]

40. Ma K, Zeng K, Wang Z. Perceptual quality assessment for multi-exposure image fusion. IEEE Trans Image Process. 2015;24(11):3345–56. doi:10.1109/tip.2015.2442920. [Google Scholar] [PubMed] [CrossRef]

41. Vonikakis V, Kouskouridas R, Gasteratos A. On the evaluation of illumination compensation algorithms. Multimed Tools Appl 2018;77(7):9211–33. doi:10.1007/s11042-017-4783-x. [Google Scholar] [CrossRef]

42. Loh YP, Chan CS. Getting to know low-light images with the exclusively dark dataset. Comput Vis Image Underst. 2019;178:30–42. doi:10.1016/j.cviu.2018.10.010. [Google Scholar] [CrossRef]

43. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process. 2004;13(4):600–12. doi:10.1109/tip.2003.819861. [Google Scholar] [PubMed] [CrossRef]

44. Chen C, Chen Q, Do MN, Koltun V. Seeing motion in the dark. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV2019 Oct 27–Nov 2; Seoul, Republic of Korea. p. 3184–93. [Google Scholar]

45. Zeng H, Cai J, Li L, Cao Z, Zhang L. Learning image-adaptive 3D lookup tables for high performance photo enhancement in real-time. IEEE Trans Pattern Anal Mach Intell. 2020;42(12):3158–72. doi:10.1109/tpami.2020.3026740. [Google Scholar] [PubMed] [CrossRef]

46. Wang R, Zhang Q, Fu CW, Shen X, Zheng WS, Jia J. Underexposed photo enhancement using deep illumination estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2019 Jun 15–20; Long Beach, CA, USA. p. 6842–50. [Google Scholar]

47. Ma L, Ma T, Liu R, Fan X, Luo Z. Toward fast, flexible, and robust low-light image enhancement. arXiv:2203.07911. 2022. [Google Scholar]

48. Moran S, Marza P, McDonagh S, Parisot S, Slabaugh G. DeepLPF: deep local parametric filters for image enhancement. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 13–19; Seattle, WA, USA. p. 12826–35. [Google Scholar]

49. Guo C, Li C, Guo J, Loy CC, Hou J, Kwong S, et al. Zero-reference deep curve estimation for low-light image enhancement. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020 Jun 13–19; Seattle, WA, USA. p. 1780–9. [Google Scholar]

50. Liu R, Ma L, Zhang J, Fan X, Luo Z. Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021 Jun 20–25; Nashville, TN, USA. p. 10556–65. [Google Scholar]

51. Xu K, Yang X, Yin B, Lau RWH. Learning to restore low-light images via decomposition-and-enhancement. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 14–19; Seattle, WA, USA. p. 2278–87. [Google Scholar]

52. Zhou S, Li C, Loy CC. LEDNet: joint low-light enhancement and deblurring in the dark. In: Computer Vision—ECCV 2022: 17th European Conference. Cham, Switzerland: Springer; 2022. p. 573–89. [Google Scholar]

53. Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang MH. Restormer: efficient transformer for high-resolution image restoration. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 Jun 18–24; New Orleans, LA, USA. p. 5718–29. [Google Scholar]

54. Wang T, Zhang K, Shen T, Luo W, Stenger B, Lu T. Ultra-high-definition low-light image enhancement: a benchmark and transformer-based method. In: AAAI’23/IAAI’23/EAAI’23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence. Washington, DC, USA: AAAI Press; 2023. p. 2654–62. [Google Scholar]

55. Zhao L, Li S. Object detection algorithm based on improved YOLOv3. Electronics. 2020;9(3):537. doi:10.3390/electronics9030537. [Google Scholar] [CrossRef]


Cite This Article

APA Style
Chen, H., Zhang, J., Yu, T., Zeng, Y., Zeng, H. (2026). RetinexWT: Retinex-Based Low-Light Enhancement Method Combining Wavelet Transform. Computers, Materials & Continua, 86(2), 1–20. https://doi.org/10.32604/cmc.2025.067041
Vancouver Style
Chen H, Zhang J, Yu T, Zeng Y, Zeng H. RetinexWT: Retinex-Based Low-Light Enhancement Method Combining Wavelet Transform. Comput Mater Contin. 2026;86(2):1–20. https://doi.org/10.32604/cmc.2025.067041
IEEE Style
H. Chen, J. Zhang, T. Yu, Y. Zeng, and H. Zeng, “RetinexWT: Retinex-Based Low-Light Enhancement Method Combining Wavelet Transform,” Comput. Mater. Contin., vol. 86, no. 2, pp. 1–20, 2026. https://doi.org/10.32604/cmc.2025.067041


cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 243

    View

  • 45

    Download

  • 0

    Like

Share Link