FENet: Underwater Image Enhancement via Frequency Domain Enhancement and Edge-Guided Refinement

Xinwei Zhu; Jianxun Zhang; Huan Zeng

doi:10.32604/cmc.2025.068578

icon Open Access

ARTICLE

FENet: Underwater Image Enhancement via Frequency Domain Enhancement and Edge-Guided Refinement

Xinwei Zhu, Jianxun Zhang^*, Huan Zeng

Department of Computer Science and Engineering, Chongqing University of Technology, Chongqing, 400054, China

* Corresponding Author: Jianxun Zhang. Email: email

Computers, Materials & Continua 2026, 86(2), 1-25. https://doi.org/10.32604/cmc.2025.068578

Received 01 June 2025; Accepted 06 October 2025; Issue published 09 December 2025

Abstract

Underwater images often affect the effectiveness of underwater visual tasks due to problems such as light scattering, color distortion, and detail blurring, limiting their application performance. Existing underwater image enhancement methods, although they can improve the image quality to some extent, often lead to problems such as detail loss and edge blurring. To address these problems, we propose FENet, an efficient underwater image enhancement method. FENet first obtains three different scales of images by image downsampling and then transforms them into the frequency domain to extract the low-frequency and high-frequency spectra, respectively. Then, a distance mask and a mean mask are constructed based on the distance and magnitude mean for enhancing the high-frequency part, thus improving the image details and enhancing the effect by suppressing the noise in the low-frequency part. Affected by the light scattering of underwater images and the fact that some details are lost if they are directly reduced to the spatial domain after the frequency domain operation. For this reason, we propose a multi-stage residual feature aggregation module, which focuses on detail extraction and effectively avoids information loss caused by global enhancement. Finally, we combine the edge guidance strategy to further enhance the edge details of the image. Experimental results indicate that FENet outperforms current state-of-the-art underwater image enhancement methods in quantitative and qualitative evaluations on multiple publicly available datasets.

Keywords

Detail extraction; frequency domain operation; edge guidance; image enhancement

1 Introduction

In recent years, underwater images play an increasingly important role in scientific research and industrial fields such as marine military, marine environmental protection and marine engineering, and have important applications in marine research [1] and underwater robotics [2], because observing marine organisms, carrying out underwater operations and other related marine activities [3] require clear images, however, due to the refraction of the water and the scattering of the suspended particles in the water [4] will cause a substantial attenuation of light, and the diversity of the water body may also cause multiple biases, making the final presentation of lower quality underwater images unsatisfactory. Meanwhile, the diversity of the water body may also cause bias in many aspects, so that the quality of the final underwater image presented is not satisfactory, and many visual tasks will fail on the lower-quality underwater image, so the enhancement of underwater images is necessary as one of the key technologies for underwater observation and operation.

Unlike imaging in atmospheric media, the water medium is more complex, and therefore the quality of underwater images is usually severely compromised. Light propagation in water is affected by absorption and scattering, resulting in low contrast, color cast, blurring, and noise [5] in underwater images, which reduces the accuracy of subsequent tasks (e.g., target detection [6], image segmentation [7], autonomous underwater vehicle [3], etc.). These problems are especially serious in turbid waters. Compared to single-noise image restoration tasks like low-light enhancement [8] and defogging [9], underwater image enhancement tasks require attention to more influential factors. Not only the problem of insufficient light,but also the bluish or greenish color of the image due to the absorption of light by the water,are of concern for underwater image enhancement. In addition, the optical properties of different waters vary greatly, leading to challenges in the generalization ability of underwater image enhancement methods in different scenes.

In order to improve the quality of underwater images, researchers have proposed a variety of image enhancement methods, including physical model-driven methods, methods based on traditional image processing, and methods based on deep learning. Among them, physical model-based methods try to simulate the underwater light propagation process to recover clear images; physical model-based methods for underwater image enhancement were first proposed in the 1880s and 1890s, in which they considered the effects of light absorption and inference system through a physical model of underwater optical [10] and established a mathematical model to describe the underwater inference process. Traditional image processing-based methods (e.g., histogram [11], Retinex theory [12], etc.) are used to improve the visual effect by adjusting the contrast and brightness; however, in the present day, these methods are also combined with deep learning for image enhancement. Bai et al. [13] proposed a new image enhancement algorithm based on Retinex theory—RetinexMamba, which enhances the details of low-light images and improves the visibility of images more effectively by combining with the Mamba algorithm. If only traditional algorithms such as physical models or histogram equalization have significant limitations, such as model assumptions that are too simple, difficult to deal with noise and details, or poorly adapted to complex scenes, it is for this reason that today’s development of deep learning combines these methods to work together.

Nowadays researchers usually combine the above methods with deep learning. Cong et al. [14] proposed a method for underwater image enhancement that combines physical modeling and Generative Adversarial Network (GAN). The model designs a parameter estimation network to learn the parameters of the physical model and uses the generated color-enhanced images as auxiliary information. In addition, a bi-discriminative apparatus is used to constrain the results with style-content adversarial constraints to enhance the realism and visual aesthetics of the images.

With the rapid development of deep learning, more and more researchers have begun to adopt end-to-end data-driven approaches for underwater image enhancement, which have achieved remarkable results. However, there are still non-negligible problems with the existing deep learning-based methods. For example, 1) GAN models have strong expressive ability, but due to training instability, their enhancement results may be inconsistent in different scenes. And there are GAN that, in order to deceive the discriminator, sometimes generate unrealistic details or over-sharpened regions, which is what we call artifacts. 2) The commonly used downsampling approach will reduce the size of the feature map while ignoring some details, especially the important texture, edges, and other structural information in the image. This will lead to a decrease in the clarity of the enhanced image and blurring of the edges. 3) Compared with low-light enhancement and defogging, underwater images are more complex, and the presence of many floating objects in the water makes it possible to amplify the noise during the enhancement process, especially in the underwater scenes with high turbidity. To address this, we propose an image enhancement method through frequency domain enhancement and edge guidance, aiming to improve the visual quality of underwater images. The main contributions and innovations are reflected in the following three aspects.

• In this paper, we propose an innovative underwater image enhancement framework, FENet, which combines frequency domain enhancement and multi-task learning strategies. Firstly, FENet will downsample the input image at multiple scales, and in the frequency domain operation, a mask based on amplitude mean and a mask based on distance are used to enhance the detail information in the high-frequency part, and at the same time, the low-frequency noise is suppressed to avoid the negative effect of noise on the image.

• In order to solve the common detail loss problem in underwater images, FENet introduces a multi-stage residual feature aggregation module into the network structure. Specifically, the network effectively preserves the detail information of the input image by stacking deep residual modules, combined with jump connections and deep feature fusion mechanisms. The residual learning module avoids the loss of image details after frequency domain enhancement by gradually extracting the residual information of clear and blurred features in the image. In particular, the output of the network is synthesized with the original input by means of residual fusion, which significantly improves the clarity and detail performance of the image. This strategy effectively avoids the common information loss problem in traditional global enhancement methods.

• In order to cope with the edge-blurring problem prevalent in underwater images, FENet introduces an edge-guided enhancement mechanism based on edge guidance in the post-processing stage. By extracting the edge features of the image using the Sobel convolution operation during training, FENet is able to accurately recover the edge details in the post-processing stage. In addition, the network combines the edge enhancement features with the initial input of the image, further extracts the high-level semantic features by cascading convolution, and reinforces the edge information in the fusion stage, resulting in clearer boundaries of the image. The method effectively improves the edge and detail recovery in the image, especially when dealing with complex edge regions, showing higher recognition ability and visual effect.

2 Related Work

2.1 Traditional Underwater Image Enhancement

Underwater image enhancement techniques have undergone many years of research and development and can now be broadly categorized into two main categories: traditional image processing methods and deep learning methods. Traditional methods can be further divided into physical model-based methods and traditional model-free methods.

Physical model-based methods try to reconstruct the real underwater imaging process to realize the reverse correction, and representative works such as the DCP (Dark Channel Prior) method [15], the UDCP (underwater dark channel prior) method [16], and the RCP (Red Channel Prior) [17]. Although physical methods have achieved remarkable results in underwater image enhancement, they usually rely on a priori knowledge of the underwater environment (e.g., attenuation coefficient, depth information, etc.), which is difficult to accurately obtain in practical applications. Second, physical models often assume that the underwater environment is homogeneous, while the real underwater environment is usually complex and variable, resulting in a limited ability to generalize the model to the real scene. In addition, the computational complexity of physical methods is high, making it difficult to meet the demands of real-time processing. Therefore, although physical methods provide an important theoretical basis for underwater image enhancement, their application in complex underwater environments still faces many challenges.

Another class of reference-free underwater image enhancement methods relies mainly on the statistical properties or visual a priori of the image itself and achieves enhancement by directly processing the color distribution, luminance information, or local contrast of the image. These methods do not rely on reference images or physical modeling and have the advantages of simple implementation, high computational efficiency, and no special dependence on hardware, and thus have been widely used in early underwater image processing research.

Typical methods include histogram equalization, Retinex theory, white balance correction, and so on. The histogram equalization method [18] improves the overall contrast by homogenizing the grayscale distribution of the image; the Retinex method [19] simulates the separation mechanism of brightness and reflection in human visual perception to improve the image details and color realism; and the white balance method [20] tries to correct the energy distribution between different color channels to alleviate the color deviation phenomenon that exists commonly in underwater images.

Although these methods can achieve some enhancement results in specific scenes, they are usually more dependent on the content characteristics of the image itself and are difficult to adaptively deal with different degrees of underwater degradation, especially when confronted with images with strong haze, low light, or severe color imbalance, where enhancement is easily limited. For example, the Retinex algorithm may introduce noise in overly dark regions.

In recent years, researchers have also attempted to introduce more sophisticated strategies based on traditional methods to improve their robustness, such as integrating multiple enhancement results weighted by image fusion strategies or using polarization filtering to remove part of the underwater scattered light to improve image clarity. However, although these improvements have enhanced the image quality to a certain extent, they still have the problems of poor adaptability to complex scenes, easy oversaturation of the enhancement results, color distortion, or loss of structural details in general, which make it difficult to meet the demands for image quality in diversified practical underwater applications.

2.2 Deep Learning for Underwater Image Enhancement

With the development of deep learning technology, more and more researchers have begun to adopt an end-to-end data-driven approach for underwater image enhancement, which has achieved remarkable results. Early works such as DCNN, UWCNN, etc., achieved relatively stable enhancement effects by designing specific network structures and using a large amount of paired data for supervised training. DCNN [21] proposes an end-to-end underwater image enhancement method that automatically learns the mapping relationship to restore clear images from low-quality underwater images and restores details and colors in underwater images. UWCNN [22] adopts a shallow CNN network structure, emphasizing the network’s lightweight and generalizability. generalization. In addition, there is also a class of methods that introduce Generative Adversarial Networks (GAN) into underwater image enhancement, such as FUnIE-GAN [23], UIEFP GAN [24], and so on. By constructing an adversarial mechanism between the generator and the discriminator, these methods make the enhanced image more natural and realistic and effectively solve the problems of color drift and texture loss. Although the GAN model has strong expressive ability, its enhancement results may be inconsistent in different scenes due to training instability. Then later scholars used the transformer’s method for image enhancement tasks, such as Uformer [25], UDAformer [26], Swin Transformer [27] and so on. Long-range dependency modeling is used to improve the overall perceptual consistency of images. Meanwhile, strategies such as multi-task learning, multi-scale fusion, and attention mechanisms have been widely introduced to improve the model’s ability to model complex features of underwater images.

Similar to underwater images, the field of medical image processing faces the challenges of low contrast and noise, and the U-Net architecture is widely used in this field. Zhang et al. propose an improved U-Net variant, U-KAN [28], which significantly improves the accuracy of medical image segmentation while reducing the computational cost through the introduction of nonlinear learnable activation functions based on Kolmogorov-Arnold Networks (KANs). Improves the accuracy of medical image segmentation while reducing the computational cost. U-KAN also demonstrates the potential of generative tasks in a diffusion model, which provides inspiration for generative methods in underwater image enhancement. Wang et al. propose a pure visual state space model (PV-SSM) [29], which efficiently captures high-dimensional medical images through a parallel state space mechanism and a parameterized positional encoding with learnable long-range dependencies, achieving linear computational complexity. All these methods provide directions that we can learn from for the research of underwater images we get later. Of course, the differences in physical properties between medical images and underwater images require the adaptation of these methods.

In summary, traditional methods have certain advantages in theoretical modeling and engineering implementation, but there is the problem of weak generalization in the face of diversified underwater environments; deep learning methods, on the other hand, with strong data-driven capabilities, show stronger enhancement potentials, especially in the complex scene of more adaptability and scalability. Therefore, how to further integrate the advantages of traditional a priori models and deep feature modeling is still one of the important directions in the current underwater image enhancement research.

3 Method

3.1 Network Architecture

Our proposed network architecture is a multi-module synergistic underwater image enhancement system designed to improve the image quality by means of frequency domain processing, feature extraction and fusion, and edge enhancement. The overall structure of the network is described in Fig. 1.

images

Figure 1: Overview of the FENet network architecture. Our proposed FENet network consists of multi-scale frequency domain enhancement, feature extraction, multi-stage residual feature aggregation, and edge enhancement modules. First, the input image is fed into the backbone network after the high-frequency details are enhanced by the multiscale frequency domain enhancement module. The backbone consists of a main feature extraction branch and a multi-stage residual feature aggregation branch: the main branch adopts a modified U-Net architecture, including 3-layer downsampling and 3-layer upsampling, and each layer is equipped with a CBAM attention module to enhance the feature representation in key regions. Multiple residual blocks are stacked at feature bottlenecks to enhance semantic modeling capabilities, and cascaded fusion is used for cross-layer connectivity and feature alignment; the multi-stage residual feature aggregation branch consists of multiple deep residual attention modules for supplementing texture and local detail information from the original image, which is ultimately fused to the output of the main branch and the input image by global residual summation. The fused features are then fed into the EDEN edge enhancement module to further improve the image edge quality, and the final enhancement results are generated through light post-processing. The overall structure takes into account semantic restoration, detail preservation, and edge structure restoration at the same time

The network first preprocesses the input image by Multiscale Frequency Domain Enhancement (MSFDE), which utilizes Fourier Transform to separate the frequency components, suppress the low-frequency scattering noise, and amplify the high-frequency details, laying the foundation for subsequent feature extraction. Next, the features enter the backbone network, which adopts the U-Net-based encoding-decoding structure. The encoding path gradually compresses the spatial resolution through three-layer downsampling operations and feature extraction blocks to extract deep features; the decoding path restores the resolution through three-layer upsampling operations and optimization blocks and fuses the downsampled features with weighted jump connections to enhance the detail reconstruction capability. Meanwhile, a multi-stage residual feature aggregation path is designed in parallel for the network, where the input is optimized for feature representation through a five-layer deep residual attention block after initial convolution and is summed with the initial features to generate the residual output.

Eventually, the U-Net output, the residual output, and the original input are summed to form a global residual connection to further enhance the overall consistency of the image. Subsequently, the results are enhanced with edge information by the Edge Enhancement Module (EDEN), which utilizes multi-directional gradient detection based on Sobel’s algorithm to improve structural clarity. Finally, after post-processing (PPro) and value-domain cropping, the output range is restricted to [0, 1] to generate enhanced images. The network effectively copes with scattering noise and detail loss in underwater images through a combination of frequency-domain preprocessing, U-Net structure, and residual learning, and the edge enhancement strategy mitigates the edge blurring problem. The method demonstrates significant enhancement effects on several underwater datasets and provides an efficient solution for underwater image processing tasks.

3.2 Multi-Scale Frequency Domain

In underwater environments, images often suffer from color distortion, low contrast, and blurred details due to light scattering and absorption effects [30]. These degradation phenomena seriously affect the visual quality of images and the subsequent processing effects. Specifically, these problems are manifested in the spatial domain as the coupling of global signals (e.g., overall color deviation) and local signals (e.g., edge texture blurring), while frequency domain analysis can effectively decouple these mixed degradation modes by separating different frequency components [31]. This provides a more accurate pathway for noise suppression and detail enhancement.

In recent years, Xiao et al. [32] proposed capturing long-range dependencies through frequency assistance, which provides support for our strategy of using the frequency domain to enhance high-frequency details. Jiang et al. [33] achieved image rain removal through frequency cross correction, which further proves that frequency domain correction can effectively handle noise. These works indicate that frequency domain operations can significantly improve detail retention in enhancement tasks. Inspired by these works, we propose a multiscale frequency-domain enhancement (MSFDE) method (shown in Fig. 2) that extends its application to underwater images. The method is specifically designed to suppress low-frequency scattering noise and enhance high-frequency details through frequency-domain operations, thereby improving the clarity and detail representation of underwater images.

images

Figure 2: Multi-scale frequency domain enhancement, MSFDE

3.2.1 Input Processing and Multiscale Design

The MSFDE module is integrated in the front-end of the network architecture and processes the input image (RGB, 256 × 256) before it enters the initial convolutional layer. To preserve the frequency characteristics of each chroma channel, each RGB channel is processed independently to avoid cross-channel operations interfering with color specificity. The input image is normalized to the [0, 1] interval before frequency domain conversion to ensure the numerical stability of the Fourier operation. The module performs no additional color space transformations, preserving the original color information and simplifying the process. MSFDE operates on multiple scales (scales = [1.0, 0.5, 0.25]) to capture the multi-level degradation characteristics of underwater images. The high-resolution scale captures fine local details, while the low-resolution scale focuses on the global structure and noise distribution, ensuring that the enhancement process is adaptive to features of different granularity. After frequency domain processing, up-sampling to the original resolution is performed. The final result is fused with learnable weights, which are softmax-normalized to optimize the contribution of each scale and provide high-quality feature inputs for subsequent convolutional layers.

3.2.2 Mask Construction and Spectrum Manipulation

For each scale of the input Fin, the MSFDE is converted to the frequency domain by Fast Fourier Transform (FFT). In the frequency domain, the image is decomposed into low-frequency (representing global structure and smooth regions) and high-frequency (capturing subtle features such as edges and textures) components. Liu et al. [34] proposed a spatial-frequency domain correlation method using a distance-based mask to separate low-frequency illumination components from high-frequency details for noise suppression and image matching enhancement. Huo et al. [35] proposed an underwater image denoising method using a dynamic mask based on amplitude averaging to separate high-frequency mutation noise and incorporating a hybrid attention mechanism. Thus, on this basis, our module employs two complementary masks—a distance-based spatial mask and a magnitude-averaged dynamic mask—to selectively enhance high-frequency details and suppress noise, thereby enhancing image detail preservation and overall quality.

The distance mask is constructed based on the radial distance from the center of the spectrum, with low frequencies concentrated in the center and high frequencies radiating outward. The low-frequency portion of the mask is defined as follows:

ℳl(i,j)={1, if (i−cy)2+(j−cx)2<r0, otherwise (1)

where Cx and Cy denote the row and column coordinates of the center of the spectrum and r denotes the fixed radius. Then the high-frequency mask is ℳh=1−ℳl. Construct the composite mask:

ℳ=γℳl+λℳh(2)

where the initial values of the learnable parameters are γ = 0.5 (low-frequency suppression) and λ = 2.0 (high-frequency enhancement). This mask selection is based on the inherent structure of the Fourier spectrum: suppression of the central low-frequency region reduces smoothing or artifacts caused by global variations, and enhancement of the peripheral high-frequency region amplifies details.

The dynamic mask is based on the amplitude spectral mean value for adaptive adjustment, so we need to calculate the amplitude spectral mean value first to build the dynamic mask. The specific method is as follows:

Amean=1H⋅W∑i,jA(i,j)(3)

where (H,W) are the height and width of the spectrum for constructing the dynamic mask, respectively, and the amplitude spectral mean Amean is calculated, then the low-frequency part is defined as follows:

𝒲l(i,j)={1,if |F(i,j)|>Am0,otherwise(4)

The high-frequency dynamic mask is 𝒲h=1−𝒲l. The integrated dynamic mask is:

𝒲=γ𝒲l+λ𝒲h(5)

where γ and λ are as above. This mask is chosen for its data-driven characteristics: regions above the average amplitude usually correspond to significant high-frequency details (e.g., structured edges), and detail preservation is achieved through enhancement; regions below the average amplitude often contain random noise or minor variations, and artifacts are reduced through suppression.

After obtaining the results of dynamic masking and distance masking, the higher-than-average frequency components are prioritized for enhancement to further highlight details and mitigate random noise interference to obtain the final enhancement results:

A=Amean𝒲ℳ(6)

The results are transformed back to the spatial domain by inverse Fourier transform, and then the multiscale enhancement results are fused with the weights of each scale through training. The initial value of the weights is 1.0, and the contributions from different scales are adaptively weighted.

In underwater image processing, the low-frequency component often contains a smooth background caused by light scattering, while the high-frequency component corresponds to the fine structure of the target object. The distance mask reduces the impact of background blurring on the overall contrast by globally suppressing the low-frequency component, while the dynamic mask further focuses on the significant high-frequency component to prevent noise from being mistakenly enhanced (e.g., random high-frequency interference caused by underwater particles). The combination of the two ensures accurate enhancement of high-frequency details, double filtering of low-frequency scattering and high-frequency low-amplitude noise, and synergistically prioritizes the enhancement of salient high frequencies and restoration of the low-resolution image texture, thus improving image clarity and structural integrity.

3.3 Multi-Stage Residual Feature Aggregation

In underwater image enhancement tasks, downsampling and upsampling operations often lead to the loss of image details, a problem that can have a significant negative impact on the visual effect of the final enhanced image. Meanwhile, restoring back to the original image after frequency domain processing can lead to problems such as loss of high-frequency details, blurring of details, and structural distortion. For this reason, we propose a multi-stage residual feature aggregation module (MRFA). It aims to effectively avoid information loss through deep residual modeling, attention mechanisms, and global residual linking, while enhancing detail retention and structural stability.

MRFA consists of two convolutional layers and five deep residual attention blocks (DRAB). Among them, the convolutional unit is responsible for the transformation between low-dimensional features and high-dimensional features, while the DRAB focuses on extracting and reconstructing detailed features. The local residual connection of each DRAB block stabilizes the deep feature extraction, alleviates the gradient vanishing problem, and enhances the representation of complex textures.

First, as shown in Fig. 1, the MRFA receives the input tensor Fin∈RB×C×H×W. The input tensor is passed through a set of convolutional layers for initial feature extraction to generate the initial feature map Finit. This step extracts the initial feature representation and lays the foundation for the subsequent depth processing. Then, a detailed repair path consisting of five DRAB stacks is entered.

Ffinal=Convout(Finit+DRAB5(DRAB4(⋯DRAB1(Finit)⋯)))(7)

Five DRABs are stacked in series to progressively refine the feature representation. After the fifth DRAB, its output is fused with the initial feature map Finit via a long jump connection, where the shallow features of the initial convolution are added directly to the deep output to preserve low-frequency information (e.g., global color and structure) and to compensate for high-frequency details that may be lost in the frequency domain enhancement. The final feature map is mapped to the target number of channels through the output convolution layer, keeping the spatial resolution constant, to produce a structured output feature map. Its output is fused with the feature extraction path and the original input via global residual concatenation to generate the enhanced image result.

As shown in Fig. 3, DRAB consists of a convolutional module and a compression and excitation (CBAM) module. CBAM enhances high-frequency details (e.g., edges and textures) and suppresses background noise through channel attention and spatial attention, which improves the feature selectivity of DRAB blocks and works synergistically with residual learning to ensure a balance between detail preservation and deep feature representation, thus optimizing the performance of underwater image enhancement. DRAB’s local residual connectivity ensures feature stability and mitigates the problem of gradient vanishing in the deep network.

images

Figure 3: Deep Residual Attention Block (DRAB) architecture

By adding the output Ffinal of MRFA, the output of the feature extraction path, and the original input Fin, the low-frequency information is effectively preserved while compensating for the high-frequency details that may be lost in the frequency domain enhancement. The feature extraction path captures global and local information through multi-scale feature extraction, the MRFA enhances the detail representation through deep residual learning and CBAM, and the addition of the original input further ensures the stability of the global structure. This multi-source feature fusion strategy significantly improves the performance of the model in terms of detail retention and structural stability through the complementary effects of each branch.

3.4 Edge Enhancement

Due to the light scattering and medium absorption characteristics of underwater environments, image edge information often becomes inconspicuous due to blurring and noise interference, which adversely affects subsequent tasks such as target detection and detail recovery [36]. To this end, we design an image enhancement strategy, EDEN (Edge Enhancement, EDEN), incorporating Sobel edge guidance, which aims to enhance the edge details of underwater images through multi-directional edge detection and feature fusion, thereby improving the clarity and structural information of the images. The method is designed based on the Sobel operator [37], which is used to extract multi-directional edge features by exploiting its sensitivity to the image gradient, and is combined with a deep convolutional network to further optimize the edge enhancement effect.

The input image is first extracted by SobelConv2d with edge features in multiple directions to capture fine structural information. Different from the traditional Sobel edge detection method, we design a horizontal, vertical, and two diagonal Sobel convolution kernels, as shown in Fig. 4, to extract edge features in four directions for finer edge detection. The SobelConv2d sub-module extracts the gradient features in the four directions of horizontal, vertical, 45∘, and 135∘ diagonal and generates 32 channels for each of the four directions. The SobelConv2d submodule extracts gradient features in each of the four directions—horizontal, vertical, 45∘ diagonal, and 135∘ diagonal—generating 32 channels in each direction for a total of 128 channels of edge features. The Sobel kernel for each direction has adjustable weights that change how strong the edges appear in each direction, helping to better capture the complicated and uneven texture details in underwater images.

images

Figure 4: The sobel convolution kernel in four directions

The edge feature enhancement network further fuses and refines the edge features and image features on this basis. First, the edge feature map is spliced with the original input image along the channel dimension to form an initial feature representation. Subsequently, the features are processed by four successive sets of convolutional modules, each consisting of 1-convolution and 3-convolution. 1-convolution layer achieves channel compression and feature fusion, while 3-convolution layer focuses on local structure modeling and context-aware enhancement. The input to each set of modules is accomplished by splicing with the initial Sobel edge map, which ensures that multi-level edge information is continuously involved in the network computation and effectively improves the edge sensitivity of the feature representation. The nonlinear activation function is chosen to be PReLU, which introduces a learnable negative semiaxial response that further improves the network’s ability to adapt to complex edge patterns. PReLU adapts to complex edge patterns (e.g., non-uniform edges in underwater images) and enhances sensitivity to weak edges by learning the negative semiaxial response, as compared to traditional ReLU (which retains only positive values.) PReLU allows negative-valued features to pass through in a controlled manner, which promotes the retention and enhancement of edge details.

After four sets of convolutional units are processed, the final output features are spliced with the original image again and fused by a 1-convolution to the final output, ensuring that the output maintains the same spatial resolution and channel structure as the input. This design not only enhances the responsiveness to edge details but also improves the end-to-end representation quality of the network in tasks such as image enhancement. By organically combining traditional edge operators with deep convolutional networks, the module is able to preserve both the structural integrity and semantic expressiveness of the image. Compared with traditional edge detection methods, the module shows stronger adaptability and detail preservation ability in tasks oriented to complex degraded scenes. Experimental validation shows that the module improves the edge clarity and structural legibility of the image, provides higher quality feature inputs for the subsequent network layer, and is particularly suitable for detail-sensitive computer vision tasks such as edge enhancement and structural recovery.

3.5 Loss Function

In the proposed underwater image enhancement framework, we design a composite loss function that jointly considers multiple dimensional objectives such as pixel accuracy, edge clarity, perceptual consistency, and semantic discrimination, aiming to simultaneously address the challenges of color distortion, low contrast, and detail degradation that are commonly found in underwater images. The composite loss function contains five components: L1 loss, edge perception loss, total loss, perception loss, and contrast loss. The total optimization loss ℒtotal is denoted as:

ℒtotal=λ1ℒL1+λ2ℒL2+λ3ℒperc+λ4ℒssim+λ5ℒucr+λ6ℒedge(8)

First, the L1 loss (mean absolute error) encourages the model to generate outputs that are highly consistent with the real image by calculating the pixel-level absolute difference between the predicted image and the real image. To balance its contribution, the weight λ1 is linearly decayed from 0.7 to 0.4 during training to gradually reduce the reliance on pixel-level accurate matching. The L2 loss (mean squared error) further enhances the pixel-level accuracy of the model by measuring the pixel-level squared difference between the predicted image and the real image, which is particularly sensitive to large pixel differences. Its weight λ2 is fixed at 0.6 to strike a balance between smoothness and accuracy.

To further enhance the subjective visualization of the images, perceptual loss is introduced to compare the difference between the predicted image and the real image in the high-level feature space. The relu12, relu22, and relu33 layers of VGG16 are selected and assigned weights of 1.0, 0.8, and 0.5, respectively, to emphasize the contribution of the low-level features to the details. The perceptual loss is defined as:

ℒperc=∑iwi⋅ℒL1(Vi(I^),Vi(I))(9)

where Vi(⋅) denotes the feature mapping of layer i in VGG16 (selected layers such as relu12, relu22, and relu33), and Wi is a hierarchical weighting factor (1.0, 0.8, and 0.5, respectively) for balancing the consistency of the low-level texture with the high-level semantics. The weight λ3 is linearly increased from 0.3 to 0.8, The results of the weights we obtained based on experimental tuning, where the perceptual loss emphasizes high-level semantics and texture, early training prioritizes pixel-level fidelity, and later weights are incremented to enhance the visual quality and detail recovery to adapt to complex degradation of underwater images.

We also introduce the SSIM loss, based on the structural similarity index, to optimally predict the perceived quality of an image by integrating the brightness, contrast, and structural information of the image. A 5×5 sliding window is used to calculate the local similarity.

In addition, for the common noise and unstructured information in underwater images, a contrast loss is designed to enhance the model’s ability to distinguish the image content through the contrast learning principle. The loss takes the enhanced image as the anchor, the real image as the positive sample, and the mosaic-transformed image as the negative sample. For each layer, the L1 distance dap between the anchor and the positive sample and the L1 distance dan between the anchor and the negative sample are calculated. defined as follows:

ℒucr=∑iwi⋅dapdan+ϵ(10)

where ϵ is a small constant used to prevent the denominator from being zero, and Wi is a hierarchical weighting factor (incremented from 1/32 to 1.0). This loss strengthens the network’s ability to discriminate the semantic and detailed structure of the image by bringing the positive samples closer to the enhanced image and at the same time distancing them from the negative samples. Edge-aware loss aims to enhance the model’s ability to model image edges and structural details by combining pixel-level L1 loss and edge loss based on the Sobel operator. It is defined as:

ℒedge=ℒL1(y^,y)+λe⋅MSE(Sobel(y^),Sobel(y))(11)

In this equation, the Sobel operator extracts the edge features of the predicted image (y^) and the real image (y) in four directions, and the MSE denotes the mean square error, which is used to compare the differences in the edge features. This loss ensures that the model produces images with accurate structural details by balancing pixel-level similarity and edge clarity. In summary, the composite loss function provides all-around constraints and optimization of the underwater image enhancement process at the pixel, structural, semantic, and contrast feature layers. The synergistic effect of each loss term makes the model not only superior in objective indexes but also have good consistency and realism in subjective visual perception. The experimental results show that the loss design significantly improves the edge clarity, color reproduction, and overall visual quality of the enhanced image.

4 Experiment and Analysis

4.1 Experimental Settings

Implementation details. We implemented the proposed FENet model based on the PyTorch framework and utilized the PyTorch Lightning library to optimize the training process and ensure the reproducibility of the experiments. Training was performed on NVIDIA RTX A6000 GPUs. Model training was performed for a total of 400 epochs, and the batch size was set to 16. In order to enhance data diversity and model generalization, we randomly cropped the input images to generate image blocks of 256×256 pixels. This cropping size strikes a balance between computational efficiency and retaining sufficient contextual information for effective feature extraction. The optimization process uses the AdamW optimizer with the initial learning rate set to 1×10−4. To improve the convergence speed and model performance, we introduce the CyclicLR learning rate scheduling strategy so that the learning rate dynamically varies within the interval [initlr,1.5×initlr] in each loop cycle. This strategy can facilitate the exploration of the loss function space in early training and achieve finer convergence in later stages. The weight decay parameter is set to 5×10−4 to regularize the model and mitigate the risk of overfitting.

Underwater image datasets.In this study, a systematic evaluation is performed on several mainstream underwater image enhancement datasets to validate the performance of the proposed method. The LSUI dataset [38], which contains real degraded images and their corresponding reference images, is used in the training phase. Specifically, the data for training consists of 3897 images from the LSUI dataset. Among them, 3497 are used as the training dataset, and the remaining 400 are used as the validation set. In the testing phase, we constructed the Test-L382 test set by selecting 382 images from LSUI. To further verify the generalization ability of the method on real underwater images, we conduct further experiments on the EUVP [39] and UFO [40] datasets and select the test dataset of EUVP to construct Test-E515 and the test dataset of UFO to construct Test-U120 as two test subsets. These datasets provide diverse underwater scenarios, which help to evaluate the robustness of the model in real-world environments.

Comparison with state-of-the-art (SOTA) methods. We compared the proposed method with eight current state-of-the-art underwater image enhancement methods, including PUIE-Net [41], TCTL [42], U-shape [38], TUDA [43], Semi-UIR [39], TDM [44], SMDR-IS [45], and HCLR-Net [46]. We have chosen a baseline of approaches that have performed well for underwater image enhancement in recent years and have publicly available code and pre-trained models. These approaches cover a wide variety of deep learning methods, including CNN-based methods, transformer-based methods, comparative learning frameworks, domain adaptation techniques, multiscale supervised methods, and uncertainty-driven methods, among others, and are diverse. This diversity ensures that our methods are thoroughly evaluated against different techniques for comparison. Finally, our chosen methods all have publicly available code and pre-trained models to ensure reproducibility and fairness of the comparison.To ensure the comprehensiveness of the evaluation, the experiments used two types of evaluation metrics: full-reference metrics and non-reference metrics. In the test containing the reference image, we use Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) as full-reference metrics to measure the content similarity between the generated image and the reference image. The higher values of PSNR and SSIM indicate that the generated image is closer to the reference image. The non-reference metrics are the Underwater Color Image Quality Evaluation (UCIQE [45]) and the Underwater Image Quality Measurement (UIQM [46]), where the UCIQE evaluates the color bias, saturation, and contrast of the image, while the UIQM integrates the color, sharpness, and contrast in combination with the characteristics of the human visual system. These two metrics are designed specifically for underwater imaging environments and do not require a reference image but are evaluated directly based on the image’s own characteristics, with special emphasis on color saturation, brightness, and contrast performance.

4.2 Qualitative Evaluation with Naturalness

In order to evaluate the enhancement effect of the proposed method on visual perception, we conducted qualitative comparisons with current mainstream underwater image enhancement methods on three typical underwater image datasets (Test-L382, Test-E515, and Test-U120). Figs. 5–7 show the enhancement results of different methods on these three datasets, respectively. As can be observed from the figures, although most of the methods improve the problems of color bias and low contrast of the original images to some extent, there are still differences in detail retention, color naturalness, and overall sharpness. Compared to other methods, our method performs more consistently in terms of color reproduction, luminance balance, and structure preservation, and the enhanced image is closer to the real visual experience. For example, in the image shown in Fig. 7, the traditional method suffers from color distortion or over-enhancement, while the present method is able to more accurately restore the true color of the object while maintaining the natural sense of the image. In the scene in Fig. 6, our method significantly enhances the local contrast and texture details of the image, making the target area clearer. Overall, from the perspective of visual perception, the proposed method shows more natural, clear, and realistic image enhancement effects in several samples, verifying its robustness and generalization ability in different types of underwater images.

images

Figure 5: Qualitative results of test images from the full reference underwater benchmark Test-L382. From left to right: input image, HCLR-Net [46], PUIE-MC [41], PUIE-MP [41], Semi-UIR [39], TCTL [42], TDM [44], TUDA [43], U-Shape [38], SMDR-IS [45], our FENet, and clear reference

images

Figure 6: Qualitative results of test images from the full reference underwater benchmark Test-E515. From left to right: input image, HCLR-Net [46], PUIE-MC [41], PUIE-MP [41], Semi-UIR [39], TCTL [42], TDM [44], TUDA [43], U-Shape [38], SMDR-IS [45], our FENet, and clear reference

images

Figure 7: Qualitative results of test images from the full reference underwater benchmark Test-U120. From left to right: input image, HCLR-Net [46], PUIE-MC [41], PUIE-MP [41], Semi-UIR [39], TCTL [42], TDM [44], TUDA [43], U-Shape [38], SMDR-IS [45], our FENet, and clear reference

4.3 Quantitative Evaluation with Metrics

The quantitative evaluation results of Test-L382, Test-E515, and Test-U120 are shown in Tables 1–3. In view of the subjective evaluation with some individual variability, we further adopt two commonly used full-reference image quality evaluation metrics—structural similarity index (SSIM) and peak signal-to-noise ratio (PSNR) — to quantitatively evaluate the enhancement results of each method. As shown in Tables 1–3, the experimental results indicate that our method obtains the optimal scores of SSIM and PSNR on both the Test-L382 and Test-E515 datasets, showing excellent structural restoration and signal-to-noise performance. The TDM method is followed by the second-best performance on these two datasets. On the Test-U120 dataset, our method achieved the highest score on the PSNR metric, followed closely by the TDM method; in the SSIM metric, U-Shape achieves the best result, followed by the TDM method. Combining the subjective and objective evaluation results on multiple reference datasets, it can be seen that our proposed FENet method outperforms the existing state-of-the-art underwater image enhancement methods in most of the scenarios, which verifies its good image restoration capability and generalization performance in complex underwater environments.

images

4.4 Analysis of Confidence Intervals for Evaluation Indicators

In order to verify the stability of the experimental results, we analyzed the confidence intervals for the evaluation indicators of different datasets. The confidence intervals of PSNR, SSIM, UIQM and UCIQE were calculated respectively at 95% confidence level, and the results are shown in Table 4. As can be seen from the table, the confidence intervals of the indicators under different data sets are narrower, indicating that the proposed method has good stability and consistency.

images

4.5 Computational Complexity and Experimental Configuration Selection

In this study, we evaluate the GFLOPs and the number of parameters of the proposed model and compare it with eight mainstream deep learning methods on 256 × 256 resolution images. As shown in Table 5, the experimental results indicate that our method achieves higher results. This is mainly because the model integrates multiple modules and attention mechanisms, and although our model is relatively large in size, it has obvious advantages over other models in terms of performance improvement. In the future, we plan to reduce the computational complexity while maintaining the performance through strategies such as model pruning, quantization, reducing the number of residual blocks, and adopting a lightweight attention mechanism, in order to enhance the utility of the model in resource-constrained environments. In order to verify the effect of the choice of the number of DRABs in the MRFA on the underwater image enhancement performance, we performed ablation experiments, testing configurations containing 3, 5, and 7 DRABs in the MRFA, respectively. All experiments keep the other modules unchanged and only change the number of DRABs in the MRFA. As shown in Table 6, the experimental results show that 5 DRABs is the optimal configuration to strike a balance between detail preservation and computational efficiency. The choice of five DRABs combines the CBAM attentional mechanism and local residual connectivity, which ensures the extraction of high-frequency details and the stability of the global structure, and is more suitable for underwater image enhancement tasks. Meanwhile, in 3.4, we mentioned the use of the nonlinear activation function PReLU, and we also conducted an experimental analysis with the traditional ReLU, and the results are shown in Table 7, which shows that the use of PReLU is more effective.

images

4.6 Loss-of-Function Ablation Experiments

In order to evaluate the contribution of each component in the composite loss function, we performed ablation experiments on the Test-L382 dataset, removing each component one by one and measuring its impact on PSNR and SSIM metrics. See Table 8. The experiments aim to validate the necessity of each component and quantify its contribution to the underwater image enhancement performance. The results show that each component contributes significantly to the performance, validating the necessity and effectiveness of our composite loss function design.

images

4.7 Ablation Study

In order to further validate the contribution of each key module in the proposed method to the overall performance, we conducted systematic ablation experiments on the LSUI dataset, specifically comparing the performance changes of the different modules in the case of removal or replacement. Table 9 and Fig. 8 present quantitative and qualitative evaluation results based on metrics such as SSIM and PSNR under different settings.

images

Figure 8: Contributions of components in ablation studies. PSNR scores are shown in the upper left corner. The complete method produces vivid colors and sharp details

Our base model is a simplified neural network, i.e., the feature extraction path part in Fig. 1, and the base model achieves pixel-level mapping by basic convolution only. As can be seen from the table, the use of MRFA and MSFDE alone leads to a significant improvement in the SSIM of the model, while the effect of the two modules together leads to a more significant improvement. This suggests that these two modules play an important role in maintaining image structure and detail. However, we also found that the use of these two modules individually will instead lead to a lower SSIM result; the exact cause of this we also hope to be able to explore in subsequent studies to optimize the accuracy while maintaining it. The introduction of the EDEN module further improves the model’s results, with significant improvements in both the PNSR and SSIM metrics. In addition, the overall performance of the model reaches the highest when all modules work together, further validating the synergy between the modules.

The experimental results show that the algorithm in this paper effectively enhances the details and structural features of underwater images, thus achieving excellent performance in underwater image enhancement, as shown in Table 9. Our experimental results fully demonstrate that our proposed sub-modules are well-designed and can effectively improve the overall enhancement quality of underwater images.

4.8 Robustness and Generalization Verification

To rigorously evaluate the adaptability and generalization ability of the proposed FENet model under different image degradation types, we conducted cross-task image enhancement experiments. The model was trained only on the LSUI underwater image dataset without exposure to other degradation types (e.g., low-light or haze scenes), thus ensuring robust testing of unseen degradation domains.

First, we applied the pre-trained FENet model directly to the low-light image enhancement task using the LOL dataset [47]. Fig. 9 shows the qualitative results for a typical low-light image. Due to the differences in imaging mechanisms between underwater and low-light images, FENet performs generally well in enhancing brightness, contrast, and details. To quantify the performance, we calculated the peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) on the LOL dataset and compared it with some methods. As shown in Table 10, the PSNR and SSIM of FENet are lower than those of the specifically optimized methods. The result indicates the limited generalization ability of the model, possibly due to the concentration of the training data on underwater images.

images

Figure 9: Low-light image enhancement results from the LOL dataset [47]. The top is a degraded low-light image from three scenes, while the bottom is the result of enhancement using the WaveNet method

Second, we evaluated the robustness of FENet in the image dehazing task using the RESIDE dataset [48]. Haze images differ from underwater image degradation, mainly in terms of contrast degradation, detail blurring, and global low-frequency interference. Fig. 10 demonstrates the qualitative results, showing that FENet can enhance contrast, mitigate haze interference, and restore image clarity and natural colors to a certain extent without obvious artifacts or over-enhancement. As shown in Table 10, compared to the baseline method, FENet has a PSNR of 21.69 and an SSIM of 0.85, indicating that its generalization ability is limited by the lack of training for haze.

images

Figure 10: Removal results of thin cloud images from the RESIDE dataset [48]. At the top are four images with different cloud densities, while at the bottom are the results of fog removal using the WaveNet method

In all of the above tasks, FENet used fixed parameters trained on the LSUI dataset without task-specific fine-tuning. This strict generalization setup highlights the model’s ability to adapt to the unseen domain while also revealing the challenges posed by domain differences.

In summary, cross-task experiments show that FENet has some generalization ability in low-light enhancement and image defogging tasks. However, its performance is limited compared to task-specific methods, reflecting the challenges across different degradation types. The limitation may stem from the fact that the training data focuses on underwater images, which differ significantly from the degradation mechanisms of low-light and haze scenes. Future work will also explore the introduction of multi-domain training data and adaptive frequency domain strategies to improve generalization performance.

5 Conclusion

In this paper, we propose a novel multi-module fusion framework, FENet, for underwater image enhancement tasks. The network fully integrates several effective components, such as frequency domain enhancement, attention guidance, residual learning, and edge structure preservation, aiming at solving the common problems of color shifting, low contrast, detail loss, and edge blurring in underwater images. multi-source composite degradation problems. Specifically, the main body of the FENet network adopts a symmetric multi-scale structure, introduces a CBAM-based channel-space attention mechanism to enhance the response of key regions, integrates a frequency domain enhancement module to strengthen the high-frequency texture details, and designs the residual paths and the EDEH edge guiding module to recover the image structure and detail information. Finally, the lightweight post-processing module is used to further optimize the visual quality and output more natural and realistic enhancement results.

Comprehensive experimental validation on several standard underwater datasets with complete references (Test-L382, Test-E515, and Test-U120) shows that the proposed FENet outperforms the current state-of-the-art methods in mainstream image quality metrics such as SSIM, PSNR, etc., and also exhibits better color reproduction and structure preservation in terms of visual perception quality. In addition, we also conducted cross-task generalization experiments, applying the model directly to the low-light image enhancement and image defogging tasks that were not involved in the training, and still achieved better enhancement results, which fully proved the robustness and adaptability of the proposed method in different image degradation scenarios.

In conclusion, FENet provides an effective generalized image enhancement paradigm that performs well in typical underwater degradation-type restoration and shows good generalization ability under unpaired degradation conditions, with strong potential for practical applications. Although our proposed model performs well in underwater image enhancement tasks, in ablation experiments we found a slight decrease in SSIM metrics when using the MRFA module and MSFDE module alone. This may be due to the deep stacking of residual blocks leading to over-smoothing of features and the CBAM attention mechanism overemphasizing high-frequency features, which weakens the structural similarity of the images. And the excessive low-frequency suppression of frequency domain enhancement destroys the global structural information of the image. However, in the complete model, the various modules work synergistically to greatly alleviate this problem. However, in the future, we will further explore the causes of this phenomenon and further optimize it to improve its independent performance and structure preservation.

Currently, our model is based on a fully supervised strategy that relies on high-quality paired samples for training. However, in real-world environments, high-quality reference images are often difficult to obtain, which limits the scalability of fully supervised approaches. Therefore, future work will focus on exploring weakly supervised or semi-supervised learning strategies to reduce the dependence on labeled data and further improve the model’s adaptability and scalability in real complex scenarios. We hope to enhance the model’s adaptive modeling ability for different image degradation types in our next work so as to push the underwater image enhancement methods towards being more efficient, intelligent, and general.

Acknowledgement: We would like to thank ChatGPT, DEEPL, and Grammarly for their assistance in refining the language of this paper.

Funding Statement: This work is supported in part by the National Natural Science Foundation of China [Grant number 62471075], the Major Science and Technology Project Grant of the Chongqing Municipal Education Commission [Grant number KJZD-M202301901].

Author Contributions: The authors confirm contribution to the paper as follows: Xinwei Zhu: Designing methodologies, crafting network modules, coding, and thesis composition. Jianxun Zhang: Guides the work and analyzes the theoretical nature of the module. Huan Zeng: Dataset analysis, inference result uploading, and recent paper collection. All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials: The datasets used in this study are publicly available datasets. The methods of this paper are available upon reasonable request from the corresponding author.

Ethics Approval: This study does not involve any experiments on humans or animals and uses only publicly available underwater image datasets.

Conflicts of Interest: The authors declare no conflicts of interest to report regarding the present study.

References

1. Cicek K, Demirci SME, Sengul D. A hybrid failure analysis model design for marine engineering systems: a case study on alternative propulsion system. Eng Fail Anal. 2025;167:108929. doi:10.1016/j.engfailanal.2024.108929. [Google Scholar] [CrossRef]

2. Zhou J, Si Y, Chen Y. A review of subsea AUV technology. J Mar Sci Eng. 2023;11(6):1119. [Google Scholar]

3. Cai L, McGuire NE, Hanlon R, Mooney TA, Girdhar Y. Semi-supervised visual tracking of marine animals using autonomous underwater vehicles. Int J Comput Vis. 2023;131(6):1406–27. doi:10.1007/s11263-023-01762-5. [Google Scholar] [CrossRef]

4. Zhou J, Liu Q, Jiang Q, Ren W, Lam KM, Zhang W. Underwater camera: improving visual perception via adaptive dark pixel prior and color correction. Int J Comput Vis. 2023;27(1):379. doi:10.1007/s11263-023-01853-3. [Google Scholar] [CrossRef]

5. Zhou J, Sun J, Zhang W, Lin Z. Multi-view underwater image enhancement method via embedded fusion mechanism. Eng Appl Artif Intell. 2023;121:105946. doi:10.1016/j.engappai.2023.105946. [Google Scholar] [CrossRef]

6. Lei F, Tang F, Li S. Underwater target detection algorithm based on improved YOLOv5. J Mar Sci Eng. 2022;10(3):310. doi:10.3390/jmse10030310. [Google Scholar] [CrossRef]

7. Zhang P, Yan T, Liu Y, Lu H. Fantastic animals and where to find them: segment any marine animal with dual sam. In: Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2024 Jun 16–22; Seattle, WA, USA. p. 2578–87. [Google Scholar]

8. Brateanu A, Balmez R, Avram A, Orhei C, Ancuti C. LYT-NET: lightweight YUV transformer-based network for low-light image enhancement. IEEE Sig Process Lett. 2025;32:2065–9. doi:10.1109/lsp.2025.3563125. [Google Scholar] [CrossRef]

9. Yu H, Li CY, Liu ZH, Zhou SP, Guo YR. Remote sensing image dehazing algorithm based on adaptive SLIC. Nat Remote Sens Bullet. 2024;28(12):3158–72. doi:10.11834/jrs.20242532. [Google Scholar] [CrossRef]

10. Jaffe JS. Computer modeling and the design of optimal underwater imaging systems. IEEE J Oceanic Eng. 1990;15(2):101–11. doi:10.1109/48.50695. [Google Scholar] [CrossRef]

11. Yuan Q, Dai S. Adaptive histogram equalization with visual perception consistency. Inf Sci. 2024;668:120525. doi:10.1016/j.ins.2024.120525. [Google Scholar] [CrossRef]

12. Jiang K, Wang Q, An Z, Wang Z, Zhang C, Lin CW. Mutual retinex: combining transformer and CNN for image enhancement. IEEE Transact Emerg Top Computat Intell. 2024;8(3):2240–52. doi:10.1109/tetci.2024.3369321. [Google Scholar] [CrossRef]

13. Bai J, Yin Y, He Q, Li Y, Zhang X. Retinexmamba: retinex-based mamba for low-light image enhancement. arXiv:2405.03349. 2024. [Google Scholar]

14. Cong R, Yang W, Zhang W, Li C, Guo CL, Huang Q, et al. Pugan: physical model-guided underwater image enhancement using gan with dual-discriminators. IEEE Transact Image Process. 2023;32:4472–85. doi:10.1109/tip.2023.3286263. [Google Scholar] [PubMed] [CrossRef]

15. Weng SE, Miaou SG, Christanto R. A lightweight low-light image enhancement network via channel prior and gamma correction. arXiv:240218147. 2024. [Google Scholar]

16. Tan DG, Sheen NN, Suhartono D, Lucky H. Enhancement of underwater images using DCP and CycleGAN. In: 2024 International Conference on Information Technology Research and Innovation (ICITRI); 2024 Sep 5–6; Jakarta, Indonesia. p. 359–64. [Google Scholar]

17. Zhang T, Su H, Fan B, Yang N, Zhong S, Yin J. Underwater image enhancement based on red channel correction and improved multiscale fusion. IEEE Trans Geosci Remote Sens. 2024;62:4205120. doi:10.1109/tgrs.2024.3388157. [Google Scholar] [CrossRef]

18. Chen Y, Liang Y. Underwater images enhancement using contrast limited adaptive parameter settings histogram equalization. Multimed Tools Appl. 2025;84(23):26703–17. doi:10.1007/s11042-024-20210-1. [Google Scholar] [CrossRef]

19. Zhang Y, Chandler DM, Leszczuk M. Retinex-based underwater image enhancement via adaptive color correction and hierarchical U-shape transformer. Optics Express. 2024;32(14):24018–40. doi:10.1364/oe.523951. [Google Scholar] [PubMed] [CrossRef]

20. Xu H, Mu P, Liu Z, Cheng S. Underwater image enhancement via color conversion and white balance-based fusion. Visual Comput. 2024;40(10):7185–200. doi:10.1007/s00371-024-03421-3. [Google Scholar] [CrossRef]

21. Wang Y, Zhang J, Cao Y, Wang Z. A deep CNN method for underwater image enhancement. In: 2017 IEEE International Conference on Image Processing (ICIP); 2017 Sep 17–20; Beijing, China. p. 1382–6. [Google Scholar]

22. Li C, Anwar S, Porikli F. Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recognit. 2020;98(1):107038. doi:10.1016/j.patcog.2019.107038. [Google Scholar] [CrossRef]

23. Islam MJ, Xia Y, Sattar J. Fast underwater image enhancement for improved visual perception. IEEE Robot Automat Lett. 2020;5(2):3227–34. doi:10.1109/lra.2020.2974710. [Google Scholar] [CrossRef]

24. Bhat A, Narang Y, Goyal Y. Underwater image enhancement with feature preservation using generative adversarial networks (UIEFP GAN). In: 2022 6th International Conference on Intelligent Computing and Control Systems (ICICCS); 2022 May 25–27; Madurai, India: IEEE. p. 1023–9. [Google Scholar]

25. Wang Z, Cun X, Bao J, Zhou W, Liu J, Li H. Uformer: a general u-shaped transformer for image restoration. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022 Jun 18–24; New Orleans, LA, USA. p. 17683–93. [Google Scholar]

26. Shen Z, Xu H, Luo T, Song Y, He Z. UDAformer: underwater image enhancement based on dual attention transformer. Comput Graph. 2023;111:77–88. [Google Scholar]

27. Wang R, Zhang Y, Zhang J. An efficient swin transformer-based method for underwater image enhancement. Multim Tools Applicat. 2023;82(12):18691–708. doi:10.1007/s11042-022-14228-6. [Google Scholar] [CrossRef]

28. Li C, Liu X, Li W, Wang C, Liu H, Liu Y, et al. U-kan makes strong backbone for medical image segmentation and generation. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 39. New York, NY, USA: ACM; 2025. p. 4652–60. [Google Scholar]

29. Wang C, Liu X, Li C, Liu Y, Yuan Y. PV-SSM: exploring pure visual state space model for high-dimensional medical data analysis. In: 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2024 Dec 3–6; Lisbon, Portugal. p. 2542–9. [Google Scholar]

30. Deluxni N, Sudhakaran P, Ndiaye MF. A review on image enhancement and restoration techniques for underwater optical imaging applications. IEEE Access. 2023;11:111715–37. doi:10.1109/access.2023.3322153. [Google Scholar] [CrossRef]

31. Kou K, Gao X, Zhang G, Xiong Y, Nie F, Bai H, et al. Efficient blind image deblurring network based on frequency decomposition. IEEE Sens J. 2024;24(14):23212–23. doi:10.1109/jsen.2024.3404964. [Google Scholar] [CrossRef]

32. Xiao Y, Yuan Q, Jiang K, Chen Y, Zhang Q, Lin CW. Frequency-assisted mamba for remote sensing image super-resolution. IEEE Trans Multimedia. 2025;27:1783–96. doi:10.1109/tmm.2024.3521798. [Google Scholar] [CrossRef]

33. Jiang K, Jiang J, Liu X, Xu X, Ma X. FMRNet: image deraining via frequency mutual revision. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. New York, NY, USA: ACM; 2024. p. 12892–900. [Google Scholar]

34. Liu C, Jia S, Wu H, Zeng D, Cheng F, Zhang S. A spatial-frequency domain associated image-optimization method for illumination-robust image matching. Sensors. 2020;20(22):6489. doi:10.3390/s20226489. [Google Scholar] [PubMed] [CrossRef]

35. Huo C, Zhang D, Yang H. An underwater image denoising method based on high-frequency abrupt signal separation and hybrid attention mechanism. Sensors. 2024;24(14):4578. doi:10.3390/s24144578. [Google Scholar] [PubMed] [CrossRef]

36. Song W, Liu Y, Huang D, Zhang B, Shen Z, Xu H. From shallow sea to deep sea: research progress in underwater image restoration. Front Mar Sci. 2023;10:1163831. doi:10.3389/fmars.2023.1163831. [Google Scholar] [CrossRef]

37. Lv K, Wang W, Zhou Z, Wang X. An improved watershed algorithm on multi-directional edge detection for road extraction in remote images. Int J Innov Comput Inf Control. 2022;18:851–66. [Google Scholar]

38. Peng L, Zhu C, Bian L. U-shape transformer for underwater image enhancement. IEEE Transact Image Process. 2023;32:3066–79. doi:10.1109/tip.2023.3276332. [Google Scholar] [PubMed] [CrossRef]

39. Huang S, Wang K, Liu H, Chen J, Li Y. Contrastive semi-supervised learning for underwater image restoration via reliable bank. In: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023 Jun 17–24; Vancouver, BC, Canada. p. 18145–55. [Google Scholar]

40. Islam MJ, Luo P, Sattar J. Simultaneous enhancement and super-resolution of underwater imagery for improved visual perception. arXiv:2002.01155. 2020. [Google Scholar]

41. Fu Z, Wang W, Huang Y, Ding X, Ma KK. Uncertainty inspired underwater image enhancement. In: European Conference on Computer Vision. Cham, Switzerland: Springer; 2022. p. 465–82. [Google Scholar]

42. Li K, Fan H, Qi Q, Yan C, Sun K, Wu QJ. TCTL-Net: template-free color transfer learning for self-attention driven underwater image enhancement. IEEE Transact Circ Syst Video Technol. 2023;34(6):4682–97. doi:10.1109/tcsvt.2023.3328272. [Google Scholar] [CrossRef]

43. Wang Z, Shen L, Xu M, Yu M, Wang K, Lin Y. Domain adaptation for underwater image enhancement. IEEE Transact Image Process. 2023;32:1442–57. doi:10.1109/tip.2023.3244647. [Google Scholar] [PubMed] [CrossRef]

44. Tang Y, Kawasaki H, Iwaguchi T. Underwater image enhancement by transformer-based diffusion model with non-uniform sampling for skip strategy. In: Proceedings of the 31st ACM International Conference on Multimedia; 2023 Oct 29–Nov 3; Ottawa, ON, Canada. p. 5419–27. [Google Scholar]

45. Zhang D, Zhou J, Guo C, Zhang W, Li C. Synergistic multiscale detail refinement via intrinsic supervision for underwater image enhancement. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. New York, NY, USA: ACM; 2024. p. 7033–41. [Google Scholar]

46. Zhou J, Sun J, Li C, Jiang Q, Zhou M, Lam KM, et al. HCLR-Net: hybrid contrastive learning regularization with locally randomized perturbation for underwater image enhancement. Int Jf Comput Vis. 2024;132(10):4132–56. doi:10.1007/s11263-024-02131-6. [Google Scholar] [CrossRef]

47. Wei C, Wang W, Yang W, Liu J. Deep retinex decomposition for low-light enhancement. arXiv:1808.04560. 2018. [Google Scholar]

48. Li B, Ren W, Fu D, Tao D, Feng D, Zeng W, et al. Benchmarking single-image dehazing and beyond. IEEE Transact Image Process. 2018;28(1):492–505. doi:10.1109/tip.2018.2867951. [Google Scholar] [PubMed] [CrossRef]

49. Chen Z, He Z, Lu ZM. DEA-Net: single image dehazing based on detail-enhanced convolution and content-guided attention. IEEE Transact Image Process. 2024;33:1002–15. doi:10.1109/tip.2024.3354108. [Google Scholar] [PubMed] [CrossRef]

50. Zhou S, Li C, Change Loy C. Lednet: joint low-light enhancement and deblurring in the dark. In: European Conference on Computer Vision. Cham, Switzerland: Springer; 2022. p. 573–89. [Google Scholar]

51. Hong M, Liu J, Li C, Qu Y. Uncertainty-driven dehazing network. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. New York, NY, USA: ACM; 2022. p. 906–13. [Google Scholar]

52. Feijoo D, Benito JC, Garcia A, Conde MV. Darkir: robust low-light image restoration. In: Proceedings of the 2025 Computer Vision and Pattern Recognition Conference; 2025 Oct 25–26; Chongqing, China. p. 10879–89. [Google Scholar]

53. Guo CL, Yan Q, Anwar S, Cong R, Ren W, Li C. Image dehazing transformer with transmission-aware 3D position embedding. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022 Jun 18–24; New Orleans, LA, USA. p. 5812–20. [Google Scholar]

54. Yang S, Ding M, Wu Y, Li Z, Zhang J. Implicit neural representation for cooperative low-light image enhancement. In: Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision; 2023 Oct 1–6; Paris, France. p. 12918–27. [Google Scholar]

55. Yan Q, Feng Y, Zhang C, Wang P, Wu P, Dong W, et al. You only need one color space: an efficient network for low-light image enhancement. arXiv:2402.05809. 2024. [Google Scholar]

Cite This Article

APA Style

Zhu, X., Zhang, J., Zeng, H. (2026). FENet: Underwater Image Enhancement via Frequency Domain Enhancement and Edge-Guided Refinement. Computers, Materials & Continua, 86(2), 1–25. https://doi.org/10.32604/cmc.2025.068578

Vancouver Style

Zhu X, Zhang J, Zeng H. FENet: Underwater Image Enhancement via Frequency Domain Enhancement and Edge-Guided Refinement. Comput Mater Contin. 2026;86(2):1–25. https://doi.org/10.32604/cmc.2025.068578

IEEE Style

X. Zhu, J. Zhang, and H. Zeng, “FENet: Underwater Image Enhancement via Frequency Domain Enhancement and Edge-Guided Refinement,” Comput. Mater. Contin., vol. 86, no. 2, pp. 1–25, 2026. https://doi.org/10.32604/cmc.2025.068578

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

FENet: Underwater Image Enhancement via Frequency Domain Enhancement and Edge-Guided Refinement

Abstract

Keywords

References

Cite This Article

271

79

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link