CLF-YOLOv8: Lightweight Multi-Scale Fusion with Focal Geometric Loss for Real-Time Night Maritime Detection

Zhonghao Wang; Xin Liu; Changhua Yue; Haiwen Yuan

doi:10.32604/cmc.2025.071813

icon Open Access

ARTICLE

CLF-YOLOv8: Lightweight Multi-Scale Fusion with Focal Geometric Loss for Real-Time Night Maritime Detection

Zhonghao Wang^1,2, Xin Liu^1,2,*, Changhua Yue³, Haiwen Yuan⁴

1 Department of Ship and Port Engineering, Shandong Jiaotong University, Weihai, 264209, China
2 Department of Intelligent Shipping, Weihai Institute of Marine Information Science and Technology, Weihai, 264200, China
3 Department of Naval Architecture and Ocean Engineering, Weihai Ocean Vocational College, Weihai, 264209, China
4 Department of Shipping, Wuhan University of Technology, Wuhan, 430063, China

* Corresponding Author: Xin Liu. Email: email

Computers, Materials & Continua 2026, 86(2), 1-23. https://doi.org/10.32604/cmc.2025.071813

Received 12 August 2025; Accepted 06 October 2025; Issue published 09 December 2025

Abstract

To address critical challenges in nighttime ship detection—high small-target missed detection (over 20%), insufficient lightweighting, and limited generalization due to scarce, low-quality datasets—this study proposes a systematic solution. First, a high-quality Night-Ships dataset is constructed via CycleGAN-based day-night transfer, combined with a dual-threshold cleaning strategy (Laplacian variance sharpness filtering and brightness-color deviation screening). Second, a Cross-stage Lightweight Fusion-You Only Look Once version 8 (CLF-YOLOv8) is proposed with key improvements: the Neck network is reconstructed by replacing Cross Stage Partial (CSP) structure with the Cross Stage Partial Multi-Scale Convolutional Block (CSP-MSCB) and integrating Bidirectional Feature Pyramid Network (BiFPN) for weighted multi-scale fusion to enhance small-target detection; a Lightweight Shared Convolutional and Separated Batch Normalization Detection-Head (LSCSBD-Head) with shared convolutions and layer-wise Batch Normalization (BN) reduces parameters to 1.8 M (42% fewer than YOLOv8n); and the Focal Minimum Point Distance Intersection over Union (Focal-MPDIoU) loss combines Minimum Point Distance Intersection over Union (MPDIoU) geometric constraints and Focal weighting to optimize low-overlap targets. Experiments show CLF-YOLOv8 achieves 97.6% mAP@0.5 (0.7% higher than YOLOv8n) with 1.8 M parameters, outperforming mainstream models in small-target detection, overlapping target discrimination, and adaptability to complex lighting.

Keywords

Nighttime ship detection; lightweight model; small object detection; BiFPN; LSCSBD-Head; Focal-MPDIoU; YOLOv8

1 Introduction

Safe navigation of ships underpins the global maritime economy, which carries trillions of dollars in annual trade, yet faces severe threats at night. Reduced visibility significantly hinders target recognition, elevating the risk of collisions and groundings. Thus, real-time and accurate nighttime ship detection is critical to intelligent shipping and maritime safety. However, achieving this goal confronts three long-standing, interconnected challenges: (1) Severe scarcity of high-quality nighttime ship data impedes model generalization; (2) Poor detection robustness for small and overlapping targets under low-light conditions; (3) Difficulty in balancing high accuracy and real-time performance during deployment on resource-constrained edge devices. Current methods have notable limitations in overcoming these obstacles.

Nighttime ship detection is a typical object detection task. While deep learning-based methods have been widely applied, they exhibit significant limitations when applied to nighttime scenarios. In terms of algorithm types, two-stage detection algorithms—represented by Fast R-CNN [1] and Faster R-CNN [2]—achieve high accuracy through staged processing of candidate region generation and classification. However, their complex workflows result in slow detection speeds, failing to meet the demands of real-time nighttime monitoring. Single-stage algorithms such as YOLO [3], SSD [4], and RetinaNet [5] improve speed via end-to-end prediction, making them more practical for ship detection; yet, their feature extraction mechanisms show poor adaptability to low-light nighttime environments. For instance, the YOLOv8 detector operates at high speed but struggles with the prevalent feature degradation issues in nighttime conditions. Its feature extraction mechanism (e.g., the CSP structure) is prone to information loss when processing weak features of nighttime ships, leading to a small-target miss rate exceeding 20%. Attention mechanisms [6,7] enhance target responses through weighted features. Subsequent developments [8–11] have further improved accuracy, though often at the cost of increased computational complexity, limiting sustainability. In strategies involving backbone network replacement, Zheng et al. [12] adopted MobileNetV3-Small as the feature extraction network, reducing model complexity; however, depthwise separable convolutions weaken the ability to capture detailed features of nighttime ships, causing detection accuracy to decrease by 3%–5%. Regarding structural simplification, reducing network depth (e.g., YOLOv2-reduced [13]) increases the false alarm rate by approximately 8% in complex nighttime noisy environments. Critically, mainstream loss functions (e.g., CIoU [14]) perform poorly when handling low-overlap targets common in nighttime scenarios. Their reliance on intersection-over-union (IoU) and center distance lacks fine-grained geometric constraints, leading to a 12% drop in localization accuracy when overlap is below 30% [15]. This pursuit of lightweight design remains an active challenge. Prevailing trends in recent YOLO variants focus on singular strategies like architectural pruning [16] or neural architecture search (NAS) [17] to reduce parameters, often at the cost of robustness in complex environments like low-light scenes. This limitation is further observed in related maritime detection research; while studies have demonstrated the efficacy of YOLOv9 for satellite-based ship detection [18] and other YOLO variants for SAR imagery [19] in open waters, their performance notably degrades in nearshore, cluttered environments—a challenge analogous to nighttime optical detection. In stark contrast to these approaches, our work introduces a synergistic co-design paradigm. We posit that robust efficiency for nighttime maritime detection necessitates concurrent innovations across the feature fusion network, the detection head, and the loss function, specifically tailored to overcome the distinct challenges of low-light optical imagery, thereby ensuring that lightweighting does not compromise performance on critical low-visibility scenarios.

Beyond these algorithmic limitations, the issue of data scarcity is even more acute. Acquiring real-world nighttime ship images is costly and hazardous, resulting in datasets with limited class diversity and severe image degradation—characterized by noise, low contrast, blurriness, and unstable lighting (overexposure/underexposure). Existing data augmentation techniques struggle to address these challenges effectively. General augmentation methods (e.g., geometric transformations [20], AutoAugment [21], MiAMix [22]) primarily optimize model robustness under existing data distributions. They fail to generate new images with authentic nighttime visual characteristics, yielding minimal gains in expanding the class coverage of nighttime ships. More critically, these methods cannot simulate or repair complex nighttime-specific lighting effects and severe image degradation; indiscriminate application (e.g., brightness adjustment or mixing images with varying lighting) may even introduce unnatural artifacts or further reduce image usability. Active learning strategies [23] improve annotation efficiency but require a sufficiently large and diverse initial unlabeled nighttime dataset, limiting their utility when such initial data is extremely scarce and lacks class diversity. Generative Adversarial Networks (GANs) [24], particularly image-to-image translation techniques like CycleGAN, offer a viable pathway to generate new images simulating nighttime scenarios. However, direct application of existing GAN models to generate nighttime ship images still results in low-quality outputs, often accompanied by noticeable blurriness, structural distortions, and physically inconsistent lighting artifacts or color deviations. Existing methods lack effective mechanisms to clean such generated images, making it difficult to ensure the authenticity and usability of the constructed datasets.

To address the aforementioned issues, this study presents a systematic solution as follows:

(1) Construction of the high-quality Night-Ships dataset: We introduce a novel dual-threshold cleaning strategy, which combines Laplacian variance sharpness filtering with brightness-color deviation screening, and applies it to images generated by the CycleGAN-based day-night conversion framework. This strategy can specifically and efficiently eliminate the prevalent blurriness and unrealistic lighting/color distortions in GAN outputs, ensuring the authenticity and quality of the dataset.

(2) Architectural innovation balancing accuracy and efficiency: We performed a fundamental redesign of the YOLOv8 architecture, replacing PANet with BiFPN to enable weighted multi-scale feature fusion, thereby enhancing small target detection performance; introducing a novel CSP-MSCB module that integrates multi-scale convolutions and cross-stage features to overcome the limitations of the original CSP structure in processing weak nighttime features; and proposing a lightweight LSCSBD-Head that incorporates shared convolutions and layer-wise batch normalization, significantly reducing parameters (by 87%) while preserving the discriminative capability of each feature layer.

(3) Optimized learning for low-overlap targets: We designed the Focal-MPDIoU loss, which integrates the minimum point distance (MPD) geometric constraints of MPDIoU with the Focal dynamic weighting mechanism. This effectively alleviates the gradient vanishing problem in non-overlapping targets and strengthens the learning of hard samples (low-IoU samples) common in nighttime scenarios.

This technology can be directly applied to autonomous navigation systems and maritime collision avoidance platforms, providing real-time ship perception under low-light conditions.

2 Architecture of CLF-YOLOv8

This study will focus on YOLOv8, taking it as the base model, and conduct improvements from three aspects: feature fusion, lightweight detection head, and loss function, so as to solve the problems of weak features of small targets, model redundancy, and regression deviation of low-overlap targets in ship target detection in nighttime scenarios. The following sections elaborate on these methods and their structures.

2.1 CLF-YOLOv8 Network

This study addresses three issues faced in nighttime ship detection and implements three major improvements to the benchmark framework. Firstly, aiming at the problem of weak features of small targets in nighttime ship detection, an innovative multi-scale feature fusion optimization is proposed to enhance the feature expression of small targets in complex backgrounds. Secondly, to tackle the problem of detection model redundancy, the LSCSBD-Head lightweight detection head is proposed, which significantly reduces parameters while maintaining detection accuracy. Finally, regarding the issue of regression deviation for low-overlap targets in nighttime ship detection, the Focal-MPDIoU loss function is proposed to optimize the bounding box regression of low-overlap targets. The network structure diagram of the CLF-YOLOv8 model improved based on YOLOv8 is shown in Fig. 1.

images

Figure 1: CLF-YOLO architecture

2.2 Multi-Scale Feature Fusion Optimization

We enhance multi-scale feature interaction and reduce computational costs through weighted fusion and bidirectional information flow. The CSP-MSCB module is proposed, which integrates multi-scale convolution and cross-stage feature fusion to strengthen the extraction of small target features and optimize computational efficiency; To address the problem of up-sampling redundancy, we introduce the Efficient Up-Convolution Block (EUCB) module, which adopts the combination of bilinear interpolation and depth-wise separable convolution, and reduces computational complexity and optimizes feature expression through feature compression and enhancement operations.

2.2.1 BiFPN Weighted Feature Fusion

As shown in Fig. 2a, the PANet of YOLOv8 has problems that some nodes have a single input path leading to computational redundancy and feature loss, and equal weight fusion is prone to diluting key features. To this end, this study introduces BiFPN [25] to replace PANet, as shown in Fig. 2b. BiFPN retains effective features through a dynamic weighted fusion mechanism, adopts depth-wise separable convolution to simplify the structure, removes redundant nodes to reduce computational load, and adds skip connections to strengthen the interaction between deep and shallow features, optimizing the ability to detect small targets.

images

Figure 2: PANet and BiFPN. (a) PANet. (b) BiFPN

The core of BiFPN lies in its weighted fusion of multi-scale feature maps and bidirectional information flow that enables comprehensive cross-level feature interaction. For each output feature layer Pjin, the fusion process is formally expressed by the equation:

Piout=Conv(∑jwj⋅Pjin∑jwj+ε)(1)

Here, Pjin denotes the input multi-scale feature maps, wj represents learnable weights indicating the contribution level of each feature map, and ϵ is a small constant preventing division by zero. The final Conv operation denotes depthwise separable convolution for computational efficiency.

Each input feature map Pjin has a corresponding weight wj. These weights are constrained to positive values using ReLU and are normalized, allowing the network to dynamically adjust the contribution of features at each layer. The normalization formula is shown in Eq. (2):

Wj=ReLU(wj)∑kReLU(wk)+ε(2)

2.2.2 CSP-MSCB Multi-Scale Convolution Module

Aiming at the problems of the C2F module in the neck network of the original YOLOv8, such as poor handling of small targets and low computational efficiency, this paper proposes the Cross Stage Partial Multi-Scale Convolutional Block (CSP-MSCB) module. This module integrates the cross-stage feature fusion idea of CSPNet with multi-scale convolution technology, aiming to enhance the model’s feature extraction capability for multi-scale targets, especially to improve the detection accuracy of small targets, while optimizing computational efficiency.

The structure of CSP-MSCB is shown in Fig. 3. The input feature map is divided into a main path and a branch path. The branch path, representing the core innovation of this module, handles multi-scale feature extraction. It primarily consists of a Multi-Scale Depth-wise Convolution (MSDC) block, which simultaneously applies depth-wise separable convolution kernels of different sizes (such as 3 × 3, 5 × 5, 7 × 7) for parallel convolution operations. This parallel structure enables the module to simultaneously capture features across different receptive fields, significantly enhancing the ability to model the diversity of target sizes and contextual information. Subsequently, the features processed by the branch path are concatenated (Concat) with the features of the main path, and finally fused and channel-adjusted through a 1 × 1 convolution layer (PWC3) to output the final feature map. This structural design effectively combines cross-stage feature reuse and efficient multi-scale feature extraction. While enhancing the ability to detect small targets, it controls the computational complexity through depth-wise separable convolution and feature segmentation strategies.

images

Figure 3: CSP-MSCB module architecture

2.2.3 EUCB Efficient Up-Sampling Module

To address the problems of computational redundancy and weak strong feature expression capability in traditional up-sampling operations, this study introduces the Efficient Up-Convolution Block (EUCB) module [26]. The core design idea of the EUCB module is to gradually enhance the feature expression after up-sampling while maintaining lightweight, and its structure is shown in Fig. 4. This module first uses bilinear up-sampling instead of traditional deconvolution operation, effectively avoiding the generation of checkerboard artifacts [27]; Then, lightweight depth-wise convolution is introduced to process the features after up-sampling, significantly reducing the computational complexity to O(k²⋅H⋅W⋅C); Finally, through the channel compression step, 1 × 1 convolution is used to adjust the feature dimension, which further optimizes the lightweight performance of the module while ensuring the feature expression capability.

images

Figure 4: EUCB module design

2.3 Lightweight Detection Head

To address the high parameter count (e.g., 0.898 M parameters) and computational load (3.64 GFLOPs) in YOLOv8n’s multi-scale decoupled detection head, this study proposes a lightweight shared detection head (LSCSBD-Head), as shown in Fig. 5. Key innovations comprise a cross-feature-level convolutional weight-sharing mechanism and a hierarchical BN strategy. This approach significantly reduces parameters while maintaining accuracy, achieving lightweight optimization.

images

Figure 5: LSCSBD-Head network structure

The input feature maps from the Neck network (P3, P4, P5) have spatial resolutions of 80 × 80, 40 × 40, and 20 × 20 pixels, respectively, each with 256 channels. The shared 1 × 1 convolution first reduces the channel dimension to 128, resulting in tensors of shapes [80, 80, 128, 40, 40, 128], and [20, 20, 128] for P3, P4, and P5, respectively. This is followed by a shared 3 × 3 convolution that operates on these compressed features. Despite sharing convolutional weights, each level (P3, P4, P5) employs independent Batch Normalization (BN) layers to compute level-specific mean (μi) and variance (σi2) statistics, mitigating distribution shift. Finally, the normalized features are fed into decoupled classification and regression branches. Each branch consists of a dedicated 1 × 1 convolution layer, outputting classification scores (for 6 ship classes) and bounding box coordinates (4 parameters per box) at each level.

The shared 1 × 1 convolution first reduces the channel dimension to 128. This is followed by a shared 3 × 3 convolution that operates on these compressed features. Despite sharing convolutional weights, each level (P3, P4, P5) employs independent Batch Normalization (BN) layers to compute level-specific mean (μi) and variance (σi2) statistics, mitigating distribution shift. Finally, the normalized features are fed into decoupled classification and regression branches. Each branch consists of a dedicated 1 × 1 convolution layer, outputting classification scores (for 6 ship classes) and bounding box coordinates (4 parameters per box) at each level.

Specifically, the detection head shares the weights of its 1 × 1 and 3 × 3 convolutional layers across the P3-P5 feature maps. A 1 × 1 convolution first adjusts channel dimensions to enhance cross-channel interaction, followed by a 3 × 3 convolution that aggregates local spatial features. This shared design eliminates parameter redundancy from repetitive convolutional layers in traditional detection heads. However, feature statistics (mean and variance) differ significantly across P3-P5 levels. A unified BN layer would cause distribution shifts. Therefore, LSCSBD-Head employs hierarchical independent BN layers, each calculating unique mean (μ) and variance (σ²) statistics for its corresponding feature level:

X^shared,i=Xshared,i(2)−μiσi2+ε⋅γi+βi(3)

Here, μ and σ² represent feature mean and variance per level, while γ and β denote learnable scaling and shifting parameters. The smoothing factor ε prevents division by zero. This adapts normalization to each level’s feature distribution, thereby avoiding performance degradation from conflicting cross-level statistics. Replacing YOLOv8’s detection head with LSCSBD-Head reduced computational complexity (FLOPs) by 59% and parameters by 87% vs. the original design, as shown in Table 1.

images

2.4 Focal-MPDIoU Loss Function

The CIoU loss function of the YOLOv8 model suffers from gradient vanishing in regression for low-overlap targets (such as partially occluded ships), resulting in low localization accuracy. To address this, this study proposes the Focal-MPDIoU loss function, which optimizes bounding box regression by integrating geometric distance constraints and a dynamic focusing mechanism. It effectively alleviates the gradient vanishing problem for non-overlapping targets and enhances hard sample learning. This design consists of two core components: a geometric constraint term and a dynamic focusing mechanism.

LMPDIoU=1−IoU+min(dGT,dPred)c(4)

LFocal-MPDIoU=α(1−IoU)γ⋅(1−IoU+min(dGT,dPred)c)(5)

Here, c denotes the diagonal length of the minimum bounding rectangle enclosing both ground-truth and predicted boxes (serving as a distance normalization factor), dGT represents the minimum Euclidean distance from the ground-truth box center to the predicted box boundary, and dPred indicates the minimum Euclidean distance from the predicted box center to the ground-truth box boundary.

The Focal-MPDIoU loss function is defined as LFocal-MPDIoU=α(1−IoU)γ⋅LMPDIoU, where we set α=1 and γ=1.5. This configuration was determined empirically to optimize performance on our dataset. Additionally, the non-differentiable min(dGT,dPred) operation is handled via a subgradient method, where the gradient is assigned to the variable achieving the minimum value.

The MPDIoU geometric constraint term constructs a minimum symmetric distance metric through min (dGT, dPred). On the one hand, it solves the problem of gradient disappearance for non-overlapping targets (traditional loss gradients are prone to failure at low IoU); on the other hand, it strengthens the boundary alignment accuracy between the predicted box and the ground truth box;

The core of the Focal dynamic focusing mechanism lies in the weight term (1−IoU)γ. This results in stronger gradient updates for low-IoU (hard) samples because their loss is largely unaffected, and weaker updates for high-IoU (easy) samples as their loss is scaled down. This prevents over-optimization on easy cases and focuses learning on hard, low-overlap targets. The focusing coefficient γ (default 1.5) precisely regulates this re-weighting intensity.

3 Nighttime Ship Dataset Construction

In the field of ship detection, the diversity and richness of datasets are crucial for training high-performance models. However, existing datasets currently have a single type of ship, which cannot support the detection needs of multiple types of ships, and the image quality under low-light conditions is poor, which easily leads to insufficient generalization ability of models when used for training. To this end, this study integrates existing technologies and constructs a ship dataset suitable for nighttime scenarios through day-night conversion and data cleaning to solve the problems of data scarcity and poor quality.

In the basic set integration stage, this study integrates self-collected nighttime wharf images and the public Sea-Ships dataset [28] (including 6 types of ships: ore carriers, general cargo ships, bulk carriers, container ships, fishing boats, and passenger ships, with more than 7000 daytime images). Labelme was used to label and summarize all self-collected and network-collected images, and a total of 8521 ship data images were successfully constructed.

To solve the problems of scarcity of nighttime samples and class imbalance, this study uses CycleGAN technology for day-night style transfer, and introduces the BDD100K driving scene dataset for auxiliary training to enable it to learn the transfer rules of day-night lighting features and ensure that key semantic information of ships is retained. The 8521 images integrated in the basic stage are input into the trained model to generate about 7000 nighttime simulated images to simulate typical nighttime scenarios such as low light, supplementing nighttime samples of diverse ship types. The process of generating nighttime images by the CycleGAN network is shown in Fig. 6.

images

Figure 6: Night image generation process by CycleGAN network

The training hyperparameters for the CycleGAN model are summarized in Table 2. The model was trained for 200 epochs using the Adam optimizer with a base learning rate of 0.002. We employed the LSGAN adversarial loss and a cycle-consistency loss weight λ cycle of 10. The input images were first loaded to a size of 286 × 286 pixels and then randomly cropped to 256 × 256 for training.

images

The DTN model was trained using a composite dataset. The daytime domain (A) included images from BDD100k and our collected daytime ship images, while the nighttime domain (B) consisted of BDD100k night scenes. The full objective function combined adversarial loss, cycle-consistency loss (λcycle=10), and identity loss (λidt=0.5). The training process (200 epochs) stabilized as shown in the loss curves (Fig. 7), indicating effective convergence. The daytime domain (A) consisted of X ship images from SeaShips and our collection, and Y daytime driving images from BDD100k. The nighttime domain (B) consisted of Z nighttime driving images from BDD100k.

images

Figure 7: CycleGAN loss function loss curve

As shown in Fig. 8, the generated images may have problems of blurriness and inconsistency with actual scenes. Therefore, this study designs a two-stage cleaning strategy:

images

Figure 8: Generated nighttime ship image samples

To eliminate blurry images and retain valid samples with clear edges, the study introduces Laplacian variance sharpness filtering. To determine a reasonable threshold, 4 independent random samplings (200 images each time) were conducted on the generated 7000 synthetic images. Statistics showed that the variance of clear samples is mainly concentrated in the range of 100–500, while most blurry samples are below 100. As shown in Fig. 9, the statistical distribution confirms that clear samples predominantly concentrate in the 100–500 variance range. Based on this visualization, 150 was selected as the threshold, which can cover more than 80% of clear samples and effectively filter blurry and artifact samples generated by GAN.

images

Figure 9: Laplace variance distribution of random sampling. (a) First random sampling. (b) Second random sampling. (c) Third random sampling. (d) Fourth random sampling

The threshold of 150 was determined through statistical analysis of four independent random samplings (200 images each) from the generated set. This threshold effectively retained over 80% of clear samples (variance > 150) while filtering out the majority of blurred ones, as visualized in the same figure. This process resulted in an estimated false deletion rate (erroneous removal of usable images) of less than 20% and a false retention rate (blurred or unrealistic images remaining) of under 5%, as extrapolated from the statistical sampling. To further validate these error rate estimates and address manual quality audit requirements, we conducted a manual quality check of a random subset of 100 retained and 100 discarded images; consistent with the statistical inference, we confirmed the false deletion rate to be below 20% and the false retention rate to be below 5%, further confirming the strategy’s efficacy. Although a full manual QA error rate was not computed, this dual-threshold strategy demonstrated high efficacy in automating the curation of a high-quality nighttime dataset.

To make the illumination and color distribution of synthetic images close to the real nighttime environment, this study introduces joint screening of brightness and color deviation. The determination of the threshold at this stage is crucial. Based on the real nighttime images of the BDD100k dataset, 3 independent random samplings (200 images each time) were conducted. Statistics showed that the minimum brightness is concentrated in 14.83–15.64, the maximum brightness in 63.15–67.44, the minimum color deviation in 16.20–18.81, and the maximum color deviation in 45.86–48.25.

To ensure the objectivity and reproducibility of the screening criteria, we explicitly define the calculation methods for brightness and color deviation here: brightness is calculated as the arithmetic mean of the V channel in the HSV color space, a metric that directly reflects the overall illumination intensity of nighttime images, and color deviation is derived from the average standard deviation of pixel values across the R, G, and B channels, an indicator that quantifies color distribution uniformity, a key characteristic of low-saturation nighttime scenes. Finally, the average values are taken to determine the brightness threshold as 15.30–64.58 and the color deviation threshold as 17.48–47.41. This threshold range (visualized in Fig. 10) effectively encapsulates over 90% of authentic nighttime image characteristics observed in BDD100k, ensuring the filtered images align with real-world nighttime scenarios.

images

Figure 10: Brightness and color deviation distribution of night images. (a) Brightness distribution of night images. (b) Color deviation distribution of night images

While quantitative metrics such as Fréchet Inception Distance (FID) or Learned Perceptual Image Patch Similarity (LPIPS) could provide another perspective on the quality of the generated images, they are primarily designed for and calibrated on natural image domains (e.g., ImageNet) and may not perfectly align with the specific perceptual goals of nighttime maritime detection. Our focus was on ensuring the perceptual authenticity and usability for the detection task. The proposed dual-threshold cleaning strategy, validated by its final performance gains in downstream detection tasks, serves as a more direct and task-specific quality filter. A quantitative analysis using these general metrics will be explored in future studies dedicated to image generation.

Finally, through the above methods, the dataset was effectively expanded, and the Night-Ships nighttime ship dataset was reconstructed. The specific distribution of different ship categories is shown in Table 3. It covers diverse nighttime scenarios such as low light and reflection, and the category balance is significantly improved compared with the original nighttime collected data, which can effectively support the training of the model’s detection performance in complex nighttime environments.

images

The final Night-Ships dataset was randomly split into training (70%), validation (20%), and testing (10%) sets, ensuring a balanced distribution of ship classes across all splits. The test set contained only real nighttime images to assess generalization beyond CycleGAN-generated data.

4 Experiment and Analysis

In this section, we systematically evaluate the performance of the CLF-YOLO model using the self-built Night-Ships dataset, and specify the experimental evaluation metrics and experimental environment. Conduct comparative experiments between the baseline model and comprehensive performance to verify the effectiveness of architectural optimization; Perform ablation experiments to quantitatively analyze the independent and synergistic contributions of the five components: BiFPN, CSP-MSCB, EUCB, LSCSBD-Head, and Focal-MPDIoU; Carry out special comparisons of component optimization: including comparison of lightweight structure efficiency (multiple lightweight variants) and comparison of loss function optimization (multiple types of bounding box regression functions); Conduct visual analysis to verify the actual detection effect of the model. This experimental system comprehensively verifies the performance characteristics of the model in terms of accuracy, efficiency, and generalization ability.

4.1 Experimental Setup

All model training, validation, and testing were conducted on uniformly configured servers. Hardware: Intel Xeon Platinum 8280 M CPU with NVIDIA RTX 2080 SUPER GPU (8 GB VRAM). Software: Linux OS, PyTorch 2.0.0 framework, Python 3.9, and CUDA 12.1 for GPU acceleration.

To determine the optimal learning rate and number of training iterations for the YOLOv8 model training, this study conducts ablation experiments on different learning rates (0.001, 0.005, 0.01, 0.05, and 0.1) and different training iterations (100, 200, and 300 epochs). Mean Average Precision (mAP) and Loss (loss function value) are used as evaluation metrics. As shown in Table 4, the initial learning rate of 0.05 and 200 epochs are selected as the experimental hyperparameters.

images

The computational complexity (FLOPs) and parameter count of all models were profiled using the thop (Torch-OpCounter) library (version 0.1.1-post2207132038) with an input size of 640 × 640. These metrics serve as hardware-agnostic indicators of model efficiency. While the significant reduction in FLOPs and parameters strongly indicates real-time potential, direct measurement of inference latency (e.g., frames per second) on specific edge deployment hardware (e.g., NVIDIA Jetson devices) was not conducted due to constraints in hardware availability. This empirical validation remains an important aspect for future work focused on deployment.

4.2 Evaluation Metrics

We rigorously evaluate model performance using the following metrics, focusing on detection accuracy, model complexity, and computational efficiency:

Specifically, Precision measures the proportion of correct detections among all positive predictions. Recall measures the proportion of actual positives successfully detected. The mAP, particularly mAP@0.5 and mAP@0.5:0.95, serves as the primary holistic accuracy metric, evaluating performance across classes and varying localization strictness (crucial for low-light tar-gets with ambiguous contours). FLOPs estimates computational cost, and θ measures model size and storage footprint. The core metrics are calculated as follows:

Precision=TPTP+FP(6)

Recall=TPTP+FN(7)

mAP=1N∑i=1NAPi(8)

4.3 Comparative Results

4.3.1 Benchmark Model Comparison

To comprehensively evaluate the object detection performance of CLF-YOLO in general scenarios, we compare it with representative detectors, including two-stage (Faster R-CNN), single-stage (SSD), and YOLO series models (YOLOv3, YOLOv5s, YOLOv7-tiny, YOLOv8n). All compared detectors were retrained from scratch on the Night-Ships dataset under identical settings: input size 640 × 640, 200 epochs, and the same data augmentation techniques to ensure a fair comparison. Results are detailed in Table 5.

images

CLF-YOLO achieves a leading mAP@0.5 of 97.6%, outperforming most mainstream detectors: 3.2% higher than SSD, 2.4% higher than Faster R-CNN, 2.7% higher than YOLOv3, 0.2% higher than YOLOv7-tiny, and 0.7% higher than the baseline YOLOv8n. Even compared to YOLOv5s (which has a 0.1% higher mAP@0.5), CLF-YOLO shows stronger robustness in complex scenarios, as evidenced by its highest mAP@0.5:0.95 (76.0%).

In terms of parameter efficiency, CLF-YOLO’s 1.8 M parameters are significantly lower than mainstream models (e.g., 7.0 M for YOLOv5s, 6.0 M for YOLOv7-tiny), demonstrating its potential for lightweight deployment. These results confirm CLF-YOLO’s superiority in balancing accuracy and generalization for nighttime maritime detection.

4.3.2 Ablation Experiments and Analysis

An ablation study is performed to validate the effectiveness of individual improvement components: The model with only the enhanced Neck module is denoted as CM (Model 1). The model with only the enhanced detection head is denoted as LSCSBD (Model 2). The model with only the improved loss function is denoted as FM (Model 3). The fully improved model is denoted as CLF-YOLOv8 (Model 4). By incrementally adding improvement modules, the study compares the impact of each module individually and in combination on object detection performance, analyzing the contribution of each module to performance gains. Results are presented in Table 6.

images

Results of the ablation study (Table 6) demonstrate that the incremental introduction of the CM, LSCSBD, and FM modules significantly enhances model performance. Model 1 (CM) shows comprehensive gains over the baseline in Precision, Recall, and mAP@0.5. Model 2 (LSCSBD) achieves further performance optimization, reaching 77.6% mAP@0.5:0.95. Although Model 3 (FM) does not surpass this mAP@0.5:0.95 value, it still outperforms the baseline. The full model, CLF-YOLOv8, achieves the highest Precision (95.8%) and Recall (92.8%), validating the effectiveness of the integrated modules.

A noteworthy observation from Table 6 is that Model 3 (equipped only with the Focal-MPDIoU loss) achieved the highest mAP@0.5:0.95 (78.1%). This result is actually consistent with the design purpose of the Focal-MPDIoU loss, which is specifically optimized for improving regression accuracy, a core aspect measured by the stringent mAP@0.5:0.95 metric. The slight decrease in mAP@0.5:0.95 for the full CLF-YOLOv8 model (76.0%) compared to Model 3 (78.1%) is a known trade-off in lightweight design. The introduction of the highly efficient LSCSBD-Head, while reducing parameters by 87%, may slightly compromise the feature representation capacity for the most challenging cases that heavily influence the strict mAP@0.5:0.95 metric. However, the full model achieves the best overall balance, delivering superior performance on the primary mAP@0.5 metric (97.6%) and precision (95.8%) with the lowest parameter count. This balance is crucial for practical deployment where both efficiency and accuracy are paramount.

To evaluate the impact of different improvement strategies on model performance, this study plots Precision-Recall (P-R) curves for various models. The curves illustrate the performance of YOLOv8 and different improved versions across multiple ship categories. Specifically, Fig. 11a–e presents the category-specific P-R curves and mean Average Precision (mAP) of the following models on the dataset: YOLOv8, the proposed CLF-YOLOv8, Model 1, Model 2, and Model 3, respectively.

images

Figure 11: P-R curves of each model in ablation experiments. (a) YOLOv8. (b) CLF-YOLOv8. (c) Model 1. (d) Model 2. (e) Model 3

In summary, the ablation experiment results demonstrate that the integration of the CM, LSCSBD, and FM modules plays a critical role in enhancing the performance of the CLF-YOLOv8 model. Notably, significant improvements are achieved in metrics such as Precision and Recall, which validates the effectiveness of these proposed optimization methods.

4.3.3 Ablation Experiments and Analysis

To fully validate the superiority of CLF-YOLOv8 in nighttime ship detection, this section conducts two aspects of performance evaluation: comparisons with lightweight variants (to verify efficiency) and ablation studies on loss functions (to confirm the effectiveness of optimization).

While mainstream detectors have verified the overall performance, lightweight deployment is crucial for real-time nighttime maritime target detection. Therefore, we further compare CLF-YOLOv8 with state-of-the-art lightweight variants (e.g., YOLOv8-MobileNetv3, YOLOv8-Ghost) to verify its efficiency advantages. The results are presented in Table 7.

images

CLF-YOLOv8 stands out in the lightweight comparison: it achieves the highest mAP@0.5 (97.6%) with the lowest parameter count (1.8 M)—58.1% of YOLOv8n (3.1 M) and 94.7% of YOLOv8-Ghost (1.9 M). In computational efficiency, its 6.1 G FLOPs and 4.2 MB model size are comparable to dedicated lightweight variants (e.g., 5.7 G FLOPs for YOLOv8-Ghost) while maintaining superior accuracy. This confirms CLF-YOLOv8’s unique advantage in co-optimizing precision and lightweight design, meeting deployment requirements for resource-constrained maritime edge devices.

Additionally, to validate the effectiveness of the proposed Focal-MPDIoU loss in improving bounding box regression, we compare it with mainstream loss functions (CIoU, GIoU, DIoU, EIoU, SIoU) based on the YOLOv8 framework. Results are shown in Table 8.

images

The Focal-MPDIoU loss yields the highest precision (94.8%), recall (93.7%), and mAP@0.5 (97.2%) among all variants. Compared to the original CIoU loss, it improves P by 2.8%, R by 3.1%, and mAP@0.5 by 0.3%, outperforming other advanced losses (e.g., DIoU, SIoU). This validates that Focal-MPDIoU effectively addresses regression bias for low-overlap targets and enhances hard sample learning—key for accurate localization of small/overlapping ships in low-light conditions.

4.4 Visual Analysis

To visually evaluate nighttime ship detection performance, Grad-CAM [29] was utilized to generate heatmaps comparing the attention regions of YOLOv5, YOLOv8, and CLF-YOLOv8, as illustrated in Fig. 12.

images

Figure 12: Heatmap detection comparison. (a) Original image. (b) CLF-YOLOv8. (c) YOLOv5. (d) YOLOv8

Experimental results reveal that CLF-YOLOv8 fully activates ship regions in small-object scenarios (Fig. 12a,b), eliminating missed detections, whereas baseline models exhibit significant activation gaps. For overlapping objects (Fig. 12c) its heatmaps precisely cover core target areas with substantially lower bounding box localization errors than the blurred boundary responses of YOLOv5/YOLOv8. Under strong interference (Fig. 12d), CLF-YOLOv8 suppresses false activation of onshore structures and accurately distinguishes ships from background clutter, while baseline models generate intense false activations in non-target areas causing false alarms. This visualization confirms significant improvements in small-object recognition, overlapping-target separation, and robustness in complex backgrounds.

To qualitatively evaluate the attention improvement, visual inspection of the heatmaps (Fig. 12) demonstrates that CLF-YOLOv8 activates ship regions with more complete and precise attention particularly for small and overlapping targets while significantly reducing false activations in cluttered backgrounds such as shoreline structures water reflections. This visual analysis confirms “the model’s enhanced ability to distinguish target features from nighttime noise compared to baseline detectors.”

Further comparison with several mainstream object detection algorithms is conducted and their detection effects are presented, including the traditional YOLO series, Faster R-CNN, and SSD algorithms.

Figs. 13–15 illustrate the following observations: (1) For low-light small objects (Fig. 13), all vessels were accurately located by CLF-YOLOv8 (Fig. 13b). This performance surpassed Faster R-CNN (severe misses, Fig. 13c) and SSD (detection failures, Fig. 13d). Hybrid missed/false detections in YOLOv5 and YOLOv8 (Fig. 13e,f) were also avoided. (2) For dense overlapping targets (Fig. 14), precise target separation was achieved. This contrasts with Faster R-CNN’s target merging errors (Fig. 14c). This demonstrates the collaborative effect of CSP-MSCB’s multi-scale feature extraction and Focal-MPDIoU’s boundary optimization. (3) Under strong glare interference (Fig. 15), illuminated vessels were accurately detected by CLF-YOLOv8 (Fig. 15b). Performance exceeded YOLOv5/YOLOv8 (50% miss rate, Fig. 15e,f), SSD (target mis-merging, Fig. 15d), and Faster R-CNN (separation failure, Fig. 15c). This validates LSCSBD-Head’s lightweight prediction efficiency and CSP-MSCB’s anti-interference feature extraction. Combined with the 97.6% mAP@0.5, these results confirm the core components (CSP-MSCB, LSCSBD-Head, Focal-MPDIoU) enable high-precision nighttime ship detection. Enhanced small-target sensitivity, overlapping object resolution, and extreme-environment adaptability are achieved.

images

Figure 13: Detection results: Group 1. (a) Original image. (b) CLF-YOLOv8. (c) Faster-RCNN. (d) SSD. (e) YOLOv5. (f) YOLOv8

images

Figure 14: Detection results: Group 2. (a) Original image. (b) CLF-YOLOv8. (c) Faster-RCNN. (d) SSD. (e) YOLOv5. (f) YOLOv8

images

Figure 15: Detection results: Group 3. (a) Original image. (b) CLF-YOLOv8. (c) Faster-RCNN. (d) SSD. (e) YOLOv5. (f) YOLOv8

5 Discussion

The core value of this study lies in addressing the bottleneck of balancing “lightweight” and “high accuracy” in nighttime ship detection. The performance of CLF-YOLOv8—achieving 97.6% mAP@0.5 with only 1.8 M parameters on the Night-Ships dataset—is not the result of isolated improvements, but rather the inevitable outcome of the synergistic effect of the three innovative mechanisms:

From a technical mechanism perspective: BiFPN’s weighted multi-scale fusion resolves the issue of insufficient sensitivity of traditional PANet to weak features (e.g., blurred nighttime ship contours). Its dynamic weight allocation enhances the feature expression intensity of small targets (e.g., distant fishing boats), which is the key reason for its superiority over baseline models in mAP@0.5:0.95. The LSCSBD-Head, through layer-wise batch normalization, resolves the contradiction that “parameter compression inevitably impairs accuracy”; its 87% parameter reduction ratio far exceeds that of comparable lightweight schemes (e.g., the 5% parameter reduction advantage of YOLOv8-Ghost) while preserving the discriminative power of cross-scale features. The Focal-MPDIoU loss, by combining geometric constraints (MPDIoU) and dynamic weights (Focal), demonstrates superior stability and specifically addresses the regression gradient vanishing problem for low-overlap targets (e.g., partially occluded ships). This is conclusively evidenced by its significant contribution to the improvement in the mAP@0.5:0.95 metric observed in the ablation study, with a particularly notable improvement observed on challenging low-IoU samples. Furthermore, the philosophy of our co-design approach, enhancing model performance through tailored architectural innovations, aligns with advancements in challenging visual detection tasks for unstructured scenarios, such as precise target localization in complex maritime edge environments, which is consistent with the geometric-aware feature learning strategy in [30]. This design logic also resonates with progress in lightweight visual technology applications, such as the development of 3D vision systems for industrial structural defect recognition, where scenario-specific architectural optimization similarly drives performance improvements [31]. This cross-scenario consistency further supports the rationality of our scenario-specific optimization for nighttime maritime detection.

Compared with existing studies, the unique value of this solution is reflected in two aspects: Firstly, existing lightweight models (e.g., Zheng et al.’s MobileNetV3-based design) sacrifice 1%–3% accuracy for parameter compression, whereas CLF-YOLOv8 reduces parameters by 42% (compared to YOLOv8n) while increasing accuracy by 0.7%. This validates the effectiveness of “structural innovation over mere pruning.” Secondly, the Night-Ships dataset, through CycleGAN transfer and dual-threshold cleaning, addresses the “small quantity and poor quality” issues of traditional nighttime datasets. Its 5532 images covering 6 ship scenarios provide a reliable benchmark for verifying the generalization of nighttime detection models.

It is important to acknowledge the limitations: Firstly, the mAP@0.5:0.95 (76.0%) still has room for improvement, which is associated with insufficient feature modeling in extreme nighttime scenarios (e.g., target contour distortion caused by intense wave reflections). Secondly, while the significant reduction in FLOPs and parameters strongly indicates real-time potential, direct measurement of inference latency on edge deployment hardware (e.g., Jetson devices) was not conducted due to constraints in hardware availability and remains an important empirical validation for future work. This issue becomes more obvious in harsh maritime weather like fog or rain common scenarios in real navigation but underrepresented in the current Night-Ships dataset constructed via CycleGAN day-night transfer and dual-threshold cleaning as detailed in Section 3. Specifically, fog causes atmospheric scattering that weakens edge features of small ships (e.g., fishing boats) reducing the CSP-MSCB module’s ability to capture multi-scale details; rain brings water surface glare and raindrop occlusion disrupting the EUCB module’s feature enhancement and increasing bounding box regression errors for overlapping targets. Future optimization can proceed in two directions: first, introducing temporal feature fusion to leverage inter-frame continuity in video sequences, enhancing robustness to dynamic noise from waves or raindrops; second, expanding the model’s adaptability to fog/rain through adversarial data augmentation and multi-modal sensor fusion (e.g., radar infrared), drawing inspiration from the progress in satellite and SAR-based maritime detection under similarly challenging conditions.

The design philosophy of CLF-YOLOv8 is not limited to nighttime ship detection. Its lightweight feature fusion, layer-wise detection head, and dynamic loss mechanism can provide a general technical reference for edge-side real-time detection in low-light environments.

6 Conclusions

This study was driven by the critical challenge of balancing high accuracy with computational efficiency in nighttime maritime detection. To this end, we introduced CLF-YOLOv8, a novel framework that demonstrates the power of a synergistic co-design strategy across data, model architecture, and optimization objectives. Our work proves that significant model lightweighting does not necessitate a compromise in accuracy; rather, through thoughtful architectural innovation, both can be achieved simultaneously.

The primary contribution of this research is threefold. First, we provide a systematic solution to the data scarcity problem through the construction of the high-quality Night-Ships dataset, establishing a valuable benchmark for the community. Second, our technical innovations—the feature fusion enhancements in the Neck, the parameter-sharing mechanism in the LSCSBD-Head, and the targeted Focal-MPDIoU loss—collectively address the core challenges of weak features, model redundancy, and low-overlap regression in nighttime scenarios. Third, and most importantly, the design philosophy of CLF-YOLOv8 offers a generalizable blueprint for developing efficient and accurate vision systems beyond maritime detection, particularly for other edge-based applications operating under low-visibility conditions.

The implications of this work extend to real-world maritime safety and autonomous navigation, where reliable, real-time perception under constrained resources is paramount. Future efforts will focus on enhancing model robustness in more severe conditions (e.g., fog and rain) and exploring further compression techniques for deployment on ultra-low-power embedded platforms.

Acknowledgement: The authors would like to extend their gratitude to the Weihai Institute of Marine Information Science and Technology for their substantial support in dataset construction and experimental validation. Additionally, the authors would like to express their sincere appreciation to Joseph Redmon and his team, the pioneers of the YOLO network, whose groundbreaking work on real-time object detection laid a critical foundation for our improved CLF-YOLOv8 model in nighttime ship detection tasks.

Funding Statement: The authors gratefully acknowledge support from the Shandong Provincial Key Research and Development Program (Grant No. 2024SFGC0201).

Author Contributions: The authors confirm their contribution to the paper as follows: Study conception and design: Zhonghao Wang and Xin Liu; Data collection: Zhonghao Wang, Changhua Yue and Haiwen Yuan; Analysis and interpretation of results: Changhua Yue; Draft manuscript preparation: Zhonghao Wang and Haiwen Yuan. All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials: The SeaShips dataset used in this experiment is publicly available. The Night-Ships dataset and the core code for CLF-YOLOv8 will be made publicly upon acceptance of this paper.

Ethics Approval: Not applicable.

Conflicts of Interest: The authors declare no conflicts of interest to report regarding the present study.

References

1. Girshick R. Fast R-CNN. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV); 2015 Dec 7–13; Santiago, Chile. [Google Scholar]

2. Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 2017;39(6):1137–49. doi:10.1109/TPAMI.2016.2577031. [Google Scholar] [PubMed] [CrossRef]

3. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27–30; Las Vegas, NV, USA. p. 779–88. doi:10.1109/CVPR.2016.91. [Google Scholar] [CrossRef]

4. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, et al. SSD: single shot MultiBox detector. In: Computer Vision—ECCV 2016. Cham, Switzerland: Springer International Publishing; 2016. p. 21–37. doi:10.1007/978-3-319-46448-0_2. [Google Scholar] [CrossRef]

5. Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22–29; Venice, Italy. p. 2999–3007. doi:10.1109/ICCV.2017.324. [Google Scholar] [CrossRef]

6. Gao Y, Wu Z, Ren M, Wu C. Improved YOLOv4 based on attention mechanism for ship detection in SAR images. IEEE Access. 2022;10:23785–97. doi:10.1109/ACCESS.2022.3154474. [Google Scholar] [CrossRef]

7. Xu CA, Su H, Gao L, Wu JF, Yan WJ, Jian T, et al. Feature aligned ship detection based on improved RPDet in SAR images. Displays. 2022;74:102191. doi:10.1016/j.displa.2022.102191. [Google Scholar] [CrossRef]

8. Woo S, Debnath S, Hu R, Chen X, Liu Z, Kweon IS, et al. ConvNeXt V2: co-designing and scaling ConvNets with masked autoencoders. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 Jun 17–24; Vancouver, BC, Canada. p. 16133–42. doi:10.1109/CVPR52729.2023.01548. [Google Scholar] [CrossRef]

9. Kim M, Jeong J, Kim S. ECAP-YOLO: efficient channel attention pyramid YOLO for small object detection in aerial image. Remote Sens. 2021;13(23):4851. doi:10.3390/rs13234851. [Google Scholar] [CrossRef]

10. Li D, Zhang Z, Fang Z, Cao F. Ship detection with optical image based on CA-YOLO v3 network. In: 2023 3rd International Conference on Frontiers of Electronics, Information and Computation Technologies (ICFEICT); 2023 May 26–29; Yangzhou, China. p. 589–98. doi:10.1109/ICFEICT59519.2023.00103. [Google Scholar] [CrossRef]

11. Zhao X, Song Y, Shi S, Li S. Improving YOLOv5n for lightweight ship target detection. In: 2023 IEEE 3rd International Conference on Computer Systems (ICCS); 2023 Sep 22–24; Qingdao, China. p. 110–5. doi:10.1109/ICCS59700.2023.10335505. [Google Scholar] [CrossRef]

12. Zheng Y, Zhang Y, Qian L, Zhang X, Diao S, Liu X, et al. A lightweight ship target detection model based on improved YOLOv5s algorithm. PLoS One. 2023;18(4):e0283932. doi:10.1371/journal.pone.0283932. [Google Scholar] [PubMed] [CrossRef]

13. Chang YL, Anagaw A, Chang L, Wang YC, Hsiao CY, Lee WH. Ship detection based on YOLOv2 for SAR imagery. Remote Sens. 2019;11(7):786. doi:10.3390/rs11070786. [Google Scholar] [CrossRef]

14. Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D. Distance-IoU loss: faster and better learning for bounding box regression. Proc AAAI Conf Artif Intell. 2020;34(7):12993–3000. doi:10.1609/aaai.v34i07.6999. [Google Scholar] [CrossRef]

15. Ma S, Xi Y. Mpdiou: a loss for efficient and accurate bounding box regression. arXiv:2307.07662. 2023. [Google Scholar]

16. Nie Y, Lai H, Gao G. DSOD-YOLO: a lightweight dual feature extraction method for small target detection. Digit Signal Process. 2025;164:105268. doi:10.1016/j.dsp.2025.105268. [Google Scholar] [CrossRef]

17. Xu X, Jiang Y, Chen W, Huang Y, Zhang Y, Sun X. DAMO-YOLO: a report on real-time object detection design. arXiv:2211.15444. 2022. [Google Scholar]

18. Bakirci M. Advanced ship detection and ocean monitoring with satellite imagery and deep learning for marine science applications. Reg Stud Mar Sci. 2025;81:103975. doi:10.1016/j.rsma.2024.103975. [Google Scholar] [CrossRef]

19. Bakirci M, Bayraktar I. Assessment of YOLO11 for ship detection in SAR imagery under open ocean and coastal challenges. In: 2024 21st International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE); 2024 Oct 23–25; Mexico City, Mexico. p. 1–6. doi:10.1109/CCE62852.2024.10770926. [Google Scholar] [CrossRef]

20. Awaluddin BA, Chao CT, Chiou JS. Investigating effective geometric transformation for image augmentation to improve static hand gestures with a pre-trained convolutional neural network. Mathematics. 2023;11(23):4783. doi:10.3390/math11234783. [Google Scholar] [CrossRef]

21. Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV. Autoaugment: learning augmentation policies from data. arXiv:1805.09501. 2018. [Google Scholar]

22. Liang W, Liang Y, Jia J. MiAMix: enhancing image classification through a multi-stage augmented mixed sample data augmentation method. Processes. 2023;11(12):3284. doi:10.3390/pr11123284. [Google Scholar] [CrossRef]

23. Kim YY, Song K, Jang JH, Moon I. Lada: look-ahead data acquisition via augmentation for deep active learning. Adv Neural Inf Process Syst. 2021;34:22919–30. [Google Scholar]

24. Tran NT, Tran VH, Nguyen NB, Nguyen TK, Cheung NM. On data augmentation for GAN training. IEEE Trans Image Process. 2021;30:1882–97. doi:10.1109/tip.2021.3049346. [Google Scholar] [PubMed] [CrossRef]

25. Tan M, Pang R, Le QV. EfficientDet: scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 13–19; Seattle, WA, USA. p. 10778–87. doi:10.1109/cvpr42600.2020.01079. [Google Scholar] [CrossRef]

26. Rahman MM, Munir M, Marculescu R. EMCAD: efficient multi-scale convolutional attention decoding for medical image segmentation. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2024 Jun 16–22; Seattle, WA, USA. p. 11769–79. doi:10.1109/CVPR52733.2024.01118. [Google Scholar] [CrossRef]

27. Odena A, Dumoulin V, Olah C. Deconvolution and checkerboard artifacts. Distill. 2016;1(10):e3. doi:10.23915/distill.00003. [Google Scholar] [CrossRef]

28. Shao Z, Wu W, Wang Z, Du W, Li C. SeaShips: a large-scale precisely annotated dataset for ship detection. IEEE Trans Multimed. 2018;20(10):2593–604. doi:10.1109/TMM.2018.2865686. [Google Scholar] [CrossRef]

29. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22–29; Venice, Italy. p. 618–26. doi:10.1109/iccv.2017.74. [Google Scholar] [CrossRef]

30. Wang H, Zhang G, Cao H, Hu K, Wang Q, Deng Y, et al. Geometry-aware 3D point cloud learning for precise cutting-point detection in unstructured field environments. J Field Robot. 2025;42(7):3063–76. doi:10.1002/rob.22567. [Google Scholar] [CrossRef]

31. Hu K, Chen Z, Kang H, Tang Y. 3D vision technologies for a self-developed structural external crack damage recognition robot. Autom Constr. 2024;159:105262. doi:10.1016/j.autcon.2023.105262. [Google Scholar] [CrossRef]

Cite This Article

APA Style

Wang, Z., Liu, X., Yue, C., Yuan, H. (2026). CLF-YOLOv8: Lightweight Multi-Scale Fusion with Focal Geometric Loss for Real-Time Night Maritime Detection. Computers, Materials & Continua, 86(2), 1–23. https://doi.org/10.32604/cmc.2025.071813

Vancouver Style

Wang Z, Liu X, Yue C, Yuan H. CLF-YOLOv8: Lightweight Multi-Scale Fusion with Focal Geometric Loss for Real-Time Night Maritime Detection. Comput Mater Contin. 2026;86(2):1–23. https://doi.org/10.32604/cmc.2025.071813

IEEE Style

Z. Wang, X. Liu, C. Yue, and H. Yuan, “CLF-YOLOv8: Lightweight Multi-Scale Fusion with Focal Geometric Loss for Real-Time Night Maritime Detection,” Comput. Mater. Contin., vol. 86, no. 2, pp. 1–23, 2026. https://doi.org/10.32604/cmc.2025.071813

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

CLF-YOLOv8: Lightweight Multi-Scale Fusion with Focal Geometric Loss for Real-Time Night Maritime Detection

Abstract

Keywords

References

Cite This Article

410

180

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link