EfficientShip: A Hybrid Deep Learning Framework for Ship Detection in the River

Huafeng Chen; Junxing Xue; Hanyun Wen; Yurong Hu; Yudong Zhang

doi:10.32604/cmes.2023.028738

icon Open Access

ARTICLE

EfficientShip: A Hybrid Deep Learning Framework for Ship Detection in the River

Huafeng Chen¹, Junxing Xue², Hanyun Wen², Yurong Hu¹, Yudong Zhang^3,*

1 School of Computer Engineering, Jingchu University of Technology, Jingmen, 448000, China
2 School of Computer Science, Yangtze University, Jingzhou, 434023, China
3 School of Computing and Mathematic Sciences, University of Leicester, Leicester, LE1 7RH, UK

* Corresponding Author: Yudong Zhang. Email: email

Computer Modeling in Engineering & Sciences 2024, 138(1), 301-320. https://doi.org/10.32604/cmes.2023.028738

Received 05 January 2023; Accepted 05 May 2023; Issue published 22 September 2023

Abstract

Optical image-based ship detection can ensure the safety of ships and promote the orderly management of ships in offshore waters. Current deep learning researches on optical image-based ship detection mainly focus on improving one-stage detectors for real-time ship detection but sacrifices the accuracy of detection. To solve this problem, we present a hybrid ship detection framework which is named EfficientShip in this paper. The core parts of the EfficientShip are DLA-backboned object location (DBOL) and CascadeRCNN-guided object classification (CROC). The DBOL is responsible for finding potential ship objects, and the CROC is used to categorize the potential ship objects. We also design a pixel-spatial-level data augmentation (PSDA) to reduce the risk of detection model overfitting. We compare the proposed EfficientShip with state-of-the-art (SOTA) literature on a ship detection dataset called Seaships. Experiments show our ship detection framework achieves a result of 99.63% (mAP) at 45 fps, which is much better than 8 SOTA approaches on detection accuracy and can also meet the requirements of real-time application scenarios.

Keywords

Ship detection; deep learning; data augmentation; object location; object classification

1 Introduction

With the continuous advancement of technology and the rapid development of industrial production, international trade is gradually increasing. The market of the shipping industry is also flourishing. In order to ensure the safety of ships and promote the orderly management of ships, satellites (generate SAR images) are used to monitor ships at sea [1] and surveillance cameras (generate optical images) are adopted for tracking ships in offshore waters [2,3]. At the technical level, with the maturity of artificial intelligence technology [4], computer-aided methods of ship classification, ship instance segmentation and ship detection from images are studied to reduce the burden on human monitors [5]. We focus on ship detection based on optical images generated by surveillance cameras in this paper.

In recent years, deep learning-based ship detection has become a hot research area [6–8]. Sea ship detection is one of the general object detections [9]. Researches on deep learning based object detection can be roughly split into two classifications: One-stage detectors and two-stage detectors [10]. One-stage detectors combine object location and classification in one deep learning framework, while two-stage detectors find object location in the first place and classify the potential objects secondly. Representative one-stage detection algorithms are RetinaNet [11], FCOS [12], CenterNet [13], ATSS [14], PAA [15], BorderDet [16], and YOLO series [17–21]. Mainstream two-stage object detection approaches are R-CNN [22], SPPNet [23], Fast RCNN [24], Faster RCNN [25], FPN [26], Cascade RCNN [27], Grid RCNN [28], and CenterNet2 [29].

Generally, the one-stage detector is considered to have a faster detection speed, while the two-stage detection algorithm has higher detection accuracy. While recent methods of ship detection [3,30–37] focus on improving one-stage detectors for real-time ship detection, they sacrifice the accuracy of detection. In this paper, we present a real-time two-stage ship detection algorithm, which improves detection accuracy while ensuring real-time performance. The algorithm includes two parts: the DLA-backboned object location (DBOL) and the CascadeRCNN-guided object classification (CROC). To further improve the accuracy of ship detection, we design a novel pixel-spatial-level data augmentation (PSDA) for increasing the number of samples at a high multiple and effectively. The PSDA, DBOL and CROC make up the proposed hybrid deep learning framework of EfficientShip.

The contributions of this study can be summarized as follows:

(1) The DBOL is presented for finding potential ship objects in real time. We integrate DLA [38], ResNet-50 [39] and CenterNet [13] into DBOL for evaluating object likelihoods quickly and accurately.

(2) The CROC is put forward to real-time categorize the potential ship objects. We calculate the category scores of suspected objects based on conditional probability and extrapolate the final detection.

(3) The PSDA is proposed to reduce the risk of the model overfitting. We amplify the original data by 960 times based on pixel and spatial image augmentation.

(4) Our EfficientShip (includes PSDA, DBOL and CROC) gets the best performance compared with 8 existing SOTA methods: 99.63% (accuracy) with 45 fps (speed).

2 Related Work

2.1 Ship Detection

Ship detection can be divided into SAR image-based [5,40] and optical image-based ship detection [2,3]. Here we focus on reviewing optical image-based ship detection. Traditional optical image-based ship detection use hand-crafted features which sliding window to obtain the candidate area of the ship target based on the saliency map algorithm or the visual attention mechanism. The features of the candidate target are extracted for training to obtain the detection model [41,42].

Recently, deep learning-based ship detection has attracted researchers’ attention. Shao et al. [3] introduced a CNN framework on the basis of saliency-aware for ship detection. Based on YOLOv2, the ship’s location and classification under a complex environment were inferred by CNN firstly and were refined through saliency detection. Sun et al. [32] presented an algorithm named NSD-SSD for real-time ship detection. They combined dilated convolution and multiscale feature to promote knockdown performance in detecting a small object of a ship. For getting the inferring score of every class and the variation of every prior bounding box, they also designed a batch of convolution filters at every trenchant feature layer. They finally reconstructed prior boxes with K-means clustering to advance visual accuracy and the ship-detecting efficiency.

Liu et al. [31] have designed an advanced CNN-enabled learning method for promoting ship detection under different weather conditions. On the basis of YOLOv3, they devised new scale of anchor boxes, localization probability of bounding boxes, soft non-maximum suppression, and medley loss function for advancing the CNN capacities of learning and expression. On the other hand, they introduced an agile DA tactics through produce synthetically-degraded pictures to enlarge the capacity and diversity of rudimentary ship detection dataset. Considering the influence of meteorological factors on ship detection accuracy, Nie et al. [30] synthesized foggy images and low visibility pictures via exploiting physical models separately. They trained YOLOv3 on the expanded dataset, including both composite and original ship pictures and illustrated that the trained model achieved excellent ship detection accuracy within a variety of weather conditions. For real-time ship detection, Li et al. [33] concentrated the network of YOLOv3 by training predetermined anchors based on the annotations of Seaship, instead max-pooling layer with convolution layer, expanding channels of prediction network to promote the detection ability of tiny object, and embedding CBAM attention module into the backbone network to facilitate the model focusing on the object. Liu et al. [43] proposed two new anchor-setting methods, the average method and the select-all method, for detecting ship targets on the basis of YOLOv3. Additionally, they adopt the feature fusion structure of cross PANet for combining the different anchor-setting methods. Chen et al. [35] introduced the AE-YOLOv3 for real-time end-to-end ship identification. AE-YOLOv3 was merged in the feature attention module, embedded with the feature extraction network, and fused through multiscale feature enhancement model.

Liu et al. [34] presented a method of RDSC on the basis of YOLOv4 by reducing more than 40% weights compared to the original one. The improved lightweight algorithm achieved a tinier network volume and preferable real-time performance on ship detection. Zhang et al. [36] presented a lightweight CNN named Light-SDNet for detecting ships under various weather conditions. Based on YOLOv5, they modificated CA-Ghost, C3Ghost, and DWConv to decrease the model parameters size. They designed a hybrid training tactic by deriving jointly-degraded pictures to expand the number of the primitive dataset. Zhou et al. [37] improved YOLOv5 for ship target detection, and named it as YOLO-Ship, which adopted MixConv to update classical convolution operation and concordant attention framework. At decision stage, they employed Focal Loss and CIoU Loss for optimizing raw cost functions.

In order to reach the goal of real-time application while obtaining detection accuracy, most of the above algorithms choose a one-stage detection algorithm as the basis for improvement. Different from these methods, we present a real-time approach of two-stage detection as the main ship detection framework and verify its accuracy and real-time performance through experiments.

2.2 Data Augmentation (DA)

Image data collection and labeling are very labor-intensive. Due to funding constraints, ship detection datasets usually have only thousands of annotated images [2]. But the deep learning model has many parameters and requires tens of thousands of data for training. While a deep convolutional neural network (CNN) learns a function that has a very high correlation with the small training data, it is poorly generalizable to testing set (overfitting). Data augmentation technology can simulate training image data through lighting variations, occlusion, scale and orientation variations, background clutter, object deformatio, etc., so that the deep learning model is robust to these disturbances and reducing overfitting on testing data [44,45].

Image DA algorithms can be split into basic image manipulations and deep learning approaches [44]. Basic image manipulations change original image pixels while the image label is conserved. Basic image manipulations include geometric transformations, color space transformations, kernel filters and random erasing. Image geometric transformations shift the geometry of image without altering its actual pixel values. Simple geometric transformations cover flipping, cropping, rotation and translation. Color space transformations will shift pixel values through an invariable number, separate RGB color channel or limit pixel values into a range. The methods of kernel filter sharpen or blur original images via sliding of filter matrix across training image. Inspired by CNN dropout regularization, random erasing does the operation of masking training image patch with the values 0, 255, or random number. Taylor et al. proved the effectiveness of geometric and color space transformations [46], while Zhong et al. verified the performance of random erasing through experiments [47]. Xu et al. presented a novel shadow enhancement named SBN-3D-SD for higher detection-tracking accuracy [48].

Deep learning-based augmentation adopts learning methods to produce synthetic examples for training data. It can be divided into adversarial training based DA, GAN-based DA, neural transfer based DA, and meta-learning-based DA [44]. Adversarial training based DA generates adversarial samples and inserts them into the training set so that the inferential model can learn from the adversarial samples during training [49]. Method of GAN is an unsupervised generative model that can generate synthetic data given a random noise vector. Adding the data generated by GAN-based DA into the training set can optimize deep learning model parameters [50]. The idea of neural style transfer is to manipulate sequential features across a CNN so that the image pattern can be shifted into other styles while retaining its primitive substance. Meta-learning-based DA uses a pre-prepared neural network to learn DA parameters from medley images, Neural Style Transfer, and geometric transfigurations. The image generated by deep learning-based augmentation is abstract and cannot pinpoint target bounding boxes. So it is not suitable for ship detection.

3 Methodology

In this section, we describe the method of EfficientShip for ship detection. It includes proposed PSDA, DBOL and CROC (as shown in Fig. 1).

images

Figure 1: The architecture of the proposed EfficientShip. PSDA is used for expanding the amount of image sample; DBOL is responsible for detecting potential objects; CROC tries to identify the potential objects

3.1 Proposed PSDA

The ship detection dataset is small for the current study. Therefore, we present a method named PSDA to counteract the overfitting of the ship detection model. PSDA includes pixel level DA (PDA), spatial level DA (SDA), and their combination. PDA will change the content of the input image at the pixel level, and SDA is to perform geometric transformations on it.

Suppose the number of DA methods we used is mda, and a train image xtr(i)∈Xtr, where Xtr indicates the train set. Each DA method will generate nda (as shown in Fig. 2), for every image will produce mda×nda new images. At the pixel level, we will perform five subsequent DA methods for the training image set Xtr.

images

Figure 2: Schematic of proposed PSDA. (a) PDA is used for expanding the amount of image sample at pixel level; (b) SDA is used for expanding the amount of image sample at spatial level

(I) Image Blur

Applying an image blur algorithm to a raw image can generate nda images.

xtr_p1(i)→=FIB[xtr(i)]=[x1tr_p1(i),⋯,xnDAtr_p1(i)](1)

where FIB means a certain image blur function [51]. The functions include Gaussian blur, glass blur, median blur, motion blur, zoom blur, etc.

(II) Noise Injection

New nda images were generated by noise injection.

xtr_p2(i)→=FNI[xtr(i)]=[x1tr_p2(i),⋯,xnDAtr_p2(i)](2)

where FNI means a noise injection function [51]. Noise injection algorithms include Gaussian noise, ISO noise, multiplicative noise, etc.

(III) Color Jitter

Color jitter generates a minor variations of color values in the training image.

xtr_p3(i)→=FCJ[xtr(i)]=[x1tr_p3(i,hbf),x1tr_p3(i,hcf),x1tr_p3(i,hsf),⋯,xnDAtr_p3(i,hsf)](3)

where FCJ denotes color jitter [51]. Color jitter can be operated from three aspects: hbf-brightness, hcf-contrast and hsf-saturation.

(IV) Color Shift

Color shift is color variation caused by different fade rates of dyes or imbalance of dyes within a picture patch.

xtr_p4(i)→=FCS[xtr(i)]=[x1tr_p4(i,trf),x1tr_p4(i,tbf),x1tr_p4(i,tgf),⋯,xnDAtr_p4(i,tgf)](4)

where FCS means color shift [51]. Color shift can be operated from three channels: trf-red, tbf-blue and tgf-green.

(V) Random Generation

Random generation method can generate new images by performing multiple operations on original image pixels, such as brightness, contrast, gamma correction, curve, fog, rain, shadow, snow, sun flare, etc. Each training image in Xtr is operated nda times through random generation gop. The variation range of gop is [−az,+az] and complies with the distribution V.

gopi∼V[−MSR,+MSR](5)

where MSR is the maximum operation range [52]. Hence, we have

xtr_p5(i)=FRG[xtr(i)]=[x1tr_p5(i,gop1),x1tr_p5(i,gop2),⋯,xnDAtr_p5(i,gopnDA)](6)

where FRG means random generation [45].

At the spatial level, the image transformation will not change the original image content, but the object bounding box will be transformed along with the transformation. The main transformations are:

(I) Image Affine

Image affine is a common geometric transformation that preserves the collinearity between pixels. It includes translation, rotation, scaling, shear and their combination.

xtr_s1(i)=FIR[xtr(i)]=[x1tr_s1(i,ha),x2tr_s1(i,ha),⋯,xnDAtr_s1(i,ha)](7)

where FIR means the image affine function, ha represents an operation of translation, rotation, scaling, or shear [45].

(II) Image Cropping

Image cropping can freely crop the input image to any size.

xtr_s2(i)=FIC[xtr(i)]=[x1tr_s2(i),x2tr_s2(i),⋯,xnDAtr_s2(i)](8)

where FIC means the image cropping function [52].

(III) Elastic Transform

Elastic transformation alters the silhouette of the input picture upon the application of a force within its elastic limit. It is controlled by the parameters of the Gaussian filter and affine.

xtr_s3(i)=FET[xtr(i)]=[x1tr_s3(i),x2tr_s3(i),⋯,xnDAtr_s3(i)](9)

where FET means the elastic transform function [45].

Algorithm 1 shows the pseudocode of PSDA on one training image xtr(i).

images

3.2 Proposed DLA-Backboned Object Location (DBOL)

The main task in the first step of two-stage object detection is to produce a number of patch bounding boxes with different proportions and sizes according to the characteristic features such as texture, color and other details of the image. Some of the patches represented by bounding boxes contain target, while others only involve background.

As Fig. 1 illustrated, the first step of two-stage ship detection is to generate a set of K ship detections as bounding boxes b1,⋯,bK. We use P(Ok) to indicates the likelihood of the object Ok with an unknown category. We can get

P(Ok)={0,background1,target waiting to be classified(10)

where P(Ok)=0 shows the object Ok is the background while P(Ok)=1 implies the things Ok in bounding box is a target waiting to be classified [29].

The network architecture of the proposed DBOL is shown n Fig. 3. We select compact DLA [38] as CNN backbone for inferring P(Ok) in the first stage of real-time object detection. The compact DLA runs on the basis of ResNet-50 [39]. The method of CenterNet [13] is used for finding objects as keypoints and regressing to bounding box parameters. The DLA-based feature pyramid generates feature maps from stride 8 to 128. A 4-level regression branch and classification branch are used for all feature pyramids to generate a detection heatmap and bounding box map. During the phase of training, annotations of the actual center are allocated to given feature pyramid levels based on the object scale. Locations are added into the 3×3 neighbor of the center, which will yield superior bounding box as positives. The distance between boundaries is used as the representation of the bounding box, and the gIoU cost is adopted for bounding box regression.

images

Figure 3: The architecture of the proposed DBOL. “Conv*” is convolution operation, “C3, C4, C5” denote the feature maps of the backbone network, “P3, P4, P5” are the feature levels used for the final prediction, “H*” is network head, “B*” is bouding box of proposals, “C0” is object classification

3.3 Proposed CascadeRCNN-Guided Object Classification (CROC)

For every ship target k, the class distribution is dk(c)=P(Ck=c) to class c∈C∪{background}, where C is a collection of all ship classes. And P(Ck|Ok) designates the conditional categorical classification at the second detection stage. If the equation P(Ok)=0 holds, then Ck=background, which means P(Ck=background|Ok=0)=1.

The conjoint category distribution of the ship detection is

P(Ck)=∑oP(Ck|Ok=o)P(Ok=o)(11)

where o indicates an arbitrary object in the image [29]. Maximum likelihood estimation is employed for training the detectors. For every labeled object, we maximize

log⁡P(Ck)=log⁡P(Ck|Ok=1)+log⁡P(Ok=1)(12)

to decrease to conjoint maximum likelihood objects of the two stages, respectively [29]. The maximum-likelihood objective of the background class is

log⁡P(background)=log⁡(P(background|Ok=1)P(Ok=1)+P(Ok=0))(13)

The architecture of the proposed CROC is shown in Fig. 4. In this stage of detection, we select CascadeRCNN [27] for inferring P(Ck|Ok) on the basis of P(Ok), which is deduced from the first stage. At each cascade stage t, CascadeRCNN has a classifier ht optimal for IoU threshold value ut (ut>ut−1). This is learned through reducing the cost

L(xt,g)=Lcls(ht(xt),yt)+λ[yt≥1]Lloc(ft(xt,bt),g)(14)

where bt=ft−1(xt−1,bt−1), g is the ground truth object classification for xt, λ=1 is the trade-off coefficient, [⋅] is the indicator function, yt is the label of xt under given ut [27].

images

Figure 4: The architecture of the proposed CROC. The Feature Map is generated from DLA-34 backbone network, “H*” is the network head, “B*” is the bouding box of proposals, “B0” is the bounding box of proposals produced in Fig. 3

Algorithm 2 shows the pseudocode of the CROC training process.

images

4 Experimental Result and Analysis

In this section, we evaluate the proposed EfficientShip on Seaships [2] dataset. The experiments use Pytorch (1.11.0) library which is installed in Ubuntu 20.04. The model parameters are trained on an NVIDIA GeForce RTX 3090 GPU with 24 GB RAM. And the CPU is Intel(R) Xeon(R) Platinum 8255C with 45 GB RAM.

4.1 Dataset and Evaluation Metrics

The dataset we selected in this paper is SeaShips [2]. The dataset has 7000 images and includes six categories: bulk cargo carrier, container ship, fishing boat, general cargo ship, ore carrier, and passenger ship. Fig. 5 shows the appearance of different ships in SeaShips. The resolution of images is 1920 × 1080. All pictures in the dataset are selected from 5400 real-world video segments generated by 156 monitoring cameras in the coastline surveillance system. It covers targets of different backgrounds, scales, hull parts, illumination, occlusions and viewpoints. We randomly divide the dataset into a training set and a test set with proportion of 9:1 for the experiments followed by [35].

images

Figure 5: Illustration of different ship samples and their labels in the SeaShips dataset. (a) bulk cargo carrier; (b) container ship; (c) fishing boat; (d) general cargo ship; (e) ore carrier; (f) passenger ship

Experimental evaluation metrics include ship detection accuracy and runtime. The runtime is reported by fps, and the detection accuracy is evaluated by standard mAP which defined as

mAP=∑i=1KAPiK(15)

where K=6 for all ship categories in SeaShips.

4.2 Parameter Setting

PSDA. For PDA, we select 33 augmentation methods (with 40 adjustable parameters) for every original training image. There are 15 parameter variations for each adjustable parameter setting shown in Table 1. For one raw image, 600 new images can be augmented at this stage. Fig. 6 displays the augmentation results of methods RandomFog and ColorJitter(in brightness). We choose 24 augmentation algorithms at the stage of SDA which generates 24∗15=360 new images with spatial variation. The spatial parameter settings are listed in Table 2, and images generated by methods Affine(rotate) and Resize are illustrated in Fig. 7. We construct a total of 960 new images for each original training image in SeaShips [2] through PSDA.

images

Figure 6: Illustration of pixel level DA. Upper: Augmentation with RandomFog; Under: Augmentation with ColorJitter(brightness)

images

Figure 7: Illustration of space level DA. Upper: Augmentation with Affine(rotate); Under: Augmentation with PixelDropout

DBOL & CROC. The method of DLA [38] is selected as the backbone of the first ship detection stage. We extend DLA through a 4-layer BiFPN [53] with 160 feature channels. We reduce the output FPN levels to 3 levels with strides 8-32. The model parameters in the first stage are trained with a long schedule that repetitively fine-tunes. The amount of object proposals is reduced to 128 in the target-detecting stage. For the second stage, the detection part of CascadeRcNN [27] is adopted for recognizing the proposals. We raise the positive IoU threshold value from 0.6 to 0.8 for the method of CascadeRcNN to reimburse the IoU distribution variation.

4.3 Results and Analysis

(I) Ablation Study

We design the different experiments on the modules of the proposed framework to find their effectiveness. We first select the EfficientShip with non-DA as a baseline. Then we add pixel-level and spatial-level DA separately on the basis of the ship detection. Finally, we test the whole hybrid ship detection framework which includes three complete steps. Details of the experimental results are presented in Table 3. We can observe that the basic EfficientShip with non-DA yields the lowest mAP value of 98.85%, and the baseline plus SDA can get a 0.43% boost. The baseline plus PDA yields a 0.62% improvement which shows PDA is much better than SDA. The whole proposed EfficientShip achieves a detection accuracy of 99.63%.

images

Fig. 8 shows the mAP comparison chart of different modules. It also indicates the changes in detection accuracy among various categories of the SeaShips dataset. Relatively, the bulk cargo carrier is the most recognizable object, while the passenger ship is the most difficult target to identify. After superimposing DA on the basis of two-stage detection, each category of detection accuracy is gradually approaching 100%.

images

Figure 8: Comparison of AP curves of different modules: (a) EfficientShip (non-DA); (b) EfficientShip (PDA); (c) EfficientShip (SDA); (d) EfficeingShip (PSDA)

(II) Comparison to State-of-the-Art Approaches

We compare the proposed approach with 8 SOTA methods [2,3,31–35,43] from accuracy and efficiency of ship detection, as shown in Table 4. The data values of all SOTA algorithms are derived from their original papers. Although the algorithm speed is not comparable because of the difference in the platform on which the algorithm runs. However, it can be seen from Table 4 that the speeds of all methods meet the requirements of real-time application scenarios. Compared with the earliest sea ship detection algorithm [2], the accuracy of our method has improved detection accuracy by 16.63%. The accuracy of proposed algorithm is 99.63%, which has a 0.93% increase over the best SOTA-performing algorithm [35].

images

5 Conclusions

Different from the traditional one-stage real-time ship detection methods, we fully utilized the latest real-time algorithms of object detection to construct a novel two-stage ship detection named EfficientShip. It includes DBOL, CROC, and PSDA. The DBOL is responsible for producing high-quality bounding boxes of the potential ship, and the CROC undertakes object recognition. We train the two stages jointly to boost the log-likelihood of actual objects. We also designed the PSDA to make further efforts of promoting the accuracy of target detection. Experiments on the dataset SeaShips show that the proposed EfficientShip has the highest ship detection accuracy among SOTA methods on the premise of achieving real-time performance. In the future, we will further verify the proposed algorithm on some new larger datasets, such as LS-SSDD-v1.0 and Official-SSDD [54].

Acknowledgement: The authors wish to express their appreciation to the reviewers for their helpful suggestions which greatly improved the presentation of this paper.

Funding Statement: This work was supported by the Outstanding Youth Science and Technology Innovation Team Project of Colleges and Universities in Hubei Province (Grant No. T201923), Key Science and Technology Project of Jingmen (Grant Nos. 2021ZDYF024, 2022ZDYF019), LIAS Pioneering Partnerships Award, UK (Grant No. P202ED10), Data Science Enhancement Fund, UK (Grant No. P202RE237), and Cultivation Project of Jingchu University of Technology (Grant No. PY201904).

Author Contributions: The authors confirm contribution to the paper as follows: study conception and design: Huafeng Chen; data collection: Junxing Xue; analysis and interpretation of results: Huafeng Chen, Junxing Xue, Yudong Zhang; draft manuscript preparation: Huafeng Chen, Hanyun Wen, Yurong Hu, Yudong Zhang. All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials: The data can be download from http://www.lmars.whu.edu.cn/prof_web/shaozhenfeng/datasets/SeaShips(7000).zip.

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

References

1. Zhang, T., Zhang, X. (2019). High-speed ship detection in SAR images based on a grid convolutional neural network. Remote Sensing, 11(10), 1206. [Google Scholar]

2. Shao, Z., Wu, W., Wang, Z., Du, W., Li, C. (2018). SeaShips: A large-scale precisely annotated dataset for ship detection. IEEE Transactions on Multimedia, 20(10), 2593–2604. [Google Scholar]

3. Shao, Z., Wang, L., Wang, Z., Du, W., Wu, W. (2019). Saliency-aware convolution neural network for ship detection in surveillance video. IEEE Transactions on Circuits and Systems for Video Technology, 30(3), 781–794. [Google Scholar]

4. Tutsoy, O. (2021). Pharmacological, non-pharmacological policies and mutation: An artificial intelligence based multi-dimensional policy making algorithm for controlling the casualties of the pandemic diseases. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12), 9477–9488. [Google Scholar]

5. Zhang, T., Zhang, X., Ke, X., Liu, C., Xu, X. et al. (2021). HOG-ShipCLSNet: A novel deep learning network with hog feature fusion for SAR ship classification. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–22. [Google Scholar]

6. Dai, W., Mao, Y., Yuan, R., Liu, Y., Pu, X. et al. (2020). A novel detector based on convolution neural networks for multiscale SAR ship detection in complex background. Sensors, 20(9), 2547. [Google Scholar] [PubMed]

7. Cao, C., Wu, J., Zeng, X., Feng, Z., Wang, T. et al. (2020). Research on airplane and ship detection of aerial remote sensing images based on convolutional neural network. Sensors, 20(17), 4696. [Google Scholar] [PubMed]

8. Zou, Z., Shi, Z., Guo, Y., Ye, J. (2019). Object detection in 20 years: A survey. arXiv preprint arXiv:1905.05055. [Google Scholar]

9. Rao, Y., Mu, H., Yang, Z., Zheng, W., Wang, F. et al. (2022). B-PesNet: Smoothly propagating semantics for robust and reliable multi-scale object detection for secure systems. Computer Modeling in Engineering & Sciences, 132(3), 1039–1054. https://doi.org/10.32604/cmes.2022.020331 [Google Scholar] [CrossRef]

10. Soviany, P., Ionescu, R. T. (2018). Optimizing the trade-off between single-stage and two-stage deep object detectors using image difficulty prediction. 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Piscataway, IEEE. [Google Scholar]

11. Lin, T. Y., Goyal, P., Girshick, R., He, K., Dollár, P. (2017). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. [Google Scholar]

12. Tian, Z., Shen, C., Chen, H., He, T. (2019). FCOS: Fully convolutional one-stage object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea. [Google Scholar]

13. Zhou, X., Wang, D., Krähenbühl, P. (2019). Objects as points. arXiv preprint arXiv:1904.07850. [Google Scholar]

14. Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S. Z. (2020). Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA. [Google Scholar]

15. Kim, K., Lee, H. S. (2020). Probabilistic anchor assignment with IoU prediction for object detection. European Conference on Computer Vision, Glasgow, UK, Springer. [Google Scholar]

16. Qiu, H., Ma, Y., Li, Z., Liu, S., Sun, J. (2020). BorderDet: Border feature for dense object detection. European Conference on Computer Vision, Glasgow, UK, Springer. [Google Scholar]

17. Redmon, J., Divvala, S., Girshick, R., Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA. [Google Scholar]

18. Redmon, J., Farhadi, A. (2017). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA. [Google Scholar]

19. Redmon, J., Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767. [Google Scholar]

20. Bochkovskiy, A., Wang, C. Y., Liao, H. Y. M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934. [Google Scholar]

21. Glenn, J., Alex, S., Jirka, B., NanoCode012, Ayush, C. et al. (2021). ultralytics/yolov5: v5.0-yolov5-p6 1280 models. https://github.com/ultralytics/yolov5 [Google Scholar]

22. Girshick, R., Donahue, J., Darrell, T., Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA. [Google Scholar]

23. He, K., Zhang, X., Ren, S., Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916. [Google Scholar] [PubMed]

24. Girshick, R. (2015). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile. [Google Scholar]

25. Ren, S., He, K., Girshick, R., Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 28, 1–9. [Google Scholar]

26. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B. et al. (2017). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA. [Google Scholar]

27. Cai, Z., Vasconcelos, N. (2018). Cascade R-CNN: Delving into high quality object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA. [Google Scholar]

28. Lu, X., Li, B., Yue, Y., Li, Q., Yan, J. (2019). Grid R-CNN. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA. [Google Scholar]

29. Zhou, X., Koltun, V., Krähenbühl, P. (2021). Probabilistic two-stage detection. arXiv preprint arXiv:2103.07461. [Google Scholar]

30. Nie, X., Yang, M., Liu, R. W. (2019). Deep neural network-based robust ship detection under different weather conditions. 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, IEEE. [Google Scholar]

31. Liu, R. W., Yuan, W., Chen, X., Lu, Y. (2021). An enhanced CNN-enabled learning method for promoting ship detection in maritime surveillance system. Ocean Engineering, 235, 109435. [Google Scholar]

32. Sun, J., Xu, Z., Liang, S. (2021). NSD-SSD: A novel real-time ship detector based on convolutional neural network in surveillance video. Computational Intelligence and Neuroscience, 2021, 1–16. [Google Scholar]

33. Li, H., Deng, L., Yang, C., Liu, J., Gu, Z. (2021). Enhanced YOLO v3 tiny network for real-time ship detection from visual image. IEEE Access, 9, 16692–16706. [Google Scholar]

34. Liu, T., Pang, B., Zhang, L., Yang, W., Sun, X. (2021). Sea surface object detection algorithm based on YOLO v4 fused with reverse depthwise separable convolution (RDSC) for USV. Journal of Marine Science and Engineering, 9(7), 753. [Google Scholar]

35. Chen, D., Sun, S., Lei, Z., Shao, H., Wang, Y. (2021). Ship target detection algorithm based on improved YOLOv3 for maritime image. Journal of Advanced Transportation, 2021, 1–11. [Google Scholar]

36. Zhang, M., Rong, X., Yu, X. (2022). Light-SDNet: A lightweight CNN architecture for ship detection. IEEE Access, 10, 86647–86662. [Google Scholar]

37. Zhou, S., Yin, J. (2022). YOLO-ship: A visible light ship detection method. 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, IEEE. [Google Scholar]

38. Yu, F., Wang, D., Shelhamer, E., Darrell, T. (2018). Deep layer aggregation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA. [Google Scholar]

39. He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA. [Google Scholar]

40. Zhang, T., Zhang, X., Ke, X., Zhan, X., Shi, J. et al. (2020). LS-SSDD-v1.0: A deep learning dataset dedicated to small ship detection from large-scale sentinel-1 SAR images. Remote Sensing, 12(18), 2997. [Google Scholar]

41. Chen, Z., Li, B., Tian, L. F., Chao, D. (2017). Automatic detection and tracking of ship based on mean shift in corrected video sequences. 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, China, IEEE. [Google Scholar]

42. Zhang, Y., Li, Q. Z., Zang, F. N. (2017). Ship detection for visual maritime surveillance from non-stationary platforms. Ocean Engineering, 141, 53–63. [Google Scholar]

43. Liu, T., Pang, B., Ai, S., Sun, X. (2020). Study on visual detection algorithm of sea surface targets based on improved YOLOv3. Sensors, 20(24), 7263. [Google Scholar] [PubMed]

44. Shorten, C., Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep learning. Journal of Big Data, 6(1), 60. [Google Scholar]

45. Buslaev, A., Iglovikov, V. I., Khvedchenya, E., Parinov, A., Druzhinin, M. et al. (2020). Albumentations: Fast and flexible image augmentations. Information, 11(2), 125. [Google Scholar]

46. Taylor, L., Nitschke, G. (2018). Improving deep learning with generic data augmentation. 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bengaluru, India, IEEE. [Google Scholar]

47. Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y. (2020). Random erasing data augmentation. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34. New York, USA. [Google Scholar]

48. Xu, X., Zhang, X., Zhang, T., Yang, Z., Shi, J. et al. (2022). Shadow-background-noise 3D spatial decomposition using sparse low-rank gaussian properties for video-SAR moving target shadow enhancement. IEEE Geoscience and Remote Sensing Letters, 19, 1–5. [Google Scholar]

49. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations. https://openreview.net/forum?id=rJzIBfZAb [Google Scholar]

50. Huang, S. W., Lin, C. T., Chen, S. P., Wu, Y. Y., Hsu, P. H. et al. (2018). AugGAN: Cross domain adaptation with GAN-based data augmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany. [Google Scholar]

51. Wang, S. H., Govindaraj, V. V., Górriz, J. M., Zhang, X., Zhang, Y. -D. (2021). COVID-19 classification by FGCNet with deep feature fusion from graph convolutional network and convolutional neural network. Information Fusion, 67, 208–229. [Google Scholar] [PubMed]

52. Zhang, Y., Zhang, X., Zhu, W. (2021). ANC: Attention network for COVID-19 explainable diagnosis based on convolutional block attention module. Computer Modeling in Engineering & Sciences, 127(3), 1037–1058. https://doi.org/10.32604/cmes.2021.015807 [Google Scholar] [CrossRef]

53. Tan, M., Pang, R., Le, Q. V. (2020). EfficientDet: Scalable and efficient object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA. [Google Scholar]

54. Zhang, T., Zhang, X., Li, J., Xu, X., Wang, B. et al. (2021). SAR ship detection dataset (SSDDOfficial release and comprehensive data analysis. Remote Sensing, 13(18), 3690. [Google Scholar]

Cite This Article

APA Style

Chen, H., Xue, J., Wen, H., Hu, Y., Zhang, Y. (2024). Efficientship: A hybrid deep learning framework for ship detection in the river. Computer Modeling in Engineering & Sciences, 138(1), 301-320. https://doi.org/10.32604/cmes.2023.028738

Vancouver Style

Chen H, Xue J, Wen H, Hu Y, Zhang Y. Efficientship: A hybrid deep learning framework for ship detection in the river. Comput Model Eng Sci. 2024;138(1):301-320 https://doi.org/10.32604/cmes.2023.028738

IEEE Style

H. Chen, J. Xue, H. Wen, Y. Hu, and Y. Zhang "EfficientShip: A Hybrid Deep Learning Framework for Ship Detection in the River," Comput. Model. Eng. Sci., vol. 138, no. 1, pp. 301-320. 2024. https://doi.org/10.32604/cmes.2023.028738

BibTex EndNote RIS

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

EfficientShip: A Hybrid Deep Learning Framework for Ship Detection in the River

Abstract

Keywords

References

Cite This Article

509

425

1

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link