Target Classification of Marine Debris Using Deep Learning

Marine Debris is human-created waste dumped into the sea or ocean. It pollutes the aquatic environment and hence very dangerous for ocean species. Removal of marine debris from ocean is necessary to eliminate pollution and to secure aquatic life. A robust and automatic system is essential that detects unnecessary litter of plastic and other garbage at real-time. In this study, we have proposed deep learning based architecture for the detection and classification of marine debris. Histogram Equalization technique combined with Median Filter is used to enhance the contrast of images and to remove noise. Experiments are performed on challenging Forward Looking Sonar Image (FLS) Marine Debris Dataset. This dataset includes ten different types of Debris. The proposed system not only detect the Debris, but also classify it into ten classes. To overcome the challenge of data scarcity, Faster-RCNN with transfer learning of ResNet-50 architecture is used. Faster-RCNN is one of the popular object detection architecture that uses Regional Proposal Network (RPN) and detector at the same time. The proposed methodology significantly improves the state-of-the-art results. Result assessment of our proposed technique achieved recall of (96%) and Mean Overlap bounding boxes of (3.78). Visual and qualitative assessment of proposed methodology shows the effectiveness of presented technique.


Introduction
Due to increase in human population, there is a significant need to increase the production of goods. For this purpose, various industries have been built to produce goods in bulk. But these industries are polluting the environment in different ways. Many processed materials take many years to completely decompose, like heavy metals, plastics, rubbers etc. Man-made garbage is the key factor in environmental pollution. This pollution is degrading the health of land and aquatic living-beings. Now a days, access to pure air and water is a challenging task. Many efforts are done to lessen the effects of pollution on humans and animals; such as recycle and reuse of products, anti-litter campaigns etc. [1].
The amount of garbage is increasing dramatically around the globe and this is big threat for marine creatures. The proper management is required for collection of waste material and litters. In the marine Sonar images are mostly used for the underwater imagery. Sonar images are grey tone images with nocolor information. The proposed study deals with the detection of debris in Sonar images by specifying the precise location for removal. This study also covers the classification of Debris for effective removal of Debris from water.

Literature Review
In this section, studies related to detection and recognition, are reviewed. Many methods are deployed for the detection of Debris in underwater images [6][7][8]. These methods can be categorized in two classes; conventional Machine Learning based methods and Deep Learning based methods.

Traditional Machine Learning Techniques
There are many techniques to detect the objects in sonar images. One of the technique is template matching, in which template and query images are compared, cross-correlation is computed [9] and maximum correlation is used for matching [10].
Hurtós et.al. [9] proposed the framework of chain link detection. In first step, images were enhanced using Fourier transformation, then pattern recognition is achieved using clustering method. This solution can only be applied for the inspection and cleaning of the mooring chains using an autonomous underwater vehicle equipped with a forward-looking sonar. For the best results, authors applied algorithm on three different FLS datasets, with detection accuracy of 84%, 92% and 62% respectively.
Haar-boosted cascade framework was first introduced in 2001 by Viola et al. [11]. It is then refined by different researchers and used in different applications, including underwater imagery [12,13]. It gained a lot of attention because of high detection rate and speed.
For the semi-automatic recognition of marine debris on a seashore, light detection and ranging (LIDAR) technique is proposed by Ge et al. in 2016 [14]. The technique is much efficient and reduced the laborious work. LIDAR is mainly used for the classification of marine debris. But it only considered few classes of debris. This technique uses 3-dimensional models for detection of laser scanned images. Support Vector Machine is used for classification.

Deep Learning Based Techniques
The traditional neural networks were used for Debris detection, which gave low detection rates. In [15] the author focuses on marine debris classification by using two stack of Convolution neural network to classify the image of size 96*96 and gets the accuracy of 70.8%.
Toro [16] proposed the use of Autonomous Underwater Vehicles to detect submerged marine debris from Forward-Looking Sonar (FLS) imagery. Valdenegro-Toro learned the object features by using convolution neural network. This work uses Forward-Looking sonar (non-color) images in-house dataset There is a rich literature about marine debris pollution in the environment. Laist [17] mentioned the effects of plastic garbage on marine environment. The report explains the process of plastic break down by sunlight . It also elaborates the effects of ingestion of these plastic particles on digestive tracts of marine animals. It is also a major cause of death of micro-organism. Discarded fishing nets that are made by polystyrene material can trap animals, causing those drown or be preyed by predators [18][19][20].
Kylili et al. [21] proposes neural network architecture with very small convolutional layers. This research reported an accuracy of 86%. In this research marine plastic debris image classification is addressed that distinguishes between three classes of litter; plastic bottles, plastic buckets and plastic straws.
In Fulton et al. [22] developed a dataset of colored images named as J-EDI (JAMSTEC E-library of Deep-sea Images). This dataset is developed by including images from three different oceans. Different deep neural networks are also used for the detection of marine debris [23].
A limited number of studies have been performed on Marine debris detection and classification. Various traditional machine learning and deep learning based techniques have been used which gave competitive results, but used a small dataset. Moreover, most of the methods are class-independent and some are class-agnostic. The class-agnostic methods performance is not up to the mark which can be improved. Moreover, classification should be added with localization and detection. So it can be used for autonomous cleaning.

Methodology
This research proposes a methodology for detection, localization and classification of Debris. It also provides data cleaning to deal with the poor quality images.
Faster-RCNN is adapted for the problem under the study. Faster-RCNN is one of the popular object detection architecture that uses Regional Proposal Network (RPN) and Detector at the same time. The RPN with the help of selective search and edge boxes, produces inexpensive features than the state of the art algorithms. Fig. 3 explains the proposed methodology in detail.

Pre-processing
Images in the dataset contains noise because of impurities and water. Due to low light travel under the water, the pictures contains noise. Median filter and histogram equalization is applied to remove this noise [24]. The algorithm of pre-processing is shown in Algorithm 1. Proposed pre-processing technique improves the performance of localization and classification task.

Data Augmentation
Data augmentation is used to generate more data for training, hence improves performance [25]. Following transformations have been done on training data to increase the size of data: Flip by 90° Horizontal rotate Vertical rotate This will increase the size of training dataset three times. That will help in better training of proposed model.

Features Extraction and Classification
Proposed model divides the image into several regions. Features are extracted for each region. The extracted features from each region are then given to the Region Proposal Network (RPN). Faster R-CNN, presented in 2015, is used for classification [26]. It is the third revision of R-CNN architecture. The RCNN uses selective search to find possible region of interest and CNN is used to classify regions [27]. In Fast RCNN a technique called Region of Interest (RoI) pooling is used to make the model fast [26]. Faster RCNN uses Region Proposal Network (RPN).
The architecture of Faster RCNN and proposed methodology is shown in Fig. 4. It will take input image which is passed to the pre-trained model to extract features. Using transfer learning for feature extraction is a common practice used in different computer vision tasks. This will solve the issue of data scarcity and improves the performance of system.
After this, RPN uses the extracted features to find predefined regions which contains objects. One of difficult task in deep learning is to generate variable number of bounding boxes. To solve this issues, anchor boxes are used. These anchors are placed in the images uniformly. Instead of detecting objects in images, the problem is solved in two phases. First, content of bounding box is classified and then the bounding boxes co-ordinates are adjusted.
Standard Faster-RCNN used VGG-16 architecture, but other architectures can also be used. In this study, we have used VGG-16 and ResNet-50 architectures, both are pre-trained on ImageNet [28]. The advantage of using ResNet-50 over VGG16 is that this architecture is big and have the capacity to learn more features.
While working on images, the aim is to find proposals (regions with area of interest). Proposal is an area of interest with fixed predefined bounding boxes. These are placed in all image points at specified ratio to predict object location. There are 9 anchor boxes for each point of image in the feature map, with aspect ratio 1:1, 1:2 and 2:2. Anchor boxes are shown in Fig. 5.  R-CNN is final step in Faster-RCNN. Feature maps from RPN is used for the classification. The aim of RCNN is to classify the proposals into classes and adjust the bounding box. Faster RCNN can be optimized by using multi-task loss function. This loss function is the combination of classification loss and bounding boxes loss.
where p i is the probability of anchor as a class, p Ã i is the ground truth of anchor as an object. t i represents the parametrized coordinates and t Ã i is the ground truth coordinates. N cls and N reg are the normalization terms, set for mini-batch and anchor locations. l is a balancing parameter.
L cls is the log loss function over two classes which can be extended to multi-class problem and l smooth 1 is the smooth L1 loss. The log loss of class is represented as: L reg is operational when anchor got object, when ground truth p Ã i is 1. The term t i is the output prediction of the regression layer and consists of 4 variables [t x , t y , t w , t h ]. The regression target t Ã i is calculated as: Here x, y, w, and h correspond to the coordinates of the box center, width and height. x a and x Ã represent coordinates of the anchor box and its corresponding ground truth respectively.
At test time, the learned regression output t i can be applied to its corresponding anchor box (that is predicted positive), and the x, y, w, h parameters for the predicted object proposal bounding box can be calculated by following equations. For the assessment of proposed methodology, we have used recall, Mean IoU and accuracy. For localization Mean IoU is used. IoU measures the location similarity of two regions. It is defined as: Standing-bottle 65 9 Tire 147 10 Valve 202 where A and B are two bounding boxes, A is the actual bounding box and B is the predicted one. Accuracy is defined as the measure to find how correctly the predictions are identified.
where TP is true positive, TN is true negative, FP is False positive and FN is False negative.
We have compared our results of our proposed architecture with the baseline architecture. Therefore, we have conducted experiments and compared our proposed technique with baseline FCN architecture. We have used VGG16 and ResNet architectures, pre-trained on ImageNet datset. The baseline method proposed a class agnostic object detector. The Tab. 3 shows the results comparison of baseline approach with Faster RCNN.. The baseline architecture is computationally expensive as features are not shared across neural network evaluations, and simple threshold of objectness values might not generalize well across environments. Whereas, Faster-RCNN uses RPN which uses the divide and conquer strategy. In RPN the regions are divided into multiple proposals and then features are extracted and object along with the bounding box is detected. The RPN adjust the bounding region with the predicted objects as it is classaware whereas, the baseline architecture fails to do so. By using augmentation and pre-processing with Faster-RCNN with Resnet achieves, class-wise accuracy of 95% and mean IoU of 3.74. The proposed approach outperform the baseline technique in terms of Mean IoU and class-wise accuracy.
The Tab. 4 shows the experimental results before and after applying the preprocessing techniques. Some of the regions in the debris sonic images are unclear due to noise and distortion. Low light is the main reason of image noise. Applying pre-processing technique improves the contrast of image in sonar scans. After applying proposed preprocessing mechanism, Mean-IoU is increased from 3.2 to 3.78 and accuracy improves from 89% to 95%. Fig. 6 shows the learning curves of total loss, Mean IoU and class-wise accuracy using VGG16 with Faster-RCNN architecture. The learning curves for ResNet architecture is shown in Fig. 7.

Conclusion and Future Work
A methodology for detection of marine debris images is proposed using Faster-RCNN with ResNet 50. An effective preprocessing methodology is also proposed to enhance the performance. Experimental results shows improvement after pre-processing technique.
We have performed experiments on challenging marine debris detection. To overcome the challenges of data scarcity and Debris classification, Faster RCNN architecture with transfer learning of baseline  The microplastic is also a big threat to the wildlife of marine. It can be addressed in future research. Ensemble technique may also be addressed as future work for performance improvement.
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.