|Computers, Materials & Continua |
Exploiting Deep Learning Techniques for Colon Polyp Segmentation
1Department of Computer Science and Engineering, University of Louisville, Louisville, KY, USA
2Centro de Computacion Cientifica Apolo at Universidad EAFIT, Medelin, Colombia
3eVida Research Group, University of Deusto, Bilbao, Spain
*Corresponding Author: Daniel Sierra-Sosa. Email: firstname.lastname@example.org
Received: 13 August 2020; Accepted: 10 November 2020
Abstract: As colon cancer is among the top causes of death, there is a growing interest in developing improved techniques for the early detection of colon polyps. Given the close relation between colon polyps and colon cancer, their detection helps avoid cancer cases. The increment in the availability of colorectal screening tests and the number of colonoscopies have increased the burden on the medical personnel. In this article, the application of deep learning techniques for the detection and segmentation of colon polyps in colonoscopies is presented. Four techniques were implemented and evaluated: Mask-RCNN, PANet, Cascade R-CNN and Hybrid Task Cascade (HTC). These were trained and tested using CVC-Colon database, ETIS-LARIB Polyp, and a proprietary dataset. Three experiments were conducted to assess the techniques performance: 1) Training and testing using each database independently, 2) Mergingd the databases and testing on each database independently using a merged test set, and 3) Training on each dataset and testing on the merged test set. In our experiments, PANet architecture has the best performance in Polyp detection, and HTC was the most accurate to segment them. This approach allows us to employ Deep Learning techniques to assist healthcare professionals in the medical diagnosis for colon cancer. It is anticipated that this approach can be part of a framework for a semi-automated polyp detection in colonoscopies.
Keywords: Colon polyps; deep learning; image segmentation
Colorectal cancer is a disease that frequently goes undiagnosed in opportune manner. Since in its early-stages there are several differential diagnoses that must be ruled out, this issue leads to high mortality rates . There has been a steady increment in the incidence of colon cancer in the last decades, which has led to a growing number of medical tests, colonoscopy being the standard one. This has generated an incremental burden on the medical personnel workload, which has often lead to the specialists finding it hard to keep up . This type of cancer is the third most common cancer in men and the second most in women. In 2018, there were 1.8 million new cases and 881,000 deaths globally according to the American Cancer Society . The incidence rates are higher in men than in women with a rate of 10.9% and 8.4% respectively. The highest rates are in Australia, New Zealand, Europe and North America and the lowest are in Africa and South-Central Asia [3–5].
The relative patients’ survival rate is between one and five years, ranging from 83.4% and 64.9% respectively, and it continues to decrease to 58.3% after ten years from the diagnosis. When colorectal cancer is timely detected the relative survival rate within five years increases to 90.3%. Nonetheless, if it spreads regionally this rate is reduced to 70.4%. In metastatic cases, the 5-year survival rate is just 12.5%, historically Germany and Japan have low incidence of colorectal cancer [4–7]. In Fig. 1, the worldwide incidence of the different types of cancer is depicted, colon and rectal add to 1,849,518 cases in 2018 and caused 861,663 deaths. Fig. 2 presents the most common deaths produced by this disease, combined, colon and rectal cancers constitute the second cause of deaths .
The incidence increment is probably correlated to poor eating habits, obesity and smoking . Fig. 3 shows the estimated incidence of colorectal cancer by country, due to sedentary lifestyle and an unhealthy habits China is in the first place of the list , followed by United States with 48 cases per 100,000 inhabitants, in this case the incidence is attributed to obesity alongside with tobacco and alcohol consumption.
Colon cancer diagnosis is more frequent in men than in women in the ages between 50 and 65 years. Since there is no early diagnosis, one out of every four cases will develop metastasis [9,10]. In Japan 86% of individuals diagnosed under the age of 50 were symptomatic at the time of diagnosis, which is directly related to advanced stages and worse prognosis [10,11]. On the other hand, France has a low estimated incidence probably due to its preventive policy in public primary care, including intestinal cancer test by colonoscopy, fecal occult blood and immunological screening .
The proprietary dataset used in this paper is from Basque Country in Bilbao Spain. Therefore, the detailed incidence of cancer cases in Spain is presented and summarized in Fig. 4. There were 567.463 new cases detected in 2018, meaning 67 cases per 100,000 inhabitants, and 263,895 deaths due to colorectal cancer, meaning 31 deaths per 100,000 inhabitants. By 2035 it is estimated that there will be 315,413 deceases because of cancer. In particular, there were .
Colorectal cancer consists in the apparition of neoplasms or polyps, originated when healthy cells from the inner lining of the colon or rectum change and grow uncontrollably, forming a mass called adenocarcinoma. Most colorectal cancer cases are preceded by diseases such as intestinal polyposis, Peutz–Jegher disease, Lynch syndrome, and inflammatory bowel disease . Polyps are defined as inflammations of the gastrointestinal wall . When diagnosing colorectal cancer the spread of the disease is measured using five stages, that could be also identified using a Deep Learning approach . In stage 0 cancer treatment is usually conducted by removing the polyp through colonoscopy as the cancer is on the inner lining of the colon. In Stage I cancer has not grown outside the colon wall, if the polyp is removed completely no other treatment is needed. In Stage II the cancer grow through the colon wall and surgery to remove the section of the colon with cancer is needed, at this stage some doctors could recommend chemotherapy as an additional treatment. In Stages III and IV chemotherapy is needed, in Stage III colectomy is required to remove cancer areas and nearby lymph nodes, and in Stage IV due to metastases (often to liver) surgery is unlikely, unless it helps to prolong patients live. Early detection in stage 0 through Stage II is important as cancers could be cured with an 80–90% rate [15,17]. There are studies explaining that 1% increase in the detection rate of polyps is associated with a 3% decrease in the incidence interval of colorectal cancer .
This article presents the application of different Deep Learning architectures for the automatic detection and segmentation of colon polyps. These techniques are based on images acquired during colonoscopies and allowing early detection of cancer risk. The tests of the proposed algorithm have been made against standard databases, enabling to compare with other published works and against a new database created in the Basque Country in Spain. This evaluation constitutes a fundamental step in the usage of these techniques for a semi-automatic polyp detection framework.
Early detection of colon polyps provides valuable information to assess the risk of developing cancer. This fact has motivated several studies in automatic polyp detection, leading to diagnosis recommendation systems to assist healthcare professionals. We implemented four deep learning techniques to detect and segment polyps in colonoscopy images, and used CVC-CLINIC and ETIS-LARIB databases to compare our results with state-of-the-art techniques. All the models were trained using two GPU Nvidia GeForce GTX Titan X.
2.1 Proposed Method
To achieve automatic detection of polyps, we compare some of the most recent Convolutional Neural Networks (CNN) architectures for instance segmentation. Four architectures proposed during the last few years based on their performance on the Microsoft COCO dataset were selected . All predictions are filtered based on their confidence and merged to generate a single binary mask.
2.1.1 Mask R-CNN
This technique is an extension of Faster R-CNN, adding a branch to predict segmentation masks and refining the Region of Interest (RoI) pooling. A Convolutional Neural Network (CNN) is employed to extract image features, from those features another CNN propose RoIs. Then this information is feed to fully connected layers to determine the boundary box from the required elements. Mask R-CNN adds a branch with two extra convolution layers for predicting the actual segmentation masks from each of the ROIs . Additionally, in Mask R-CNN proposal the authors refined the RoI pooling, making every target cell the same size and calculating the feature maps within them by interpolation, this improves the accuracy significantly. In Fig. 5 the neural network flow is presented, ResNet-101 with ImageNet pre-trained weights is used as backbone for feature map extraction over the polyp images, then these feature maps are aligned with the RoIs and feet into fully connected layers (FC) and to additional convolutional layers to perform boundary box predictions and classification on the FC layers and segmentation mask prediction on the convolutional branch. The neural network architecture used in the convolutional layers has dimensions, and a ReLU activation function was used in the hidden layers. On the presented experiments we used a learning rate of 0.001, learning momentum of 0.9 and weight decay of 10−4.
In the previous architecture , the authors explore the usage of a Feature Pyramid Network (FPN) on the backbone of the Mask R-CNN network. In their experiments, they found a noticeable increment in their metrics over other architectures. In PANet , the authors improve on this architecture enhancing information propagation between low-level and high-level features. In order to achieve this, they propose the usage of a bottom-up augmentation path to propagate low-level features. On each stage of this processes the feature maps of previous stages use a convolutional layer and adds it to the current one using a lateral connection. Like Mask R-CNN, these maps pass through a RoIAlign layer in order to pool feature grids from each level. Then, they are concatenated using a fusion operation such as element-wise max or element-wise sum in what is called an adaptive feature pooling layer, this architecture is described in Fig. 6. Finally, the authors of this method improved mask prediction adding a fully connected layer that gets concatenated to the final convolutional layer, which generates the final mask.
ResNext-101 pre-trained with ImageNet  was used as the feature extractor for this architecture. We employ Stochastic Gradient Descent (SGD) as optimizer with momentum of 0.9 and a weight decay of 10−4. We use a learning rate of 0.01 with 500 iterations of gradual warm up. In order to fit batches of 8 images in memory, we rescale the images to pixels, except the ones from ClinicDB that have lower resolution. We trained the neural network for 20 iterations and selected the optimal epoch in order to avoid overfitting the datasets.
2.1.3 Cascade R-CNN
In  the authors presented a method for training and evaluating models based on the Faster R-CNN framework, which uses an IoU threshold of 0.5 to filter proposed bounding boxes, limiting the performance of deep learning algorithms. This can be attributed to the lack of incentive for the model to predict more accurate bounding boxes, and that using a higher IoU makes it harder for the model to obtain initial results over which to improve. As presented in Fig. 7, to solve this problem, the authors propose a modification to Faster R-CNN framework which include a multi-stage extension of the original architecture. They used a combination of cascaded bounding box regressions and cascaded detection. With this technique, the model is able to progressively refine its prediction, sampling the training data with increasing IoU thresholds on each stage. This allows the model to handle different training distributions.
The architecture selected as network backbone is ResNext-101 pre-trained with ImageNet database. For the experiments we employed SGD as optimizer with momentum of 0.9, weight decay of 10−4 and again we rescale the images to pixels, except the ones from ClinicDB, in order to fit them in memory for a batch size of 8. We use a learning rate of 0.005 with a warm-up of 500 iterations. We trained the neural network for 20 iterations and selected the most optimal epoch in order to avoid overfitting the datasets.
2.1.4 Hybrid Task Cascade (HTC)
The advantages of cascade models for image segmentation were explored by , which resulted in Cascade Mask R-CNN, but  argue that this is not an optimal way of leveraging the improvements that a cascade model can provide. To improve Cascade Mask R-CNN results, the authors propose a new cascade architecture (Fig. 8) that interleaves the bounding box and mask prediction branches, so that the latter can take advantage of the updated bounding box predictions. Another addition is the inclusion of a segmentation mask. This layer connected to the output of the Feature Pyramid is used as a complementary task that improves performance when trained fused with the bounding box and mask features.
The architecture selected as the network backbone is ResNext-101 pre-trained with ImageNet database. For the experiments we employ SGD with momentum of 0.9, weight decay of 10−4 and we rescale the images to pixels, except the ones from ClinicDB. in order to fit them in memory for a batch size of 8. We use a learning rate of 0.005 with a warm up of 500 iterations. We trained the neural network for 20 iterations and selected the most optimal epoch in order to avoid overfitting the datasets.
Colonoscopy is the reference method for the diagnosis and treatment of colonic diseases. It is an exploratory technique that allows the assessment of the colon wall through endoscopic examination. The lesions detected are assessed to be removed and biopsies are taken for analysis. One of the main problems arising in the colon are polyps, these are abnormal tissue growths appear in the intestinal mucous membrane. They normally occur in between 15% and 20% of the adult population, being one of the most common problems affecting the colon and rectum. Even though most polyps are benign, their association with colorectal cancer has been proven, develop via the adenoma-carcinoma sequence . As it can be seen in Fig. 9, in order to detect Polyps on colonoscopy screening three challenges should be addressed :
• There are a variety of noises on the images known as artifacts, such as specular highlights, lens frames and inadequate preparation for the procedure.
• Polyps have a number of shapes and textures and they can vary from 3 mm to 10 mm.
• There are transformation and distortions from the employed imaging system.
Several computer-aided techniques have been proposed to detect polyps in colonoscopies . We used two public databases and one proprietary database to detect and segment colon polyps. All our experiments were trained and validated using these 3 databases: CVC-ClinicDB database and ETIS-LARIB Polyp from the 2015 MICCAI sub-challenge on automatic polyp detection  and one proprietary database from Deusto University e-Vida research group. Tab. 1 contains the number of images, image size and train and validation subset sizes, in the first column the combination from the 3 databases is called all and contains images with different sizes.
We conducted three sets of experiments in order to evaluate the performance of the proposed techniques when tested with different databases. The experimental set up is presented in Fig. 10, in the first experiment we compared the results when training the model by using each of the databases independently and test on independent evaluation sets, in the second experiment we used the training sets from the three databases for training and tested on each testing set from the databases independently, adding one testing set formed from the conjunction of the three databases. Finally, we trained on each database independently and tested on an expanded testing set formed by the conjunction of the three databases. On each of the experiments we employed an 80/20 training/test ratio.
To measure segmentation, we compare the binary mask generated by our model with the ground truth pixel by pixel. The metrics used for testing the segmentation performance are presented in Tab. 2. These metrics are defined in terms of the correct detection output for the cases that are inside the polyp region (True Positive), the detection output of polyps for cases outside the polyp in the ground truth (False Positive), polyp not detected in a region containing a polyp in the ground truth (False Negative), and no detection in a region without polyp in the ground truth (True Negative). These regions are described in Fig. 11, the overlap of regions is defined as True Positive, the missing detection of the ground truth are False Positive, the incorrect detection is False Positive, anything falling outside these regions are the True Negative cases.
The results of the application of multiple Deep Learning models are presented in two sections, we present both the detection rate of polyps and the segmentation performance from the technique.
3.1 Polyp Detection Rate
The polyp detection rate for the implemented techniques is presented in Tabs. 3–6. These tables summarize the error results of the three experiments proposed, the error is measured as the ratio between the detected polyp and the actual presence of a polyp on the image. The best performance is obtained when the model was trained using all databases and testing on Deusto database as highlighted in green. Additionally, we highlight for each database used in training, the best test result in bold. As observed the lowest error percentage is achieved when training with all datasets and testing on the Deusto data. ETIS dataset is relatively small and therefore, as expected, has more error in detection on average for all models.
3.2 Polyp Segmentation Rate
Tabs. 7–10 present the metrics results for the polyp segmentation when using the proposed techniques for polyp detection. The best performances for each of the metrics are highlighted in green, while the best performance for each of the databases are highlighted by using bold fonts. We note that training on all databases provides an overall advantage although it may not necessarily lead to the best performance in each metric.
Even though few authors have explored the problem of polyp segmentation, multiple works on polyp detection and localization have been made in recent years. Most remarkable results have been obtained exploring the use of deep learning and end to end models instead of hand-crafted solutions, as it can be seen on the results of the 2015 MICCAI sub-challenge on automatic polyp detection .
To identify how relevant our results in segmentation are, the model is compared with the lowest polyp missing rate trained with ClinicDB and tested using ETIS against past results reported. In Tab. 11 we show the results of the two best models in the MICCAI competition, and their combination . We also include the results of two previous works that used fully convolutional networks for this problem. One of them experimented with multiple well-known architectures such as GoogLeNet and VGG  and the other used a model based on Faster R-CNN with two types of image augmentation for increased accuracy .
For this comparison, the best models were included respectively (FCN-VGG, and Faster-CNN with Aug1 and with Aug2). Our results were adapted to the previously reported metrics based on the Intersection-over-Union (IoU) in the following way, using an IoU threshold of 0.5:
• True Positive (TP): The model made an accurate prediction of the location of a polyp. We mark with this label the prediction if the IoU between the output of our model and the ground truth is greater or equal than 0.5.
• False Positive (FP): The model predicted a polyp in the wrong location, or its segmentation mask covered an area much bigger or much smaller than the true area of the polyp. We use this label when the IoU is lower than 0.5 and gave a prediction.
• False Negative (FN): The model did not predict a mask when there is at least a polyp on the image. We only considered this label when none of the predictions of the network make past the confidence threshold and the model outputs an empty binary mask.
With this, we calculate precision, recall and F1 metrics. To obtain more precise polyp localizations we also test a higher confidence threshold to filter more detections, which in return lowers our recall. In both cases we tested the model trained using the ClinicDB dataset with all images from the ETIS dataset and not only those separated for validation.
In this paper the application of four deep learning models for the detection and segmentation of polyps in colonoscopy images was presented. These models were trained and tested using three data bases: CVC-CLINIC, ETIS-LARIB and a proprietary database from Deusto University. In order to evaluate the performance three experiments were conducted and discussed. The results were obtained and compared when using each database independently, combining them for training and for testing the models. It should be noted that these databases contain images with different resolutions and characteristics, which allowed us to demonstrate the model capabilities on a real deployment environment. The results for both the polyp detection rate and the segmentation were presented, the best detection rate was obtained when training the model with all the databases and using PANet architecture, the best segmentation accuracy (98.17%) was obtained when using HTC architecture trained with the merged dataset and tested on CVC-CLINIC database. The results obtained from the training and testing with the combined datasets are promising, we are currently working on a framework for real time processing of live feed from colonoscopy, integrating these techniques for the colon polyp detection and segmentation, with the Kudo’s classification of the findings, to generate an alert system to aid the medical personnel. providing computer-aided diagnosis of risk. We foresee that the presented method can be used to provide a robust semi-automated polyp detection and segmentation tool.
Acknowledgement: The authors would like to thank to Hospital Universitario Donostia (Inés Gil and Luis Bujanda), Hospital Universitario de Cruces (Manuel Zaballa and Ignacio Casado), Hospital Universitario de Basurto (Angel José Calderón and Ana Belen Díaz), Hospital Universitario de Araba (Aitor Orive and Maite Escalante), Hospital de San Eloy (Fidencio Bao and Iñigo Kamiruaga) and Hospital Galdakao (Alain Huerta) health centres and Osakidetza Central (Isabel Idígoras and Isabel Portillo) for their collaboration in the research.
Funding Statement: This research was supported by the Basque Government “Aids for health research projects” and the publication fees supported by the Basque Government Department of Education (eVIDA Certified Group IT905-16).
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|