Segmentation and Classification of Stomach Abnormalities Using Deep Learning

: An automated system is proposed for the detection and classification of GI abnormalities. The proposed method operates under two pipeline procedures: (a) segmentation of the bleeding infection region and (b) classification of GI abnormalities by deep learning. The first bleeding region is segmented using a hybrid approach. The threshold is applied to each channel extracted from the original RGB image. Later, all channels are merged through mutual information and pixel-based techniques. As a result, the image is segmented. Texture and deep learning features are extracted in the proposed classification task. The transfer learning (TL) approach is used for the extraction of deep features. The Local Binary Pattern (LBP) method is used for texture features. Later, an entropy-based feature selection approach is implemented to select the best features of both deep learning and texture vectors. The selected optimal features are combined with a serial-based technique and the resulting vector is fed to the Ensemble Learning Classifier. The experimental process is evaluated on the basis of two datasets: Private and KVASIR. The accuracy achieved is 99.8 per cent for the private data set and 86.4 percent for the KVASIR data set. It can be confirmed that the proposed method is effective in detecting and classifying GI abnormalities and exceeds other methods of comparison.

cancer were recorded in the USA [6]. A survey conducted in 2017 showed that approximately 21% population of the USA is suffering from gastrointestinal infections, and 765,000 deaths are noticed due to maladies found in the stomach [7]. According to the global cancer report of 2018 for 20 regions of the world, total estimated cases of cancer are 18.1 million. Among them, 9.2% for deaths and 6.1% for new cases relate to colorectal cancer. According to the American cancer society, in 2019, approximately 27,510 new stomach cancer cases of both genders (consisting of 10,280 women's and 17,230 men's) are observed in the US. A total of 11,140 deaths (consisting of 4340 women's and 6800 men's) are noticed in 2019 due to colorectal cancer [8].
GI infections can be easily cured if they were diagnosed at an early stage. As small bowel has a complex structure, that is why push gastroscopy is not considered as the best choice for the diagnosis of small bowel infections like bleeding, polyp, and ulcer [9]. The traditional endoscopic method is an invasive method that is not utilized by endoscopists and is also not recommended to patients due to its high level of discomfort [10]. These problems were resolved by a new technology introduced in the year 2000, namely, Wireless Capsule Endoscopy (WCE) [11]. WCE is a small pill shaped device, consisting of batteries, a camera, and a light source [12]. While passing through the gastrointestinal tract (GIT), WCE captures almost 50,000 images and releases them through the anus. Mostly malignant diseases of GIT like bleeding, polyp, and ulcer are diagnosed through WCE. WCE is proved to be an authentic modality for painless investigation and examination of GIT [13]. This technique is more convenient to use than traditional endoscopies. Moreover, it provides better diagnostic accuracy for bleeding and tumor detection, specifically in the small intestine [14]. It is too difficult for an expert physician to thoroughly examine all of the captured images as it is a chaotic and time taking task. The manual analysis of WCE frames is not an easy process, and it takes much time [15]. To resolve this problem, researchers are working on introducing several computer-aided diagnosis (CAD) methods [16]. The suggested methods will help the doctors to detect the disease more accurately in less time. A typical CAD system consists of five major steps that are image pre-processing, feature extraction, optimal feature selection, feature fusion, and classification. The extraction of useful features is one of the most significant tasks. Numerous features are extracted in the CAD systems for accurate diagnosis of disease. These features include texture [17], color [18], and so on [19]. All of the extracted features are not useful; there may exist some irrelevant. Therefore, it is essential to reduce the feature vector to remove irrelevant features for better efficiency.
In this work, to detect and classify GI abnormalities, two pipeline procedures are considered: bleeding segmentation and GI abnormality classification. The significant contributions of the work are: i) in the bleeding detection step, a hybrid technique is implemented. In this method, the original image is split into three channels and thresholding is applied for each color channel. After that, pixel-by-pixel matching is performed, and a mask is generated for each channel. Finally, combining all mask images in one image for final bleeding segmentation; ii) in the classification procedure, transfer learning is ustilized for extracting deep learning features. The original images are enhanced by using a unified method, which is a combination of chrominance weight map, gamma correction, haze reduction, and YCbCr color conversion. Later, deep learning features are extracted by using a pre-trained CNN model. Further, the LBP features are also obtained for textural information of each abnormal regions; iii) a new method named entropy controlled ensemble learning is proposed and it selects the best learning features for correct prediction as well as fast execution. The selected features are ensemble in one vector by using a concatenation approach; iv) the performance of the proposed method is validated by several combinations of features. Further, many classification methods are also used for validation of selected features vector.

Related Work
Several machine learning and computer vision-based techniques are introduced for the diagnosis of human diseases like lung cancer, brain tumor, GIT infections from WCE images, and so on [20,21]. The stomach is one of the most significant organs of the human body. The most conspicuous diseases of the stomach are ulcer, bleeding, and polyp. In the study [22], the authors have utilized six features of different color spaces for the classification of non-ulcer and ulcerous regions. The used color spaces are CMYK, YUV, RGB, HSV, LAB, and XYZ. After the extraction of features, the cross-correlation method is utilized for the fusion of extracted features. Finally, 97.89% accuracy is obtained by utilizing the support vector machine (SVM) for the classification. Suman et al. [23] proposed a new method for the classification of bleeding images from non-bleeding ones. Their suggested method is mainly based on the statistical color features obtained from the RGB images. Charfi et al. [24] presented another methodology for colon irregularity detection from WCE images utilizing variance, LBP, and DWT features. They have been used a multilayer perceptron (MLP) and SVM for the classification. The suggested method performed well than existing methods and achieved 85.86% accuracy on linear SVM and 89.43% accuracy on MLP. In [25], authors have proposed a CAD method for bleeding classification. They have used unsupervised and supervised learning algorithms. Souaidi et al. [26] proposed a unique approach named multiscale complete LBP (MS-CLBP) for ulcer detection which is mainly based on the Laplacian pyramid and completed LBP (CLBP). In this method, ulcer detection is performed using two-color spaces (YCbCr and RGB). They have used the G channel of RGB and Cr of YCbCr for the detection of ulcers. Classification is performed using SVM and attained an average accuracy of 93.88% and 95.11% for both datasets. According to the survey conducted by Fan et al. [27], different pre-trained models of deep learning have covered numerous aspects of the medical imaging domain. Many researchers have utilized CNN models for the accurate classification and segmentation of disease or infections.
In contrast, the images that have the same category should share the same learned features. The overall achieved recognition accuracy of this method is 98%. Sharif et al. [28] have used the contrast-enhanced color features for the segmentation of the infected region from the image. After that, geometric features are pull-out from the resultant segmented portion of the image. Two deep CNN models VGG19 and VGG16, are also used in this method. Extracted deep features of both models are fused using the Euclidean fisher vector (EFV) that are later combined with the geometric characteristics to obtain strong features. Conditional entropy is employed on the resultant feature vector for optimal feature selection, which is classified using the KNN classifier and achieved the highest accuracy of 99.42%. Diamantis et al. [29] have proposed a novel method named as LB-FCN (Look Behind Fully Convolutional Neural Network). The proposed method has performed well than existing methods and achieved better GI abnormality detection results. Alaskar et al. [30] have utilized GoogleNet and AlexNet for the ulcer and non-ulcer images classification. Khan et al. [31] have suggested a technique for the ulcer detection and GI abnormality classification. ResNet101 is utilized by the authors for features extraction. Moreover, they optimized features by utilizing distance fitness function along with grasshopper optimization. They have utilized C-SVM and achieved 99.13% classification accuracy. The literature depicts that CAD systems for disease recognition mostly rely on handcrafted features (shape, color, and texture information). However, impressed by the performance of CNN in other domains, some researchers have employed CNN models in the medical field for disease segmentation and classification [32,33]. Inspired by these studies, we utilized deep learning for better classification accuracy.

Proposed Methodology
A hybrid architecture is proposed in this work for automated detection and classification of stomach abnormalities. The proposed architecture follows two pipeline procedures: Bleeding abnormality segmentation and GI infections classification. The proposed bleeding segmentation procedure is illustrated in Fig. 1. First, select the RGB bleeding images from the Database, then, extract all three channels and apply thresholding. Output images produced by a threshold function are compared by pixel-wise and used for generating a mask for each channel. Later, combined the mask of all three channels, as a result, a segmented image is obtained. The detail description of each step given as follows.

Bleeding Abnormality Segmentation
Consider X (i, j) is an original RGB WCE image of dimension 256 × 256 × 3. Consider δ R , δ G , and δ B denotes extracted channels, namely red channel, green channel, and blue channel, respectively, and illustrated in Fig. 2. After splitting all channels, applying a threshold on each channel. The Otsu thresholding is used for each channel. Mathematically, the thresholding process is defined as follows. Considering three channels δ R , δ G , and δ B having Δ gray levels (0, 1, 2, . . .), and maximum pixels are N = −1 j=0 ξ j . Here, ξ j is a specific grey-level pixel. The variance is using for segmentation process and define as it is a sum of the difference between two classes and denoted by δ u (p 1 ) and can be expressed as: 2 (1) where, Pro (1) (p 1 ) and Pro (2) (p 1 ) denotes pixels probabilities and it can be calculated through the following formulas: where, M 1 (p 1 ) and M 2 (p 1 ) denotes the mean value between the two classes. And can be calculated as: Bi-class variance can be expressed as: In Ostu segmentation, threshold selection is based on the cost function and can be calculated as: where, ϕ (y i ) is a thresholded image and i denotes three thresholded images for channels δ R , δ G , and δ B . After that, each channel is compared with corresponding threshold channels. As a result, the mask is generated for each color channel. Later, it combines all channels information through the following equation. Consider ϕ (y 1 ) denote a thresholded image of the red channel, ϕ (y 2 ) denote a thresholded image of the green channel and ϕ (y 3 ) denote a thresholded image of the blue channel, respectively.

Abnormalities Classification
In the classification task, we presented a deep learning-based architecture for GI abnormalities classification such as bleeding, ulcer, and healthy. This task consists of three steps: enhancement of original images, deep feature extraction, and selection of robust features for classification. The flow diagram is presented in Fig. 5. The mathematical description of the proposed classification task is given below.

Contrast Enhancement
WCE images may suffer from non-uniform lighting, low visibility, diminished colors, blurring, and low contrast of image characteristics [34]. In the very first step, we have applied a chromatic weight map to improve the saturation in the images. Thus, the color can be an essential indicator of image value. Let, X (i, j) be the input image of dimension 256 × 256. For each input image distance of each pixel saturation with most of the congestion, the series is calculated to get this weight map. Mathematically, the chromatic weight map is expressing as follows: where, X c (i, j) is the output image of the chromatic weight map, σ represents the standard deviation, and its default value is 0.3. The S max denote is the highest value of the saturation range. Therefore the small amount of saturation is allocated to the pixels. After chromatic weight computation, gamma correction is applying to control the brightness and contrast ratio of an image X c (i, j). Generally, the value of input and output ranges between 0 and 1. In the proposed preprocessing approach, we have used γ = 0.4 because, at this value, images are enhanced suitably without losing important information. Mathematically gamma correction is formulated as follows: where, A is a constant, V X c (i, j) is the positive real value of chromatic weight map image, raised to the power of γ multiplied by a constant value A (generally A = 1). Input and output range typically from 0 to 1. Further, we are applying image dehazing. The main objective of Image dehazing is to recover the flawless image from a blurry image. However, it may be deficient in capturing the essential features of hazy images fully. Mathematically image dehazing is formulated as follows: The foundation of a hazy image can be presented as a curved combination of the image radiance J and the light A, which can be formulated as where X GM (m, n) is a pixel from the hazy image X GM and t(X GM (i, j)) is its transmission. As a consequence, by recovering the image radiance X GM (i, j) from hazy pixel X GM (m, n) the problem of image hazing is resolved. The visual effects of this formulation are illustrated in Fig. 6. Finally, the YCbCr color transformations are applying and select the Y channel based on peak pixel values. This channel is further utilized for features extraction task.   Using this model, the ratio that we have used for training and testing is 70:30. This feature extraction process is conducted using the transfer learning method. In the TL, a pre-trained model was trained on GI WCE images. For this purpose, we required an input layer and an output layer. For input layer, we are using the first convolution layer, while in the output; we select the average pool layer. We have obtained a feature vector of dimension 1 × 2048, after activation of the average pool layer. For N images, the vector should be N × 2048.
Texture Oriented Features-For the sake of texture oriented features, we are extracting Local Binary Patterns (LBP). LBP is a significant method used for object identification and detection [35]. Basically, LBP features consist of two bitwise transitions from 0 to 1 and 1 to 0. LBP use a greyscale image as input, then calculate mean and variance for each pixel intensity. Mathematical representation of LBP is formulated as follow: Here, the number of neighborhood intensities are represented byP, R is the radius, k p is the variance of the neighboring pixel intensity, KC represents the contrast of intensity calculated from (P, R).
where neighboring pixels d n (P) are compared with the central pixel t. It obtains a feature vector of 1 × 59 dimension for one image and N × 59 for N number of images.

Features Selection
After the extraction of texture and deep learning features, the next phase involves the optimal features selection. In this work, we have utilized Shannon entropy along with an ensemble learning classifier for best features selection. A heuristic approach has opted for feature selection. The Shannon Entropy is computed from both vectors separately and set a target function based on the mean value of original entropy vectors. The features that are equal or higher than mean features are selects as robust features and passed to ensemble classifiers. However, this process is to continue until the error rate of the ensemble classifier is below 0.1. Mathematically Shannon entropy is ratified by the equation as follow: Let n k j be the number of occurrences of t j in the category c k and, tf k j the frequency of t j in this category c k : The Shannon entropy E(t j ) of the term t j is given by: Through this process, approximately 50% of features are removed from both vectors-deep learning and texture oriented. Later on, these selected vectors are fused in one vector by simple concatenation approach as given as: Let A and B be two feature spaces that are defined on the sample space , where A represents selected LBP features and B represents selected inceptionV3 features. An arbitrary sample ξ is selected from the sample space such that ξ ∈ , the corresponding two FVs are α lbp ∈ A and β incep ∈ B. Then, the fusion of feature ξ is defined by γ = α lbp β incep , if α lbp is n dimensional and β incep is m dimensional then the fused FV γ is (n + m)-dimensional. All combined feature FVs form a (n + m)-dimensional feature space.

Experimental Results and Comparison
Two datasets are used in this work for the assessment of suggested GI infections detection and classification method. The description of each dataset is given as: KVASIR Dataset contains a total of 4000 images, which are confirmed by expert doctors [36]. Private Dataset was collected from COMSATS Computer Vision Lab [37], and it includes a total of 2326 clinical sample images. These images consist of three categories-ulcer, bleeding, and healthy. The image size is 512 × 512. Some sample images are presented in Fig. 8b.

Results
A detailed description of classification results in quantitative and graphical form is given in this section. Bleeding Segmentation Results: To validate segmentation accuracy, 20 sample images from the Private dataset are randomly chosen for bleeding segmentation. In this research we have utilized the ground truth images designed by an expert physician for the segmentation accuracy calculation. The ground truth images are compared with the proposed segmented images pixel-by-pixel. Our proposed bleeding method has achieved the highest accuracy of 93.39%. Other calculated measures are Jack-index (96.58%) and FNR (6.61). The selected image's accuracy is given in Tab. 1. In this table, it is shown that for all selected 20 images, Jack-Index, Dice, FNR, and Time (s) is presented. The average dice rate is 87.59%, which is good for bleeding segmentation. Classification Results: For classification results, we have performed experiments for both selection and fusion processes separately. As mentioned above, two datasets named KVASIR and Private datasets are used for the evaluation of the proposed method. As the Private Dataset includes a total of 2326 RGB images consisting of three classes, namely ulcer, bleeding, and healthy. Initially, robust deep learning features are selected using the proposed selected approach.      The fused vector is also applied to the KVASIR dataset and achieved maximum accuracy of 87.8% for the ESDA classifier. Tab. 8 shows the results of the proposed architecture. The other calculated measures of this classifier are SEN (86.4%), SPE (98.06%), PRE (86.99%), F1 score (86.63%), FPR (0.02), and FNR (13.6%). Moreover, Tab. 9 shows a confusion matrix of this classifier, which can be used as the authenticity of proposed ESDA accuracy. The diagonal values show the correct prediction of each abnormality while the rest of the values are FNR. After fusion process accuracy is improved. However, a little bit of increase occurs in the execution time. After the fusion process, the best noted time is 26.06 s, while for ESDA classifier is 50.70 s. Overall, it can be easily concluded that the ESDA classifier shows better performance.

Comparison with Existing Methods
From the above review, it is noted that several methods are proposed by the researchers for GI disease abnormality detection and classification. Most of the existing CAD methods are based on traditional techniques and algorithms, such as most of them are based on only color information or texture information. Although there exists some methods in which authors have used a combination of features. Some methods are also based on deep CNN features. Despite too many existing CAD methods, there exist some limitations in the old approaches such as low contrast of captured frames, the same color of the infected and healthy region, the problem of proper color model selection, hazy image, redundant information, etc. These limitations forced us to develop a robust method for GI abnormality detection and classification with better accuracy. The proposed deep learning method is evaluated on two datasets-Kvasir and Private and achieved an accuracy of 99.80% and 87.80%. The comparison with existing techniques is given in Tab. 10. In this table, the comparison is conducted based on the abnormality name or dataset. Because most of the GI datasets are private, therefore, we mainly focused on disease type. From this table, it is shown that the proposed architecture gives better accuracy as well as execution time.

Conclusion
In this article, we proposed a deep learning architecture for the detection and classification of GI abnormalities. The proposed architecture consists of two procedures for pipeline detection and classification. In the detection task, the bleeding region is segmented by a fusion of three separate channels. In the classification task, deep learning features and texture-oriented features are extracted and the best features are selected using the Shanon Entropy controlled ESDA classifier. The selected features are concatenated and are classified. In the evaluation phase, the segmentation process achieves an average accuracy of over 87% for abnormal bleeding regions. For classification, the accuracy of the private data set is 99.80 percent, while for the Kvasir data set, the accuracy is 87.80 percent. It is concluded from the results that the proposed selection method shows better performance compared to the existing techniques. It also concludes that the merger process is effective for more classes, such as the Kvasir dataset classification. In addition, texture features also have a high impact on disease classification and deep learning fusion, addressing the issue of texture variation. In future studies, we will focus on ulcer segmentation through deep learning.