An Explainable Centralized Deep Learning Model for Gastrointestinal Polyp Segmentation Using the Kvasir-SEG Dataset

Hafeez Rahman; Naveed Butt; Naila Naz; Fahad Ahmed; Muhammad Saleem; Adnan Khan; Khan Adnan

doi:10.32604/cmes.2026.081316

icon Open Access

ARTICLE

An Explainable Centralized Deep Learning Model for Gastrointestinal Polyp Segmentation Using the Kvasir-SEG Dataset

Hafeez Rahman¹, Naveed Butt¹, Naila Sammar Naz¹, Fahad Ahmed¹, Muhammad Saleem¹, Adnan Khan^2,3,4, Khan Muhammad Adnan^5,*

1 School of Computer Science, National College of Business Administration and Economics, Lahore, Pakistan
2 School of Computing, Horizon University College, Ajman, United Arab Emirates
3 Jadara University Research Center, Jadara University, Irbid, Jordan
4 Riphah School of Computing and Innovation, Faculty of Computing, Riphah International University, Lahore Campus, Lahore, Pakistan
5 Department of Software, Faculty of Artificial Intelligence and Software, Gachon University, Seongnam-si, Republic of Korea

* Corresponding Author: Khan Muhammad Adnan. Email: email

Computer Modeling in Engineering & Sciences 2026, 147(1), 36 https://doi.org/10.32604/cmes.2026.081316

Received 27 February 2026; Accepted 30 March 2026; Issue published 27 April 2026

Abstract

Gastrointestinal polyps are well-known precursors to colorectal cancer (CRC), making their accurate detection and segmentation during colonoscopy essential for early diagnosis and cancer prevention. Deep learning–based segmentation models trained on publicly available datasets such as Kvasir-SEG have demonstrated promising performance; however, two key challenges remain: limited robustness across diverse polyp morphologies and endoscopic imaging conditions, and the lack of interpretable decision-making mechanisms that support clinical trust and validation. Many existing centralized segmentation approaches are primarily optimized using overlap-based metrics such as the Dice coefficient and intersection over union (IoU), without adequately analyzing challenging cases such as small, flat, or low-contrast polyps or providing insight into the visual cues influencing model predictions. This study presents an explainable centralized deep learning segmentation model for gastrointestinal polyp segmentation using the Kvasir-SEG dataset. The approach integrates a ResUNet++-Lite encoder–decoder segmentation model with Grad-CAM and masked Grad-CAM visualizations to analyze the spatial regions influencing segmentation predictions. The study focuses on establishing a reproducible and interpretable experimental model that combines systematic preprocessing, data augmentation, centralized training, and explainability analysis. Experimental evaluation on an 80:20 train–test split of the Kvasir-SEG dataset, where data augmentation was applied after splitting, demonstrates stable training behavior and competitive segmentation performance, achieving a pixel accuracy of 0.964, a Dice coefficient of 0.858, and an IoU of 0.791 on the held-out test set. Qualitative explainability results further indicate that the model consistently focuses on anatomically relevant polyp regions. Overall, the study illustrates how segmentation performance and explainable AI techniques can be integrated to support the development of clinically interpretable AI-assisted colonoscopy systems.

Keywords

Gastrointestinal polyp segmentation; deep learning; ResUNet++-Lite; Kvasir-SEG dataset; explainable AI (XAI); Grad-CAM; masked Grad-CAM; MEDICAL image analysis; colonoscopy

1 Introduction

CRC is one of the most diagnosed types of cancer in the world and is a leading cause of cancer-related deaths. According to the Global Cancer Observatory (GLOBOCAN), CRC resulted in approximately 1.93 million new occurrences and 935,000 deaths globally in 2020, holding the third spot in its occurrence and the second in cancer-related fatalities among all types of cancers [1]. More recent epidemiological projections show that the global burden of CRC is expected to rise substantially, with incidence and mortality increasing in both older and younger populations, making it a growing public health challenge [2].

A significant percentage of CRC cases arise from benign gastrointestinal polyps that undergo malignant transformation over time in the adenoma-carcinoma sequence. Longitudinal clinical studies have shown that the incidence of CRC can be reduced by 60%–76%, and mortality from CRC by about 50%, if precancerous polyps are detected early and removed through screening colonoscopy. These findings provide accurate detection and segmentation of gastrointestinal polyps, a cornerstone of effective CRC prevention and early intervention strategies [3].

Colonoscopy is generally considered the clinical gold standard for CRC screening; however, its accuracy in detecting disease is highly operator-dependent, including experience, visual acuity, and attention throughout the procedure. Tandem colonoscopy studies have reported overall miss-rates of 20%–30%, with miss-rates >25% for diminutive polyps (<5 mm) and miss-rates remain high for flat and low-contrast lesions. Visual problems, in addition to illumination flexibility, bowel preparation quality, mucosal folds, motion blur, bubbles, and specular highlights, further complicate reliable inspection and increase the risk of missing lesions [4,5]. These limitations have fueled growing interest in computer-aided detection and segmentation systems to assist clinicians during colonoscopic examinations.

CNNs (and, more specifically, encoder-decoder architectures such as U-Net [6] and attention-based segmentation frameworks for polyp delineation [7]) have shown strong performance on curated datasets. The development of benchmark datasets, such as Kvasir-SEG, which includes 1000 expert-annotated colonoscopy images with pixel-level polyp masks, has allowed for the standardization of evaluation and accelerated the development of techniques in the gastrointestinal polyp segmentation community. In addition, controlled clinical trials have found that AI-assisted colonoscopy systems can increase adenoma detection rates by 6–14 percentage points, suggesting the potential clinical value of deep learning-based decision support tools [8].

Despite these promising results, the use of deep learning-based polyp segmentation systems in clinical practice remains limited. One major challenge is to enhance segmentation performance and interpretability within a controlled, centralized evaluation setting. Many centralized segmentation models achieve average Dice scores in the range of 0.80–0.85 on benchmark datasets, but their performance drops when applied to visually challenging cases with small, flat, or poorly contrasted polyps [9,10]. The limited dataset size, class imbalance between polyp and background pixels, and dataset-specific biases further increase the risk of overfitting, leading to models that perform well in controlled evaluations but fail to generalize to diverse real-world conditions during colonoscopy [10].

A second important challenge is a lack of interpretability in deep learning-based segmentation models. Most state-of-the-art architectures are black boxes that provide little information about which image features or anatomical structures drive segmentation decisions. While overlap-based metrics, such as Dice and IoU, quantify overlap between predicted and ground-truth masks, they do not explain why a model recognizes a given region as polyp tissue [11]. In the safety-critical clinical environment, this lack of transparency undermines clinicians’ trust, complicates model validation, and poses barriers to model regulatory approval. Although explainable artificial intelligence (XAI) methods, such as gradient-weighted class activation mapping (Grad-CAM), have been widely used for classifying medical images, they have only been applied to pixel-level segmentation in a clinically significant way for the analysis of gastrointestinal polyps [12–15].

Motivated by the aforementioned challenges, this study investigates an explainable, centralized deep learning model for gastrointestinal polyp segmentation on the Kvasir-SEG dataset. The approach integrates a ResUNet++-Lite encoder–decoder segmentation model with Grad-CAM and masked Grad-CAM visualizations to analyze the model’s decision-making behavior during segmentation. The objective is to design a systematic and interpretable experimental model that integrates robust training strategies with explainable AI techniques to enable clinically transparent polyp segmentation.

2 Literature Review

Recent studies in polyp segmentation focus on improving generalization and segmentation accuracy using advanced deep learning architectures; however, domain shift and reliance on low-level style features continue to limit performance on unseen datasets [16–18]. Recent progress in deep learning techniques, in particular Convolutional Neural Networks (CNNs), encoder-decoder approaches, and attention-based models, has led to much more accurate segmentation results on benchmark datasets, and Kvasir-SEG is no exception. However, most existing approaches are evaluated under the assumption of centralized training, and the primary focus is on improving overlap-based metrics, while addressing robustness, scalability, and clinically reliable deployment is not well addressed.

Fan et al. (2020) [19] addressed the inaccuracy of polyp boundary delineation in low-contrast areas by introducing PraNet, a parallel reverse attention network. The model uses reverse attention and multi-scale feature aggregation to progressively refine the segmentation boundaries. Experimental results on Kvasir-SEG yielded Dice scores of 0.82–0.85, indicating clear improvement over standard U-Net baselines. While PraNet is successful in improving boundary awareness, it requires a complex, centralized architecture and fails to assess the robustness of clinical conditions and deployment constraints, leaving scalability-related issues unaddressed.

Jha et al. (2021) [20] focused on the challenge of real-time applicability in colonoscopy procedures, where segmentation models must operate under strict latency constraints. They proposed a CNN-based framework that can perform polyp detection, localization, and segmentation simultaneously in real time. The model achieved around 0.80 Dice scores without sacrificing inference speed. Although this work demonstrated feasibility for live clinical application, the robustness of segmentation for small or flat polyps was not thoroughly analyzed, and the approach remained limited to centralized training.

To overcome the limitations of multi-scale feature representation, an integrated architecture, called u-Net, was proposed in [21]. The model improves encoder-decoder interaction and feature fusion to better capture the morphology of the polyp across varying spatial scales. Evaluated on Kvasir-SEG, the u-Net gave Dice scores of around 0.83. Despite these improvements, a systematic evaluation of the study’s robustness was not presented, and scalability for broader clinical deployment was not discussed.

PolypSegNet (2024) [22] addressed the low feature expressiveness of traditional CNN backbones by combining the ConvNeXt-Tiny encoder with the U-Net decoder via attention. This hybrid architecture improved the feature discrimination, and the Dice scores obtained approximately 0.84 on Kvasir-SEG. However, the architectural complexity of the change raises concerns about computational efficiency, and the evaluation primarily focused on aggregate metrics without a performance analysis under challenging visual conditions.

Raghaw et al. (2024) [23] proposed MNet-SAt, a multiscale segmentation network with spatially enhanced attention, to address large variations in polyp sizes and shapes. The model achieved dice scores of 0.82–0.84, indicating improvements over the baseline architectures. This study focused on improving accuracy using an explainable, centralized deep learning segmentation model.

Singh and Sengar (2024) [24] solved the dilemma between segmentation accuracy and computational efficiency by proposing BetterNet, an efficient CNN architecture with residual connections and attention mechanisms. The model achieved Dice scores of around 0.81–0.83 on Kvasir-SEG with lower computational overhead. Despite these advantages, performance on polyp-difficult cases in visual interpretation and robustness across different imaging conditions were not systematically evaluated.

Fitzgerald and Matuszewski (2023) [25] worked on the problem of modeling long-range dependencies by proposing FCB-SwinV2, a transformer-based segmentation architecture. On Kvasir-SEG, the model achieved dice scores of around 0.84–0.85. Although transformer-based designs are good at capturing global context, they are computationally expensive, which limits their use for real-time clinical deployment, and robustness analysis was not covered in depth.

Tomar et al. (2022) [26] proposed DilatedSegNet to address the limited receptive fields of conventional CNNs by using dilated convolutions to capture wider contextual information. The model reported dice scores of approximately 0.80–0.83, an improvement over baseline methods. However, the approach remains a centralized black-box model and lacks detailed evaluation across heterogeneous clinical conditions.

Mei et al. (2023) [27] noted that polyp segmentation studies are scattered across different architectures, training strategies, and evaluation protocols, making it difficult to determine which improvements consistently lead to better segmentation performance. To address this, they conducted an organized survey of deep learning-based polyp segmentation methods, where the deep learning-based segmentation approaches were divided by model families (CNN, attention-based, and transformer-assisted), and common problems were identified, such as data set bias, lack of consistent preprocessing, and limited evaluation using Dice/IoU. Whilst the survey has been an important consolidation of methodological directions and benchmarking practices, it remains a descriptive review and does not propose a unified framework that simultaneously aims to improve segmentation accuracy and segmentation-aware explainability, leaving these as clinical requirements with open gaps.

The researchers in [28] presented a comprehensive survey of deep learning–based colorectal polyp segmentation methods published between 2014 and 2023, covering 115 studies. The work addressed challenges related to fragmented architectures, diverse datasets, and inconsistent evaluation practices by introducing a structured taxonomy and analyzing commonly used datasets and metrics. The study provided an overview of state-of-the-art models and highlighted key research trends and open challenges in Colorectal Polyp Segmentation (CPS).

The Graft-U-Net study [29] addressed the issue that endoscopic images are prone to low contrast and inconsistent lighting conditions, which can degrade segmentation boundaries and increase false positives. To address this issue, the authors proposed introducing a U-Net-style network, aided by preprocessing methods such as contrast enhancement (e.g., CLAHE) to improve input quality before segmentation. The method achieved better segmentation performance than the basic baselines under their evaluation criteria. However, the approach primarily focuses on preprocessing-driven improvements and does not address explainable segmentation to visually justify predictions, leaving a gap in clinical transparency. In addition, performance improvements remain based on handcrafted preprocessing decisions that can reduce robustness across different acquisition conditions.

Jha et al. (2020) [30] addressed the basic problem that the field lacked a widely used benchmark dataset with pixel-level annotations for gastrointestinal polyp segmentation. To address this issue, they proposed Kvasir-SEG, a collection of 1000 colonoscopy images with expert-annotated binary segmentation masks, facilitating standardized evaluation and comparison of segmentation networks. Benchmarking against baseline performance by U-Net-style models set a performance reference (Dice scores) of ~0.82 in baseline settings. While Kvasir-SEG has played a crucial role in driving progress, the dataset paper itself does not focus on explainable segmentation or advanced training strategies to achieve segmentation accuracy beyond established baselines; hence, providing the motivation for ongoing efforts to develop even higher-accuracy segmentation with greater transparency.

The PRANet GitHub implementation (2021) [31] addressed practical issues of reproducibility and accessibility for the PRANet architecture by providing an open-source implementation and training/testing code for polyp segmentation experiments. This makes it more convenient for researchers to reproduce PraNet-style reverse attention segmentation on datasets such as Kvasir-SEG and to compare results across different setups. As an implementation repository, it usually does not add new peer-reviewed experimental results beyond those reported in the PraNet paper. Reported reproduction-level Dice values are typically around 0.83, consistent with the original PraNet publication. However, the repository does not add new methodological contributions to explainable segmentation or demonstrate improved segmentation accuracy beyond stronger recent training/architecture refinements, which leave the same gaps that the present framework aims to address.

Overall, these works collectively strengthen two consistent gaps, which are: (1) segmentation accuracy still varies substantially based on preprocessing, architecture, and evaluation setup; and (2) segmentation-aware explainability is still dealt with inconsistently—and especially in a way that leads to clinically interpretable evidence along with strong Dice/IoU performance. These gaps motivate the present framework, which aims to improve segmentation accuracy by integrating explainable segmentation to increase clinical transparency. Each study is summarized by model/method, dataset, objective, Dice performance report, presence of explainable segmentation, and limitations. The limitations column highlights only those gaps directly addressed by the proposed framework: improved segmentation accuracy and explainable segmentation.

Table 1 summarizes representative studies on gastrointestinal polyp segmentation, highlighting the models used, datasets, reported Dice performance, and the presence of explainable AI mechanisms. The comparison shows that most existing approaches primarily focus on improving segmentation accuracy through architectural innovations while largely overlooking systematic explainability and reproducible evaluation pipelines. In addition, explainable segmentation mechanisms are rarely integrated into these models, and evaluation is typically limited to overlap-based metrics such as Dice and IoU. These observations highlight the need for segmentation pipelines that combine reliable performance with interpretable model behavior.

images

3 Limitations of Previous Approaches

Despite considerable progress in deep learning–based gastrointestinal polyp segmentation, several limitations persist, affecting the interpretability and reproducibility of existing approaches. These limitations are consistently observed across representative studies summarized in Table 1.

3.1 Limited Integration of Explainable Segmentation Mechanisms

Many recent segmentation approaches focus primarily on improving segmentation accuracy through architectural innovations such as reverse attention, multiscale feature aggregation, or transformer-based encoders. Representative models, including PraNet [19], PolypSegNet [22], and MNet-SAt [23], demonstrate improved feature representation and boundary delineation. However, these methods typically operate as black-box predictors and do not provide explicit explanations of the image regions influencing segmentation decisions. As a result, the interpretability of segmentation predictions remains limited, which can hinder clinical validation and trust.

3.2 Lack of Reproducible Experimental Pipelines

Another limitation in current research is the lack of clearly defined and reproducible experimental workflows. Several studies emphasize architectural improvements but provide limited details regarding preprocessing strategies, augmentation procedures, training configurations, and evaluation protocols. For example, models such as BetterNet [24], DilatedSegNet [27], and FCB-SwinV2 [25] primarily focus on architectural design and offer limited discussion of reproducible experimental pipelines. This lack of standardized workflows makes it difficult to fairly compare segmentation methods across different studies.

3.3 Limited Joint Evaluation of Segmentation Performance and Explainability

Most existing segmentation studies evaluate model performance using overlap-based metrics such as Dice coefficient and IoU, while explainability analysis is rarely integrated into the evaluation process. Survey studies [27,28] highlight that although segmentation accuracy continues to improve, the integration of explainable AI techniques in medical image segmentation remains limited. Consequently, many models provide strong segmentation performance but lack a systematic analysis of whether predictions correspond to clinically meaningful anatomical regions.

4 Contributions of the Proposed Model

Inspired by the above-identified limitations, the main contributions of this study are as follows:

4.1 Explainable Segmentation Pipeline for Gastrointestinal Polyp Analysis

This study develops a centralized deep learning segmentation pipeline that integrates a ResUNet++-Lite encoder–decoder segmentation model with Grad-CAM and masked Grad-CAM visualization techniques. The integration enables analysis of spatial regions that influence segmentation predictions, providing interpretable visual evidence of model decisions in gastrointestinal polyp segmentation.

4.2 Reproducible Experimental Workflow for Kvasir-SEG

The study establishes a systematic experimental workflow, including preprocessing, data augmentation, centralized model training, and quantitative evaluation on the Kvasir-SEG dataset. This workflow provides a reproducible baseline for explainable polyp segmentation research and supports transparent experimental evaluation.

4.3 Joint Evaluation of Segmentation Performance and Explainability

Beyond reporting pixel-level segmentation metrics such as Dice coefficient and IoU, the study incorporates explainability analysis using Grad-CAM and masked Grad-CAM visualizations to examine whether segmentation predictions align with clinically meaningful polyp structures.

5 Proposed Methodology

Although benchmark performances suggest that deep learning-based gastrointestinal polyp segmentation models can be effectively applied in the clinic, there are two practical issues facing many models: (i) performance in low contrast, specular highlights and scale variation, (ii) lack of transparency, whereby the model can give an accurate mask but fails to provide any explanation to what the image evidence contributed to the prediction. To overcome these shortcomings, this paper proposes a centralized explainable segmentation architecture trained on Kvasir-SEG using a fortified U-Net family base (ResUNet++-Lite) and incorporating Grad-CAM-based elucidations to facilitate clinically interpretable predictions. Fig. 1 represents the workflow in general.

images

Figure 1: Proposed explainable deep learning model for gastrointestinal polyp segmentation.

In Fig. 1, a pipeline is shown in sequence: the dataset is acquired, preprocessed, and augmented; centralized model training follows, along with quantitative evaluation (pixel-level and image-level) and explainability generation (Grad-CAM and masked Grad-CAM) on test samples.

5.1 Dataset Acquisition and Characteristics

The proposed model is trained and tested on the Kvasir-SEG dataset [32], which comprises colonoscopy images along with the polyp regions, annotated with pixel-wise masks by experts. Representative samples from the dataset are illustrated in Fig. 2. The dataset exhibits substantial variability in polyp morphology and imaging quality, including inconsistent illumination, contrast changes, and size disproportion between small and large lesions.

images

Figure 2: Representative polyp samples from the Kvasir-SEG dataset [32].

To improve generalization, the study’s dataset is augmented offline, increasing the effective dataset size. The original 1000 image–mask pairs are augmented to yield a dataset of 2000 pairs in total (1000 original + 1000 augmented), with anatomically realistic transformations (horizontal/vertical flips and regulated brightness/contrast adjustments). After augmentation, the dataset will be split into an 80/20 train/test ratio, yielding 1600 training samples and 400 test samples.

Preprocessing Pipeline

Images are uniformly down-sampled to a standard spatial resolution of 256 × 256 pixels so that the image size is uniform across all images and adequate anatomical information exists to mark boundaries.

After spatial resizing in Fig. 3, image intensities are normalized to the range [0, 1] to minimize differences caused by varying illumination conditions and endoscopic devices. This step of normalization increases contrast consistency and maintains anatomical texture, both of which are essential for proper polyp segmentation. Fig. 4 shows the impact of intensity normalization when a resized colonoscopy image was used. Following normalization, the ground-truth annotations provided by the experts are converted into binary segmentation masks to enable pixel-level supervision. A fixed intensity threshold of more than 127.5 is used to delineate polyp areas from surrounding tissue. The binarization step provides a well-defined separation between foreground and background, which is needed for supervised segmentation training. An example of the resulting binary mask is shown in Fig. 5.

images

Figure 3: Example of spatial resizing applied during preprocessing.

images

Figure 4: Colonoscopy image after intensity normalization to the range [0, 1].

images

Figure 5: Binary segmentation mask obtained from expert annotation using thresholding.

On-the-fly data augmentation during centralized training is used to enhance model generalization and reduce overfitting, given the small, heterogeneous medical data used as inputs. The concept of geometric augmentation is initially introduced through horizontal flipping, which enhances resistance to left-right rotation changes that are often encountered during endoscopic navigation, as shown in Fig. 6.

images

Figure 6: Horizontal flip applied for geometric data augmentation.

It is then followed by vertical flipping to increase invariance to more changes in the camera point of view that can happen during the process of manipulating the colonoscope. Fig. 7 shows an example of vertical flip augmentation.

images

Figure 7: Vertical flip used to simulate viewpoint variation.

Photometric augmentation is also applied through random brightness adjustments, in addition to geometric transformations. This operation emulates illumination variability across different endoscopic systems and lighting conditions, thereby enhancing the ability to withstand intensity variations. An example of brightness-based augmentation is given in Fig. 8. The augmentation strategy is designed to simulate common visual variations observed during colonoscopy imaging. Geometric transformations, such as horizontal and vertical flipping, improve the orientation invariance of polyp structures, while brightness adjustments simulate illumination variability under different endoscopic lighting conditions. These transformations enhance model robustness without altering the anatomical characteristics of the polyp regions

images

Figure 8: Random brightness adjustment to model illumination variability.

The centralized segmentation model is implemented using TensorFlow/Keras. The model supports two architectures: Attention U-Net-Lite and the stronger baseline ResUNet++-Lite. The training in this paper is carried out on ResUNet++-Lite, which implements residual connections and squeeze-and-excitation-like recalibration of channel features to enhance feature strength and boundary accuracy. The ResUNet++-Lite configuration is a computationally simplified version of the original ResUNet++. The lightweight design reduces the number of convolutional filters and simplifies certain residual blocks, thereby decreasing parameter complexity while maintaining effective feature extraction. Selective use of squeeze-and-excitation channel recalibration helps preserve boundary-sensitive feature representation while improving computational efficiency for medical image segmentation.

The model employs the Binary Cross-Entropy (BCE)-Dice loss function, which combines BCE with Dice overlap to balance pixel-level accuracy and region-based segmentation quality. AdamW (in place of Adam) is used for optimization, and gradient clipping is applied to stabilize it. Mixed precision training is enabled to accelerate computation while maintaining numerical stability by casting final outputs to float32. The model checkpoint with the highest validation Dice score is selected.

Training is then evaluated on the final trained model, and the results are stored as reproducible log records. Performance reporting involves the use of:

1. Pixel-level segmentation metrics: Dice coefficient and IoU.

2. Pixel-level confusion statistics: TP, TN, FP, FN computed at a fixed probability threshold of 0.50, with derived precision, recall, specificity, F1-score, and pixel accuracy.

3. Image-level detection recall: image-level TP and FN derived from segmentation outputs using a fixed detection criterion (since Kvasir-SEG contains only polyp-positive images).

The model supports paper-readable reporting and traceability by maintaining organized artifacts, including per-epoch logs, final evaluation summaries, pixel-level confusion metrics, image-level metrics, confusion matrix visualizations, threshold-sweep analysis (Dice and IoU vs. threshold), and panels of qualitative predictions.

βk=(1Z)∑i∑j(∂S)∂(Fijk)(1)

where (Fijk) in Eq. (1), denotes the activation at spatial location (i, j) of the k-th convolutional feature map, S is defined as the spatial sum of the predicted foreground probability map, βk is the importance weight of the k-th feature map, and Z is a normalization factor corresponding to the total number of spatial locations.

Using these importance weights, the Grad-CAM heatmap is generated as

H=ReLU(∑kβkFk)(2)

The resulting Grad-CAM heatmap, denoted H, highlights regions in the input image that positively influence the segmentation prediction in Eq. (2). The ReLU transformation suppresses negative contributions, ensuring that only features that support the foreground prediction are rendered. Additionally, to further limit explanations to regions linked to the predicted polyp, a masked Grad-CAM is obtained by overlaying the segmentation mask associated with the prediction onto the original Grad-CAM heatmap.

Hmasked=H⊙M^(3)

Here, M^ denotes the predicted binary segmentation mask and ⊙ represents element-wise multiplication in Eq. (3). This masking operation suppresses background activations and enables clinically meaningful interpretation by restricting visual explanations to predicted polyp areas.

The pipeline above enables interpretability through segmentation and model segmentation, with reproducible, centralized training using Grad-CAM and masked Grad-CAM. In turn, PolypXAI-Central provides a useful and clear foundation of polyp segmentation studies. All stages of the workflow, including dataset preparation, augmentation, model training, evaluation, and explainability, are outlined in Table 2.

images

Table 2 presents a concise, reproducible pseudocode representation of the proposed segmentation model that is centralized and explainable. It outlines the end-to-end workflow, which includes dataset preparation and augmentation, centralized deep-learning-based training of the segmentation model, quantitative analysis, and the application of Grad-CAM and masked Grad-CAM for segmentation-aware interpretability. The architecture ensures high segmentation performance and provides clear visual descriptions that make clinical validation of segmentation performance easy and inspire confidence.

5.2 Performance Evaluation Metrics

To assess the proposed centralized polyp segmentation model quantitatively, standard segmentation evaluation metrics based on confusion matrix statistics are used. These metrics are computed at the pixel level using binary foreground-background segmentation results from the test set, with a constant probability threshold of 0.50. Where TP, TN, FP, and FN represent the number of true positives, true negatives, false positives, and false negatives, respectively. All simulations were conducted in a Google Colab–based experimental environment, using Python and TensorFlow/Keras for model training and evaluation.

Accuracy=(TP+TN)(TP+TN+FP+FN)(4)

MisclassificationRate=(FP+FN)(TP+TN+FP+FN)=1−Accuracy(5)

Precision=TPTP+FP(6)

Recall=TPTP+FN(7)

Specificity=TN(TN+FP)(8)

FNR=FN(FN+TP)(9)

FPR=FP(FP+TN)(10)

F1=2×Precision×RecallPrecision+Recall(11)

Dice=2TP2TP+FP+FN(12)

IoU=TPTP+FP+FN(13)

At the image level, the Kvasir-SEG dataset consists of polyp-positive images only. Consequently, image-level evaluation is expressed in terms of detection recall for positive cases, reflecting the segmentation output’s ability to correctly indicate the presence of a polyp in each image.

6 Simulation Results

Although deep learning segmentation of gastrointestinal polyps has made significant advances, it cannot be implemented in a clinical setting with much reliability due to two practical constraints. To start with, high segmentation accuracy is difficult to achieve under real-world colonoscopy conditions due to substantial variability in polyp size, shape, texture, illumination, and background mucosa. Second, the absence of explainable segmentation outputs limits clinical confidence, especially in safety-critical screening operations, where it is critical to understand the model’s decisions. The suggested model is evaluated in a centralized training environment using the Kvasir-SEG dataset to address the aforementioned challenges. The objectives of the experimental assessment are twofold: (i) to evaluate whether the suggested segmentation pipeline can produce a higher accuracy of pixel-level segmentation, and (ii) whether an explanation of a segmentation is possible without a decrease in performance.

There are two complementary perspectives of performance. Pixel measures, including Accuracy, Dice coefficient, and IoU, are measures of quality of segmentation and precision of boundary delineation. Moreover, image-level detection performance is evaluated based on segmentation results by comparing the presence of a properly detected polyp in an image, which aligns with clinical screening needs. Individual reported results are consistent and reproducible as they are all the final trained models that are assessed using the held-out test set.

6.1 Tabular Summary of Image-Level Detection Performance

As a supplement to the visual image-level detection outcome matrix, the numbers of true positives (TP) and false negatives (FN) are summarized in tabular form and converted into image-level detection recall once the detection rule is established. Since the Kvasir-SEG dataset contains only images showing polyp-positive cases, the reported accuracy reflects only the ability to detect positive cases and not the overall success at full specification with both positive and negative classes. The centralized segmentation model is claimed to perform image-level detection, with performance measured on both the training and test splits using the same decision rule used in the confusion matrix analysis.

The centralized model identifies 1583 of 1600 training images and 396 of 400 testing images as polyp-positive, as summarized in Table 3. These results correspond to image-level recall values of 98.94% and 99.0% for the training and testing splits, respectively. The metric reflects the consistency of segmentation-based detection within the evaluated dataset.

images

6.2 Experimental Setup

The experiments are conducted using the publicly available Kvasir-SEG dataset, comprising 1000 colonoscopy images with pixel-wise polyp masks annotated by experts. An offline augmentation strategy is used to enhance robustness and reduce overfitting, yielding a total of 2000 image–mask pairs (1000 original + 1000 augmented). The augmented data is divided into 1600 training images (80%) and 400 testing images (20%). From the 1600 training samples, 10% were further reserved as a validation subset for model selection and early checkpoint monitoring.

All images are scaled to a spatial resolution of 256 × 256 pixels and normalized to the range [0, 1]. A ResUNet++-Lite architecture is used to train the segmentation model. It combines residual learning, squeeze-and-excitation blocks, and boundary refinement to enhance boundary refinement and is also computationally efficient. The BCE-Dice loss is used for training and balances the optimization of region overlap with the stability of pixel-wise classification. This model is trained with a fixed number of epochs, and the trained model is tested on the test set. For evaluation, we select the model checkpoint that achieves the highest validation Dice score during training and report test-set results using this checkpoint.

6.3 Pixel-Level Segmentation Performance

The performance of the proposed model at the pixel-level segmentation on the training and test split are summarized in Table 4. The segmentation quality is evaluated using three metrics: Accuracy, the Dice coefficient, and the IoU, all of which are reported. The pixel-wise confusion matrix is then used to compute pixel-level Accuracy, Dice coefficient, and IoU using the formulas in Eqs. (4)–(13).

images

The findings indicate strong segmentation performance: its test Dice coefficient is about 0.86, and its IoU is about 0.79. The fact that training and testing performance differed by only a small margin suggests good generalization and minimal overfitting, despite the complexity of colonoscopy images. These findings are competitive with representative centralized methods reported in the literature.

6.4 Image-Level Detection Performance

In addition to pixel-level segmentation evaluation using Dice and IoU, image-level detection performance is examined by determining whether the predicted segmentation output indicates the presence of a polyp in an image. This analysis reflects the consistency of the segmentation results in identifying polyp-positive cases within the evaluation set. An image is considered polyp-positive when the maximum predicted probability exceeds a threshold of 0.50.

Besides pixel-level segmentation evaluation using Dice and IoU, image-level detection performance is analyzed based on whether the predicted segmentation output indicates the presence of a polyp in an image. In this context, the metric is used only as a supplementary indicator to verify that the segmentation model correctly identifies polyp-positive cases in the evaluation set. It should be noted that the Kvasir-SEG dataset contains only polyp-positive images; therefore, this metric does not represent a full detection evaluation and cannot assess specificity or false-positive rates.

The confusion matrix below reflects 396 true positives and 4 false negatives. In this case, specificity and false-positive rates cannot be determined since the Kvasir-SEG test set consists of polyp-positive images only. Thus, image performance is reported as the detection recall on positive instances. The false-positive rate cannot be assessed because the Kvasir-SEG test split contains only polyp-positive images.

6.5 Qualitative Results and Explainability Analysis

Explainability analysis in this study focuses on qualitative visualization of model attention using Grad-CAM and masked Grad-CAM, which aids clinical interpretation of the segmentation outcome. Such visualizations provide information about the spatial regions that affect model predictions and can be used to assess whether the model focuses on anatomically relevant polyp structures.

Grad-CAM and masked Grad-CAM visualizations are generated on representative test images to improve clinical transparency and interpretability. Grad-CAM determines which spatial regions have the strongest contribution to the prediction of segmentation, whereas masked Grad-CAM limits such explanations to the areas of the predicted polyp by damping the background activations.

Qualitative analysis reveals that the model consistently targets clinically significant polyp structures, such as lesion boundaries and internal textures, rather than insignificant background tissue. Masked Grad-CAM also enhances interpretability by aligning predictions with the explanation maps, thereby reducing spurious activations outside lesion areas. These findings are visual proof that the anatomically based, clinically explainable segmentation decisions are achievable.

6.6 Centralized Baseline Performance

The centralized segmentation model was trained on the augmented Kvasir-SEG dataset using a single-node setup with an 80:20 train-test split. Using the model, the Dice coefficient was about 0.86, and IoU stood at about 0.79 on the test set, indicating strong agreement between predicted masks and expert annotations.

Based on the Dice coefficient trends, both the training and validation sets were evaluated over epochs, as shown in Fig. 9. At the beginning of the training process, the two curves improve rapidly, indicating good learning of coarse polyp structures. During the training, the validation Dice curve reaches a steady 0.85–0.86, while the training Dice continues to rise. It is a positive sign that convergence occurred with a moderate generalization gap, since polyp morphology is a complex set of phenomena, and the data are small.

images

Figure 9: Centralized training dice progression.

The IoU performance across training epochs is shown in Fig. 10. As with the Dice trends, both the training and validation sets exhibit a steady increase in IoU, with the validation set approaching 0.79. The strong correlation between Dice and IoU indicates a steady segmentation pattern and good boundary delineation during training.

images

Figure 10: Centralized training IoU progression.

Fig. 11 displays the training and validation loss curves, respectively. The training loss has a downward trend, which is decreasing, whereas the validation loss has a smooth downward trend and then levels. The lack of a sharp divergence between training and validation loss indicates controlled overfitting, and the model is learning discriminative features rather than memorizing the training data.

images

Figure 11: Centralized training loss curve.

In general, centralization of the baseline shows steadily convergent baselines, high levels of segmentation, and strong generalization on the Kvasir-SEG test set. Compared to previous centralized baselines reported in the literature, which generally achieve Dice scores between 0.80 and 0.84, the proposed training strategy and architecture achieve better performance without requiring a particularly heavyweight architecture. Such findings provide an excellent and consistent reference point for gastrointestinal polyp segmentation and a solid basis for future analysis and clinical interpretability research.

6.7 Explainability with Grad-CAM

Grad-CAM visualization is adopted to examine and justify the decision-making pattern of the hypothesized centralized polyp segmentation model. The resulting activation maps show that the model can target clinically relevant areas that are consistent across polyp tissue and lesion edges as well as typical textural features. In appropriately segmented cases, high-response regions on Grad-CAM maps are highly consistent with expert-labeled ground-truth masks, indicating that the model relies on meaningful anatomical features rather than irrelevant background information for its predictions.

Under more difficult conditions, such as images with specular highlights, mucus, or complex mucosal folds, Grad-CAM shows greater activation in visually ambiguous regions. Such observations can provide insight into the possible causes of segmentation errors and help explain instances of false-negative predictions at the image level. These visual explanations are useful for understanding model constraints in challenging imaging situations.

To provide even more interpretability, masked Grad-CAM is used by regulating the activation maps to the predicted segmentation mask. This optimization suppresses irrelevant background responses and emphasizes only the parts specifically involved in the final segmentation output. Consequently, masked Grad-CAM offers an easier, segmentation-conscious explanation, which allows for more straightforward evaluation of whether the model’s attention is properly focused on the identified polyp area.

The correctly segmented test images, presented as Representative Grad-CAM visualizations, are shown in Figs. 12–14, which demonstrate good agreement between the activation maps and the polyp anatomy. Fig. 15 illustrates an example of qualitative segmentation, which includes an input colonoscopy image, the ground truth (GT) mask manually annotated by an expert, the model-predicted probability map, and the overlay of the prediction on the input image. The high visual congruence between the estimated area and the GT mask indicates that the localization and delineation of this sample’s boundaries are correct. All these qualitative outcomes facilitate the openness of the offered centralized model and make it more confident of its clinical application due to the interpretable evidence of the basis of prediction.

images

Figure 12: Grad-CAM visualization example 1.

images

Figure 13: Grad-CAM visualization example 2.

images

Figure 14: Grad-CAM visualization example 3.

images

Figure 15: Qualitative polyp segmentation results: (left to right) input image, GT, predicted probability map, and prediction overlay on the input.

Here, the prediction yields strong, high-confidence polyp activity at a spatial scale closely matching the GT mask. The overlay also shows that the segmentation is appropriately localized on the clinically significant region of the lesion, indicating that it has been correctly delineated even in the case of real endoscopic appearance variability (non-uniform illumination and weak border transitions). It is important to note that the Kvasir-SEG dataset contains only polyp-positive images; therefore, the reported image-level recall should not be interpreted as a full detection performance metric. The recall value indicates that the segmentation model successfully identifies polyps in most images in the evaluation set. Metrics such as specificity, false positive rate, and screening sensitivity cannot be assessed without datasets containing both polyp and non-polyp frames.

The current research focuses on qualitative explainability via Grad-CAM visualizations, but quantitative analysis of explanation quality using measures such as pointing-game accuracy, deletion/insertion analysis, and so on can also provide additional insight. Recent research on explainable AI in healthcare has emphasized the importance of interpretability models for facilitating transparency, clinical validation, and the responsible application of AI systems. Future research will thus consider quantitative explainability assessment and a larger interpretability paradigm in clinical decision-support contexts.

7 Discussion

The experimental findings indicate that a well-planned, centralized deep learning architecture can achieve high accuracy in gastrointestinal polyp segmentation on the Kvasir-SEG dataset despite large variations in polyp size, shape, texture, and illumination. The suggested architecture, along with the data augmentation specific to it and an optimized loss formulation, enables boundary delineation and stable convergence, as evidenced by the gradually increasing Dice and IoU curves and the absence of a sharp drop in performance at the very end.

In contrast to previous baseline models that reported low segmentation accuracy, the proposed centralized methodology achieves a test Dice score of about 0.86 and an IoU of about 0.79, representing a measurable improvement in overlap-based segmentation quality relative to several earlier centralized baselines. The training and validation curves show steady learning behavior with an intermediate generalization gap, indicating that the model learns clinically relevant polyp structures rather than memorizing training examples. Notably, the high-level image-detection of 99% and four cases missed on the test split indicate that high-quality pixel-level segmentation can be transformed into high-quality lesion-presence detection, which is essential for screening and diagnostic procedures.

Explainability is key to justifying the clinical relevance of the segmentations’ outputs. Grad-CAM and masked Grad-CAM diagrams demonstrate that the model is consistently focused on anatomically relevant areas related to polyp tissue, e.g., boundaries and internal textures, rather than on non-significant background features such as mucosal folds or a specular highlight. In failure cases, attention maps can indicate confusion caused by low contrast or ambiguous visual patterns. Such transparency is critical to clinical adoption, as it enables qualitative verification of model behavior, facilitates trust and interpretability, and supports possible regulatory review.

These results directly address gaps identified in the literature. Although several current studies on gastrointestinal polyp segmentation focus on architectural complexity or marginal accuracy improvements, few offer improved segmentation results and systematic explainability within the model. The findings reported in this article show that competitive performance does not necessitate an overcompensated model structure or an advanced training paradigm, but rather a set of principled approaches that combine robust structure design, effective optimization, and interpretable inference.

In general, the suggested centralized segmentation model provides a powerful, interpretable benchmark on gastrointestinal polyp segmentation. It demonstrates high Dice overlap, robust image-level detection, and produces clinically meaningful visual explanations, thereby offering a viable and reliable solution for implementation in clinical research and decision support systems, and a good basis for extending to multi-centered or privacy-conscious learning scenarios in the future.

Comparison with Previously Published Methods

The proposed centralized segmentation model is contrasted with representative gastrointestinal polyp segmentation models reported in the literature. Most of the literature compares deep learning models in centralized training conditions on benchmarks like Kvasir-SEG, and the goal of most of them is to increase the overlap-based metrics, specifically Dice and IoU. Although these works have continued to improve segmentation performance by introducing novel architectural designs, they have ignored the need for clinically understandable explanations to justify model predictions.

Initial encoder-decoder methods, such as U-Net and its residual counterparts, have served as popular baselines for polyp segmentation. These models have a good, reasonable boundary delineation of 0.78–0.82 on Kvasir-SEG [32], but poor robustness in difficult cases such as flat or low-contrast polyps. Future attention-based and multiscale architectures optimized feature representation and localization, providing progressive improvements in segmentation performance, but at the cost of increased architectural complexity.

To capture long-range dependencies and global contextual information in colonoscopy images, more recent transformer-based models, including FCB-SwinV2, suggested by Fitzgerald and Matuszewski [25], tried to capture it. Even though these models have reported better Dice scores (~0.84) than conventional CNNs, they remain computationally expensive and offer little insight into how they work, serving as black-box predictors with no clear segmentation-level explainability.

Conversely, the suggested model focuses on achieving higher segmentation accuracy and clinically significant interpretability within a centralized training paradigm. The proposed model will be tested on 80:20 split of the augmented Kvasir-SEG dataset with a test Dice score of about 0.86 and an IoU of about 0.79, showing performance comparable to or slightly higher than several representative centralized baselines with a smaller architecture. The pixel-level overlap metrics are supplemented by image-level reliability, with a recall of 99.0% for image-positive test polyp images, indicating that lesions are not missed by the stipulated detection criterion.

Table 5 shows a comparative analysis of representative gastrointestinal polyp segmentation methods evaluated on the Kvasir-SEG dataset. The comparison highlights commonly reported segmentation performance metrics such as Dice coefficient and IoU, along with the training paradigm and the use of explainable segmentation techniques. The results indicate that most existing approaches primarily focus on improving segmentation accuracy through architectural innovations, but do not incorporate explicit explainability mechanisms. In contrast, the proposed study integrates ResUNet++-Lite with Grad-CAM and masked Grad-CAM to provide interpretable segmentation outputs while maintaining competitive segmentation performance on the benchmark dataset.

8 Conclusion

This paper outlined an explainable, centralized deep learning system for polyp segmentation in the gastrointestinal tract using the Kvasir-SEG dataset. To address the variability in polyp sizes, class imbalance, and visual ambiguity in colonoscopy images, a powerful U-Net-based segmentation network with residual and squeeze-excitation improvements was trained using a specially crafted preprocessing and augmentation pipeline. The test set performance of the proposed approach was high, with a Dice coefficient of approximately 0.86, indicating good boundary delineation across a wide range of polyp sizes and appearances. In addition to overlap-based measures, image-level detection analysis using segmentation results was used to test the model, as it reflects clinically relevant screening behavior. The model showed almost perfect image-level detection, with only a few missed cases on the test split, suggesting strong performance in identifying the presence of a polyp under real-world assessment conditions.

Grad-CAM with masked Grad-CAM was used to provide visual explanations for clinical transparency, in accordance with the segmentation predictions. The explainability analysis confirmed that the model consistently focuses on clinically relevant polyp regions, margins, and textures, while suppressing irrelevant background structures. Masked Grad-CAM further increases interpretability and qualitative validation of prediction accuracy by limiting explanations to predicted segmentation masks. In general, the findings indicate that high segmentation accuracy and explainable decision-making are achievable within a centralized training paradigm without requiring overly complex architectures. The proposed model can be regarded as a strong and interpretable foundation of gastrointestinal polyp segmentation and is a worthwhile base of future studies on clinically reliable AI-assisted colonoscopy systems.

Limitations and Future Work

Despite strong performance, the model faces limitations in handling challenging polyp appearances, dataset diversity, and centralized training assumptions. The current evaluation is conducted on the Kvasir-SEG dataset; therefore, future work will include validation on additional publicly available polyp segmentation datasets, such as CVC-ClinicDB, CVC-ColonDB, and ETIS-LaribPolypDB, to further assess cross-dataset generalization and robustness. Future work will focus on improving robustness, explainability, fidelity, and real-world clinical deployment.

Acknowledgement: Not applicable.

Funding Statement: The authors received no specific funding for this study.

Author Contributions: Hafeez Rahman, Naveed Butt, Naila Sammar Naz and Fahad Ahmed collected data from various resources and contributed to the original draft preparation. Hafeez Rahman, Muhammad Saleem and Adnan Khan; performed formal analysis and Simulation, Naveed Butt and Khan Muhammad Adnan performs interpretation of results, Hafeez Rahman, Naveed Butt, Adnan Khan and Naila Sammar Naz; writing—review and editing, drafted pictures and tables, Fahad Ahmed, Muhammad Saleem, and Khan Muhammad Adnan; performed revisions and improve the quality of the draft. All authors reviewed and approved the final version of the manuscript.

Availability of Data and Materials: The original contributions presented in this study are included in the article. In addition, the dataset used in this work is publicly available at: https://github.com/Thehunk1206/PRANet-Polyps-Segmentation. Further inquiries can be directed to the corresponding author.

Ethics Approval: Not applicable.

Conflicts of Interest: The authors declare no conflicts of interest.

References

1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49. doi:10.3322/caac.21660. [Google Scholar] [PubMed] [CrossRef]

2. Wu S, Zhang Y, Lin Z, Wei M. Global burden of colorectal cancer in 2022 and projections to 2050: incidence and mortality estimates from GLOBOCAN. BMC Cancer. 2025;25(1):1770. doi:10.1186/s12885-025-15138-0. [Google Scholar] [PubMed] [CrossRef]

3. Winawer SJ, Zauber AG, Ho MN, O’Brien MJ, Gottlieb LS, Sternberg SS, et al. Prevention of colorectal cancer by colonoscopic polypectomy. N Engl J Med. 1993;329(27):1977–81. doi:10.1056/nejm199312303292701. [Google Scholar] [CrossRef]

4. van Rijn JC, Reitsma JB, Stoker J, Bossuyt PM, van Deventer SJ, Dekker E. Polyp miss rate determined by tandem colonoscopy: a systematic review. Am J Gastroenterol. 2006;101(2):343–50. doi:10.1111/j.1572-0241.2006.00390.x. [Google Scholar] [PubMed] [CrossRef]

5. Leufkens A, van Oijen M, Vleggaar F, Siersema P. Factors influencing the miss rate of polyps in a back-to-back colonoscopy study. Endoscopy. 2012;44(5):470–5. doi:10.1055/s-0031-1291666. [Google Scholar] [PubMed] [CrossRef]

6. Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention—MICCAI 2015. Cham, Switzerland: Springer International Publishing; 2015. p. 234–41. doi:10.1007/978-3-319-24574-4_28. [Google Scholar] [CrossRef]

7. Song P, Li J, Fan H. Attention based multi-scale parallel network for polyp segmentation. Comput Biol Med. 2022;146:105476. doi:10.1016/j.compbiomed.2022.105476. [Google Scholar] [PubMed] [CrossRef]

8. Repici A, Badalamenti M, Maselli R, Correale L, Radaelli F, Rondonotti E, et al. Efficacy of real-time computer-aided detection of colorectal neoplasia in a randomized trial. Gastroenterology. 2020;159(2):512–20.e7. doi:10.1053/j.gastro.2020.04.062. [Google Scholar] [PubMed] [CrossRef]

9. Tomar NK, Jha D, Ali S, Johansen HD, Johansen D, Riegler MA, et al. DDANet: dual decoder attention network for automatic polyp segmentation. In: Pattern recognition. ICPR international workshops and challenges. Cham, Switzerland: Springer International Publishing; 2021. p. 307–14. doi:10.1007/978-3-030-68793-9_23. [Google Scholar] [CrossRef]

10. Guo P, Xue Z, Long LR, Antani S. Cross-dataset evaluation of deep learning networks for uterine cervix segmentation. Diagnostics. 2020;10(1):44. doi:10.3390/diagnostics10010044. [Google Scholar] [PubMed] [CrossRef]

11. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88. doi:10.1016/j.media.2017.07.005. [Google Scholar] [PubMed] [CrossRef]

12. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vis. 2020;128(2):336–59. doi:10.1007/s11263-019-01228-7. [Google Scholar] [CrossRef]

13. Asare A, Nguyen TH, Bagci U. Improved segmentation of polyps and visual explainability analysis. arXiv:2509.18159. 2025. [Google Scholar]

14. Shahzad T, Saleem M, Farooq MS, Abbas S, Khan MA, Ouahada K. Developing a transparent diagnosis model for diabetic retinopathy using explainable AI. IEEE Access. 2024;12:149700–9. doi:10.1109/access.2024.3475550. [Google Scholar] [CrossRef]

15. Adnan KM, Ghazal TM, Saleem M, Farooq MS, Yeun CY, Ahmad M, et al. Deep learning driven interpretable and informed decision making model for brain tumour prediction using explainable AI. Sci Rep. 2025;15(1):19223. doi:10.1038/s41598-025-03358-0. [Google Scholar] [PubMed] [CrossRef]

16. Poudel S, Lee SW. Polyp generalization via diversifying style at feature-level space. Appl Sci. 2024;14(7):2780. doi:10.3390/app14072780. [Google Scholar] [CrossRef]

17. Nguyen NQ, Vo DM, Lee SW. Contour-aware polyp segmentation in colonoscopy images using detailed upsampling encoder-decoder networks. IEEE Access. 2020;8:99495–508. doi:10.1109/ACCESS.2020.2995630. [Google Scholar] [CrossRef]

18. Aggarwal S, Gupta I, Kumar A, Kautish S, Almazyad AS, Wagdy Mohamed A, et al. GastroFuse-Net: an ensemble deep learning framework designed for gastrointestinal abnormality detection in endoscopic images. Math Biosci Eng. 2024;21(8):6847–69. doi:10.3934/mbe.2024300. [Google Scholar] [PubMed] [CrossRef]

19. Fan DP, Ji GP, Zhou T, Chen G, Fu H, Shen J, et al. PraNet: parallel reverse attention network for polyp segmentation. In: Medical image computing and computer assisted intervention—MICCAI 2020. Cham, Switzerland: Springer International Publishing; 2020. p. 263–73. doi:10.1007/978-3-030-59725-2_26. [Google Scholar] [CrossRef]

20. Jha D, Ali S, Tomar NK, Johansen HD, Johansen D, Rittscher J, et al. Real-time polyp detection, localization and segmentation in colonoscopy using deep learning. IEEE Access. 2021;9:40496–510. doi:10.1109/access.2021.3063716. [Google Scholar] [PubMed] [CrossRef]

21. Emon MH, Mondal PK, Mozumder MAI, Kim HC, Lapina M, Babenko M, et al. An integrated architecture for colorectal polyp segmentation: the μ-net framework with explainable AI. Diagnostics. 2025;15(22):2890. doi:10.3390/diagnostics15222890. [Google Scholar] [PubMed] [CrossRef]

22. Islam MM, Sohel MK. PolypSegNet: a hybrid ConvNeXt-tiny and attention U-net framework for accurate colorectal polyp segmentation. Preprint. 2025. doi:10.21203/rs.3.rs-7102819/v1. [Google Scholar] [CrossRef]

23. Raghaw CS, Yadav A, Sanjotra JS, Dangi S, Kumar N. MNet-SAt: a Multiscale network with spatial-enhanced attention for segmentation of polyps in colonoscopy. Biomed Signal Process Control. 2025;102:107363. doi:10.1016/j.bspc.2024.107363. [Google Scholar] [CrossRef]

24. Singh O, Sengar SS. BetterNet: an efficient CNN architecture with residual learning and attention for precision polyp segmentation. arXiv:2405.04288. 2024. [Google Scholar]

25. Fitzgerald K, Matuszewski B. FCB-SwinV2 transformer for polyp segmentation. arXiv:2302.01027. 2023. [Google Scholar]

26. Tomar NK, Jha D, Bagci U. DilatedSegNet: a deep dilated segmentation network for polyp segmentation. In: MultiMedia modeling. Cham, Switzerland: Springer International Publishing; 2023. p. 334–44. doi:10.1007/978-3-031-27077-2_26. [Google Scholar] [CrossRef]

27. Mei J, Zhou T, Huang K, Zhang Y, Zhou Y, Wu Y, et al. A survey on deep learning for polyp segmentation: techniques, challenges and future trends. Vis Intell. 2025;3(1):1. doi:10.1007/s44267-024-00071-w. [Google Scholar] [CrossRef]

28. Wu Z, Lv F, Chen C, Hao A, Li S. Colorectal polyp segmentation in the deep learning era: a comprehensive survey. arXiv:2401.11734. 2024. [Google Scholar]

29. Ramzan M, Raza M, Sharif MI, Kadry S. Gastrointestinal tract polyp anomaly segmentation on colonoscopy images using graft-U-Net. J Pers Med. 2022;12(9):1459. doi:10.3390/jpm12091459. [Google Scholar] [PubMed] [CrossRef]

30. Jha D, Smedsrud PH, Riegler MA, Halvorsen P, de Lange T, Johansen D, et al. Kvasir-SEG: a segmented polyp dataset. In: MultiMedia modeling. Cham, Switzerland: Springer International Publishing; 2019. p. 451–62. doi:10.1007/978-3-030-37734-2_37. [Google Scholar] [CrossRef]

31. Zhang W, Fu C, Zheng Y, Zhang F, Zhao Y, Sham CW. HSNet: a hybrid semantic network for polyp segmentation. Comput Biol Med. 2022;150(3):106173. doi:10.1016/j.compbiomed.2022.106173. [Google Scholar] [PubMed] [CrossRef]

32. Set theme jekyll-theme-cayman. [cited 2026 Jan 1]. Available from: https://github.com/thehunk1206/pranet-polyps-segmentation. [Google Scholar]

33. Zhang W, Ye W, Yu Z, Ju J. A colon polyp segmentation network via collaborative decision-making of mixture of experts. Expert Syst Appl. 2026;308(4):131117. doi:10.1016/j.eswa.2026.131117. [Google Scholar] [CrossRef]

Cite This Article

APA Style

Rahman, H., Butt, N., Naz, N.S., Ahmed, F., Saleem, M. et al. (2026). An Explainable Centralized Deep Learning Model for Gastrointestinal Polyp Segmentation Using the Kvasir-SEG Dataset. Computer Modeling in Engineering & Sciences, 147(1), 36. https://doi.org/10.32604/cmes.2026.081316

Vancouver Style

Rahman H, Butt N, Naz NS, Ahmed F, Saleem M, Khan A, et al. An Explainable Centralized Deep Learning Model for Gastrointestinal Polyp Segmentation Using the Kvasir-SEG Dataset. Comput Model Eng Sci. 2026;147(1):36. https://doi.org/10.32604/cmes.2026.081316

IEEE Style

H. Rahman et al., “An Explainable Centralized Deep Learning Model for Gastrointestinal Polyp Segmentation Using the Kvasir-SEG Dataset,” Comput. Model. Eng. Sci., vol. 147, no. 1, pp. 36, 2026. https://doi.org/10.32604/cmes.2026.081316

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

An Explainable Centralized Deep Learning Model for Gastrointestinal Polyp Segmentation Using the Kvasir-SEG Dataset

Abstract

Keywords

References

Cite This Article

491

145

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link