A prevalent diabetic complication is Diabetic Retinopathy (DR), which can damage the retina’s veins, leading to a severe loss of vision. If treated in the early stage, it can help to prevent vision loss. But since its diagnosis takes time and there is a shortage of ophthalmologists, patients suffer vision loss even before diagnosis. Hence, early detection of DR is the necessity of the time. The primary purpose of the work is to apply the data fusion/feature fusion technique, which combines more than one relevant feature to predict diabetic retinopathy at an early stage with greater accuracy. Mechanized procedures for diabetic retinopathy analysis are fundamental in taking care of these issues. While profound learning for parallel characterization has accomplished high approval exactness’s, multi-stage order results are less noteworthy, especially during beginning phase sickness. Densely Connected Convolutional Networks are suggested to detect of Diabetic Retinopathy on retinal images. The presented model is trained on a Diabetic Retinopathy Dataset having 3,662 images given by APTOS. Experimental results suggest that the training accuracy of 93.51% 0.98 precision, 0.98 recall and 0.98 F1-score has been achieved through the best one out of the three models in the proposed work. The same model is tested on 550 images of the Kaggle 2015 dataset where the proposed model was able to detect No DR images with 96% accuracy, Mild DR images with 90% accuracy, Moderate DR images with 89% accuracy, Severe DR images with 87% accuracy and Proliferative DR images with 93% accuracy.
One of the most prevalent diseases around the world is Diabetic Mellitus. The extended prevalence of diabetes causes several problems related to health, such as Diabetic Retinopathy, nephropathy, diabetic foot, etc. The most common issue is Diabetic Retinopathy (DR). Diabetic Retinopathy is a diabetes complication that can harm the retina’s veins and leads to significant vision loss. Diabetic Retinopathy typically happens when high glucose levels damage the veins and limit the bloodstream to the retina. Initially, it starts with no symptoms or mild vision problems, and eventually, it can create vision loss. The symptoms can be noticed only when it reaches an advanced stage and usually affects both eyes [
Even though there are many computer-aided diagnosis tools, detecting diabetic retinopathy at some severity level is challenging, such as mild and moderate DR. The principal target of the work is to apply the data fusion/feature fusion technique, which combines more than one relevant feature to predict diabetic retinopathy at an early stage with greater accuracy. It usually takes about 7–14 days in conventional methods to detect DR, considering the eye screening and consulting from an ophthalmologist. Since we are using an end-user neural network to detect and classify, it takes around 2–5 min depending on the resolution of the fundus image since there aren’t any methods as of yet which is practically attainable to click the fundus images sitting at home so the person might first take a pre-screening or undergo usual fundoscopy after which the patient may use the neural network to detect DR [
There are two types of Diabetic Retinopathy: Non-Proliferative DR (L0, L1, L2, L3) & Proliferative (L4) DR. Microaneurysms, cotton wool spots, and hemorrhages define non-proliferative retinopathy; and iris or retinal neovascularization defines proliferative retinopathy. Non-Proliferative DR (NPDR) is the milder form. It is primarily symptomless, whereas Proliferative DR (PDR) is an advanced stage of DR, and it leads to the formation of abnormal blood vessels in the retina [
The paper proceeds in the following way: Related work about Diabetic Retinopathy is reviewed in Section 2. The proposed implementation model includes the data overview and steps like ben graham preprocessing, scaling and cropping, etc. A detailed description of the three proposed dense Convolutional Neural Networks (CNN) models is described in Section 3. The visual performance evaluation and comparison of the models on the Kaggle 2015 dataset are shown in Section 4. In the end, in Section 5, the conclusions and future scope are presented.
Pires et al. (2019) [
Sarwinda et al. (2018) [
In Khojasteh et al. (2018) [
This proposed work by Costa et al. (2018) [
Dutta et al. (2018) [
Kumar et al. (2018) [
Generally, in contrasting the exhibition and examining the aftereffects of conventional and Deep Learning-based techniques, the DL-based strategies beat the traditional methods, as discussed in the writing study. On separately auditing the DL-based procedures, Conventional Neural networks (CNN) and their pre-prepared structures have been utilized by the more significant part of the specialists. It has delivered more possible outcomes. Be that as it may, CNN experiences various issues; one of them is information comment, where it requires the ophthalmologists’ administrations to name the retinal fundus pictures. Class irregularity and overfitting are different issues that may bring about one-sided expectations. An expansion in information builds the exhibition of the based frameworks, which may not be conceivable in a wide range of issues. Robotized DR Disease recognition frameworks astoundingly decrease the determination time cost and help ophthalmologists identify retinal variations from the norm and give timely treatment. A few examinations have proposed that the exhibition has been expanded while incorporating the handmade and CNN-based highlights.
Later on, CNN designs can be coordinated to remove more exact and better picture highlights, which improve DR recognition and grouping rate. At first, in this work, a concise depiction was given on clinical picture preparation, Diabetic Retinopathy hazard elements, and characterizations. Regular and DL-based DR discovery techniques are examined, alongside their exhibition measurements. The majority of the specialists have utilized CNN for its proficiency and capacity to give more exact outcomes, which outperforms different strategies. This survey paper examines the ongoing works, and the most valuable methods are advanced, which helps the exploration network recognize and order DR.
In the proposed work, Fundus images of patients’ eyes affected by DR are used to train high-density CNN’s, i.e., DenseNets, which can classify images into five classes (mild DR without DR, medium DR, severe, proliferative DR). The first and foremost important step is image preprocessing and separation into grayscale. The model goes through various stages. First, image preprocessing ensures that impurities such as starting noise and uneven contrast lighting are not included in the image. Then, after reducing the impact of light, cropping areas with less information, classify and predict using a dense convolutional neural network model.
The image contains many removal conditions, including blurred vision, negative or unsatisfactory color vision drift, and partial vision that can confuse the result. However, the proposed method worked with these noisy data to increase sensitivity and accuracy to overcome these efforts. In the entire process, the input image data is first converted to a grayscale image, reducing the influence of light conditions. Since the input image size in the DR dataset is considered to be very high resolution, the training process is slowed down, and there is a possibility that the training process consumes and runs out of memory. For this purpose, the image is scaled to a resolution of 224 × 224. In addition, areas without image information have been truncated. Finally, Ben Graham’s pre-processing method for improving lighting conditions. Algorithm 1 below states the same. The normal retinal image and the image after performing Ben Graham’s pre-processing are shown in
Furthermore, uninformative black areas of the images have been cropped. Algorithm 2 below states the same.
The project has a variety of pre-trained models and proposes three different high-density CNN models for adding new layers experimented with. Choosing the suitable design parameters for your CNN directly impacts your model’s performance. Densely Connected Convolutional Networks were chosen for this model because DenseNets has several decisive advantages. It alleviates burnishing gradients, enhances functional propagation, encourages practical reuse, and significantly reduces parameters.
DenseNets require fewer parameters than a typical CNN, because there are no redundant feature maps to learn. Also, DenseNets concatenate the layer’s output feature maps with the input feature maps instead of adding them together. A network contains N layers each of which executes a non-linear transformation TN.
The dense connectivity in the DenseNet architecture is represented using
where [
DenseNets are partitioned into DenseBlocks, where the feature map dimensions remain constant inside a block but the number of filters varies. Since the feature maps are concatenated, the channel dimension grows with each layer. It can be generalized for the Nth layer if we make TN to produce R feature maps every time.
where R is the growth rate. Because of its dense connectedness, this network architecture is referred to as a Dense Convolutional Network [
The first model is based on the DenseNet-121 model. These networks begin with a simple convolution and pool hierarchy. There is a dense block, then a layer of transition, then another dense block and another layer of transition, another dense block and a layer of transition, and finally a dense block following a layer of classification. The starting block of convolution is 7 × 7 in size, has two strides, and has 64 filters. Then, following it is a 3 × 3 Max Pooling layer with stride 2. Then there is the dense block. All dense blocks have two convolutions along with 1 × 1 and 3 × 3 size kernels. Dense block one can be repeated six times in a row, dense block twelve times for a couple, dense block three twenty-four times, and finally dense block four sixteen times. Every dense layer is a layer of transition consisting of a 1 × 1 convolution layer and an average pool of 2 × 2 layers with a stride length of 2. After the last dense layer of the model is the world average pool layer. Regularization is done using the 0.5 dropouts. After the fully connected layer, the softmax function continues to convert the output into a probability distribution. The architecture of model1 is shown in
The second model is based on the DenseNet-169 model. These networks begin with a simple convolution and pool hierarchy. There is a dense block, then a layer of transition, then another dense block and another layer of transition, another dense block and a layer of transition, and finally a dense block following a layer of classification. The starting block of convolution has a stride of 2 with 64 filters at size 7 × 7, followed by a max-pooling layer of 3 × 3 and a stride of 2. Next a dense block. All dense blocks have two convolutions with kernels of size 1 × 1 and 3 × 3. Dense block 1 to be repeated six times, dense block 2 to be repeated twelve times, dense block three to be repeated thirty-two times, and finally, dense block four may be repeated thirty-two times. Each dense layer is a transition layer with a 1 × 1 convolutional layer and an average pooled 2 × 2 layer of stride length 2. There is a global average pooling layer behind the last dense block in the model. Regularization is done using 0.5 dropouts. After the fully connected layer, the softmax function continues to convert the output into a probability distribution. The model2 architecture is demonstrated in
The third model is based on the DenseNet-121 model. These networks begin with a simple convolution and pool hierarchy. Followed by, there are dense blocks and the transition layers. The starting block of convolution is 7 × 7 in size, has two strides, and has 64 filters. Then, following it is a 3 × 3 Max Pooling layer with stride 2. Then there is the dense block. All dense blocks have two convolutions along with 1 × 1 and 3 × 3 size kernels. Dense block one can be repeated six times in a row, dense block twelve times for a couple, dense block three twenty-four times, and finally dense block four sixteen times. Every dense layer is a layer of transition consisting of a 1 × 1 convolution layer and an average pool of 2 × 2 layers with a stride length of 2. After the last dense layer of the model is the world average pool layer. Regularization is done by using a 0.5 of dropout. After the fully connected layer, the softmax function continues to convert the output into a probability distribution.
The model was executed on Google Colab and python version 3.7. It also required TensorFlow version 2.0 and Keras version 2.3.
The Diabetic Retinopathy Dataset has been used for the data of color fundus images, which is provided by APTOS, which has an extensive collection of retinal images taken with the help of fundus photography under a wide range of lighting and imaging constraints. The data set contains 3,662 training images and 1928 testing images under variety of imaging conditions in which only training dataset has been in the proposed model. Each image has been assessed on a scale of 0 to 4 by a clinician for the severity of diabetic retinopathy. The data found related to this topic is noisy and requires multiple pre-processing steps to get all images to a usable format for training a model. The training dataset is the APTOS 2019 dataset [
The test dataset’s results are assessed using four factors: accuracy, precision, recall, and f1-score, all of which may be evaluated using the confusion matrix. TP, TN, FP, FN are true positive, true negative, false positive and false negative, respectively. Binary cross-entropy has been used, a particular type of cross-entropy with a goal of 0 or 1. The calculation is done with the formula of cross-entropy, where the target is converted to a one-hot vector like [0,1] or [1,0] and the predictions, respectively.
This paper proposes three models of Dense CNN to classify DR into 1 out of 5 Diabetic Retinopathy classes according to the severity of the disease: No DR, Mild DR, Moderate DR, Severe DR, and proliferative DR. The images are trained on DenseNet based sequential models with the learning rate of 0.00005. It was also observed that significant changes in validation accuracy occurred only after epoch size 30 and above in all models. Therefore, epoch size and accuracy are not linearly related to each other. The larger the Epoch size, the better the performance. It was also found that Model 3 achieved the highest accuracy overall. In Model 3, maximum training accuracy of 93.51% was observed. Followed by it, Model 2 has reached 89.19% of training accuracy, and Model 1 has achieved the least of the three, i.e., 83.9% of training accuracy. After comparing the three models discussed above, Model 3 has achieved the highest accuracy; after that, Model 2 and then Model 1. The least accuracy is achieved by Model 1. The rate of accuracy is related to the proposed system and the parameters set for each model. Model 3 reduces the problem of overfitting to a great extent [
The graph in
The graph in
A comparison between the training accuracy and validation accuracy is shown in
In this section, a comparison between the models is made. The three models have been compared based on five parameters. These three models have been trained over 3,662 images altogether and also for 50 epochs for uniformity. As a result, model 3 reduces overfitting greatly compared to model 1 and model 2.
Epochs | Loss | Accuracy | Precision | Recall | F1-score | |
---|---|---|---|---|---|---|
Model 1 | 50 | 0.0166 | 0.8390 | 0.9954 | 0.9945 | 0.9949 |
Model 2 | 50 | 0.0101 | 0.8919 | 0.9962 | 0.9967 | 0.9964 |
Model 3 |
Well-shuffled 550 images of the Kaggle Diabetic Retinopathy dataset (2015) have been pre-processed with the methods mentioned above and taken for testing on the best-suited model 3. The predicted values were compared with the actual dataset values. Moreover, a class-wise classification report has been generated, which depicts the class-wise performance of model 3 on the testing dataset. Model 3 performs the best on label 0-No DR images with an accuracy of 96%, the precision of 0.97, recall of 0.85, and f1-score of 0.91 and performs the least on label 3-Severe DR with an accuracy of 87%, precision of 0.70, recall of 0.68 and f1-score of 0.69.
Accuracy | Precision | Recall | F1-score | |
---|---|---|---|---|
No DR | 96% | 0.97 | 0.85 | 0.91 |
Mild DR | 90% | 0.68 | 0.80 | 0.74 |
Moderate DR | 89% | 0.73 | 0.72 | 0.72 |
Severe DR | 87% | 0.70 | 0.68 | 0.69 |
Proliferative DR | 94% | 0.83 | 0.86 | 0.84 |
Hence, the average per-class accuracy which decides the per-class effectiveness of the classifier is 91.2%, and the Macro-average precision that decides the average per-class agreement of the true class labels with those of the classifier’s, the Macro-average recall that determines the average per-class effectiveness of a classifier to identify class labels, and the Macro-average F1-score that is the harmonic mean of micro-average precision and recall are 0.782, 0.782, and 0.780 respectively.
According to the research done before the proposed work, it is hard for the models to get a decent accuracy to detect Mild Diabetic Retinopathy images. In contrast, in this case, the model was able to do so. In addition, the advantage of combining resizing and augmentation into a single operation which has been done here, is that we do not have to interpolate the image multiple times, which typically degrades image quality.
Accuracy | Precision | Recall | F1-score | |
---|---|---|---|---|
M.T. Esfahani et al. (2018) [ |
85% | – | 86% | – |
S. Dutta et al. (2018) [ |
For BNN it was observed 42% |
– | – | – |
X. Wang et al. (2018) [ |
AlexNet gave 37.43%, |
– | – | – |
B. Harangi et al. (2019) [ |
90.07% | – | – | – |
P. Vora et al. (2020) [ |
76% | – | – | – |
According to the research done before the proposed work, it is hard for the models to get a decent accuracy to detect Mild Diabetic Retinopathy images. In contrast, in this case, the model was able to do so. The advantage of combining resizing and augmentation into a single operation done here is that we do not have to interpolate the image multiple times, which typically degrades image quality. Furthermore, DenseNet is much more efficient in terms of parameters and computations to the same degree as ResNet and VGGNet. Therefore, the reuse of the gradient loss problem is facilitated, and the number of parameters is significantly reduced. Future work can include identifying and proposing machine learning/deep learning model frames that can reduce the existing overfitting, giving even better results on the test images. It will also include working with feature extraction like vessel segmentation, microaneurysm detection, and detection of hard and soft exudates, which will break down the problem of diabetic retinopathy, focussing on its symptoms in a very detailed manner. This will also help in the betterment of the performance of the model with improved time complexity.
Diabetic retinopathy is a critical public health issue affecting the quality of life. Patients receive specific treatment from a doctor to protect their vision to prevent the spread and progression of the disease. In the last few years, human beings have witnessed eye problems worldwide due to diabetes. Already existing DR uses manual fundus image analysis, which requires an experienced and skilled clinician to identify, detect, and analyze the presence and importance of the minor feature. It is also tedious and challenging. Here, three dense CNN models were proposed to detect DR according to disease severity and classify them into different classes. The proposed model contained various layers and parameters. Model 3 has come up with the best accuracy. The highest training accuracy achieved was 93.51%. The other two models have achieved maximum training accuracy of around 84% and 89%, respectively. The advantage of combining resizing and augmentation into a single operation done here is that we do not have to interpolate the image multiple times, which typically degrades image quality. The detection of mild and moderate DR is usually tricky, decently done by the proposed model. The data sets available for the proliferative phase are relatively small and pose a significant challenge. Therefore, training and classification are subject to the limited circumstances of the data set. The proposed model processed the processing time limit.
While there is a need to improve current accuracy, the presented results indicate a significant advance in diabetic retinopathy detection using computer vision to implement efficient software with lesser usage of resources and portable economical hardware. Thus, it can detect diabetic retinopathy without consulting an ophthalmologist in remote or medically inadequate locations and could significantly impact the relief of diabetes-based vision deterioration in the future.
The authors would like to thank the SRM Institute of Science and Technology, Department of CSE for providing an excellent atmosphere for researching on this topic.