Pneumonia is an acute lung infection that has caused many fatalities globally. Radiologists often employ chest X-rays to identify pneumonia since they are presently the most effective imaging method for this purpose. Computer-aided diagnosis of pneumonia using deep learning techniques is widely used due to its effectiveness and performance. In the proposed method, the Synthetic Minority Oversampling Technique (SMOTE) approach is used to eliminate the class imbalance in the X-ray dataset. To compensate for the paucity of accessible data, pre-trained transfer learning is used, and an ensemble Convolutional Neural Network (CNN) model is developed. The ensemble model consists of all possible combinations of the MobileNetv2, Visual Geometry Group (VGG16), and DenseNet169 models. MobileNetV2 and DenseNet169 performed well in the Single classifier model, with an accuracy of 94%, while the ensemble model (MobileNetV2+DenseNet169) achieved an accuracy of 96.9%. Using the data synchronous parallel model in Distributed Tensorflow, the training process accelerated performance by 98.6% and outperformed other conventional approaches.
The acute pulmonary infection known as pneumonia can be brought on by a virus that affects the lungs, causing inflammation and pleural effusion, which fills the lungs and makes breathing difficult. The majority of cases of pneumonia occur in impoverished and emerging nations, where there is a shortage of medical resources, excessive population, pollution, and unclean environmental conditions. Therefore, preventing the disease from turning fatal can be significantly aided by early diagnosis and care. The diagnosis of lung disorders typically involves radiological evaluation of the lungs using various imaging techniques like Computed Tomography (CT), X-ray, and Magnetic Resonance Imaging (MRI). Pneumonia is the leading cause of mortality attributed to respiratory illnesses. Medical professionals examine X-rays [
In the medical sector, machine learning methods have been reported for pneumonia, cancer diagnosis, and real-time healthcare monitoring [
The final result of the ensemble method is an aggregate of the different single-model outputs. Furthermore, the ensemble model reduces the variance for predictions and generalization error, which considerably improves computational learning and allows for the use of a minimal quantity of training examples. Typically, training these models on a standard processor takes weeks or months. Even though training neural networks has become hundreds of times faster thanks to contemporary Graphics Processing Units (GPU) and customized accelerators, training time still impacts both the accuracy of these methods’ predictions and their applicability. Many significant application areas can benefit from techniques that accelerate neural network training. By allowing professionals to train more on data and reducing the experimental iteration time, faster training can significantly increase model quality. This will enable researchers to test out new ideas and configurations more quickly. Accelerated training effectively uses neural networks when models and datasets are updated. Data parallelism is a simple and sound method for accelerated neural network training. Data parallelism refers to the distribution of training instances among several processors to compute gradient updates, followed by aggregating these locally computed updates. In this work, a data-parallel model accelerates the training process, and an extensive experimental study is conducted on the X-ray images. Pneumonia affects many people, especially children, and is most common in developing and impoverished nations, known for risk factors such as congestion, unsanitary living conditions, and poverty, in addition to the lack of adequate medical services. Most past studies focused on developing a separate network for detecting infected patients, and the application of ensemble methods with parallelism has not been explored.
Parallel and distributed implementation of medical models are the need of the hour due to the enormous size of the medical data. Medical data is growing daily, and these applications are inherently parallel since the data is stored locally at hospitals and cannot be shared due to security and ethical issues. So, the model needs to be parallelized to process the data locally and share only the results; this, in turn, makes the model faster than centralized processing, which needs to share data and is subjected to node failure. The memory footprint is almost the same when the data is processed parallelly among the nodes rather than sequentially. The model is replicated among nodes which is the only additional memory requirement, but it helps achieve faster and more accurate results. The computational complexity in the data parallel model is lesser than in the sequential model. Now healthcare applications are even developed using the 6G framework and the Internet of Things [
The significant contributions in this work are listed below:
Three fined-tuned transfer learning models were implemented for the pneumonia prediction An ensemble model using the three transfer learning models Using regularization and augmentation techniques like SMOTE to reduce overfitting and remove the minority class imbalance problem that exists in the pneumonia dataset Learning bias is reduced when anomalies and class imbalance are managed. Data parallel model to reduce the training time and accelerate the training process
Detecting pneumonia using X-ray has been an unresolved issue for a long time, with a lack of publicly available data being the most significant obstacle. Chandra et al. [
Algorithms | Details | Reference |
---|---|---|
Inception ResNet-V2 | Transfer learning, Low performance | [ |
ResNet-50 | Transfer learning, Low performance | [ |
ResNet-152+Generative adversarial network (GAN) | Transfer learning, Low performance | [ |
Data augmentation, Low accuracy | ||
CovFrameNet | Region CNN | [ |
Image Net and SqueezeNet | Transfer learning, Improved BoxNet | [ |
CNN model | Automated feature learning, High computational cost | [ |
CNN Model | High computational cost, Seven-layer custom built CNN | [ |
Mask-RCNN | Segmentation using RCNN, Ensemble model | [ |
Inception-ResNetv2, Xception Net, DenseNet-169 | Ensemble model | [ |
Custom built VGG16 | Accuracy of around 94% | [ |
DenseNet-121 | Using X-rays to predict cancer in the lungs with an accuracy of around 80% | [ |
Xception and VGG16 | The Xception layer gave an accuracy of around 82% and VGG 16 at 87% | [ |
VGG19, MobileNet, Inception and XceptionNet | Transfer learning, Small data size 1427 X-ray with an accuracy of around 96% | [ |
VGG-16, ResNet-50, Xception, MobileNet, Inception and SqueezeNet | Transfer learning, Ensemble model for paediatric pneumonia with AUC 95.21 | [ |
InceptionV3, DenseNet121, and VGG19 | Transfer learning with fuzzy logic | [ |
DenseNet169, MobileNetV2, Vision Transformer | Fine-tuned transfer learning model with a transformer with 93.9% accuracy | [ |
The proposed model has mainly four modules: Data Preprocessing and Augmentation, Transfer Learning models, Ensemble model, and Data Parallel Model, as shown in
The dataset is organized into three (training, testing, and Validation). There are 5,863 X-Ray images split into two classes (for standard and Pneumonia patients). The Guangzhou Women and Children’s Medical Center, Guangzhou, analyzed chest X-ray images from retrospective batches of children ages one to five.
Privacy regulations, the costly expense of acquiring annotations, and other issues limit the growth of medical imaging datasets. After dataset preprocessing and partition, data augmentation is used in the training process to supplement data in data-limited circumstances and minimize overfitting. In addition, our approaches used geometrical changes such as rescale, rotate, shifting, shearing, zooming, and flipping. Unfortunately, this location featured an uneven distribution of positive and negative observations, minimal data, and a significantly lower number of standard photographs than pneumonia photographs. This may result in poor post-training verification and generalization.
The prevalence of class imbalance issues in diagnosing diseases is high. A conventional classifier may favor the majority class and disregard the significance of the minority class. Therefore, this issue affects most supervised classification techniques, requiring researchers to exert much more effort to address it. The classification of outliers is a crucial aspect of deep learning. This issue arises when sample data seldom adhere to a distinct pattern. Techniques for managing outliers and unbalanced data have been presented, which may be categorized into two major categories: algorithm-based and data-level-based methods. The former seeks to adapt a learning algorithm toward the data and is considered to incur a high computational expense. The second is classifier-independent and straightforward to implement since it relies on approaches for data pre-treatment [
Synthetic Minority Oversampling Technique [
Deep Learning models are widely utilized for pneumonia diagnosis. However, owing to privacy regulations, the expensive expense of collecting annotations, and other factors, a considerable quantity of data is currently accessible in diagnostic imaging for deep learning models, even though they have shown tremendous performance in medical imaging. In light of the absence of medical datasets, transfer learning is used. Transfer learning is a deep learning approach in which a previously trained model from ImageNet is reused and transferred to a newly trained model. VGG16, Mobilenetv2, and Densenet169 are the transfer learning models used in this research. After analyzing the problem and dataset, the models are selected because the network structure significantly impacts the model’s performance [
With 3 × 3 convolutional kernels with 2 × 2 pooling layers, a VGGNet design may be considered an extended AlexNet. The network infrastructure can be further developed to improve deep features by utilizing a smaller convolution layer. VGGNet-16 and VGGNet-19 are now the two most popular VGGNet versions. The layers of the VGG model are given in
VGG16 parameters | Mobile Net parameters | DenseNet 169 parameters | |||
---|---|---|---|---|---|
Type | Filter | Type | Filter | Type | Filter |
Convolution layer | 64 × 3 × 3 | Convolution layer | 3 × 3 × 3 × 32 | Convolution layer | 7 × 7 |
Convolution layer | 64 × 3 × 3 | Depth wise convolution | 3 × 3 × 32 dw | Pooling layer | max pooling 3 × 3 |
MaxPooling 2D | 64 × 3 × 3 | Convolution layer | 1 × 1 × 32 × 64 | Dense layer | 1 × 1 |
3 × 3 | |||||
Convolution layer | 128 × 3 × 3 | Depthwise convolution | 3 × 3 × 64 dw | Transition layer | 1 × 1 average pooling 2 × 2 |
Convolution layer | 128 × 3 × 3 | Convolution layer | 1 × 1 × 128 × 64 | Dense layer | 1 × 1 |
3 × 3 | |||||
MaxPooling 2D | 128 × 3 × 3 | Depthwise convolution | 3 × 3 × 128 dw | Transition layer | 1 × 1 average pooling 2 × 2 |
Convolution layer | 256 × 3 × 3 | Convolution layer | 1 × 1 × 128 × 128 | Dense block | 1 × 1 |
3 × 3 | |||||
Convolution layer | 256 × 3 × 3 | Depthwise convolution | 3 × 3 × 128 dw | Transition layer | 1 × 1 average pooling 2 × 2 |
MaxPooling 2D | 256 × 3 × 3 | Convolution layer | 1 × 1 × 128 × 256 | Dense block | 1 × 1 |
3 × 3 | |||||
Convolution layer | 512 × 3 × 3 | Depthwise convolution | 3 × 3 × 256 dw | Classification layer | average pooling 7 × 7 softmax |
Convolution layer | 512 × 3 × 3 | Convolution layer | 1 × 1 × 256 × 256 | - | - |
Convolution layer | 512 × 3 × 3 | Depthwise convolution | 3 × 3 × 256 dw | - | - |
MaxPooling 2D | 512 × 3 × 3 | Convolution layer | 1 × 1 × 256 × 512 | - | - |
Convolution layer | 512 × 3 × 3 | 5 × Depthwise convolution | 3 × 3 × 512 dw | - | - |
Convolution layer | 512 × 3 × 3 | 5 × C convolution layer | 1 × 1 × 512 × 512 | - | - |
Convolution layer | 512 × 3 × 3 | Depthwise convolution | 3 × 3 × 512 dw | - | - |
MaxPooling 2D | 512 × 3 × 3 | Convolution layer | 1 × 1 × 512 × 1024 | - | - |
Fully connected (FC) | Depthwise convolution | 3 × 3 × 1024 dw | - | - | |
FC layer | Convolution layer | 1 × 1 × 1024 × 1024 | - | - | |
FC Layer | Average pooling | Pool 7 × 7 | - | - | |
Flatten-SoftMax | Classifier | Fully connected layer | 1024 × 1000 | - | - |
- | - | Soft max | Classifier | - | - |
MobileNet V2 is an upgraded form of MobileNet V1, CNN with 54 layers and a 224-by-224-pixel input image size. This approach does many convolutions using a single kernel instead of a two-dimensional convolution. Instead, it uses two 1-dimensional convolutions using two depth-separable kernels. Consequently, less memory and parameters are necessary for training, resulting in a small and effective model. The layers, along with the filter details of the Mobilenet model, are given in
Huang et al. [
The neural network model is distributed and trained in several High-Performance Computing (HPC) devices in the data parallelism. Training data are dispersed among the equipment to run synchronously or asynchronously. All-reduce is a method that reduces the target arrays throughout all machines to a tensor and returns the tensor to every device. Forward and backward propagation of the neural network are the two critical processes in the stochastic gradient learning of CNNs. An error about the desired results is computed after the forward pass calculates the outputs for a data set. In the so-called backward phase, this mistake or loss is then discriminated against about each parameter within CNN. The weights inside the network are then updated using the obtained gradients. These procedures are iteratively repeated until convergence, that is, until a local minimum inside the error function is reached. There are two modes of data parallelism. First, the model is replicated on each worker node and is used to process distinct data batches. Second, the parameter server nodes store and update the model parameters. Essentially, the worker will take the model’s parameters, run it on a batch of data, and transmit the gradients over to the Parameter Server (PS), where the model will be changed to improve it. However, multiple model update policies might be selected at the PS to carry out the training. The Parameter Server waits until all agent nodes have determined the gradients concerning their respective data sets in this scenario. The Parameter Server applies the gradients to the current weight after receiving them and then updates the model before sending it back to all worker nodes. This approach may result in variable connection speeds whenever other users share the cluster because updates are not made until all worker nodes have completed the calculation. It is only as fast as the slowest node. However, as more precise gradient estimates are produced, convergence occurs more quickly.
After using preprocessing, data segmentation, and data augmentation techniques, the quantity of our training data is increased, and it is ready to be given to the suggested method for extracting features to acquire suitable and relevant characteristics. The features acquired from each proposed system are merged to build the final, fully linked layer, which is then utilized to categorize each image into its corresponding class. In addition, each model within the ensemble is independently trained to address the specific issue. The ensemble model’s ultimate output is the mean or fusion of the various separate model outputs. In addition, ensemble models reduce the variance of prediction and generalization errors, considerably improve computational training, and may be implemented with a small amount of training data. This work developed three well-known CNN classification methods for pneumonia in pulmonary images using the literature and the suggested ensemble technique. Utilizing early termination prevents processes from getting overfitted. The model is built in Python using Tensorflow. Even though Tensorflow currently supports distributed repeated training, which has solved the problem of long training periods and may reduce training times, there is still room for development. The proposed model parameters are shown in
Input data size | 224 × 224 | |
Model batch size | 32 | |
Model learning rate | 0.0001 | |
Epochs | 20 | |
Optimization | Adam | β1 = 0.9, β2 = 0.999 |
Training dataset | 5216 | |
Testing dataset | 624 | |
No of classes | 2 | |
Steps per epoch | Training sample size/batch size = 163 | |
VGG16 | Total parameters | 14,882,242 |
Trainable parameters | 166,018 | |
Non-trainable parameters | 14,716,224 | |
Mobile Net | Total parameters | 2,625,218 |
Trainable parameters | 364,162 | |
Non-trainable parameters | 2,261,056 | |
Dense Net | Total parameters | 13,109,954 |
Trainable parameters | 463,234 | |
Non-trainable parameters | 12,646,720 |
The dataset has data and associated labels based on which the model accuracy and other evaluation parameters are calculated. Based on the label associated with data, the output can be grouped into four classes: True positives (correctly predicted output for positive labels), True Negative (Correctly predicted output for negative labels), False positive (False output for positive values, when the label is negative but got predicted as positive) and False Negative (False output for negative values, the label is positive but got predicted as unfavorable). These four factors provide the basis for most categorization task evaluation measures. Classification techniques are evaluated using the accuracy statistic. It is determined by dividing the number of correct predictions for both positive and negative labels by the total number of predictions. Precision is determined by dividing the number of actual positives by the total of real and false positives. When the impact of false negatives is considerable, it is reasonable to apply the recall statistic. The recall is determined by dividing the number of genuine positives by the total number of true positives and false negatives. The training time and speed up are also measured for model analysis.
Cohen’s Kappa score is a compelling performance statistic for datasets with imbalances. Various ranges of Kappa scores are used to examine the consistency with the acquired findings. If the score is lower than zero, it indicates a lack of data consistency. For example, if the score range is between 0.01 and 0.20, it indicates that there is just a modest degree of agreement, between 0.21 and 0.40, reasonable agreement, between 0.41 and 0.60, moderate agreement, and between 0.61 and 0.80, strong agreement. The range of Kappa scores between 0.81 and 1.00 represents an almost perfect degree of agreement. Precision and accuracy have also been included as a measure for the model’s evaluation. The dataset used for training has 5863 pictures. As most examples correspond to the negative samples, the models incorrectly forecasted most cases as belonging to the negative samples, resulting in much worse accuracy, recall, and F1 score values for positive class prediction. The model performance before SMOTE is given in
CNN model | Precision | Recall | F1 score | Accuracy |
---|---|---|---|---|
VGG16 | 0.85 | 0.88 | 0.86 | 0.85 |
Mobile Net | 0.90 | 0.89 | 0.89 | 0.90 |
DenseNet | 0.91 | 0.92 | 0.90 | 0.92 |
The performance measures of the three base transfer learning models–Densenet 169, ResNet 50, and MobileNet after SMOTE are given in
CNN model | Precision | Recall | F1 score | Kappa score | Accuracy |
---|---|---|---|---|---|
VGG16 | 0.89 | 0.95 | 0.914 | 0.78 | 0.89 |
Mobile Net | 0.92 | 0.92 | 0.92 | 0.82 | 0.9371 |
DenseNet | 0.92 | 0.94 | 0.929 | 0.83 | 0.9452 |
In
In
The ensemble model’s output is preferable to the base learning models because it serves to include the discriminatory features of all of its component models. A practical approach for classifier merging is weighted average assembly. In this section, the ensemble learning models are studied to obtain improved performance, and the accuracy is measured. In
Model | Precision | Recall | F1 score | Accuracy |
---|---|---|---|---|
VGG16+Mobile Net | 0.92 | 0.91 | 0.915 | 0.901 |
Mobile Net+Dense Net | 0.92 | 0.95 | 0.935 | 0.969 |
Dense Net+VGG16 | 0.90 | 0.94 | 0.92 | 0.956 |
VGG16+Mobile Net+Dense Net | 0.91 | 0.93 | 0.92 | 0.944 |
There are two types of bandwidths in a data pipeline: data load bandwidth and model training bandwidth. The actual model training bandwidth is also constrained by the restricted on-device storage of the GPUs or other accelerators. From the flow viewpoint, it is false to say that the bigger input data size causes a longer time for training in single-node. From a system standpoint, the problem is the mismatch between data loading and model train bandwidth. The issue of the bandwidth mismatch between the data load bandwidth and the model building bandwidth is the fundamental cause of the lengthy single-node model training phase. We can boost the model-building bandwidth proportional to the number of processors used in the same training run by using data parallelism. The main steps of the data parallel model are shown in
The training pipeline in a data-parallel model mainly consists of six steps:
Data collection, Preprocessing of the image, and Augmentation Data partition based on the number of devices/hardware accelerators available without bias Loading data into the accelerators Building identical models on the accelerators and training them Model synchronization after the gradient calculation among all nodes/devices Update the model with the updated parameter in each node/device Repeat steps 4–6 till the end of an epoch
Model | Training time with data parallel strategy | Training time w/o data parallel strategy |
---|---|---|
VGG16 | 2770.66 s | 3075.026 s |
Mobile Net V2 | 2593.55 s | 2740.39 s |
Dense Net169 | 2453.4 s | 2737.0556 s |
The proposed model consists of a parallel ensemble model with accelerated training time and improved performance of 98.6%. The model is compared with the existing works, broadly classified as works on sequential ensemble techniques for pneumonia detection, parallel techniques for pneumonia detection, and parallel ensemble models. From
Model | Accuracy | Remarks | Reference |
---|---|---|---|
Pattern recognition using CUDA | NA | Uses a GPU environment. No ensemble was used, but parallelism was employed for pattern recognition. Speed up 12.75 | [ |
VGG16 based | 96.81% | No ensemble was used but parallelism employed | [ |
VGG19 based | 96.58% | No ensemble was used but parallelism employed | [ |
NasNet Mobile based | 83.37% | No ensemble was used but parallelism employed | [ |
ResNet152V2 based | 96.35% | No ensemble was used but parallelism employed | [ |
Inception, ResNetV2 | 94.87% | No ensemble was used but parallelism employed | [ |
MobileNetV2, DenseNet169, VGG16 | 98.6% | Ensemble model with data parallelism | Proposed model |
Model | Accuracy | Remarks | Reference |
---|---|---|---|
DenseNet169 | 93.91% | Ensemble transfer learning models | [ |
MobileNetV2 | No parallelism, executed in a | ||
Vision transformer | sequential manner | ||
VGG, ResNet Inception-, | 90.71% | Pediatric pneumonia dataset | [ |
SqueezeNet, Mobile Net | Ensemble model, no parallelism | ||
Executed in a sequential manner | |||
GoogLeNet, ResNet, DenseNet | 98.81% | Ensemble model, no parallelism | [ |
Executed in a sequential manner | |||
VGG19 with different ML classifiers | 97.94% | Ensemble model with ML classifiers, no parallelism, executed in a sequential manner | [ |
MobileNetV2, DenseNet169, VGG16 | 98.6% | Ensemble model with data parallelism | Proposed model |
Model | Accuracy | Remarks | Reference |
---|---|---|---|
3D inception | 98.21% | The soft-Voting mechanism, ensemble classifier, GPU parallelism | [ |
AI systems | 92.49% | Smaller dataset, CT scan images used | [ |
ECOVNet | 96.59% | Efficient Net with ensemble classifier, parallel model | [ |
MobileNetV2, DenseNet169, VGG16 | 98.6% | Ensemble model with data parallelism | Proposed model |
Early detection of pneumonia has the potential to save more lives than later detection. Detecting pneumonia from X-Rays is difficult; thus, assistive tools and approaches may aid in diagnosing this illness. Medical data analysis demands a high degree of precision, and as a result, a great deal of research is being conducted to create novel diagnostic procedures. The proposed technique is based on a parallel ensemble model with 98.4%. The results demonstrate the efficacy of features derived from VGG16, MobileNetV2, and DenseNet169 in conjunction with a data-parallel model for successfully diagnosing pneumonia in a balanced enhanced dataset. Even though the strategy yielded promising outcomes, it must be validated on more datasets to ensure its robustness. After demonstrating its efficiency with larger real-world data sets, the approach may be used as a medical aid. The proposed model uses three transfer learning algorithms ensembled together, and it is a time-consuming task, but the distributed parallel framework effectively balances it. The main limitation of the proposed model is that the memory requirement is slightly higher since the model is replicated in each parallel node. High-performance computing systems will make the model faster, but it is costly. Even then, the GPU system’s performance outweighs its cost. In the future, the model will be scaled to HPC systems in Cloud, and a general framework needs to be created for deployment in real-time setup.
The authors received no specific funding for this study.
The authors declare that they have no conflicts of interest to report regarding the present study.