With the increasing demand for doctors in chest related diseases, there is a 15% performance gap every five years. If this gap is not filled with effective chest disease detection automation, the healthcare industry may face unfavorable consequences. There are only several studies that targeted X-ray images of cardiothoracic diseases. Most of the studies only targeted a single disease, which is inadequate. Although some related studies have provided an identification framework for all classes, the results are not encouraging due to a lack of data and imbalanced data issues. This research provides a significant contribution to Generative Adversarial Network (GAN) based synthetic data and four different types of deep learning-based models that provided comparable results. The models include a ResNet-152 model with image augmentation with an accuracy of 67%, a ResNet-152 model without image augmentation with an accuracy of 62%, transfer learning with Inception-V3 with an accuracy of 68%, and finally ResNet-152 model with image augmentation but targeted only six classes with an accuracy of 83%.
Cardiothoracic diseases are serious health problems that may lead to disorders Affecting the organs and tissues [
A neural network is a mathematical model of neurons, also defined as a network capable of approximating arbitrary functions mathematically based on the universal approximation theorem [
In this research, we adopted Convolutional Neural Network (CNN) as a class of deep neural networks to propose a generative adversarial network (GAN)-based model to generate synthetic data for training the data as the amount of the data is limited. We will use pre-trained models, which are models that were trained on a large benchmark dataset to solve a problem similar to the one we want to solve. For example, the ResNet-152 model we used was initially trained on the ImageNet dataset.
Other researches in the field of the cardiothoracic disease include: Ganesan et al. [
References | Method | Findings | Gaps identified |
---|---|---|---|
[ |
Used a Keras framework to classify and predict lung diseases in CXRs by training 40000 CXRs obtained from NIH dataset using a depth wise separable convolution. | The proposed model has a training accuracy of 86.14% and a validation accuracy of 85.62%. | The accuracy of the model gets better as the number of epochs for training was increased. |
[ |
Used pixel-wise annotated DRRs data to learn an unsupervised multi-organ segmentation model on X-ray images. A deep image-to-image network for multi-organ segmentation on the labeled DRRs data was trained. | The proposed model framework takes synthetic labeled DRR images as input and can produce meaningful segmentation results on real X-ray images without any ground truth annotations. | Nodule annotations directly on 2D X-ray images are quite challenging and time-consuming due to the projective nature of the X-ray imaging. |
[ |
A binary classifier for the detection of pneumonia from images of frontal-viewchest X-rays. | The CheXNet model proposed achieve an f1 score of 0.435 (95% CI 0.387, 0.481). | When comparing the CheXNet model and the radiologist’s diagnosis, there is an F1 difference of 0.051 (95% CI 0.005, 0.084). Since this does not contain 0, CheXNet is therefore higher. |
[ |
A DCGAN tailored model designed for training with X-ray images where a generator is trained to generate artificial CXRs. | The proposed model has an approximate accuracy of 70.87% for DS1, 58.90% for DS2, and 92.10% for DS3. | The model obtained its best accuracy when trained on an augmented dataset with DCGAN synthesized CXRs to balance the imbalanced real dataset (D3). |
[ |
Models were trained to take input as a single-view chest radiograph and output the probability of each of 14 observations. Several models were trained to find the one with the best accuracy. | DenseNet121 produced the best accuracy and was used for the research. | The models are limited to the CheXpert database and liable to over-fitting. |
[ |
Five benchmarked classifiers namely: Multilayer Perceptron (MLP), Random Forest, Sequential Minimal Optimization (SMO), Classification via Regression, and Logistic Regression classifiers are used for pneumonia detection. | Of all the classifiers, the logistic regression has the highest accuracy of 95.631%. | The model is limited to only analysis on non-rigid deformable registration driven automatically lung regions and segmented ROI confined feature extraction. |
Previous works used state-of-the-art techniques and got significant results with one or two cardiothoracic diseases but could lead to misclassification. In our work, we adopted GANs to synthesize the chest radiograph (CXR) to augment the training set on multiple cardiothoracic diseases to efficiently diagnose the chest diseases in different classes, as shown in
The rest of the manuscript is organized as follows: In Section 2, we have reported a dataset and its analytics. Section 3 elaborates on the proposed solution’s overall methodology and shows all deep learning models with different hyper parameters and a comparison of the results. Section 4 discusses the outcomes in the results and analysis section. Finally, the conclusion is drawn in Section 5 with some discussion regarding limitations and future works.
Training of deep learning models requires big data and requires much computational power–-practical training and validation. As more data is gathered for a deep learning model, it becomes more effective and more accurate as the model is prone to over-fitting with insufficient data. This research used a state-of-the-art dataset from [
Thoracic diseases | No. of cases |
---|---|
Infiltration | 16,421 |
Effusion | 12,921 |
Atelectasis | 11,610 |
Nodule | 6,971 |
Mass | 6,046 |
Pneumothorax | 5,793 |
The model also requires a class for X-rays with no thoracic disease; hence we collected 49,186 images for that. Although the images of remaining classes are not enough for proper training to successfully create it without risk of over-fitting, we were able to resolve the problem by obtaining more images from the dataset of the Kaggle challenge [
We used python’s matplotlib library to graphically represent the distribution of all the various cardiothoracic diseases and those without the cardiothoracic disease. Furthermore, the familiar X-ray images (without cardiothoracic disease) used in our study have the highest frequency, higher than the combined cardiothoracic diseases. When comparing the cardiothoracic disease frequencies, the infiltration data has the highest frequency, while consolidation has the lowest frequency, as seen in
In
The dataset gathered for this research is highly imbalanced as the number of individuals without cardiothoracic disease (49,186) is significantly higher than others. This is an issue as it makes training of the model very difficult to avoid the over-fitting problem. To remedy this problem, a GAN is used as its primary purpose is to learn regularities and make a small dataset useable for training. After the GAN was applied, we went further to seek multiple physicians’ opinions in the cardiothoracic. Field to ensure that there are no errors in the dataset.
Next, we clustered each of our classes of cardiothoracic diseases into separate folders and further divided each folder into training and validation sets. These sets contain the label of each class as it would be used in creating the models. These training and validation sets could not be used the way they are as they have to be reshaped and normalized. Therefore, we reshaped them into (150, 150, 3) and normalized them. The images needed to be rotated in all angles because some images might have been uploaded the wrong way. This was done by augmenting the images. The labels were then encoded using one-hot coding, and the array was shaped into (128, 128, 3). These were saved as pickle files to be used later in the model creation.
As the neural network’s depth increases, the training process becomes more tedious, and the convergence time increases significantly. At the instant, when a deep neural network starts to converge, they are exposed to degradation issues [
He et al. [
The residual block serves two purposes. First, when the input and output dimensions are equal, the identity short, i.e., x assists in the computation of the output as presented in
On the contrary, when the dimensions change, the short will perform the identity mapping from the input with zero-padding to increase the dimensions. The dimension shortcut assists in computing the dimension using 1 Ö 1 convolution operation represented as
This architecture makes ResNet very efficient in terms of accuracy, and the depth of the network can be considerably increased. He et al. [
Different hyper parameters were used, and the various results were compared to improve the proposed model, as shown in
Models | Augmentation | Accuracy | Total trainable parameters | Max Pooling: #layers | Batch: #layers | Optimizer |
---|---|---|---|---|---|---|
ResNet-152 | Yes | 0.67 | 3,479,631 | 4 | 10 | Adam |
ResNet-152 | No | 0.62 | 3,657,221 | 3 | 10 | Adam |
Transfer learning (Inception-V3) | – | 0.68 | 38,551,567 | False | – | RMSprop |
ResNet-152 with 6 targeted classes | Yes | 38,543,367 | False | 100 | RMSprop |
In the first model, we built four layers of convolutions, each followed by a Maxpooling layer. The model has one flattened layer followed by one dropout layer, then finally, two fully connected layers at each end. An Adam optimizer is used. As for the activation function, a leaky ReLU function is used. All these are used to train a multi-class model with 14 classes for our various cardiothoracic diseases with a validation accuracy of 67%.
The next model uses the same model (ResNet) but without image augmentation. It is clear from
The third model is a transfer learning method utilizing Inception-V3. This transfer learning involves reusing an already developed model for a task as the starting point for a second task model. For this research, we used a pre-trained inception-V3 model, which has 1024 fully connected layers and the ReLU activation function without batch normalization layers as we used more parameters. To remedy the over-fitting issue, a dropout of 0.2 is used, and SoftMax is also used to output 15 classes. The validation accuracy is around 68% in predicting cardiothoracic diseases in X-rays using a multi-class classification of 14 target labels. In general, the validation accuracy of the model is good compared to model 2. However, it is not a significant improvement compared to the original model 1.
The last optimized model targeted six classes of the original 14 classes. A replica of the first model which used image augmentation in the convolutional neural network is used, but the number of target classes is reduced to 6 from the original dataset. This model has all the parameters of the first model for the image augmentation which are as follows: rotation
Four different deep learning models were developed for the automatic detection of various cardiothoracic diseases using X-ray images of the chest. After completion of training and validation of the models, we recognize the following:
Model 4 is a replica of model 1 but with a reduced target class to 6, using all the label samples greater than 100 with a rotation range of 40 and a shift of 0.2. Thus, means model 4 is significantly affected by clustering approach so that the accuracy increased from 0.6721 to 0.83. The ResNet-152 without image augmentation has a training accuracy of 99% and a validation accuracy of 62%. This model seems to over fit the training data. More training data with balanced classes will significantly increase model accuracy. Image augmentation increases the accuracy of the model in predicting cardiothoracic diseases from an X-ray. More training data for other rare cases will increase the model performance. More training data will increase the model performance in predicting cardiothoracic disease from an X-ray. Using a pre-trained model can speed up training and increase model accuracy.
This research employs the advantages of computer vision and medical image analysis to develop an automated model with the clinical potential for early detection of the disease. Using deep learning models, the research aims to evaluate the effectiveness and accuracy of different convolutional neural network models in the automatic diagnosis of cardiothoracic diseases from X-ray images compared to diagnosis by experts in the medical community.
After successfully training and validating the models we developed, ResNet-152 with image augmentation proved to be the best model for the automatic detection of cardiothoracic disease. However, one of the main problems associated with radiographic in-depth learning projects and research is the scarcity and unavailability of enough datasets, a critical component of all deep learning models as they require many data for training. This is why some of our models had image augmentation to increase the number of images without duplication. As more data are collected in chest radiology, the models could be retrained to improve the accuracy of the models as deep learning models improve with more data. The future of Artificial Intelligence in terms of cardiothoracic diseases are unlimited. As more data becomes available, training and testing of different deep learning models will be possible. Multi-classification techniques are required with huge datasets for effective, efficient, and accurate detection of various cardiothoracic diseases. Hence, we are aimed to bring transfer learning in the pre-trained model to use the learning procedures of top deep learning models. We also intended to carry out some optimization algorithms for the automation of hyper parameters in the deep learning models.
We would like to thank the Deanship of Scientific Research, Qassim University for funding the publication of this project.