Deep learning created a sharp rise in the development of autonomous image recognition systems, especially in the case of the medical field. Among lung problems, tuberculosis, caused by a bacterium called
According to the World Health Organization Global tuberculosis report 2021 [
One type of imaging commonly used in the diagnosis of pulmonary tuberculosis is a chest radiograph. On chest X-ray images, lesions will have different distortions depending on the type of tuberculosis or the advanced stage of the disease. Chest X-ray data is rather sensitive, so it can be widely used to screen for pulmonary tuberculosis.
In Vietnam, the government has been some attempts to prevent and lead to removing tuberculosis. One of them is the Vietnam National Tuberculosis Control Program. In 2020, the United States Agency for International Development and the Vietnam National Tuberculosis Program [
There have been several research findings on tuberculosis diagnosis using traditional machine learning models, as described in
Year | Authors | Method or technique | Dataset type | Source of data |
---|---|---|---|---|
2011 | Elveren et al. | Traditional multilayer neural networks, genetic algorithms | Patients’ epicrisis reports | Diyarbakir Chest Disease Hospital |
2011 | Dongardive et al. | Identification tree model | Five medical examinations | T.B Hospital Group, Mumbai |
2017 | Amani et al. | Support vector machine | 38 attributes from patient discharge report | A local hospital in the city of Diyarbaki, Turkey. |
2017 | Hooda et al. | Custom CNN | X-ray images | Montgomery and Shenzhen datasets |
2017 | Lopes et al. | Pre-trained CNN as the feature extractor | X-ray images | Montgomery and Shenzhen datasets |
2018 | Evalgelista et al. | 9 different CNNs | X-ray images | JSRT [ |
2018 | Ojasvi et al. | Resnet50 | X-ray images | NIH, Montgomery, and Shenzhen datasets |
2019 | Pasa et al. | Custom simple CNN | X-ray images | Montgomery, Shenzhen, and combined dataset |
2019 | Mostofa et al. | A generalized model from VGG16 | X-ray images | Montgomery and Shenzhen datasets |
2019 | Hernández et al. | Ensemble approach with Resnet50, VGG19, and InceptionV3 | X-ray images | Montgomery and Shenzhen datasets |
2019 | Meraj et al. | GoogLeNet, VGG-16, VGG-19, and Resnet50 | X-ray images | Montgomery and Shenzhen datasets |
2022 | Marios Zachariou et al. | ResNet, DenseNet, and SqueezeNet | Fluorescence microscopy images | A clinical in Mbeya, Tanzania |
There were several studies using deep learning methods for monitoring tuberculosis, as described in
Most deep learning methods require a large enough dataset, and this is clearly not a requirement that can be easily solved. In the context of our research, we need to find deep learning models that can classify X-ray images of Vietnamese people effectively to diagnose pulmonary tuberculosis. Medical data is often restricted by the privacy and security of hospitals, and even with access to hospital data, it is difficult to obtain a large enough amount of well-annotated data. To solve these problems, the paper chooses a transfer learning approach, which has made significant breakthroughs in many deep learning applications, especially the problem of diagnosing diseases from medical images [
Our study is put in the Vietnamese patient data context. This research provides a state-of-the-art solution to diagnose pulmonary tuberculosis in the Vietnamese X-Ray imaging dataset, which is collected from a Vietnamese local hospital with the help of VRPACS [ We prepare a ready-to-use Vietnamese X-ray image dataset. We design different transfer learning strategies for the training models. We analyze the effect of each strategy on the Vietnamese imaging dataset to clarify the best solution.
The rest of the paper is discussed as follows: Section 2 provides the model background for the proposed system. Section 3 presents the methodology. Section 4 discusses the result and evaluation of the proposed study. Lastly, Section 5 concludes this study.
The proposed study requires some CNN models. In our experiments, we chose Alexnet, Resnet, and Densenet, which have been popular architectures and applied successfully in many studies.
In 2012, Alex et al proposed their network, which was the famous CNN architecture that completed the Image Net Large Scale Visual Recognition Challenge 2012 [
Alexnet was applied to the medical images in the research of Nawaz et al. [
In 2016, Residual Network [
Resnet is widely used in image classification problems as well as in medical problems. Some studies on X-ray images are [
DenseNet is a convolutional neural network that links densely between layers which was proposed by Gao Huang et al. in 2017 [
Densenet was one of the deep learning architectures which applied successfully on X-ray images with some studies such as [
Our work focuses on Vietnamese patients, so we need to prepare a Vietnamese X-ray images dataset for training and testing tasks. We named it VRTB dataset. In the study, all Vietnamese X-ray images were collected in a Vietnamese local hospital by using VRPACS software [
In detail, the patient needs to wear simple clothes with no metal equipment. After that, he stands against the plate and holds his arms up or to the sides. Before delivering to us, all private information has been removed. The DICOM (Digital Imaging and Communications in Medicine) images have only pixel data. For image resolution, the size of each dimension is in the range [2320,3072].
Images were annotated with tuberculosis state before they could be used for research. Although there are some medical signs of tuberculosis in the chest X-ray image, such as nodule, pleural effusion, infiltration, etc., we need only a binary state for each image. Therefore, one image would be marked as the “tuberculosis” state or the “normal” state.
With the help of the VRPACS team, we prepared the VRTB dataset with 2000 well-annotated chest X-ray images. Among the dataset, there are 1000 images having the “tuberculosis” state, and 1000 images having the “normal” state, which are described in
Dataset | Tuberculosis | Normal |
---|---|---|
VRTB | 1000 | 1000 |
KaggleTB | 700 | 3500 |
KaggleTB is a dataset published on the Kaggle platform by Rahman et al. [
In the study, we only used 4200 images which can be downloaded directly from the Kaggle platform. This data was used for training task at every strategy. By using this dataset, we have tuberculosis data from international patients. In
Based on the VRTB dataset and Kaggle dataset, we set up VRTBTest and VRTBCombineTrain datasets for experiments, which are described in
Dataset | Tuberculosis | Normal |
---|---|---|
VRTBTest | 500 | 500 |
VRTBCombineTrain | 1200 | 4000 |
Our study aims to evaluate the effectiveness of different transfer learning strategies for Vietnamese patient data. To do that, on the one hand, all models need to be verified on a Vietnamese X-ray image testing set, VRTBTest. In the paper, we created a 1000 Vietnamese X-ray images testing set balancing tuberculosis samples and normal samples. Therefore, each label had 500 images in VRTBTest.
On the other hand, we also designed VRTBCombineTrain, a training set containing both international data and Vietnamese data. In
VRTBCombineTrain | Tuberculosis | Normal |
---|---|---|
International samples | 700 | 3500 |
Vietnamese samples | 500 | 500 |
In the study, we constructed experiment strategies into 4 cases: direct training on KaggleTB dataset, transfer learning from ImageNet and training on KaggleTB dataset, direct training from VRTBCombineTrain dataset, and transfer learning from ImageNet and training on VRTBCombineTrain dataset. After that, the evaluation scores of the experiments would point to the value of transfer learning corresponding to different datasets. All datasets for learning are imbalanced. Numbers of Normal and Tuberculosis labels in KaggleTB are 3500 and 700 respectively, and there figures in VRTBCombineTrain are 4000 and 1200. In general, the imbalance is the factor impacting the performance of models and so does in the paper’s experimental context in particular.
In practice, from VRTB framework in
Firstly, we tried to train a model directly on KaggleTB dataset without transfer learning as described in
Secondly, we would learn a model which had weighted from ImageNet as described in
Next, a model was trained directly on the VRTBCombineTrain dataset, with no transfer learning as described in
Lastly, this is the experiment using transfer learning and have combined training set as described in
Researching the diagnostic quality of one model is estimating the agreement between a predicted value and reference standard. In our study, we have a patient series that had well-annotated in their disease states. These are the references for comparing with results from our proposed model. In general, a patient case will be assigned to "normal" or "tuberculosis".
In medical studies, doctors highly recommend using sensitivity and specificity scores. Sensitivity aims to present the rate of people who are correctly diagnosed as sick among the whole people really having the target condition. And in contrast, specificity refers to the proportion of people who are identified as healthy among the whole people really not having the target condition. In our context, the target condition is tuberculosis.
These scores are produced by combining predicted values and references. In detail, we have a 2 × 2 table, also called a confusion matrix. In this matrix, there are four scalars presented: True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN).
From the confusion matrix, there are two scalars calculated: True Positive Rate (TPR) and False Positive Rate (FPR).
From the confusion matrix, there is also a simple score that is used widely and can be calculated. This is the accuracy.
Another score used in the proposed study is AUC. It stands for the area under receiver operating characteristic (ROC) curve, the graph presenting the relation between TPR and FPR over a different threshold set. AUC is a score measuring the overall quality of one binary classifier. This scalar is 0.5 corresponding to a random classifier, and the other interesting value is the maximum value. It equals 1 and presents the performance of the perfect classifier. Illustrations of the confusion matrix and AUC are described in
This section discusses the results obtained from our experiments. All datasets for learning, KaggleTB and VRTBCombineTrain, would be split by 80:20 into a training set and a validation set as material for the training phase. The optimization process would be stopped based on the score measured from the validation set. In all experiments, we used the accuracy score as an indicator for stopping the training process. Then, the trained model would be evaluated on the VRTBTest dataset. At that time, we could discuss the effectiveness of each strategy.
To practice experiments, we use an Ubuntu deep learning server. The configuration of this machine is NVIDIA GeForce RTX 3080 with 10 GB memory, 64 GB RAM, and 1TB Hard Disk memory. To implement deep learning functions, our program was built on top of Pytorch library.
For each strategy, we tried to train and test 3 models: Resnet34, Alexnet, and Densenet121. In general, we calculated the average of each metrics from 3 models as in
Strategy | Average AUC | Average sensitivity | Average specificity | Average accuracy |
---|---|---|---|---|
1 | 0.4042 | 0.0335 | 0.9805 | 0.507 |
2 | 0.3456 | 0.0005 | 0.9955 | 0.498 |
3 | 0.8848 | 0.679 | 0.9315 | 0.8053 |
4 | 0.9929 | 0.8745 | 0.992 | 0.9333 |
As presented in section 3.2, experiment strategies were designed with different options of using the transfer learning approach and training dataset. In detail, the quality of strategy 4 should be the highest because of using transfer learning with the combined training set. The output metrics pointed out our hypothesis was correct with almost strategy 4’s metric averages over 0.9.
We can see in the table that the best result belongs to strategy 4, and it’s consistent with the initial opinion presented in section 3.2.4. Results in strategies 1 and 2 are quite similar, and both of them are not good. Their average accuracies are approximate 0.5, and the average sensitivities are near 0. To discuss more detail, we focus on true positive, true negative, false positive, and false negative scores as below.
In
VRTBCombineTrain, the training set was used in both strategies 3 and 4, is also imbalanced though the imbalanced rate is smaller than KaggleTB. The label rate in it is 4000:1200, approximately 3.333. Another important factor is VRTBCombineTrain including Vietnamese X-ray images, and it makes the distributions of the training set and the testing set become more similar. In two pie charts, what stands out is that the majority of output results are true positive and true negative. It points out that almost the VRTBTest dataset was classified correctly. So, we can see obviously the role of Vietnamese data appearance in the training dataset.
In more detail, we analyze the effectiveness of transfer learning by comparing the average of evaluating metrics between strategies 3 and 4 as described in
Lastly, we discuss how different the performance of models in strategy 4. As presented in
In this paper, we focused on the influence of the transfer learning approach and the role of Vietnamese training data in testing Vietnamese X-ray images for tuberculosis classification. In this study, we used data collected in the Vietnamese hospital combined with the international data, the Kaggle dataset All study’s results gave information about how we should design experiments and build the tuberculosis classification application with Vietnamese X-ray images. Although we achieved good testing AUC, accuracy, sensitivity, and specificity, there is a lot of work needed to do to make an out-of-box product for tuberculosis diagnosis in the healthcare industry. This study also aims to create the base to create autonomous diagnosis systems for the data scientist, radiologist, and healthcare staff. In this study, we saw the importance of Vietnamese data in the training phase, so collecting more and more data would be our vital task. By applying transfer learning, some popular deep learning models also prove the ability with high evaluating scores. So, transfer learning is a suitable approach for developing a diagnosis system by requiring less training time and providing highly accurate results in small medical datasets.