iconOpen Access



The IOMT-Based Risk-Free Approach to Lung Disorders Detection from Exhaled Breath Examination

Mohsin Ghani, Ghulam Gilanie*

Department of Artificial Intelligence, Faculty of Computing, The Islamia University of Bahawalpur, Bahawalpur, 63100, Pakistan

* Corresponding Author: Ghulam Gilanie. Email: email

Intelligent Automation & Soft Computing 2023, 36(3), 2835-2847. https://doi.org/10.32604/iasc.2023.034857


The lungs are the main fundamental part of the human respiratory system and are among the major organs of the human body. Lung disorders, including Coronavirus (Covid-19), are among the world’s deadliest and most life-threatening diseases. Early and social distance-based detection and treatment can save lives as well as protect the rest of humanity. Even though X-rays or Computed Tomography (CT) scans are the imaging techniques to analyze lung-related disorders, medical practitioners still find it challenging to analyze and identify lung cancer from scanned images. unless COVID-19 reaches the lungs, it is unable to be diagnosed. through these modalities. So, the Internet of Medical Things (IoMT) and machine learning-based computer-assisted approaches have been developed and applied to automate these diagnostic procedures. This study also aims at investigating an automated approach for the detection of COVID-19 and lung disorders other than COVID-19 infection in a non-invasive manner at their early stages through the analysis of human breath. Human breath contains several volatile organic compounds, i.e., water vapor (5.0%–6.3%), nitrogen (79%), oxygen (13.6%–16.0%), carbon dioxide (4.0%–5.3%), argon (1%), hydrogen (1 ppm) (parts per million), carbon monoxide (1%), proteins (1%), isoprene (1%), acetone (1%), and ammonia (1%). Beyond these limits, the presence of a certain volatile organic compound (VOC) may indicate a disease. The proposed research not only aims to increase the accuracy of lung disorder detection from breath analysis but also to deploy the model in a real-time environment as a home appliance. Different sensors detect VOC; microcontrollers and machine learning models have been used to detect these lung disorders. Overall, the suggested methodology is accurate, efficient, and non-invasive. The proposed method obtained an accuracy of 93.59%, a sensitivity of 89.59%, a specificity of 94.87%, and an AUC-Value of 0.96.


1  Introduction

To survive, the human body requires oxygen that it inhales through its respiratory system. Lungs are the main components of the human respiratory system, which are flexible, pinkish organs that resemble two upside-down cones in the human chest. The right lung is made up of three lobes, while to create room for the heart, the left lung contains only two lobes. Through breathing, oxygen from the incoming air enters the bloodstream, whilst carbon dioxide leaves the bloodstream [1]. A blunt or penetrating chest injury, medical operations, ruptured air blisters, or damage caused by underlying lung disease can cause lung cancer, while Covid-19 is a viral disease. Symptoms of these disorders include sudden chest discomfort, acute pain when inhaling, increasing chest pressure over time, elevated heart rate, and shortness of breath. Both disorders are generally asymptomatic in their early stages. For lung cancer, most instances are discovered after treatment has failed, the 5-year survival rate increases considerably from 10% to 80% if the condition is discovered early.

Lung disorders are usually detected using Chest X-Ray (CXR) and CT scan using imaging-based methods, while biopsy is still used as the gold standard method, which is the most invasive, riskiest, and requires surgery. CXR is the most popular and accessible method of identifying lung disorders. However, it is dangerous due to the presence of radiation.

In recent years, computer-aided diagnostic methods have been used to detect these disorders and are gaining popularity to reduce the number of fatalities by anticipating them early. These methods are helping radiologists, chest specialists, pulmonologists, and other medical professionals. Artificial neural networks (ANN), convolutional neural networks (CNN), Naive Bayes (NB), Linear regression (LR), Support Vector Regression (SVR), Random Forest Regression (RFR), Extra Tree Regression (ETR), Support Vector Machines (SVM), Random Forests (RF), and Deep Neural Networks (DNN) are among popular methods of machine learning, which achieve excellent results.

The breathing of living things is a complete cycle of air moving in and out of the lungs for gas exchange in the internal environment, mostly to flush out carbon dioxide and bring in oxygen. Breath composition is an important factor. Inhaled air consists of 79% nitrogen, 20.95% oxygen, and a small amount of some other gases, including argon, carbon dioxide, neon, helium, and hydrogen [2]. The gas exhaled is 4% to 5% by volume of carbon dioxide, about a 100-fold increase over the inhaled amount. The volume of oxygen is reduced by a small amount, 4% to 5%, compared to the oxygen inhaled.

New research has demonstrated that breath analysis can be used to detect lung, stomach, or chest-related disorders, which is an exciting development because it does not require any intrusive procedures [3]. Over the last few decades, numerous distinct kinds of sensors have been developed and used in laboratories. There have also been stories of electronic nose systems being utilized in clinical settings in conjunction with data processing methods to locate and identify stomach cancer more quickly [4].

Although a great number of techniques and sensor materials have been developed and applied in the search for lung, chest, or stomach disorders detection through exhaled breath, most of the previously available solutions have either been too difficult to use or too costly [5]. The clinical diagnosis that is based on the detection of VOCs faces several challenges related to the sensor technology, most of which are concentrated on the following major fronts: the complexity of metabolism and VOC kinetics in a multianalyte system; the inter/intra-person variability of VOC profiles in such a complex environment; the standardization of sensor calibration due to inherent sensor-to-sensor variability; and sensor drift and cross-sensitivities to environmental variables such as temperature.

Breath testing is a non-invasive method for doctors to diagnose a variety of illnesses [6]. Measuring the volume of specific gases in breath enables doctors to make a quick and accurate diagnosis. Research studies established that breath analysis helps diagnose a variety of ailments, including type 1 diabetes, colorectal cancer, lung cancer, obesity, lactose intolerance, fructose intolerance, head and neck cancer, ovarian cancer, bladder cancer, prostate cancer, gastric cancer, Crohn’s disease, ulcerative colitis, multiple sclerosis, pulmonary hypertension, pre-eclampsia toxemia, chronic kidney disease, etc., [7].

Initially, the subjects infected with the Covid-19 virus have this virus in their nasal cavity, and if breathing through the nose, may have several VOC-based biomarkers, assisting its diagnosis. Similarly, when breathing through the mouth, there are also biomarkers that are related to lung disorders [8].

Pakistan, the world’s sixth-most populous country with a population of around 208 million, spanning an area of approximately 881,913 km2, has an urgent need for such automated healthcare facilities for its large population scattered in small towns or villages. Most of its population does not have access to healthcare institutions that address critical health issues, and it is very hard for them to approach city hospitals due to poor infrastructure, roads, and transport facilities. Further, this population either cannot afford expensive medical checkups or is careless about their health and usually visits a qualified doctor at the last stage when it becomes difficult or impossible to cure [9]. The need for an e-healthcare system becomes more important and sometimes mandatory due to the countless viral infections spreading all over the world that requires social distancing. The focus of this project is to provide early-stage diagnosis and reduce the number of visits of patients to hospitals. This system provides a quality analysis at their doorstep at a very low cost with high accuracy.

In this study, breath has been analyzed to diagnose, grade, and monitor lung cancer and the Covid-19 virus. Overall, the proposed technique will have high accuracy, low cost, and will be non-invasive. The proposed method uses the Internet of Medical Things (IoMT) and machine learning to lay out an efficient model. The key contributions of this study are four-fold.

1.    To analyse breath, in an efficient manner to diagnose healthy, Covid-19 infected and infected with other lung disorders.

2.    To decrease overall diagnostic time.

3.    To introduce a portable, cost-effective, and lightweight house-held medical device.

4.    To introduce a mechanism with overall improved efficacy as compared to recently published state-of-the-art studies.

The rest of the article is organized as follows, the literature review is presented in Section 2, materials and methods are positioned in Sections 3 and 4 contains results and their discussions, while Section 5 contains conclusions and future work of the conducted study.

2  Literature Review

In the article [10], the authors proposed a method to diagnose gastric cancer. They examined the breath of 54 people with gastric cancer and 85 people in the control group. They performed their experiments using the Weka tool and trained the model based on NB, SVM, and RF classifiers. Their models achieved a maximum accuracy rate of 77.8%. The sensor reaction curve representation based on cluster taxonomy enhanced the results. The study conducted by [11], diagnosed acetone concentration through breath analysis using a SmFeO3-based sensor. The mathematical model between the response of SmFeO3 and the concentration of acetone was developed using the linear regression correlation algorithm. The accuracy of the proposed method exceeded 85% in comparison to the acetone concentration obtained by the gas chromatography-mass spectrometry method. Using the sol-gel method, the gas-sensing substance SmFeO3 was created and then heated to 800°C for annealing. At 210 degrees Celsius, the responses to 0.1 to 1 ppm acetone were 1.56, 1.87, 2.39, 3.22, 4.1, 4.79, and 5.92. In the study [12], the authors proposed a quick and non-invasive breath-monitoring method for detecting and tracking the progression of liver diseases. For the diagnosis of liver function scores, machine learning methods, i.e., LR, SVR, RFR, and ETR are used. The dataset used for research and experiments includes 473 liver donors who had morbidity tests performed ten years after their liver transplant. In the article [13], concentration variations of volatile organic compounds (VOCs) are detected from breath analysis using a solvothermal method. Investigating VOCs from three different species, including isoprene (hydrocarbons), ethanol (alcohols), and formaldehyde (aldehydes) a straightforward solvothermal method is created.

In this paper [14], the authors performed an acoustic analysis of breath sounds for the diagnosis of the infection of Covid-19. They used the Coswara dataset, which is a freely accessible dataset of breath sounds used for research and experiments. K-Nearest Neighbors and CNN algorithms are used for breath sound analysis of both healthy people and patients infected with Covid-19. The dataset consists of a total of 224 providers with unknown health status, 107 Covid-19 positive patients, 1107 healthy controls, and 48 patients with other respiratory diseases, whose tests were negative for Covid-19 infection. As per achieved results, the CNN classifier achieved an accuracy of over 97 percent, while KNN with optimized features achieved an accuracy of over 85 percent. In a research conducted by [15], the authors proposed a method to diagnose gastric cancer in its early stage using a sparse autoencoder neural network. The breath samples were collected from the Shanghai Tongren Hospital, China. A total of 200 volunteers participated in the study, including 89 advanced gastric cancer patients, 56 healthy individuals, and 55 early gastric cancer patients. Their proposed model achieved results with an accuracy of 98.7% for early-stage gastric cancer detection using breath analysis. In another research [16], the Covid-19 virus was detected using breath analysis. Different sensors having multiplexer sensing techniques that can detect plaque metabolites in inhaled air are used to diagnose Covid-19 the dataset consists of 49 confirmed Covid-19 patients, 58 healthy subjects, and 33 non-Covid-19 lung infection patients. Overall achieved results for distinguishing patients from healthy control with 94% and 76%, while Covid-19 and other patients with lung infection classification rates were 90% and 95% respectively. In the study [17], the authors proposed a method to diagnose lung cancer using the Cuckoo search algorithm. Otsu thresholding, along with local binary patterns, is used to extract the features from CT images. Their suggested framework achieved accuracy and sensitivity of 99.59% and 99.31%, respectively.

In this paper [18], a method has been proposed to diagnose Covid-19 infection in its early stage using X-ray images. The researchers used enhanced deep-learning techniques for research and experiments. The researchers used a publicly available dataset consisting of chest X-ray images. Their reported model achieved a specificity of 0.9474 and an accuracy of 0.9597. To control the spread of Covid-19 virus, a mechanism is proposed in the article (Reference, Blockchain, and ANFIS empowered IoMT application for privacy-preserved contact tracing in the COVID-19 pandemic). In this paper [19], a blockchain-based framework, which preserves patients’ anonymity by tracing their contacts using Bluetooth-based smart cellphones, has been proposed. Their smartphone application helps in interaction with their proposed blockchain based framework regarding contact tracing of the public. The application and the proposed framework interact via Bluetooth. The obtained data is then stored in the cloud, to be accessed by health departments’ personnel for their timely response. Their smartphone application is also able to verify Covid-19 status after the analysis of symptoms. Then their proposed Adaptive Neuro-Fuzzy Interference System (ANFIS) system predicts the Covid-19 status, while K-Nearest Neighbor (KNN) achieved Covid-19 status detection accuracy (95.9%).

3  Material and Methods

The following steps from data acquisition to successful classification of the controls, i.e., healthy, Covid-19 infected, and lung disorders other than Covid-19 infection are executed.

3.1 Dataset Acquisition

A dataset has been collected from the Department of Radiology and Diagnostic Images, Bahawal Victoria Hospital, Bahawalpur (BVHB), Pakistan. A total of 594 subjects were analyzed for the study, out of which 186 subjects were healthy, 207 patients were infected with the Covid-19 virus, and the rest of 201 subjects were diagnosed with lung infections other than the Covid-19 virus. The details of the dataset divided for training, testing, and validation are described in Table 1. Participating controls for dataset acquisition were eligible to take part in the study, if they were at least 18 years old, able to perform well on a breath test, and they signed a consent form. The subjects having any stomach disorder, additional cancers that are known to be active, treated with neoadjuvant chemotherapy, the evolution of operations on the stomach, insufficiency of the kidneys in their latter stages, diabetic Mellitus type I active bronchial asthma, surgical removal of sections of the small intestine in the past, performed upper endoscopy in one or two days before were not allowed to participate in the study. However, the subjects who were diagnosed with adenocarcinoma of the stomach and were scheduled to undergo surgery on their stomachs were included in the study. For the subjects who performed upper endoscopy two days ago, there was a group of people who did not have any sort of stomach disorders. These individuals were evaluated to see if the results of the upper endoscopy were accurate and were allowed to participate. Following the examination, the conclusive grouping was carried out by inspecting the histology findings. Before the measurements were performed, the persons who participated in the study were given instructions to follow to minimize the influence of any VOCs that could skew the results of the measurements. All of the participants were instructed not to consume anything for at least a whole day; drink coffee, tea, or soft drinks for a minimum of twelve hours, smoke for a minimum of two hours; consume any medicine that has any sort of organic compound or alcoholic element (C2H5OH) for at least one full day, brush their teeth for at least two hours before the examination, use gum or mouth fresheners for at least 12 h, use cosmetics and makeup for at least 12 h, engage in strenuous physical activity such as going to the gym, jogging, cycling, or other hard physical work, etc., for at least two hours before the examination.


3.2 Experimental Setup

The model’s center is the Arduino Mega, a microcontroller development board that handles all the computation. Hygrometers sensor, thermal conductivity sensor, infrared pulse oximeter sensor, nondispersive infrared sensor, ultrasonic flow sensor, hydrogen micro sensor, carbon sensor, a micro-electrical sensor, gas chromatography sensor, WO3-based sensor, electrochemical gas sensor, vernier ethanol sensor, quartz enhanced photoacoustic spectroscopy sensor, NOx sensor, passive colorimetric sensor, and diverse sensor are the sensors used to sense water vapor (H2O), nitrogen (N2), oxygen (O2), carbon dioxide (CO2), argon (Ar), hydrogen (H2), carbon monoxide (CO), proteins RCH(NH2)COOH, isoprene (C5H8), acetone (C3H6O), ammonia (NH3), ethanol (C2H5OH), ethane (C2H6), nitrogen oxides (NOx), volatile Sulphur compounds, and breath VOC mixes, respectively. The captured data is then transmitted to the computer system for further processing. The primary components consists of a sampling unit that captures the air that has been exhaled and monitors the flow of air. This unit is equipped with multiple sensors to sense the presence of each of these VOCs. Additionally, a few sensors to measure temperature, humidity, and air pressure are also mounted on the breadboard. The Arduino mega processor runs the code for the entire model and interfaces with the hardware to receive and produce data, as well as provide power to most of the components connected to it. A Light-emitting diode (LED) helps extract understandable data as needed when the patient stands before the sensors. The breath sensing setup is shown in Fig. 1.


Figure 1: Experimental setup of the proposed system

3.3 Measurement of Breath

To participate in the study, the individuals were invited to the BVHB. There, a separate room was established to measure breath samples to prevent these samples from being tainted by any other VOCs. A questionnaire was also filled out regarding the individual’s lifestyle and medical history before taking a sample of his/her breath. This questionnaire included questions concerning potential confounding factors. When the participants were finished with the questionnaire, they exhaled into a series of sensors integrated with a breadboard, microcontroller, and computer, which were placed on a table, and were able to immediately sense their breath as soon as it was exhaled. The sensors are designed to be sensitive to a wide variety of VOCs in exhaled breath. This collection of sensors with a variety of information transfer methods was chosen to detect the medical condition of the control based on the findings of prior investigations. These measurements, after recording in analog signals, are converted to digital format and then send to the computer, where a Comma Separated Value (CSV) file is created for all these samples. A sample of the room’s air was taken and automatically measured before the measurement began. This was done to get data from the sensor’s response to any possible background VOCs. The computer created another CSV file that contained sensory data about the room air.

3.4 Preprocessing

In this phase, data is modified and enhanced for its better quality so that any further process may perceive its patterns more accurately. The raw data from the breath measurement was first subjected to preprocessing to eliminate any inaccurate measurements, standardize the length of each measurement, and clean up the data. After the data had been preprocessed, it was then used to construct cluster taxonomies and to obtain the standard characteristics for comparison. After that, the cluster taxonomies and the extracted features were run through feature selection algorithms to choose the attributes having the most relevant information. These attributes were then categorized by the instructions. An open-source tool, TIBCO Clarity, is present at [20] and is used for data cleaning.

The measurements were then preprocessed by making the measurement lengths the same so that they could be analyzed using methods that require time series to be the same length. This was done so that the measurements were one-time series for each sensor. After the sensor response had become stable, the results were compared to the final values of the baseline measurements. The baseline measurements were the readings taken from the air in the room before the breath analysis was performed. This was done to eliminate the impact of the air in the room as well as the VOCs that were in the air. In the search for outliers, each measurement was taken into consideration. The noise in the sensor values was significantly reduced by the application of the median filter. This resulted in one preprocessed time series being generated for each different type of sensor, demonstrating how the sensor responded when exposed to breath.

It was necessary to perform the preprocessing to have the sensor response curves prepared for the training of the classification model. The steady-state response and the transient responses, such as the minimum, mean, or maximum values, or the area under the curve, are utilized most of the time to determine what is going on. However, using this method removes a great deal of information that is useful from the curve. One of the aspects of the curve that was investigated with its overall shape was to describe the contours of the curve we made use of the curves that were specific to each cluster.

Using clustering, observations are divided up into groups in which the members of each group share more similarities than they do with the members of other groups. Therefore, curves that shared similar characteristics were grouped into clusters. As a feature of this cluster, the utilization of measurements that were comparable to the cluster mean curve was utilized. This is because these processes rank the features according to how effectively they help classify the data.

3.5 Clustering of the Measurements

To produce curve shape taxonomies with curves that could be used to classify items, measurement curves were first sorted into k groups that were like one another, and then it was determined what shape each of those groups had in common. Before the analysis, it was impossible to provide an accurate count of the number of groups that were present. Because of this, we utilized hierarchical clustering and cuts in the resulting dendrograms to generate 3 clusters. These clusters were then used to make a hierarchical taxonomy. The clustering was accomplished using Euclidean distances. It was selected since it is the method used most frequently to measure distance. It is possible to calculate the distance ‘d’ by comparing the distances that exist between each pair of points (out of a total of ‘n’ points) in both time series (TS1 and TS2). The distance ‘d’ is calculated using Eq. (1).

d(TS1, TS2) = i=1n(TS1i  TS2i)2 (1)

This capability is very significant in the analysis of sensor responses as reactions can occur at varying rates, and there might be delays before a reaction starts or before it reaches its maximum. This results in each point being aligned with one or more points from the other time series. They infrequently form small clusters with a few data points that are further away from other points on the attribute axes than other points, but these data points are not regarded as outliers. The objective of this distance is to minimize the amount of dissimilarity that exists between the data points that make up each cluster, whereas the focus of the complete distance is to determine the level of separation that exists between any two clusters.

3.6 Training

In this phase, three aspects are included, i.e., classification using SVM and NB, transfer learning using CNN-based algorithms, i.e., Restnet50 & VGG19, and long short-term memory (LSTM). 80% of the available dataset was used for training, while the rest of 10% was used for both testing and validation of the trained models.

3.7 Classification

In this phase, the trained model predicts the data vector from which class the data pattern belongs to. A trained model can identify the new data vector to classify the healthy, and those infected with the Covid-19 virus, and infected with lung disorders other than the Covid-19 virus. The model also gives a score based on probability, which represents that data belongs to a certain class. The findings of the clustering process, which included hierarchies, group memberships, and characteristic curves, were then utilized as input data for the induction of a classifier, which was used to differentiate between the healthy and infection controls. Each breath’s membership in a cluster was used as a separate attribute at each level of the taxonomy that was performed on the breaths. There are a few distinct approaches to select an optimal number of clusters and making the appropriate cut in the taxonomy. Some classifiers contain built-in mechanisms for selecting the features that should be used. If the method of choice does not have a built-in technique to cut, then the cut can be done by an expert’s choice, the distance between clusters and other measures of dissimilarity, or feature selection algorithms. Alternatively, the cut can be made based on the number of clusters. During this research, we used the strategy of feature selection since we wanted to ensure that we had the most accurate cluster sets for categorizing. To determine which cuts were superior, we utilized the Information Gain (IG), and Symmetrical Uncertainty Feature Selection (SUFS) algorithms. IG is a method for determining how much entropy is lost because of the employment of a feature in the process of subdividing data into smaller groups, also called Kullback-Leibler divergence [21]. It is utilized in many settings, even though it does not consider how characteristics could be dependent on one another. It investigates the connection that exists between a feature and its corresponding class.

After that, the selected characteristics were used to categorize the items into their respective categories. The data was then utilized to reshape into two-dimensional representation, so that deep CNN [22] be applied to lean their patterns. The common characteristics and cluster taxonomies that were present in the dataset were utilized as features, and the cycle of picking features, training the classification model, and testing it was repeated one thousand times for each cluster. Each run was evaluated based on a variety of criteria, including its overall accuracy, sensitivity, specificity, and area under the curve (AUC-value). A systematic flow diagram of the proposed framework is represented in Fig. 2.


Figure 2: A systematic flow diagram of the proposed framework

4  Results and Discussions

When each variety of sensors was analyzed on its own, the results were considerably different from one another. So multiple sensors, each sensing a specific exhaled breath component, have been used to integrate them into an array. Therefore, it enabled the classifiers to learn contributing patterns of healthy or diseased controls.

Features extracted from preprocessed and clustered datasets consisting of measured VOCs placed in a CSV file were input to SVM and NB classifiers to train a model. However, the same features after arrangement in two-dimensional space were also input to pretrained CNN-based models, i.e., Restnet50 & VGG19, and LSTM for their classification purposes. Most of the researchers working in the domain of classification adopt accuracy, specificity, sensitivity, and AUC-Value as standard evaluation measures to evaluate their outcomes. Therefore, the results obtained during experiments were evaluated based on these standard evaluation measures, which are represented in Table 2. For the sake of simplicity, results are compared and discussed in two groups, i.e., comparison of machine learning models and comparison with other state-of-the-art recently published related studies.


4.1 Machine Learning Model-Wise Comparison of the Results

As per the results shown in Table 2, the SVM classifier obtained reasonable measures with an accuracy of 69.59%, a specificity of 58.53%, and an ACU-Value of 0.87, but the sensitivity of 58.53% was achieved as low. Similarly, the results obtained through NB were also acceptable. They were 73.26%, 57.56%, 82.31%, and 0.89 in terms of accuracy, specificity, sensitivity, and AUC-Value, respectively. While classification through trained CNN-based models, i.e., Restnet50 and VGG119, the results are better as compared to what was achieved with SVM and NB. These results are 79.36% for accuracy, 64.28% for sensitivity, 88.75% for specificity, and 0.93 for AUC-Value, and 83.95% for accuracy, 78.25% for sensitivity, 89.85% for specificity, & 0.94 for AUC-Value achieved by the models trained using Restnet50 and VGG19 respectively. It is established from the results that SVM, NB, Restnet50, and VGG19, trained models could not achieve high sensitivity. When experimented with LSTM, the most accurate classification algorithm was as per the experimental results, had an overall accuracy (93.59%), sensitivity (89.59%), specificity (94.87%), and an AUC-Value (0.96). It is important to highlight that LSTM achieved a much better sensitivity rate as compared to other classifiers.

4.2 Comparison of the Results with State-of-the-art Studies

Table 3 shows the results obtained through the state-of-the-art recently published studies and the proposed model. The study [10] is about diagnosing gastric cancer, in which authors examined the breaths of 54 people with gastric cancer and 85 healthy controls. The model is trained using NB, SVM, and RF classifiers with a maximum of 77.8% achieved accuracy. The dataset they used for research and experiments is low in volume, which could not ensure the robustness of their proposed model. Moreover, the results as per accuracy are not too high. In another study [14], Covid-19 infection was diagnosed using KNN and CNN machine learning models using acoustic analysis of breath sounds. The dataset, the authors used consists of 107 Covid-19 positive patients, 1107 healthy controls, and 48 patients with other respiratory diseases. Their reported model based on the CNN classifier achieved an accuracy of over 97%, while KNN with optimized features achieved an accuracy of over 85%. Although, the accuracy reported against the CNN model is appreciative, however, the volume of the dataset in the case of Covid-19 and respiratory diseases was low. Similarly, another recently published study [15], diagnosed gastric cancer in its early stages using a sparse autoencoder neural network from the analysis of breath. Experiments have been performed on a locally developed dataset obtained from Shanghai Tongren Hospital, China, consisting of 200 volunteers (89 with AGC, 56 healthy individuals, and 55 with EGC). Their proposed model achieved results per accuracy of 98.7%. The reported study has only a limited volume of the dataset.


The proposed study also analyzed exhaled breaths of the subjects. The dataset used in the proposed study for research and experiments consists of 594 subjects out of which 186 controls were healthy, 207 subjects were positive for Covid-19 infection, and 201 participants were suffering from lung disorders other than Covid-19 infection. Each of these machine learning models, i.e., SVM, NB, Restnet50, VGG19, and LSTM was used to train a model. It is established that LSTM outperformed, and results obtained as per standard evaluation measures, i.e., accuracy, sensitivity, specificity, and AUC-Value were 93.59%, 89.59%, 94.87%, and 0.96 respectively. Since all the qualitative measures used to assess the model trained through LSTM indicate its success. In particular, the ability of the reported model to correctly identify a patient with any disorder is up to the mark. Therefore, this model can be deployed in any clinical set up as a real-time environment.

5  Conclusion and Future Work

In this study, we have investigated how well recently developed biomedical sensors may diagnose patients suffering from different lung disorders. In general, this research can differentiate between the breath of those who had Covid-19 infection, lung disorders other than Covid-19, and healthy people. As the reported model has higher specificity, so it is more suitable to use in screening. The suggested apparatus makes it feasible to analyze breath anyplace, providing it more flexibility than approaches that are performed in a laboratory. It makes use of sensors, which enables it to maintain a high level of precision. It is possible that, when paired with the indicated techniques for data analysis, it may give a mechanism that is both rapid and accurate to determine whether someone has a lung disorder just on their breath.

In addition to that, in the future, it has the potential to detect various types of lungs, stomach, and chest disorders, monitor the development of complexities, and to screen the whole population. Further research is needed to enhance the efficacy of the classification system and to put the distinct categorization stages of disorders into practice. The proposed model, as an integrated product, is aimed at deployment in the Department of Radiology and Diagnostic Images, BVHB, Pakistan for screening of chest disorders. It is also aimed in the future to design its printed circuit board to integrate all into one so that it is available on the market as medical equipment to use in homes as household devices. The main limitation of the proposed model is that, currently, it can diagnose, healthy, Covid-19 infected and lung disorders other than Covid-19, while radiological and surgical procedure may require to identify further what the lung disorders are for their proper treatment.

Acknowledgement: The researchers would like to thank the staff of the Department of Radiology and Diagnostics Images, Bahawalpur, Pakistan, for their overall cooperation in the provision of radiological and biological information and expert opinion.

Funding Statement: The authors received no specific funding for this study.

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.


  1. M. Abouhawwash and A. M. Alessio, “Multi-objective evolutionary algorithm for pet image reconstruction: Concept,” IEEE Transactions on Medical Imaging, vol. 40, no. 8, pp. 2142–2151, 202
  2. P. Sukul, J. K. Schubert, K. Zanaty, P. Trefz, A. Sinha et al., “Exhaled breath compositions under varying respiratory rhythms reflects ventilatory variations: Translating breathomics towards respiratory medicine,” Scientific Reports, vol. 10, no. 1, pp. 1–16, 2020.
  3. A. R. Javed, M. U. Sarwar, M. O. Beg, M. Asim, T. Baker et al., “A collaborative healthcare framework for shared healthcare plan with ambient intelligence,” Human-Centric Computing and Information Sciences, vol. 10, no. 1, pp. 1–21, 2020.
  4. T. Richter, B. Fishbain, A. Markus, G. Richter-Levin and H. Okon-Singer, “Using machine learning-based analysis for behavioral differentiation between anxiety and depression,” Scientific Reports, vol. 10, no. 1, pp. 1–12, 2020.
  5. M. Rizwan, A. Shabbir, A. R. Javed, G. Srivastava, T. R. Gadekallu et al., “Risk monitoring strategy for confidentiality of healthcare information,” Computers and Electrical Engineering, vol. 100, no. 1, pp. 107833, 2022.
  6. C. Chojnacki, P. Konrad, A. Błońska, J. Chojnacki and M. Mędrek-Socha, “Usefulness of the hydrogen breath test in patients with functional dyspepsia,” Gastroenterology Review/Przegląd Gastroenterologiczny, vol. 15, no. 4, pp. 338–342, 2020.
  7. O. Gould, N. Ratcliffe, E. Król and B. de Lacy Costello, “Breath analysis for detection of viral infection, the current position of the field,” Journal of Breath Research, vol. 14, no. 4, pp. 041001, 2020.
  8. G. Gilanie, U. I. Bajwa, M. M. Waraich, M. Asghar, R. Kousar et al., “Coronavirus (COVID-19) detection from chest radiology images using convolutional neural networks,” Biomedical Signal Processing and Control, vol. 66, no. 3, pp. 102490, 2021.
  9. S. Suganthi, A. Vinayagam, V. Veerasamy, A. Deepa, M. Abouhawwash et al., “Detection and classification of multiple power quality disturbances in Microgrid network using probabilistic based intelligent classifier,” Sustainable Energy Technologies and Assessments, vol. 47, no. 1, pp. 101470, 2021.
  10. I. Polaka, M. P. Bhandari, L. Mezmale, L. Anarkulova, V. Veliks et al., “Modular Point-of-care breath analyzer and shape taxonomy-based machine learning for gastric cancer detection,” Diagnostics, vol. 12, no. 2, pp. 491, 2022.
  11. H. Zhang, J. Xiao, Y. Wang, L. Zhang, G. Zhao et al., “A portable acetone detector based on SmFeO3 can pre-diagnose diabetes through breath analysis,” Journal of Alloys and Compounds, vol. 922, pp. 166160, 2022.
  12. R. K. Patnaik, Y. -C. Lin, A. Agarwal, M. -C. Ho and J. A. Yeh, “A pilot study for the prediction of liver function related scores using breath biomarkers and machine learning,” Scientific Reports, vol. 12, no. 1, pp. 1–14, 2022.
  13. X. Wu, H. Wang, J. Wang, D. Wang, L. Shi et al., “VOCs gas sensor based on MOFs derived porous Au@ Cr2O3-In2O3 nanorods for breath analysis,” Colloids and Surfaces A: Physicochemical and Engineering Aspects, vol. 632, pp. 127752, 2022.
  14. Z. Chen, M. Li, R. Wang, W. Sun, J. Liu et al., “Diagnosis of COVID-19 via acoustic analysis and artificial intelligence by monitoring breath sounds on smartphones,” Journal of Biomedical Informatics, vol. 130, no. 6, pp. 104078, 2022.
  15. M. A. Aslam, C. Xue, Y. Chen, A. Zhang, M. Liu et al., “Breath analysis based early gastric cancer classification from deep stacked sparse autoencoder neural network,” Scientific Reports, vol. 11, no. 1, pp. 1–12, 2021.
  16. B. Shan, Y. Y. Broza, W. Li, Y. Wang, S. Wu et al., “Multiplexed nanomaterial-based sensor array for detection of COVID-19 in exhaled breath,” ACS Nano, vol. 14, no. 9, pp. 12125–12132, 2020.
  17. V. Chapala and P. Bojja, “IoT based lung cancer detection using machine learning and cuckoo search optimization,” International Journal of Pervasive Computing and Communications, vol. 17, no. 5, pp. 549–562, 2021.
  18. S. Mahajan, A. Raina, M. Abouhawwash, X. -Z. Gao and A. K. Pandit, “Covid-19 detection from chest x-ray images using advanced deep learning techniques,” Computers, Materials and Continua, vol. 70, no. 1, pp. 1541–1556, 2021.
  19. B. Aslam, A. R. Javed, C. Chakraborty, J. Nebhen, S. Raqib et al., “Blockchain and ANFIS empowered IoMT application for privacy preserved contact tracing in COVID-19 pandemic,” Personal and Ubiquitous Computing, vol. 65, no. 1, pp. 1–17, 2021.
  20. D. Petrova-Antonova and R. Tancheva, Data cleaning: A case study with OpenRefine and Trifacta Wrangler. Faro, Portugal: Springer, pp. 32–40, 20
  21. T. Van Erven and P. Harremos, “Rényi divergence and Kullback-Leibler divergence,” IEEE Transactions on Information Theory, vol. 60, no. 7, pp. 3797–3820, 2014.
  22. G. Gilanie, U. I. Bajwa, M. M. Waraich and M. W. Anwar, “Risk-free WHO grading of astrocytoma using convolutional neural networks from MRI images,” Multimedia Tools and Applications, vol. 80, no. 3, pp. 4295–4306, 2021.

Cite This Article

M. Ghani and G. Gilanie, "The iomt-based risk-free approach to lung disorders detection from exhaled breath examination," Intelligent Automation & Soft Computing, vol. 36, no.3, pp. 2835–2847, 2023.

cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 636


  • 335


  • 0


Share Link