|Computers, Materials & Continua |
Traditional Chinese Medicine Automated Diagnosis Based on Knowledge Graph Reasoning
1School of Computer & Communication Engineering, University of Science & Technology Beijing, Beijing, 100083, China
2Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, 100083, China
3Inspur Electronic Information Industry Co., Ltd. & State Key Laboratory of High-End Server & Storage Technology, Jinan, 250101, China
4Surgical Simulation Research Lab, Department of Surgery, University of Alberta, Edmonton, T6G 2E1, Alberta, Canada
*Corresponding Author: Yonghong Xie. Email: email@example.com
Received: 26 January 2021; Accepted: 01 March 2021
Abstract: Syndrome differentiation is the core diagnosis method of Traditional Chinese Medicine (TCM). We propose a method that simulates syndrome differentiation through deductive reasoning on a knowledge graph to achieve automated diagnosis in TCM. We analyze the reasoning path patterns from symptom to syndromes on the knowledge graph. There are two kinds of path patterns in the knowledge graph: one-hop and two-hop. The one-hop path pattern maps the symptom to syndromes immediately. The two-hop path pattern maps the symptom to syndromes through the nature of disease, etiology, and pathomechanism to support the diagnostic reasoning. Considering the different support strengths for the knowledge paths in reasoning, we design a dynamic weight mechanism. We utilize Naïve Bayes and TF-IDF to implement the reasoning method and the weighted score calculation. The proposed method reasons the syndrome results by calculating the possibility according to the weighted score of the path in the knowledge graph based on the reasoning path patterns. We evaluate the method with clinical records and clinical practice in hospitals. The preliminary results suggest that the method achieves high performance and can help TCM doctors make better diagnosis decisions in practice. Meanwhile, the method is robust and explainable under the guide of the knowledge graph. It could help TCM physicians, especially primary physicians in rural areas, and provide clinical decision support in clinical practice.
Keywords: Traditional Chinese medicine; automated diagnosis; knowledge graph; Naïve Bayes; syndrome differentiation
As a complementary field of medicine outside the modern medicine system, traditional Chinese medicine (TCM) has played a significant role in the healthcare of China for thousands of years [1–4]. According to the China Public Health Statistical Yearbook, over 1 billon TCM treatments are carried out in China each year . Syndrome differentiation is a core diagnosis method of TCM. It analyzes the specific pattern of symptoms, etiology, nature, and location of a disease and guides treatment strategies . In TCM, syndrome is a concept that abstracts a set of symptoms and determines the phase of a disease . Master TCM syndrome differentiation is an intricate and time-consuming process. Because syndrome differentiation is very complicated to conduct, it can be difficult to maintain stable efficacy when treating a given disease. Moreover, the number of TCM doctors cannot support the huge demand for TCM treatments.
In recent years, automated diagnosis has received much attention. Automated diagnosis systems utilizing artificial intelligence aim to diagnose and make decisions based on a patient’s condition. Most reported research has applied artificial intelligence in modern medicine [8–10]. Automated diagnosis in TCM is more challenging. Some researchers have begun to study the application of information technology in TCM diagnosis [11,12]. Wang et al.  used raw free-text as original input and employed the naïve Bayes and the support vector machine classifier for automated diagnosis in TCM. Xu et al.  designed an artificial neural network as a classifier for syndrome differentiation and achieved good performance in diagnosing chronic obstructive pulmonary disease. Liu et al.  focused on lung cancer syndrome differentiation. They treated syndrome differentiation as a multilabel text classification task and utilized deep learning to model the clinical record text feature for classification. They also used a fusion model approach to obtain better performance than a single model. Meanwhile, Zhang et al.  developed a TCM assistive diagnostic system based on artificial intelligence. A long-short term memory network (LSTM) with a conditional random field (CRF) framework extracted features from raw medical record text. Then a convolutional neural network (CNN) was used to predict the disease type. Despite these TCM automated diagnosis systems having positive preliminary results, there are some limitations. The existing methods require a large volume of annotated clinical records for training. Furthermore, these methods lack interpretability for the diagnosis process. In practice, clinicians need an automated diagnosis method that does not rely on a large number of annotated data and is explainable.
Knowledge graphs may address these limitations. Knowledge graphs describe concepts, entities, events, and their relationships in the real world. The knowledge graph is the foundational knowledge resource used to implement artificial intelligence systems [17,18]. In TCM, the knowledge graph could organize fragmented theoretical knowledge. In this way, we could reinforce the connectivity of TCM knowledge and support the automated diagnosis method. Xie et al.  proposed a personalized diagnostic pattern mining method based on the TCM knowledge graph with a specific doctor’s clinical records. Meanwhile, Yu et al.  and Zheng et al.  described the construction of a TCM knowledge graph using databases and documents. Zhang et al.  introduced a TCM knowledge graph based on ontology. Lastly, Xie et al.  proposed a TCM auxiliary diagnosis method combining a knowledge graph and reinforcement learning.
In this study, we propose an artificial intelligence TCM automated diagnosis method. This method simulates syndrome differentiation through deductive reasoning on a knowledge graph and infers syndromes from the patient’s symptoms. We analyze the reasoning path patterns from symptom to syndromes on the knowledge graph. According to these patterns, we illuminate the inference process from a set of symptoms to syndromes with naïve Bayes. The proposed method reasons the syndrome results by calculating the possibility according to the weighted score of the path in the knowledge graph. We evaluate the performance of our method with real-world record sets and prove its effectiveness and practicality.
2.1 Task Definition
For a given symptom set , where is a symptom, and a given syndrome set , which is pre-defined by a specific disease and wherein is a syndrome, we infer the target syndrome utilizing the TCM knowledge graph of Zhang et al. . For each syndrome, represents the probability of syndrome being in symptom set . The inference process simulates syndrome differentiation and treats knowledge paths (or reason paths) in the knowledge graph as evidence for the inference. These paths need to be consistent with cognitive diseases in TCM and indicate the diagnosis decision-making process of physicians. We limit the length of the pattern to 2 because we believe these patterns could provide evidence for diagnosis. Therefore, we define the meta-path as the reasoning path pattern as in Tab. 1.
2.2 Naïve Bayes Automated Diagnosis on Knowledge Graph
In this section, we describe the automated diagnosis method. According to the definition of the task, the core question is the calculation of the probability . Based on the Bayes formula, we can get this relation:
where represents the probability that the symptom set occurs in the condition of the syndrome , and is an priori probability and represents the possibility of syndrome being the specific disease. can be defined by a TCM expert or calculated according to past medical records.
We consider that each symptom in symptom set is independent. Therefore, we can obtain
Combining this with Eq. (1), we can obtain
We define the inference score as in Eq. (4). In practice, we use log to avoid the result of the series multiplication being too small.
Next, we need to calculate . The knowledge path on the knowledge graph is the main reasoning principle. There are two kinds of path patterns: one-hop and two-hop. We define as the score function of the one-hop path and as the score function of the two-hop path.
For the one-hop path, we search all knowledge paths from each symptom to every syndrome and calculate the one-hop score as follows:
where represents the number of one-hop knowledge paths from symptom to syndrome .
Two-hop paths represent the support from different perspectives in TCM, including nature of disease, etiology, and pathomechanism. However, the support strengths of the intermediate knowledge nodes for different syndromes are unequal. We use TF-IDF to regularize the path weight. As with the one-hop path, we first need to search all of the knowledge paths. Then, we calculate the path weight based on the frequency of each intermediate knowledge node for each different syndrome. The two-hop score is as follows:
where is the number of two-hop knowledge paths from symptom to syndrome , and is the number of all the intermediate knowledge node paths. In this way, some intermediate knowledge nodes will be emphasized if they are particularly strongly related to a specific syndrome.
is determined by adding and . Here, we use two hyper-parameters to balance the two different scores:
We set and since we think a short path in the knowledge graph would provide better support than a long path. An example of the inference is shown in Fig. 1.
3.1 Data Description
We used two datasets to test our method. The first dataset was a clinical record set with data collected from the book, Chinese Medical Records of All Famous Doctors. We selected 519 clinical records related to nine different diseases, including coronary heart disease, diabetes, and some gynecological diseases. The medical records were manually processed by TCM experts. The syndromes of each disease were also defined by TCM experts. Tab. 2 lists the syndromes of each disease. The Chinese–English translations of the syndromes are presented in Appendix A.
We also used a real-world dataset. We deployed our method in nine hospitals to test its efficacy in practice. The hospitals included Guanganmen Traditional Chinese Medicine Hospital and Dongzhimeng Traditional Chinese Medicine Hospital in Beijing, China, among others. Doctors of these hospitals used our method to diagnose coronary heart disease and diabetes. Finally, doctors evaluated the result of the automated diagnosis based on their professional expertise. The distributions of gender, age, and syndromes are shown in Figs. 2–4, respectively.
3.2 Experiment Results
We used the metrics Hit@N and MeanRank to evaluate the performance of the proposed method on the clinical record dataset. First, we ranked the list of predicted syndromes in descending order based on the possibility of correct reasoning. The Hit@N measures the probability of how often the correct syndrome is in the top N places of the list. Here, we set N to 1, 3, and 5. The MeanRank measures the average sorted position of the correct syndrome. For candidate set ranking, the aim is to rank the correct syndrome at the top position.
The performance metrics obtained for experiments on the clinical record dataset are shown in Tab. 3. The count column represents the number of clinical records pertaining to each disease. Our method gives the performance with Hit@1, Hit@3, and Hit@5 of 0.708, 0.958, and 0.980 and MeanRank of 1.438. Although coronary heart disease and diabetes have worse Hit@1 scores than other diseases, the Hit@5 is greater than 0.90 for both. As indicated by the results, the proposed method achieves high diagnostic accuracies on the eight diseases.
In the real-world experiment, we let the doctors treat the diagnosis result as correct if one of the top three syndromes is consistent with their diagnosis. Otherwise, the result is considered wrong. In this experiment, there are 934 cases of coronary heart disease and 314 cases of diabetes. Tab. 4 displays the results of this evaluation. We can observe that the ratio of correct diagnoses is very high. Therefore, the proposed method has good performance in clinical practice.
Unlike most previous research that treats automated diagnosis as a supervised task, our method does not rely on a large annotation dataset for training with machine learning. We utilize the TCM knowledge graph and develop an unsupervised automated diagnosis method to achieve syndrome differentiation. Compared with other reported work, our method is robust and explainable under the guide of the knowledge graph. Our method’s performance indicates its effectiveness in clinical practice. Moreover, our method could easily be generalized to other diseases.
However, the proposed method has limitations. First, it requires a high-quality knowledge graph. Thus, a stronger knowledge graph could improve its performance. Moreover, this method still relies on the prior knowledge of experts to a certain degree. Thus, we must consider introducing supervised learning. Lastly, additional clinical data are needed in future work.
Automated diagnosis is an essential and vital task. For TCM, syndrome differentiation is an important part of the diagnostic process. We propose an automated diagnosis method that simulates syndrome differentiation through deductive reasoning on a knowledge graph. We evaluate the method using a clinical record dataset and assess its application to clinical practice. The preliminary results suggest that the method can support diagnosis. It could help TCM physicians, especially primary physicians in rural areas, make clinical decisions. This will solve the imbalance of the medicine resource problem in China and lead to social and economic benefits.
Acknowledgement: We thank the anonymous reviewers for their helpful comments. Thanks are also due for the TCM knowledge support from Yingjie Shi and for the data processing by Hu Tao and Jia Li. We thank LetPub (https://www.letpub.com) for its linguistic assistance during the preparation of this manuscript.
Funding Statement: This work is supported by the National Key Research and Development Program of China under Grant 2017YFB1002304 and the China Scholarship Council under Grant 201906465021.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
Appendix A. The Chinese–English translations of syndromes
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|