Tags: type discussion association rules other POS Data so association rule mining infection
In recent years, medical data mining has developed rapidly, however, at present, medical data structure is in the initial stage, more medical information is still in the form of natural language text, the knowledge of these medical texts is the crystallization of people's wisdom in different regions and different times, which shows a large amount of literature and medical records, which is Chinese medicine is particularly prominent.
The learning ability of natural persons is limited, so scholars try to complete the process of summarizing TCM knowledge through natural language processing (Natural Language processing, NLP), distill the knowledge, extract the useful diagnosis and treatment information, and eventually form the knowledge ontology or Knowledge Network, which provides the standard and convenience for subsequent text mining tasks. NLP belongs to the sub-domain of artificial intelligence, its core purpose is to make the computer can understand and generate human natural language, the task mainly includes information extraction, machine translation, emotion analysis, abstract extraction, etc., the techniques used include named body recognition, semantic disambiguation, finger digestion, pos tagging, structural analysis, etc. The medical history, diagnosis, treatment, medicine and other nouns contained in a large number of medicine texts provide the possibility for the application of NLP. Using NLP technology to excavate the knowledge hidden in the text is of great significance to the development of medicine, and there are some related studies in the fields of medicine and biology . At the same time in the 20th century 80-90, some medical ontology databases were gradually established, such as the Integrative medical Information System, the terminology of clinical medicine system, so that the data and tools of exploiting NLP to excavate medical knowledge were more abundant.
1 Association Rule Mining
2. 1. 1 An overview of association rules is a common method of data mining,
The core is to analyze the rules like "some things happen to lead to other events", including Simple Association, Time Series Association, Quantity Association, causal Association, and so on, the core algorithm is based on the support degree and confidence level as the criterion to determine whether there is an association relationship. The well-known association algorithm has Apriori algorithm and its improved algorithm fp-growth , by calculating the frequent itemsets to represent the rule before and after the matter is obviously at the same time.
2. 1. 2 The application of association rules in Traditional Chinese medicine is mainly prescription
Related mining, such as Nintinger, etc.  tried to build a database of traditional Chinese medicine prescriptions, collected about 100,000 prescriptions data in the past 2000 years, a total of 1 million data records, and gave the method of mining association rules from . Wanda  using the Apriori algorithm to analyze the collected prescriptions database, the ==> of Angelica sinensis (support 7.86%, confidence 78.57%), Bai Yupi
==> Soil Poria Cocos (support degree 7.14%, confidence 83.33%) and other association rules, will be commonly used in the prescription of drug analysis, the formulation of traditional Chinese medicine to play a guiding role in the evidence. Daire  to 445 cases of medical treatment case analysis, mining the etiology of asthma, disease, syndromes and four diagnostic information related relations, etiology, disease, syndromes, four diagnostic information and the relationship between drug use, and the correlation between Chinese medicine.
2. 1. 3 Limitations
The knowledge of correlation analysis is limited, and only the concurrency is taken into account, which is generally limited to a term that is more similar to one or some other term. Most applications are built on the premise of acquiring structured data, and more of the ability to analyze structured data.
2. 2 Cluster analysis
2. 2. 1 Overview TCM has the theory of Yin and Yang and five elements, the human body has organs points, all highlighting the characteristics of classification, cluster analysis should be used in traditional Chinese medicine should be in harmony with the nature of TCM itself. The scholars use the clustering analysis method to study the TCM text mining, which is the symptom classification and drug evaluation.
2. 2. 2 The corpus of the symptom classification is more from the diagnosis manuscript of TCM, it is common to start with a special disease, use the diagnosis manuscript to cluster the symptom, and obtain the phenotype characteristic of the disease. Ma Xiaohui  using a total of 739 cases of biliary tract infection, gallstone disease, 92 clinical phenotype clustering, to obtain the characteristics of the manifestation of biliary symptoms, summed up the symptoms of biliary disease group. Shihong et al  Using cluster analysis method to find the natural group of kidney deficiency symptoms, the result of clustering is basically consistent with the description of TCM theory, which provides a good evidence for the scientificity of TCM. In addition to the symptoms, He Yumin and other  use fuzzy clustering to obtain the type of physical classification (strong, weak, maladjusted) and a number of subtypes.
2. 2. 3 Drug evaluation The direction of cluster drug evaluation is mainly the use of clustering methods of similar traits or the same efficacy of drugs together, the use of Traditional Chinese medicine theory to summarize knowledge. HO striker et  The Chinese medicine according to the efficacy of clustering, define the similarities between drugs, the classification of Chinese medicine to make a certain contribution.
2. 2. 4 limitations compared to information extraction, cluster analysis biased to the overall nature, from the macroscopic point of view of disease, symptoms, drugs to make a classification, can only be generalized evaluation, unable to dig out the specific diagnosis and treatment methods information.
2. 3 Information Extraction
2. 3. 1 Overview TCM documents are mostly described in the way of natural language, and complex, medical records contain symptoms, diagnostic information, medical books contains prescriptions, pathological information, drug books containing components, production methods information, if the use of artificial methods to extract these information, the cost of manpower, material is difficult to measure. However, since TCM terminology is included in the descriptive language, and the document description language is concise and logically simple, it is possible to consider using information extraction algorithm to obtain structured information automatically.
2. 3. 2 Information extraction based on the model of hidden hidden in recent years, Hidden Markov models (Markov model, hmm) have been widely used in the field of information extraction. Gu Zheng et  using HMM to extract the information of Traditional Chinese medicine classics, the symptoms, etiology, pulse and prescriptions as the model of 4 states, and then use named entity recognition combined with manual labeling method to draw the corresponding nouns from the literature, the final calculation of the hmm correlation parameters, to achieve the purpose of information extraction. Zhongli  Taking TCM clinical diagnosis and treatment data for the general public convenient information service as the goal, designed and realized the TCM clinical diagnosis and treatment vertical search system TCMVSE, can complete the Web information collection, the information extraction, the information index and the retrieval and so on function.
2. 3. 3 Insufficient information extraction need to manually define the extracted template, and often face the situation of data loss, the resulting structural data is also missing data, to further analysis brings some difficulties. However, as one of the least loss means to transform unstructured information into structured information, information extraction plays an important role in the study of TCM NLP.
2. 4 machine learning
The application of machine learning in medicine is widely used in the classification of structured data, the direction of natural language processing is relatively small, and machine learning method is widely applied to the classification of literature, and the research direction of text knowledge mining is different. In traditional Chinese medicine, some scholars try to use machine learning technology to put forward a solution to a specific problem and get some results. Xunyan  using support vector machines and related improved algorithms to analyze and quantify the theory of typhoid, the quantitative analysis of specific medicinal herbs and the application of support vector machine to the theory of typhoid fever according to the eight-way training classification, some results are given. Yanjun et  The paper studies the key problems of syndrome differentiation by using rough set theory to obtain the inference rules of TCM diagnostic syndromes, and makes some systematic discussions about the diagnosis of symptoms and the relationship between symptoms and syndromes. YC  proposed the application of decision tree method in TCM syndrome research, and illustrated the application prospect of decision tree method in TCM diagnosis and differentiation. Lu Yanxin and other  by means of POS tagging rules to extract the noun and use support vector machine to classify it, determine whether it is a pathogenic factor and compared with the evaluation results given by the epidemiological experts, the highest 80% accuracy rate.
2.5 Named entity recognition
Biometric named entity recognition is the identification of the name of a given type from a biomedical text, such as genes, proteins, RNA, DNA, diseases, cells, and the names of drugs. At present, there are several research methods of using more biological named entity recognition: rule-based method , dictionary matching method  and machine learning method, such as support vector Machine (SVM) , maximum entropy , Conditional random field (CRF)  and Hidden Markov ( HMM)  and so on.
 Wang Hao Chang, Chi Tiejun Research and development of biomedical text mining technology [journal paper]-Journal of Chinese Information 2008 (03)
 Nintinger, Liu Xiaofeng, Gao Jianbo, Yang Bin, Kong, Zhang Fan, Wang Xin "Basic database system of Traditional Chinese Medicine" introduction [journal paper]-Chinese Journal of TCM Information
 Nintinger, Liu Xiaofeng, Zhang Fan, Xunyan, Tong mining techniques on TCM prescription Knowledge [Journal]-sci-Tech Herald 2010 (15)
 The application of Wanda-Fu Association rules in the Data mart of TCM prescriptions [journal Papers]-Journal of Guizhou University (natural Science Edition) 2006 (03)
5. Zhu Licheng, Linsuch, Shehanrong, Chaqinglin, Zhang Qiming, Lueping The association rule analysis of 445 cases of TCM Asthma Medical Case [journal paper]-Journal of Jiangxi College of Traditional Chinese Medicine
6. Ma Xiaohui, Wang Yu, ho Yue Yu syndrome cluster research [journal paper]-Chinese Journal of Basic Medicine 2000 (12)
7. Shihong, Hao, Wang Tianfang, Yan Shilin, Bi Ying, Shi Jian Mei, Zhao Yan an exploratory study on the symptoms of kidney deficiency by cluster analysis [journal paper]-Beijing Chinese Medicine Big
Chinese Journal of Study 2006 (04)
8. He Yumin, Chu more martial physique Clustering research [journal paper]-Chinese Journal of Basic Medicine 1996 (05)
9. Ho striker, Zhou Xuezhong, Zhou Zhong Eyebrow, Trimon, Wu Zhaohui cluster analysis based on the efficacy of Traditional Chinese Medicine [journal paper]-Chinese Journal of TCM Information 2004 (06)
10. Gu Zheng, the application of Gu Ping Information Extraction Technology in TCM research [Journal Papers]-Medical information 2007 (01)
11. Zhongli Research on the vertical search system of TCM clinical diagnosis and Treatment [dissertation] 2009
12. Xunyan. Research on the method of analyzing the syndrome of typhoid fever based on machine learning technology
13. Yanjun, the application of Zhu WF rough set theory in the study of syndrome differentiation of TCM [Journal]-Chinese Journal of TCM Basic Medicine 2006 (02)
14. YC, He Jia, Menghong, He Xianmin, Fan Sichang decision Tree Technology and its application in Medicine [Journal]-Journal of Mathematical Medicine 2004 (02)
15. Lu Yanxin, Yiu Xu, Wang Songwan study on extracting pathogenic factors by natural Language Processing technology [Journal]-Journal of Medical Informatics 2013 (03)
16.Fukuda K, Tamura A, Tsunoda T, et al toward information extraction:identifying protein names from biological papers.[ c]//Pacific Symposium on biocomputing. Pacific Symposium on Biocomputing. Pac symp Biocomput, 1998:707-718.
17.Tuason O, Chen L, Liu H, et al biological nomenclatures:a source of lexical knowledge and ambiguity. [J]. Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing, 2004:238.
18.Bakir G, Hofmann T, Sch?lkopf B, et al. Support Vector machine learning for interdependent and structured Output Spaces [c]//International Conference on machine learning. ACM, 2004:104.
19.Lin Y F, Tsai T H, Chou W C, et al. A maximum entropy approach to biomedical named entity recognition[c]//International Conference on Data Mining in Bioinfor Matics. Springer-verlag, 2004:56-61.
20.Su J, Su J. Named entity recognition using an hmm-based chunk tagger[c]//meeting on Association for Computational Ling Uistics. Association for Computational Linguistics, 2002:473-480.
21.Li Y, Lin H, Yang Z. Incorporating rich background knowledge for gene named entity classification and RECOGNITION[J]. BMC Bioinformatics, 2009, 10 (1): 1-15.
Chaihua, Lu Haiming, Liu early morning. A review of research methods of natural language processing in TCM [J]. Journal of Medical Informatics, 2015, 36 (10): 58-63.
Application of natural language processing in medical field