Using association rules to explore the relationship between TCM syndromes and malignant tumors

Last Update:2018-07-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Target :
Using pathological information to explore the relationship between TCM syndromes and TNM staging of breast cancer

thinking and Flow :
Objective to explore the relationship between TCM syndromes and TNM staging of breast cancer, and adopt Association rules model
After confirming the model, the data of the patients ' TCM syndromes and the TNM staging of breast cancer need to be sorted out. The data should be preprocessed, including data cleaning, attribute specification, data transformation, etc. to meet the need of mining.

getting Data - data preprocessing - building models

First, access to data

Chinese Medicine syndrome: ' Liver-qi stagnation syndrome type coefficient ', ' heat toxin accumulation syndrome ', ' Chong-ren-syndrome-type coefficient ', ' deficiency syndrome type coefficient ', ' spleen-stomach weakness syndrome ', ' syndrome type of hepatic-kidney yin deficiency '
TNM staging of breast cancer: h1:i, H2:ii, H3:III, H4:iv. Phase I is lighter, IV is heavier

Data set size is (930 rows, 7 columns), no null value

Second, data preprocessing

In the last step we found that the data has been very clean, where the main task of data preprocessing is to discretization of each attribute, and to cluster each attribute into 4 classes. This is done to accommodate the needs of the algorithm because the association rule algorithm cannot handle continuous data

The key to clustering each attribute into the 4 class is to find the right dividing point. The dividing point is determined by clustering algorithm to find the cluster center of each attribute, taking the average value of the adjacent cluster center.

The range and corresponding label of syndrome type coefficient of liver-qi stagnation:

The code is as follows:

def f (x): From
    sklearn.cluster import kmeans
    model = Kmeans (n_clusters=4, n_jobs=4)
    Model.fit (D[[x]].as_ Matrix ())

    centers_d = PD. Dataframe (Model.cluster_centers_). Sort_values (by = 0)
    group = [0] + list (centers_d.rolling (2). mean (). Iloc[1:][0]  + [D[x].max ()]
    s = pd.cut (d[x], group, labels = [x + str (i) for I in range (4)]) return

    s

discretization_d = Pd.concat (' F (' syndrome of liver-qi stagnation '), f (' accumulation coefficient of heat toxin '),
                              F (' Chong-ren imbalance syndrome type coefficient '), f (' Qi and blood two deficiency syndrome '),
                              f (' Spleen and stomach weakness syndrome type coefficient '), f (' liver and Kidney yin deficiency syndrome type coefficient '), d[' TNM Staging ']],axis=1)

The resulting dataset is:

Construction model and its application

According to the purpose of mining, the association rule model is adopted here. Infer information about another property from one attribute, based on the associated relationship that is being mined

The key of association algorithm is to determine the appropriate minimum support and minimum confidence, but there is no uniform standard. The excavation after several adjustments and combined with the business, to determine the minimum support of 5.9%, the minimum confidence level of 75%, the Association rules Code is as follows:

def connect_string (x, MS): #自定义连接函数, to implement l_{k-1} to c_k connection x = List (map (Lambda i:sorted (I.split (MS), x)) L = Len (x[0) r = [] for i in range (len (x)): for J in Range (I,len (x)): if x[i][:l-1] = = X[j][:l-1] and x[i ][L-1]!= X[j][l-1]: R.append (x[i][:l-1]+sorted ([x[j][l-1],x[i][l-1])) return R def find_rule (D, SUP Port, confidence, MS = U '--'): #寻找关联规则的函数 result = PD. Dataframe (index=[' support ', ' confidence ']) #定义输出结果 support_series = 1.0*d.sum ()/len (d) #支持度序列 column = List (Suppo
        Rt_series[support_series > Support].index) #初步根据支持度筛选 k = 0 while len (column) > 1:k = k+1 Print (U \ n is doing the%s search ... '%k) column = connect_string (column, ms) print (U ' number:%s ... '%len (column)) s f = Lambda I:d[i].prod (axis=1, numeric_only = True) #新一批支持度的计算函数 #创建连接数据, which is a time-consuming, memory-intensive step.
        When the dataset is large, parallel operation optimization can be considered. D_2 = PD. Dataframe (List (map (sf,column)), index = [Ms.join (i) for I in column]).

        TSupport_series_2 = 1.0*d_2[[ms.join (i) for I in Column]].sum ()/len (d) #计算连接后的支持度 column = List (support_series_2[su Pport_series_2 > Support].index) #新一轮支持度筛选 support_series = Support_series.append (support_series_2) col
            Umn2 = [] for i in column: #遍历可能的推理, such as whether {a,b,c} is a+b-->c or B+c-->a or c+a-->b. i = I.split (ms) for J-in range (len (i)): Column2.append (i[:j]+i[j+1:]+i[j:j+1]) cofide Nce_series = PD. Series (Index=[ms.join (i) for I-column2]) #定义置信度序列 for i in Column2: #计算置信度序列 COFIDENCE_SERIES[MS.J Oin (i)] = Support_series[ms.join (sorted (i))]/support_series[ms.join (I[:len (i)-1])] for I in Cofidence_series[cofi Dence_series > Confidence].index: #置信度筛选 result[i] = 0.0 result[i][' confidence '] = Cofidence_se Ries[i] result[i][' support ' = Support_series[ms.join (sorted (ms))] result = result. T.sort_values (by = [' confidence ', ' SuppoRT '], ascending = False) print (U ' \ n results: ') return result

Package the above code into a library, enter the dataset, and run the result:

It can be seen that the liver-qi stagnation syndrome type coefficient 2, hepatic and Kidney yin deficiency syndrome type coefficient 3 to H4 the maximum support reached 7.95%, the maximum confidence reached 88%. This shows that when the syndrome of liver-qi stagnation is in (0.258, 0.352), and the coefficient of syndrome of Yin deficiency is in (0.355, 0.607), the probability of the TMN being diagnosed as H4 is 88%, and the probability of this happening is 7.95%.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Using association rules to explore the relationship between TCM syndromes and malignant tumors

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Using association rules to explore the relationship between TCM syndromes and malignant tumors

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support