Correlation analysis-apriori Python code annotations

Last Update:2016-05-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Own a little bit of comprehension, may be a little bit wrong, welcome AC ^_^

get a frequent item set Main Ideas

Python code

def loadDataSet():    return [[1,3,4],[2,3,5],[1,2,3,5],[2,5]]

CreateC1 (DataSet) gets all the itemsets for all first layers

def createC1(dataSet):    C1 = []    forin dataSet:        forin transaction:            ifnotin C1:                C1.append([item])    C1.sort()    return map(frozenset,C1)

#scanD是根据训练数据D to determine if a bunch of itemsets inside a CK are frequent.  def scand(d,ck,minsupport):sscnt = {} forTidinchD: forAainchCk:ifCan.issubset (TID):if  notSscnt.has_key (CAN): sscnt[can] =1                Else: Sscnt[can] + =1NumItems = float (len (D)) retlist = [] Supportdata = {} forKeyinchSscnt:support = Sscnt[key]/NumItemsifSupport >= MinSupport:retList.insert (0, key) Supportdata[key] = supportreturnRetlist,supportdata

#根据前一层的项集的合并得到下一层的. Like#值得注意的是这样得到的下一层不一定就是频繁项集, we have to make k-2 judgments.{1,2} {3,4} {1,3} You can get {1,2,3} def apriorigen(lk,k):Retlist = [] lenlk = Len (Lk) forIinchRange (LENLK): forJinchRange (i+1, LENLK): L1=list (Lk[i]) [: K2]; L2=list (Lk[j]) [: K-2] L1.sort (); L2.sort ()ifL1==l2:retlist.append (lk[i) | LK[J])returnRetlist

#主函数,给出数据返回频繁项集def apriori(dataSet,minSupport=0.5):    C1 = createC1(dataSet)    D = map(set,dataSet)    L1,supportData = scanD(D,C1,minSupport)    L = [L1]    2    while (len(L[k-20):        Ck = aprioriGen(L[k-2],k)        Lk,supK=scanD(D,Ck,minSupport)        supportData.update(supK)        L.append(Lk)        1    return L,supportData

Get Association rules based on frequent itemsets Main Ideas

Just looking at the right side of the rules is the way to get frequent itemsets.
Then the rules defined for a frequent itemsets must contain all the elements, so long as the right side of a rule is determined, the left side of the rule = frequent itemsets-right. Here is the possible representation of the right side of the H rule.

Pythoh Code

The main function. The initial state makes the rule to the right, and H has only one element. def generaterules(l,supportdata,minconf=0.7):Bigrulelist=[] forIinchRange1, Len (L)): forFreqsetinchL[i]: H1 = [Frozenset ([item]) forIteminchFreqset]if(I >1): Rulesfromconseq (Freqset,h1,supportdata, bigrulelist,minconf)Else: calcconf (Freqset,h1,supportdata,bigrulelist, minconf)returnBigrulelist

//the degree of support for the calculation rules is in compliance with the requirements. Finally, return all possible rules to the right of the collection Prunedh. BRL stores all the rules that meet the requirements. def  calcconf   (Freqset,h,supportdata,brl,minconf=0.7 ) :  PRUNEDH = [] for  conseq in  h:conf = support Data[freqset]/supportdata[freqset-conseq] if  conf >= minconf: 
    
     print  freqset-conseq,
     , Conseq, ' conf: ' , conf brl.append ((freqset-conseq,conseq,conf)) Prunedh.append (CONSEQ) return  prunedh

//就像频繁项集一样，试图对规则的右边也就是H进行合并.然后产生新的规则def  rulesFromConseq(freqSet,H,supportData,brl,minConf=0.7):    m = len(H[0])    if (len(freqSet) > (m+1)):        Hmp1 = aprioriGen(H,m+1)        Hmp1 = calcConf(freqSet,Hmp1,supportData,brl,minConf)        if (len(Hmp1)>1):            rulesFromConseq(freqSet,Hmp1,supportData,brl,minConf)

watch out. Apriori

From Henry.
At each level KK, you have kk-item sets which is frequent (with sufficent support).

At the next level, the Kk+11-item sets your need to consider must has the property, all of their subsets must be freq Uent (with sufficent support). This is the Apriori property:any subset of frequent itemset must being frequent.

So if you know at level 2 that the sets {1,2}{1,2}, {1,3}{1,3}, {1,5}{1,5} and {3,5}{3,5} is the only sets with Sufficien T support, then at level 3 you join these with all other to produce {1,2,3}{1,2,3}, {1,2,5}{1,2,5}, {1,3,5}{1,3,5} and {2 , 3,5}{2,3,5} but your need only consider {1,3,5}{1,3,5} further:the others each has subsets with insufficent support (suc h as {2,3}{2,3} or {2,5}{2,5}).

Maximal frequent episodes

Contains none of his frequent episodes

closed Frequent sets

Included his support count is less than his

Exercise12

(a) s ({e}) = 0.8 s ({b,d}) = 0.2 s ({b,d,e}) = 0.2

(a) C (?→A)=S (A)
(b)c1>c2,c2<c3 -> c1>=c2,c2 <= c3
(c) The rules have the same置信度->支持度That is, left->right {left,rigth} has the same support level

(a) 3 6 ? 2 6 ?2+1=602
(b) 4
(c)5+C(4,3)+1+C(4,3) -> C(6,3)
(d) Butter, bread

(b) {1,2,3,4},{1,2,3,5},{1,2,4,5},{1,3,4,5},{2,3,4,5}
(c) {1,2,3,4},{1,2,3,5},// no {1,4,5}, no {2,4,5}

When drawing, it is important to note that when I is not just the time to draw N, but also to draw n when it is n.
F/total
I/total

Correlation analysis-apriori Python code annotations

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Correlation analysis-apriori Python code annotations

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Correlation analysis-apriori Python code annotations

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support