Correlation analysis-apriori Python code annotations

Source: Internet
Author: User

Own a little bit of comprehension, may be a little bit wrong, welcome AC ^_^

get a frequent item set Main Ideas

Python code
def loadDataSet():    return [[1,3,4],[2,3,5],[1,2,3,5],[2,5]]

CreateC1 (DataSet) gets all the itemsets for all first layers

def createC1(dataSet):    C1 = []    forin dataSet:        forin transaction:            ifnotin C1:                C1.append([item])    C1.sort()    return map(frozenset,C1)
#scanD是根据训练数据D to determine if a bunch of itemsets inside a CK are frequent.  def scand(d,ck,minsupport):sscnt = {} forTidinchD: forAainchCk:ifCan.issubset (TID):if  notSscnt.has_key (CAN): sscnt[can] =1                Else: Sscnt[can] + =1NumItems = float (len (D)) retlist = [] Supportdata = {} forKeyinchSscnt:support = Sscnt[key]/NumItemsifSupport >= MinSupport:retList.insert (0, key) Supportdata[key] = supportreturnRetlist,supportdata
#根据前一层的项集的合并得到下一层的. Like#值得注意的是这样得到的下一层不一定就是频繁项集, we have to make k-2 judgments.{1,2} {3,4} {1,3} You can get {1,2,3} def apriorigen(lk,k):Retlist = [] lenlk = Len (Lk) forIinchRange (LENLK): forJinchRange (i+1, LENLK): L1=list (Lk[i]) [: K2]; L2=list (Lk[j]) [: K-2] L1.sort (); L2.sort ()ifL1==l2:retlist.append (lk[i) | LK[J])returnRetlist
#主函数,给出数据返回频繁项集def apriori(dataSet,minSupport=0.5):    C1 = createC1(dataSet)    D = map(set,dataSet)    L1,supportData = scanD(D,C1,minSupport)    L = [L1]    2    while (len(L[k-20):        Ck = aprioriGen(L[k-2],k)        Lk,supK=scanD(D,Ck,minSupport)        supportData.update(supK)        L.append(Lk)        1    return L,supportData
Get Association rules based on frequent itemsets Main Ideas

Just looking at the right side of the rules is the way to get frequent itemsets.
Then the rules defined for a frequent itemsets must contain all the elements, so long as the right side of a rule is determined, the left side of the rule = frequent itemsets-right. Here is the possible representation of the right side of the H rule.

Pythoh Code
The main function. The initial state makes the rule to the right, and H has only one element. def generaterules(l,supportdata,minconf=0.7):Bigrulelist=[] forIinchRange1, Len (L)): forFreqsetinchL[i]: H1 = [Frozenset ([item]) forIteminchFreqset]if(I >1): Rulesfromconseq (Freqset,h1,supportdata, bigrulelist,minconf)Else: calcconf (Freqset,h1,supportdata,bigrulelist, minconf)returnBigrulelist
//the degree of support for the calculation rules is in compliance with the requirements. Finally, return all possible rules to the right of the collection Prunedh. BRL stores all the rules that meet the requirements. def  calcconf   (Freqset,h,supportdata,brl,minconf=0.7 ) :  PRUNEDH = [] for  conseq in  h:conf = support Data[freqset]/supportdata[freqset-conseq] if  conf >= minconf: 
    
     print  freqset-conseq,
     , Conseq, ' conf: ' , conf brl.append ((freqset-conseq,conseq,conf)) Prunedh.append (CONSEQ) return  prunedh 
     
//就像频繁项集一样,试图对规则的右边也就是H进行合并.然后产生新的规则def  rulesFromConseq(freqSet,H,supportData,brl,minConf=0.7):    m = len(H[0])    if (len(freqSet) > (m+1)):        Hmp1 = aprioriGen(H,m+1)        Hmp1 = calcConf(freqSet,Hmp1,supportData,brl,minConf)        if (len(Hmp1)>1):            rulesFromConseq(freqSet,Hmp1,supportData,brl,minConf)
watch out. Apriori

From Henry.
At each level KK, you have kk-item sets which is frequent (with sufficent support).

At the next level, the Kk+11-item sets your need to consider must has the property, all of their subsets must be freq Uent (with sufficent support). This is the Apriori property:any subset of frequent itemset must being frequent.

So if you know at level 2 that the sets {1,2}{1,2}, {1,3}{1,3}, {1,5}{1,5} and {3,5}{3,5} is the only sets with Sufficien T support, then at level 3 you join these with all other to produce {1,2,3}{1,2,3}, {1,2,5}{1,2,5}, {1,3,5}{1,3,5} and {2 , 3,5}{2,3,5} but your need only consider {1,3,5}{1,3,5} further:the others each has subsets with insufficent support (suc h as {2,3}{2,3} or {2,5}{2,5}).

Maximal frequent episodes

Contains none of his frequent episodes

closed Frequent sets

Included his support count is less than his

Exercise12

(a) s ({e}) = 0.8 s ({b,d}) = 0.2 s ({b,d,e}) = 0.2

3

(a) C (?→A)=S (A)
(b)c1>c2,c2<c3 -> c1>=c2,c2 <= c3
(c) The rules have the same置信度->支持度That is, left->right {left,rigth} has the same support level

6

(a) 3 6 ? 2 6 ?2+1=602
(b) 4
(c)5+C(4,3)+1+C(4,3) -> C(6,3)
(d) Butter, bread

7

(b) {1,2,3,4},{1,2,3,5},{1,2,4,5},{1,3,4,5},{2,3,4,5}
(c) {1,2,3,4},{1,2,3,5},// no {1,4,5}, no {2,4,5}

8
    1. When drawing, it is important to note that when I is not just the time to draw N, but also to draw n when it is n.
    2. F/total
    3. I/total

Correlation analysis-apriori Python code annotations

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.