Own a little bit of comprehension, may be a little bit wrong, welcome AC ^_^
get a frequent item set
Main Ideas
Python code
def loadDataSet(): return [[1,3,4],[2,3,5],[1,2,3,5],[2,5]]
CreateC1 (DataSet) gets all the itemsets for all first layers
def createC1(dataSet): C1 = [] forin dataSet: forin transaction: ifnotin C1: C1.append([item]) C1.sort() return map(frozenset,C1)
#scanD是根据训练数据D to determine if a bunch of itemsets inside a CK are frequent. def scand(d,ck,minsupport):sscnt = {} forTidinchD: forAainchCk:ifCan.issubset (TID):if notSscnt.has_key (CAN): sscnt[can] =1 Else: Sscnt[can] + =1NumItems = float (len (D)) retlist = [] Supportdata = {} forKeyinchSscnt:support = Sscnt[key]/NumItemsifSupport >= MinSupport:retList.insert (0, key) Supportdata[key] = supportreturnRetlist,supportdata
#根据前一层的项集的合并得到下一层的. Like#值得注意的是这样得到的下一层不一定就是频繁项集, we have to make k-2 judgments.{1,2} {3,4} {1,3} You can get {1,2,3} def apriorigen(lk,k):Retlist = [] lenlk = Len (Lk) forIinchRange (LENLK): forJinchRange (i+1, LENLK): L1=list (Lk[i]) [: K2]; L2=list (Lk[j]) [: K-2] L1.sort (); L2.sort ()ifL1==l2:retlist.append (lk[i) | LK[J])returnRetlist
#主函数,给出数据返回频繁项集def apriori(dataSet,minSupport=0.5): C1 = createC1(dataSet) D = map(set,dataSet) L1,supportData = scanD(D,C1,minSupport) L = [L1] 2 while (len(L[k-20): Ck = aprioriGen(L[k-2],k) Lk,supK=scanD(D,Ck,minSupport) supportData.update(supK) L.append(Lk) 1 return L,supportData
Get Association rules based on frequent itemsets
Main Ideas
Just looking at the right side of the rules is the way to get frequent itemsets.
Then the rules defined for a frequent itemsets must contain all the elements, so long as the right side of a rule is determined, the left side of the rule = frequent itemsets-right. Here is the possible representation of the right side of the H rule.
Pythoh Code
The main function. The initial state makes the rule to the right, and H has only one element. def generaterules(l,supportdata,minconf=0.7):Bigrulelist=[] forIinchRange1, Len (L)): forFreqsetinchL[i]: H1 = [Frozenset ([item]) forIteminchFreqset]if(I >1): Rulesfromconseq (Freqset,h1,supportdata, bigrulelist,minconf)Else: calcconf (Freqset,h1,supportdata,bigrulelist, minconf)returnBigrulelist
//the degree of support for the calculation rules is in compliance with the requirements. Finally, return all possible rules to the right of the collection Prunedh. BRL stores all the rules that meet the requirements. def calcconf (Freqset,h,supportdata,brl,minconf=0.7 ) : PRUNEDH = [] for conseq in h:conf = support Data[freqset]/supportdata[freqset-conseq] if conf >= minconf:
print freqset-conseq,
, Conseq, ' conf: ' , conf brl.append ((freqset-conseq,conseq,conf)) Prunedh.append (CONSEQ) return prunedh
//就像频繁项集一样,试图对规则的右边也就是H进行合并.然后产生新的规则def rulesFromConseq(freqSet,H,supportData,brl,minConf=0.7): m = len(H[0]) if (len(freqSet) > (m+1)): Hmp1 = aprioriGen(H,m+1) Hmp1 = calcConf(freqSet,Hmp1,supportData,brl,minConf) if (len(Hmp1)>1): rulesFromConseq(freqSet,Hmp1,supportData,brl,minConf)
watch out.
Apriori
From Henry.
At each level KK, you have kk-item sets which is frequent (with sufficent support).
At the next level, the Kk+11-item sets your need to consider must has the property, all of their subsets must be freq Uent (with sufficent support). This is the Apriori property:any subset of frequent itemset must being frequent.
So if you know at level 2 that the sets {1,2}{1,2}, {1,3}{1,3}, {1,5}{1,5} and {3,5}{3,5} is the only sets with Sufficien T support, then at level 3 you join these with all other to produce {1,2,3}{1,2,3}, {1,2,5}{1,2,5}, {1,3,5}{1,3,5} and {2 , 3,5}{2,3,5} but your need only consider {1,3,5}{1,3,5} further:the others each has subsets with insufficent support (suc h as {2,3}{2,3} or {2,5}{2,5}).
Maximal frequent episodes
Contains none of his frequent episodes
closed Frequent sets
Included his support count is less than his
Exercise12
(a) s ({e}) = 0.8 s ({b,d}) = 0.2 s ({b,d,e}) = 0.2
3
(a) C (?→A)=S (A)
(b)c1>c2,c2<c3 -> c1>=c2,c2 <= c3
(c) The rules have the same置信度->支持度
That is, left->right {left,rigth} has the same support level
6
(a) 3 6 ? 2 6 ?2+1=602
(b) 4
(c)5+C(4,3)+1+C(4,3) -> C(6,3)
(d) Butter, bread
7
(b) {1,2,3,4},{1,2,3,5},{1,2,4,5},{1,3,4,5},{2,3,4,5}
(c) {1,2,3,4},{1,2,3,5},// no {1,4,5}, no {2,4,5}
8
- When drawing, it is important to note that when I is not just the time to draw N, but also to draw n when it is n.
- F/total
- I/total
Correlation analysis-apriori Python code annotations