This time bindings teacher to decorate the work is simple, its principle realization is also very clear.
About Association rules, think about, its essence, the author stole thought: Still is the thought of classification, its essence is, can be divided into a category of item, its internal has a certain relevance, then, the essence of mining, is in the classification after, find the same class different item in the correlation (why can be divided into the same class).
The author just swung a piece of code, found that its portability is very good. Now, the code, the results, and the original URLs are posted below. Interested students can continue to reference.
Implementation environment: Ubuntu under Python2.7 (Ubuntu comes with)
The code is as follows:
#-*-coding:utf-8-*-"""Apriori exercise. Created on Fri Nov 11:09:03 2015@author:90zeng"""defLoaddataset ():" "Create a simple data set for testing" " return[[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5 ] ]defcreateC1 (dataSet):" "constructs a list of initial candidate sets, that is, all candidate itemsets contain only one element, C1 is a collection of all candidate itemsets of size 1" "C1= [] forTransactioninchDataSet: forIteminchTransaction:if[Item] not inchC1:C1.append ([item]) C1.sort ()returnmap (Frozenset, C1)defScand (D, Ck, minsupport):" "calculates the degree of support in data set D (record or transactions) for itemsets in CK, returns a collection of itemsets that satisfy the minimum support, and a dictionary of all the item set support information. " "sscnt= {} forTidinchD:#for every piece of transaction forCaninchCk:#for each candidate set can, check if it is part of the transaction #whether the candidate can be supported by transaction ifCan.issubset (TID): sscnt[can]= Sscnt.get (can, 0) + 1NumItems=float (len (D)) Retlist=[] Supportdata= {} forKeyinchsscnt:#the degree of support for each item setSupport = sscnt[Key]/NumItems#set the itemsets that satisfy the minimum support, join Retlist ifSupport >=minSupport:retList.insert (0, key)#summarize support level datasupportdata[key] = Supportreturnretlist, Supportdata#######################################if __name__=='__main__': #Import data setsMydat =Loaddataset ()#build the first list of candidate Itemsets C1C1 =createC1 (Mydat)#Building a DataSet representation of a data set DD =map (set, Mydat)#Select a set of itemsets with a support level of not less than 0.5 as a frequent itemsetsL, Suppdata = Scand (D, C1, 0.5 ) PrintU"frequent itemsets L:", LPrintU"support level information for all candidate sets:", Suppdata#################################################Aprior AlgorithmdefApriorigen (Lk, k):" "A collection of initial candidate sets LK generates a new set of build candidates, and K indicates the number of elements contained in the new set of items being generated" "retlist=[] lenlk=Len (Lk) forIinchRange (LENLK): forJinchRange (i + 1, LENLK): L1= List (lk[i]) [: k-2 ]; L2= List (lk[j]) [: k-2 ]; L1.sort (); L2.sort ()ifL1 = =L2:retList.append (lk[i]|lk[j])returnretlistdefApriori (dataSet, minsupport = 0.5 ): #Building the initial candidate set C1C1 =createC1 (DataSet)#aggregates datasets to meet Scand format requirementsBmap (set, DataSet)#build the initial frequent itemsets, that is, all itemsets have only one elementL1, Suppdata =Scand (D, C1, minsupport) L=[L1]#each set of items in the initial L1 contains an element, the newly generated #the set of items should contain 2 elements, so the k=2K = 2 while(Len (l[k-2]) >0): Ck= Apriorigen (l[k-2], k) Lk, SUPK=Scand (D, Ck, Minsupport)#Add the support data of the new itemsets to the original Total support degree dictionarysuppdata.update (SUPK)#Add the itemsets that meet the minimum support requirements to Ll.append (Lk)#the number of elements in the newly generated set of items should continue to increaseK + = 1#returns a list of all the frequent itemsets that meet the criteria, and the support information for all the set of candidates returnL, Suppdata###################################################if __name__=='__main__': #Import data setsMydat =Loaddataset ()#Select a frequent item setL, Suppdata = Apriori (Mydat, 0.5 ) PrintU"frequent itemsets L:", LPrintU"support level information for all candidate sets:", Suppdata
The result is:
The meaning can be seen intuitively from the results. This is also the advantage of Python, and its open source ensures that the code base is large enough.
Original website: https://www.cnblogs.com/90zeng/p/apriori.html
You are welcome to discuss.
Apriori algorithm Implementation