Apriori algorithm Implementation

Last Update:2017-12-07 Source: Internet

Author: User

Tags new set

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This time bindings teacher to decorate the work is simple, its principle realization is also very clear.

About Association rules, think about, its essence, the author stole thought: Still is the thought of classification, its essence is, can be divided into a category of item, its internal has a certain relevance, then, the essence of mining, is in the classification after, find the same class different item in the correlation (why can be divided into the same class).

The author just swung a piece of code, found that its portability is very good. Now, the code, the results, and the original URLs are posted below. Interested students can continue to reference.

Implementation environment: Ubuntu under Python2.7 (Ubuntu comes with)

The code is as follows:

#-*-coding:utf-8-*-"""Apriori exercise. Created on Fri Nov 11:09:03 2015@author:90zeng"""defLoaddataset ():" "Create a simple data set for testing" "    return[[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5 ] ]defcreateC1 (dataSet):" "constructs a list of initial candidate sets, that is, all candidate itemsets contain only one element, C1 is a collection of all candidate itemsets of size 1" "C1= []     forTransactioninchDataSet: forIteminchTransaction:if[Item] not inchC1:C1.append ([item]) C1.sort ()returnmap (Frozenset, C1)defScand (D, Ck, minsupport):" "calculates the degree of support in data set D (record or transactions) for itemsets in CK, returns a collection of itemsets that satisfy the minimum support, and a dictionary of all the item set support information. " "sscnt= {}     forTidinchD:#for every piece of transaction         forCaninchCk:#for each candidate set can, check if it is part of the transaction            #whether the candidate can be supported by transaction            ifCan.issubset (TID): sscnt[can]= Sscnt.get (can, 0) + 1NumItems=float (len (D)) Retlist=[] Supportdata= {}     forKeyinchsscnt:#the degree of support for each item setSupport = sscnt[Key]/NumItems#set the itemsets that satisfy the minimum support, join Retlist        ifSupport >=minSupport:retList.insert (0, key)#summarize support level datasupportdata[key] = Supportreturnretlist, Supportdata#######################################if __name__=='__main__':    #Import data setsMydat =Loaddataset ()#build the first list of candidate Itemsets C1C1 =createC1 (Mydat)#Building a DataSet representation of a data set DD =map (set, Mydat)#Select a set of itemsets with a support level of not less than 0.5 as a frequent itemsetsL, Suppdata = Scand (D, C1, 0.5 )       PrintU"frequent itemsets L:", LPrintU"support level information for all candidate sets:", Suppdata#################################################Aprior AlgorithmdefApriorigen (Lk, k):" "A collection of initial candidate sets LK generates a new set of build candidates, and K indicates the number of elements contained in the new set of items being generated" "retlist=[] lenlk=Len (Lk) forIinchRange (LENLK): forJinchRange (i + 1, LENLK): L1= List (lk[i]) [: k-2 ]; L2= List (lk[j]) [: k-2 ]; L1.sort (); L2.sort ()ifL1 = =L2:retList.append (lk[i]|lk[j])returnretlistdefApriori (dataSet, minsupport = 0.5 ):    #Building the initial candidate set C1C1 =createC1 (DataSet)#aggregates datasets to meet Scand format requirementsBmap (set, DataSet)#build the initial frequent itemsets, that is, all itemsets have only one elementL1, Suppdata =Scand (D, C1, minsupport) L=[L1]#each set of items in the initial L1 contains an element, the newly generated    #the set of items should contain 2 elements, so the k=2K = 2 while(Len (l[k-2]) >0): Ck= Apriorigen (l[k-2], k) Lk, SUPK=Scand (D, Ck, Minsupport)#Add the support data of the new itemsets to the original Total support degree dictionarysuppdata.update (SUPK)#Add the itemsets that meet the minimum support requirements to Ll.append (Lk)#the number of elements in the newly generated set of items should continue to increaseK + = 1#returns a list of all the frequent itemsets that meet the criteria, and the support information for all the set of candidates    returnL, Suppdata###################################################if __name__=='__main__':    #Import data setsMydat =Loaddataset ()#Select a frequent item setL, Suppdata = Apriori (Mydat, 0.5 )    PrintU"frequent itemsets L:", LPrintU"support level information for all candidate sets:", Suppdata

The result is:

The meaning can be seen intuitively from the results. This is also the advantage of Python, and its open source ensures that the code base is large enough.

Original website: https://www.cnblogs.com/90zeng/p/apriori.html

You are welcome to discuss.

Apriori algorithm Implementation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More