Apriori algorithm Implementation

Source: Internet
Author: User
Tags new set

This time bindings teacher to decorate the work is simple, its principle realization is also very clear.

About Association rules, think about, its essence, the author stole thought: Still is the thought of classification, its essence is, can be divided into a category of item, its internal has a certain relevance, then, the essence of mining, is in the classification after, find the same class different item in the correlation (why can be divided into the same class).

The author just swung a piece of code, found that its portability is very good. Now, the code, the results, and the original URLs are posted below. Interested students can continue to reference.

Implementation environment: Ubuntu under Python2.7 (Ubuntu comes with)

The code is as follows:

#-*-coding:utf-8-*-"""Apriori exercise. Created on Fri Nov 11:09:03 2015@author:90zeng"""defLoaddataset ():" "Create a simple data set for testing" "    return[[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5 ] ]defcreateC1 (dataSet):" "constructs a list of initial candidate sets, that is, all candidate itemsets contain only one element, C1 is a collection of all candidate itemsets of size 1" "C1= []     forTransactioninchDataSet: forIteminchTransaction:if[Item] not inchC1:C1.append ([item]) C1.sort ()returnmap (Frozenset, C1)defScand (D, Ck, minsupport):" "calculates the degree of support in data set D (record or transactions) for itemsets in CK, returns a collection of itemsets that satisfy the minimum support, and a dictionary of all the item set support information. " "sscnt= {}     forTidinchD:#for every piece of transaction         forCaninchCk:#for each candidate set can, check if it is part of the transaction            #whether the candidate can be supported by transaction            ifCan.issubset (TID): sscnt[can]= Sscnt.get (can, 0) + 1NumItems=float (len (D)) Retlist=[] Supportdata= {}     forKeyinchsscnt:#the degree of support for each item setSupport = sscnt[Key]/NumItems#set the itemsets that satisfy the minimum support, join Retlist        ifSupport >=minSupport:retList.insert (0, key)#summarize support level datasupportdata[key] = Supportreturnretlist, Supportdata#######################################if __name__=='__main__':    #Import data setsMydat =Loaddataset ()#build the first list of candidate Itemsets C1C1 =createC1 (Mydat)#Building a DataSet representation of a data set DD =map (set, Mydat)#Select a set of itemsets with a support level of not less than 0.5 as a frequent itemsetsL, Suppdata = Scand (D, C1, 0.5 )       PrintU"frequent itemsets L:", LPrintU"support level information for all candidate sets:", Suppdata#################################################Aprior AlgorithmdefApriorigen (Lk, k):" "A collection of initial candidate sets LK generates a new set of build candidates, and K indicates the number of elements contained in the new set of items being generated" "retlist=[] lenlk=Len (Lk) forIinchRange (LENLK): forJinchRange (i + 1, LENLK): L1= List (lk[i]) [: k-2 ]; L2= List (lk[j]) [: k-2 ]; L1.sort (); L2.sort ()ifL1 = =L2:retList.append (lk[i]|lk[j])returnretlistdefApriori (dataSet, minsupport = 0.5 ):    #Building the initial candidate set C1C1 =createC1 (DataSet)#aggregates datasets to meet Scand format requirementsBmap (set, DataSet)#build the initial frequent itemsets, that is, all itemsets have only one elementL1, Suppdata =Scand (D, C1, minsupport) L=[L1]#each set of items in the initial L1 contains an element, the newly generated    #the set of items should contain 2 elements, so the k=2K = 2 while(Len (l[k-2]) >0): Ck= Apriorigen (l[k-2], k) Lk, SUPK=Scand (D, Ck, Minsupport)#Add the support data of the new itemsets to the original Total support degree dictionarysuppdata.update (SUPK)#Add the itemsets that meet the minimum support requirements to Ll.append (Lk)#the number of elements in the newly generated set of items should continue to increaseK + = 1#returns a list of all the frequent itemsets that meet the criteria, and the support information for all the set of candidates    returnL, Suppdata###################################################if __name__=='__main__':    #Import data setsMydat =Loaddataset ()#Select a frequent item setL, Suppdata = Apriori (Mydat, 0.5 )    PrintU"frequent itemsets L:", LPrintU"support level information for all candidate sets:", Suppdata

The result is:

The meaning can be seen intuitively from the results. This is also the advantage of Python, and its open source ensures that the code base is large enough.

Original website: https://www.cnblogs.com/90zeng/p/apriori.html

You are welcome to discuss.

Apriori algorithm Implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.