Python-based Apriori algorithm and pythonApriori Algorithm

Last Update:2015-09-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Apriori algorithm is a basic algorithm in association rules. The association rule mining algorithm proposed by Dr. Rakesh Agrawal and Ramakrishnan Srikant in 1994. Association rules are used to identify the relationship between items in a dataset, also known as Market Basket analysis ), because "shopping Blue Analysis" expresses a subset that is applicable to this algorithm scenario.
For more information about the algorithm, see the following link:
Detailed explanation of the Apriori algorithm
Next, I will share with you how to use the code to implement the Apriori algorithm. The steps are as follows:
1. Create an apriori class
　

Class Apriori: def _ init _ (self, min_sup = 0.2, dataDic ={}): self. data = dataDic # construct a data record dictionary, for example, {'t800': ['i1', 'i2 ', 'i3', 'i1'],...} self. size = len (dataDic) # Number of Statistics records self. min_sup = min_sup # minimum support threshold self. min_sup_val = min_sup * self. size ## minimum support count

2. filter out items smaller than the minimum support threshold

Def find_frequent_incluitemsets (self): FreqDic ={}# {itemset1: freq1, itemsets2: freq2}, used to count the item's support count for event in self. data: # event indicates each record, for example, T800 for item in self. data [event]: # item is I1, I2, I3, I4, I5 if item in FreqDic: FreqDic [item] + = 1 else: freqDic [item] = 1 L1 = [] for itemset in FreqDic: if FreqDic [itemset]> = self. min_sup_val: # filter out the L1.append ([itemset]) return L1 items smaller than the minimum supported value

3. filter out non-frequent item sets

Def has_infrequent_subset (self, c, L_last, k): # c is the current set, L_last is the set of the previous frequent item set, and k is the number of elements in the current frequent item set, # This function is used to check whether all the subsets of the current set are subsets = list (itertools. combinations (c, k-1) # itertools is the arrangement of composite modules, Objective c decomposition, such as [1, 2, 3] will be divided into [(1, 2), (1, 3), (2, 3)] for each in subsets: each = list (each) # convert tuples to lists if each not in L_last: # return True return False if all subsets are frequent item sets

Note:
Itertools is an arrangement and combination module. For example, list (itertools. combinations ([, 3], 2) can be decomposed into [(), (), ()]
Specific use can refer to: http://www.jb51.net/article/34921.htm

4. merge to form a new frequent item set

Def required ori_gen (self, L_last): # L_last means frequent (k-1) itemsets k = len (L_last [0]) + 1 Ck = [] # for itemset1 in L_last: for itemset2 in L_last: # join step flag = 0 for I in range (K-2): print K-2 if itemset1 [I]! = Itemset2 [I]: flag = 1 # if one of the preceding K-2 items is not equal, the newly merged set cannot be the frequent item set break; if flag = 1: continue if itemset1 [K-2] <itemset2 [K-2]: c = itemset1 + [itemset2 [K-2] else: continue # pruning setp if self. has_infrequen 'T' _ subset (c, L_last, k): # determine whether the subset is a frequent item set continue else: Ck. append (c) return Ck

5. associate analysis iterations form frequent item sets

Def do (self): L_last = self. find_frequent_shortitemsets () # filter out items smaller than the minimum support threshold L = L_last I = 0 while L_last! = []: Ck = self. apriori_gen (L_last) # merge to form a new frequent item set FreqDic ={} for event in self. data: # get all suported subsets for c in Ck: # count the number of new frequent item sets if set (c) <= set (self. data [event]): # determine whether the newly merged frequent project is a subset of data records if tuple (c) in FreqDic: FreqDic [tuple (c)] + = 1 else: freqDic [tuple (c)] = 1 print FreqDic Lk = [] for c in FreqDic: print c print '------ 'if FreqDic [c]> self. min_sup_val: # determine whether the newly formed frequent item set is greater than the minimum support threshold Lk. append (list (c) L_last = Lk L + = Lk return L # L is the set of newly formed frequent item sets.

Test example
Data = {'T100':['I1','I2','I5'], 'T200':['I2','I4'], 'T300':['I2','I3'], 'T400':['I1','I2','I4'], 'T500':['I1','I3'], 'T600':['I2','I3'], 'T700':['I1','I3'], 'T800':['I1','I2','I3','I5'], 'T900':['I1','I2','I3']}

Complete code:

#! -*-Coding: UTF-8-*-import itertoolsclass Apriori: def _ init _ (self, min_sup = 0.2, dataDic ={}): self. data = dataDic # construct a data record dictionary, for example, {'t800': ['i1', 'i2 ', 'i3', 'i1'],...} self. size = len (dataDic) # Number of Statistics records self. min_sup = min_sup # minimum support threshold self. min_sup_val = min_sup * self. size ## minimum support count def find_frequent_shortitemsets (self): FreqDic ={}# {itemset1: freq1, itemsets2: freq2}, used to count the item support count for event in self. d Ata: # event indicates each record, for example, T800 for item in self. data [event]: # item is I1, I2, I3, I4, I5 if item in FreqDic: FreqDic [item] + = 1 else: freqDic [item] = 1 L1 = [] for itemset in FreqDic: if FreqDic [itemset]> = self. min_sup_val: # filter out the L1.append ([itemset]) return L1 def has_infrequent_subset (self, c, L_last, k) of the item that is less than the minimum support threshold. # c is the current set, rochelle last is the set of the previous frequent item set, and k is the number of elements in the current frequent item set, # This function is used to check whether all the subsets of the current set are subsets = list (itertoo Ls. combinations (c, k-1) # itertools is the arrangement of composite modules, Objective c decomposition, such as [1, 2, 3] will be divided into [(1, 2), (1, 3), (2, 3)] for each in subsets: each = list (each) # convert tuples to lists if each not in L_last: # return True return False def into ori_gen (self, rochelle last): # Rochelle last means frequent (k-1) itemsets k = len (L_last [0]) + 1 Ck = [] # for itemset1 in Rochelle last: for itemset2 in Rochelle last: # join step flag = 0 for I in range (K-2): print K-2 if itemse T1 [I]! = Itemset2 [I]: flag = 1 # if one of the preceding K-2 items is not equal, the newly merged set cannot be the frequent item set break; if flag = 1: continue if itemset1 [K-2] <itemset2 [K-2]: c = itemset1 + [itemset2 [K-2] else: continue # pruning setp if self. has_infrequent_subset (c, L_last, k): # determine whether the subset is a frequent item set continue else: Ck. append (c) return Ck def do (self): L_last = self. find_frequent_shortitemsets () # filter out items less than the minimum support threshold L = L_last I = 0 while L_last! = []: Ck = self. apriori_gen (L_last) # merge to form a new frequent item set FreqDic ={} for event in self. data: # get all suported subsets for c in Ck: # count the number of new frequent item sets if set (c) <= set (self. data [event]): # determine whether the newly merged frequent project is a subset of data records if tuple (c) in FreqDic: FreqDic [tuple (c)] + = 1 else: freqDic [tuple (c)] = 1 print FreqDic Lk = [] for c in FreqDic: print c print '------ 'if FreqDic [c]> self. min_sup_val: # determine whether the newly formed frequent item set is greater than the minimum support threshold Lk. append (list (c )) rochelle last = Lk L + = Lk return L # L is the set of new frequent item sets # ******* Test ****** Data = {'t100 ': ['i1', 'i2 ', 'i5'], 't200': ['i2 ', 'i4'], 't300': ['i2 ', 'i3 '], 't400': ['i1', 'i2', 'i4'], 't500 ': ['i1', 'i3'], 't600 ': ['i2', 'i3 '], 't700': ['i1', 'i3 '], 't800': ['i1 ', 'i2 ', 'i3', 'i5 '], 't900': ['i1', 'i2 ', 'i3']} a = Apriori (dataDic = Data) # print. do (). do ()

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python-based Apriori algorithm and pythonApriori Algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python-based Apriori algorithm and pythonApriori Algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support