Apriori algorithm Python implementation

Source: Internet
Author: User

1. Introduction to the Apriori algorithm

The Apriori algorithm is an algorithm for mining the frequent itemsets of Boolean association rules. The Apriori algorithm uses a priori knowledge of the nature of frequent itemsets, and through the iterative method of layered search, the K-itemsets are used for the exploration (k+1) itemsets to exhaust all the frequent itemsets in the data set. Find the frequent itemsets 1-itemsets collection L1, then use L1 to find the frequent 2-itemsets collection L2, and then use L2 to find L3, knowing that frequent K-itemsets are not found, and that each LK needs a database scan. Note: all non-empty sets of frequent itemsets must also be frequent. Apriori Nature improves the efficiency of frequent itemsets by layer by reducing search space. The Apriori algorithm consists of two steps to connect and prune .

2. Apriori algorithm Steps

According to an example: is a trade list, I1 to I5 can be regarded as 5 kinds of goods. The association rules are identified by the frequent item collection below.

Suppose our minimum support threshold is 2, that is, the support count is less than 2 to be removed.

The first line of the above table (the first transaction) means: I1 and I2 and I5 are purchased together.

C1 to L1: just check to see if the support is higher than the threshold and then choose. All thresholds in the C1 are greater than 2, so the L1 are preserved.

The process of L1 to C2 is divided into three steps:

    • Traversal produces all probability combinations in L1, i.e. (I1,I2) ... (I4,I5)
    • Each combination that facilitates the creation is split to ensure that all non-empty sets of frequent itemsets must also be frequent. That is, for (I1,I2) to split into I1,i2. Because I1 and I2 are frequent items in L1, this combination is preserved.
    • For the remaining C2, the support count is based on the original data set

C2 to L2: just check to see if the support is higher than the threshold and then choose.

The process of L2 to C3:

or the above steps. First generation (1,2,4), (1,2,5), .... Why is the last only (1,2,5)? Because of the pruning process: (1,2,4) split into (UP) and (1,4) and (2,4). However (1,4) does not exist in L2, that is, non-frequent items. All pruning deletions. The remaining combinations in the C3 are then counted. Found (1,2,5) and (2) Support degrees. End of iteration.

So the algorithm process is the CK-LK-CK+1 process:

3.Apriori Algorithm Implementation

#-*-coding:utf-8-*-"""Created on Sat Dec 9 15:33:45 2017@author:lps"""ImportNumPy as NP fromItertoolsImportCombinations#Iterative ToolsData= [[1,2,5], [2,4], [2,3], [1,2,4], [1,3], [2,3], [1,3], [1,2,3,5], [A.]]MINSP= 2D= [] forIinchRange (len (data)): D.extend (Data[i]) new_d=list (set (d))defSatisfy (S, s_new, K):#update does exist for Le=[] ss_new=[]     forIinchRange (len (s_new)): forJinchCombinations (S_new[i], K):#iterative generation of all element possibility combinationsE.append (List (j))if([l forLinchEifL not inchS]) = =[]: Ss_new.append (s_new[i]) e= []            returnSs_new#Filter the results that meet the criteria    defCount (s_new):#returns C in Narray formatnum =0 C=np.copy (s_new) C=Np.column_stack ((C, Np.zeros (c.shape[0) )) forIinchRange (len (s_new)): forJinchRange (len (data)):if([l forLinchS_new[i]ifL not inchDATA[J]) = =[]: num= Num+1C[i,-1] =Num Num=0returnCdefLimit (L):#Remove the C that does not meet the threshold valuerow = []     forIinchRange (L.shape[0]):ifL[I,-1] <minsp:row.append (i) L=Np.delete (L, row, 0)returnLdefGenerate (L, K):#implementation of conversions from L to Cs = []     forIinchRange (L.shape[0]): S.append (List (l[i,:-1])) S_new= []#L = l.delete (L,-1, 1)#L = l.shape[1]     forIinchRange (l.shape[0]-1):         forJinchRange (i+1, L.shape[0]):if(l[j,-2]>l[i,-2]): t=list (np.copy (s[i)) T.append (L[j,-2]) s_new.append (t)#s_new as a lists_new=satisfy (S, s_new, K) C=count (s_new)returnC#the initial C and LC = Np.zeros ([Len (New_d), 2]) forIinchRange (len (new_d)): C[i:]=Np.array ([New_d[i], D.count (New_d[i])) L=np.copy (C) L=limit (L)#Start IterationK = 1 while(Np.max (l[:,-1]) >minsp): C=Generate (L, K) # produced by L C L=limit (c) # generated by C l K= K+1#Repeat for the final result.Print(List (Set ([Tuple (t) forTinchL] )))#results for [(1.0, 2.0, 3.0, 2.0), (1.0, 2.0, 5.0, 2.0)]

Apriori algorithm Python implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.