1. Introduction to the Apriori algorithm
The Apriori algorithm is an algorithm for mining the frequent itemsets of Boolean association rules. The Apriori algorithm uses a priori knowledge of the nature of frequent itemsets, and through the iterative method of layered search, the K-itemsets are used for the exploration (k+1) itemsets to exhaust all the frequent itemsets in the data set. Find the frequent itemsets 1-itemsets collection L1, then use L1 to find the frequent 2-itemsets collection L2, and then use L2 to find L3, knowing that frequent K-itemsets are not found, and that each LK needs a database scan. Note: all non-empty sets of frequent itemsets must also be frequent. Apriori Nature improves the efficiency of frequent itemsets by layer by reducing search space. The Apriori algorithm consists of two steps to connect and prune .
2. Apriori algorithm Steps
According to an example: is a trade list, I1 to I5 can be regarded as 5 kinds of goods. The association rules are identified by the frequent item collection below.
Suppose our minimum support threshold is 2, that is, the support count is less than 2 to be removed.
The first line of the above table (the first transaction) means: I1 and I2 and I5 are purchased together.
C1 to L1: just check to see if the support is higher than the threshold and then choose. All thresholds in the C1 are greater than 2, so the L1 are preserved.
The process of L1 to C2 is divided into three steps:
- Traversal produces all probability combinations in L1, i.e. (I1,I2) ... (I4,I5)
- Each combination that facilitates the creation is split to ensure that all non-empty sets of frequent itemsets must also be frequent. That is, for (I1,I2) to split into I1,i2. Because I1 and I2 are frequent items in L1, this combination is preserved.
- For the remaining C2, the support count is based on the original data set
C2 to L2: just check to see if the support is higher than the threshold and then choose.
The process of L2 to C3:
or the above steps. First generation (1,2,4), (1,2,5), .... Why is the last only (1,2,5)? Because of the pruning process: (1,2,4) split into (UP) and (1,4) and (2,4). However (1,4) does not exist in L2, that is, non-frequent items. All pruning deletions. The remaining combinations in the C3 are then counted. Found (1,2,5) and (2) Support degrees. End of iteration.
So the algorithm process is the CK-LK-CK+1 process:
3.Apriori Algorithm Implementation
#-*-coding:utf-8-*-"""Created on Sat Dec 9 15:33:45 2017@author:lps"""ImportNumPy as NP fromItertoolsImportCombinations#Iterative ToolsData= [[1,2,5], [2,4], [2,3], [1,2,4], [1,3], [2,3], [1,3], [1,2,3,5], [A.]]MINSP= 2D= [] forIinchRange (len (data)): D.extend (Data[i]) new_d=list (set (d))defSatisfy (S, s_new, K):#update does exist for Le=[] ss_new=[] forIinchRange (len (s_new)): forJinchCombinations (S_new[i], K):#iterative generation of all element possibility combinationsE.append (List (j))if([l forLinchEifL not inchS]) = =[]: Ss_new.append (s_new[i]) e= [] returnSs_new#Filter the results that meet the criteria defCount (s_new):#returns C in Narray formatnum =0 C=np.copy (s_new) C=Np.column_stack ((C, Np.zeros (c.shape[0) )) forIinchRange (len (s_new)): forJinchRange (len (data)):if([l forLinchS_new[i]ifL not inchDATA[J]) = =[]: num= Num+1C[i,-1] =Num Num=0returnCdefLimit (L):#Remove the C that does not meet the threshold valuerow = [] forIinchRange (L.shape[0]):ifL[I,-1] <minsp:row.append (i) L=Np.delete (L, row, 0)returnLdefGenerate (L, K):#implementation of conversions from L to Cs = [] forIinchRange (L.shape[0]): S.append (List (l[i,:-1])) S_new= []#L = l.delete (L,-1, 1)#L = l.shape[1] forIinchRange (l.shape[0]-1): forJinchRange (i+1, L.shape[0]):if(l[j,-2]>l[i,-2]): t=list (np.copy (s[i)) T.append (L[j,-2]) s_new.append (t)#s_new as a lists_new=satisfy (S, s_new, K) C=count (s_new)returnC#the initial C and LC = Np.zeros ([Len (New_d), 2]) forIinchRange (len (new_d)): C[i:]=Np.array ([New_d[i], D.count (New_d[i])) L=np.copy (C) L=limit (L)#Start IterationK = 1 while(Np.max (l[:,-1]) >minsp): C=Generate (L, K) # produced by L C L=limit (c) # generated by C l K= K+1#Repeat for the final result.Print(List (Set ([Tuple (t) forTinchL] )))#results for [(1.0, 2.0, 3.0, 2.0), (1.0, 2.0, 5.0, 2.0)]
Apriori algorithm Python implementation