Basic Concepts
For A→b A\rightarrow B
Support degree (support):
P (A∩B) p (A∩B), the probability of both A and B
Confidence degree (confidence strength):
Conf (a→b) =sup (a∪b) sup (A) =p (b| A) conf (a\rightarrow B) = {sup (a∪b) \over sup (a)} = P (b| A
That is, the probability that B will occur at the same time in the event of a occurrence
For example shopping basket analysis: milk ⇒⇒ Bread
Example: [Support degree: 3%, reliability: 40%]
Support level 3%: means 3% customers buy milk and bread at the same time
Confidence level 40%: means that customers who buy milk 40% also buy bread
Candidate set (Candidate Itemset):
The set of items that are merged down.
Defined as C[k].
Frequent episodes (frequent itemset):
The support degree is greater than the set of items equal to a specific minimum support (Minimum support/minsup).
expressed as l[k].
Lifting ratio (elevation lift):
Lift (x→y) =lift (y→x) =conf (x→y) supp (y) =conf (y→x) supp (x) =p (x⋂y) p (x) p (y) lift (x \rightarrow y) = lift (y \rightarrow x) = { Conf (x \rightarrow y) \over supp (y)} = {conf (y \rightarrow X) \over supp (x)} = {p (x \bigcap y) \over p (x) p (y)}
Close down properties (downward Closure property)
If an item set satisfies a minimum support requirement, then any non-empty set of the set of items must satisfy this minimum support degree. Introduction to Apriori algorithm
Apriori algorithm is a frequent itemsets algorithm for mining Association rules, whose core idea is to close and detect the frequent itemsets through the generation of candidate sets.
Apriori algorithm is widely used in consumer market price analysis, guessing consumer consumption habits; intrusion detection technology in the field of network security; can be used in the management of colleges and universities, according to the mining rules to effectively assist the school management departments targeted to carry out poverty education work; can also be used in the field of mobile communications, Directs the operation of operators and the decision making of supporting business providers. Mining Steps:
1. Find all frequent itemsets (frequency) according to the support degree
2. The implementation Step of association rule (strength) based on confidence
Apriori uses an iterative approach called layered search , which is used to search for "K-set" K-1.
First, the entire transaction is scanned to find a set of frequent 1-itemsets, which is recorded as L1. L1 is used to find collection L2 for frequent "2 Itemsets", while L2 is used to find L3. So go on until you can't find the K set. Looking for every LK needs a database scan.
The core idea is: Connecting steps and pruning steps. The connection step is a k-2, the principle is to ensure that the previous items are the same and are connected in a dictionary order. The pruning step is to make all the nonempty sets of any frequent itemsets also frequent. Conversely, if a candidate's non-empty set is not frequent, the candidate is certainly not frequent, so that it can be removed from CK.
Simply speaking, 1, the discovery of frequent itemsets, the process of (1) scan (2) count (3) comparison (4) generated frequent itemsets (5) connection, pruning, generating candidate set repeat step (1) ~ (5) until no larger frequency set to Generate association rules
According to the definition of the confidence level mentioned above, the association rules are produced as follows:
(1) For each frequent item set L, produces all the nonempty sets of L;
(2) For each non-empty set S of L, if
P (L) p (s) ≧min_conf {p (L) \over p (s)}≧min\_conf
The output rule s→l−s s\rightarrow l-s
Note: L-s represents an item set that removes a subset of S in the item set L
Pseudo Code
Pseudo code Description://Find frequent 1 itemsets L1 =find_frequent_1-itemsets (D); for (k=2;
Lk-1!=null;k++) {//Generate candidate, and prune Ck =apriori_gen (Lk-1); Scan D for a candidate count for each transaction T in d{Ct =subset (ck,t);//Get a subset of T for each candidate C belongs to C
T c.count++; }//Return item set of not less than minimum support in candidate set Lk ={c belongs to Ck | c.count>=min_sup}} return l= all frequent sets; first step: Connection (join) Pr
Ocedure Apriori_gen (Lk-1: Frequent (k-1)-itemsets) for the each item set L1 belongs to the Lk-1 for each item set L2 belongs to Lk-1 If ((L1 [1]=L2 [1]) && (L1 [2]=L2 [2]) && ......&& (L1 [K-2]=L2 [k-2]) && L1 [k-1]<l2 [K-1])
then{C = L1 connection L2//Connection step: Generating candidate//If subset C is already present in K-1 key set, prune If Has_infrequent_subset (c, Lk-1) then delete C;
Pruning step: Delete Infrequent candidate else add C to Ck;
} return Ck; Step Two: Pruning (prune)
Procedure has_infrequent_sub (C:candidate k-itemset; Lk-1: Frequent (k-1)-itemsets) for each (k-1)-subset S of C If s does not belong to