Pattern discovery in Data Mining (II.) Apriori algorithm __ Data Mining

Source: Internet
Author: User
Basic Concepts

For A→b A\rightarrow B

Support degree (support):

P (A∩B) p (A∩B), the probability of both A and B

Confidence degree (confidence strength):

Conf (a→b) =sup (a∪b) sup (A) =p (b| A) conf (a\rightarrow B) = {sup (a∪b) \over sup (a)} = P (b| A

That is, the probability that B will occur at the same time in the event of a occurrence

For example shopping basket analysis: milk ⇒⇒ Bread

Example: [Support degree: 3%, reliability: 40%]

Support level 3%: means 3% customers buy milk and bread at the same time

Confidence level 40%: means that customers who buy milk 40% also buy bread

Candidate set (Candidate Itemset):

The set of items that are merged down.

Defined as C[k].

Frequent episodes (frequent itemset):
The support degree is greater than the set of items equal to a specific minimum support (Minimum support/minsup).

expressed as l[k].

Lifting ratio (elevation lift):

Lift (x→y) =lift (y→x) =conf (x→y) supp (y) =conf (y→x) supp (x) =p (x⋂y) p (x) p (y) lift (x \rightarrow y) = lift (y \rightarrow x) = { Conf (x \rightarrow y) \over supp (y)} = {conf (y \rightarrow X) \over supp (x)} = {p (x \bigcap y) \over p (x) p (y)}

Close down properties (downward Closure property)

If an item set satisfies a minimum support requirement, then any non-empty set of the set of items must satisfy this minimum support degree. Introduction to Apriori algorithm

Apriori algorithm is a frequent itemsets algorithm for mining Association rules, whose core idea is to close and detect the frequent itemsets through the generation of candidate sets.

Apriori algorithm is widely used in consumer market price analysis, guessing consumer consumption habits; intrusion detection technology in the field of network security; can be used in the management of colleges and universities, according to the mining rules to effectively assist the school management departments targeted to carry out poverty education work; can also be used in the field of mobile communications, Directs the operation of operators and the decision making of supporting business providers. Mining Steps:

1. Find all frequent itemsets (frequency) according to the support degree

2. The implementation Step of association rule (strength) based on confidence

Apriori uses an iterative approach called layered search , which is used to search for "K-set" K-1.

First, the entire transaction is scanned to find a set of frequent 1-itemsets, which is recorded as L1. L1 is used to find collection L2 for frequent "2 Itemsets", while L2 is used to find L3. So go on until you can't find the K set. Looking for every LK needs a database scan.

The core idea is: Connecting steps and pruning steps. The connection step is a k-2, the principle is to ensure that the previous items are the same and are connected in a dictionary order. The pruning step is to make all the nonempty sets of any frequent itemsets also frequent. Conversely, if a candidate's non-empty set is not frequent, the candidate is certainly not frequent, so that it can be removed from CK.

Simply speaking, 1, the discovery of frequent itemsets, the process of (1) scan (2) count (3) comparison (4) generated frequent itemsets (5) connection, pruning, generating candidate set repeat step (1) ~ (5) until no larger frequency set to Generate association rules

According to the definition of the confidence level mentioned above, the association rules are produced as follows:

(1) For each frequent item set L, produces all the nonempty sets of L;

(2) For each non-empty set S of L, if

P (L) p (s) ≧min_conf {p (L) \over p (s)}≧min\_conf

The output rule s→l−s s\rightarrow l-s

Note: L-s represents an item set that removes a subset of S in the item set L

Pseudo Code

Pseudo code Description://Find frequent 1 itemsets L1 =find_frequent_1-itemsets (D); for (k=2;   
Lk-1!=null;k++) {//Generate candidate, and prune Ck =apriori_gen (Lk-1); Scan D for a candidate count for each transaction T in d{Ct =subset (ck,t);//Get a subset of T for each candidate C belongs to C  
        T c.count++; }//Return item set of not less than minimum support in candidate set Lk ={c belongs to Ck | c.count>=min_sup}} return l= all frequent sets; first step: Connection (join) Pr  
            Ocedure Apriori_gen (Lk-1: Frequent (k-1)-itemsets) for the each item set L1 belongs to the Lk-1 for each item set L2 belongs to Lk-1 If ((L1 [1]=L2 [1]) && (L1 [2]=L2 [2]) && ......&& (L1 [K-2]=L2 [k-2]) && L1 [k-1]<l2 [K-1])  
                   then{C = L1 connection L2//Connection step: Generating candidate//If subset C is already present in K-1 key set, prune If Has_infrequent_subset (c, Lk-1) then delete C;  
                   Pruning step: Delete Infrequent candidate else add C to Ck;  
} return Ck;  Step Two: Pruning (prune) 
 Procedure has_infrequent_sub (C:candidate k-itemset;  Lk-1: Frequent (k-1)-itemsets) for each (k-1)-subset S of C If s does not belong to

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.