Data Mining Algorithm--apriori

Source: Internet
Author: User

in the previous article data mining Getting started algorithm collation mentioned in the the Apriori algorithm is The most widely used algorithm in association rules algorithm, this time we will learn the basic knowledge of the algorithm.

I. Overview of Algorithms

     apriori Algorithm is one of the most influential algorithms for mining the frequent itemsets of Boolean Association rules, which is proposed by Rakesh Agrawal and Ramakrishnanskrikant. It uses an iterative approach called layered search, and K-itemsets are used to explore (k+1)-itemsets. First, find the collection of frequent 1-itemsets. The set is recorded as L1. L1 is used to find a collection of frequent 2-itemsets L2, while L2 is used to find L2, so go on until the K-itemsets cannot be found. Every time you find a Lk, you need a database scan. To improve the efficiency of frequent itemsets, a kind of important property called Apriori is used to compress search space. The operating theorem is One is that all non-empty sets of frequent itemsets must be frequent, and the second is that all the parent sets of non-frequent itemsets are non-frequent .

Two, application scenario

     III, basic concept
The two most important concepts of the Apriori algorithm are support and confidence level (confidence):

  • Support: Supported({A, B}) =p (AB), which is the probability that event a
  • confidence level: c Onfidence (a=>b) = support ({A, b})/support ({A}), that is, the probability of B occurring at the same time in the event of a, the confidence level of a to B in the calculation is the support level of {A, a},/{a}.
  • minimum confidence: The predetermined value, generally by the multiple attempts to obtain the results of the algorithm, to exclude each candidate set of elements, has been the next layer of frequent itemsets.
  • minimum confidence level, preset value, to determine confidence level
  • Strong rules: rules that satisfy both minimum support and minimum confidence are called strong rules
three, the realization principle

The algorithm is divided into two stages: calculating the support degree of each layer and calculating the confidence degree according to the support degree. Here is a direct example of the initial set of 5 records, according to the product portfolio in the record, we can step through each layer of support, the calculation process such as:

Support degree calculation process

As you can see, you can finally get 3 layers of support: L1,l2,l3, Next, we can calculate the confidence level of each layer directly through the support degree, here we take L3 as an example:

confidence Calculation Process

Confidence calculation is relatively simple, is based on the K-1 elements in a set of elements to the confidence level of another element, directly apply the upper formula. Here we can actually draw the rules, when the BC or CE appears, E or B will inevitably appear. Of course, this is just a simple example, the actual need to have enough samples, the results are more reliable.

Iv. Conclusion from the top of the Apriori algorithm basic principle is still relatively simple , but in the actual process if the calculation, for n products will have 2^n−1 a combination, but n slightly larger calculation is very large, so the actual algorithm implementation, to use the beginning mentioned The properties of the Apriori algorithm are pruned to reduce the computational amount. In addition, the association rule algorithm also has the fp-growth and the Eclat and so on the more efficient algorithm, here no longer introduces, can understand by oneself.

Reference: Apriori algorithm detailed, using Apriori algorithm and fp-growth algorithm for correlation analysis



Data Mining Algorithm--apriori

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.