Data Mining Algorithm--apriori

Last Update:2015-09-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

in the previous article data mining Getting started algorithm collation mentioned in the the Apriori algorithm is The most widely used algorithm in association rules algorithm, this time we will learn the basic knowledge of the algorithm.

I. Overview of Algorithms

apriori Algorithm is one of the most influential algorithms for mining the frequent itemsets of Boolean Association rules, which is proposed by Rakesh Agrawal and Ramakrishnanskrikant. It uses an iterative approach called layered search, and K-itemsets are used to explore (k+1)-itemsets. First, find the collection of frequent 1-itemsets. The set is recorded as L1. L1 is used to find a collection of frequent 2-itemsets L2, while L2 is used to find L2, so go on until the K-itemsets cannot be found. Every time you find a Lk, you need a database scan. To improve the efficiency of frequent itemsets, a kind of important property called Apriori is used to compress search space. The operating theorem is One is that all non-empty sets of frequent itemsets must be frequent, and the second is that all the parent sets of non-frequent itemsets are non-frequent .

Two, application scenario

III, basic concept
The two most important concepts of the Apriori algorithm are support and confidence level (confidence):

Support: Supported({A, B}) =p (AB), which is the probability that event a
confidence level: c Onfidence (a=>b) = support ({A, b})/support ({A}), that is, the probability of B occurring at the same time in the event of a, the confidence level of a to B in the calculation is the support level of {A, a},/{a}.
minimum confidence: The predetermined value, generally by the multiple attempts to obtain the results of the algorithm, to exclude each candidate set of elements, has been the next layer of frequent itemsets.
minimum confidence level, preset value, to determine confidence level
Strong rules: rules that satisfy both minimum support and minimum confidence are called strong rules

three, the realization principle

The algorithm is divided into two stages: calculating the support degree of each layer and calculating the confidence degree according to the support degree. Here is a direct example of the initial set of 5 records, according to the product portfolio in the record, we can step through each layer of support, the calculation process such as:

Support degree calculation process

As you can see, you can finally get 3 layers of support: L1,l2,l3, Next, we can calculate the confidence level of each layer directly through the support degree, here we take L3 as an example:

confidence Calculation Process

Confidence calculation is relatively simple, is based on the K-1 elements in a set of elements to the confidence level of another element, directly apply the upper formula. Here we can actually draw the rules, when the BC or CE appears, E or B will inevitably appear. Of course, this is just a simple example, the actual need to have enough samples, the results are more reliable.

Iv. Conclusion from the top of the Apriori algorithm basic principle is still relatively simple , but in the actual process if the calculation, for n products will have 2^n−1 a combination, but n slightly larger calculation is very large, so the actual algorithm implementation, to use the beginning mentioned The properties of the Apriori algorithm are pruned to reduce the computational amount. In addition, the association rule algorithm also has the fp-growth and the Eclat and so on the more efficient algorithm, here no longer introduces, can understand by oneself.

Reference: Apriori algorithm detailed, using Apriori algorithm and fp-growth algorithm for correlation analysis

Data Mining Algorithm--apriori

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Data Mining Algorithm--apriori

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Data Mining Algorithm--apriori

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support