in the previous articledata mining Getting started algorithm collationmentioned in thethe Apriori algorithm isThe most widely used algorithm in association rules algorithm, this time we will learn the basic knowledge of the algorithm. I. Overview of Algorithms Apriori algorithm is one of the most influential algorithms for mining the frequent itemsets of Boolean Association rules, which is proposed by Rakesh Agrawal and Ramakrishnanskrikant. It uses an iterative approach called layered search, and K-itemsets are used to explore (k+1)-itemsets. First, find the collection of frequent 1-itemsets. The set is recorded as L1. L1 is used to find a collection of frequent 2-itemsets L2, while L2 is used to find L2, so go on until the K-itemsets cannot be found. Every time you find a Lk, you need a database scan. To improve the efficiency of frequent itemsets, a kind of important property called Apriori is used to compress search space. The operating theorem is that
all non-empty sets of a frequent itemsets must be frequent, and the second is that all the parent sets of the non-frequent itemsets are non-frequent.
second, the application scenario apriori algorithm is widely used to analyze consumer market price, guess customer's consumption habit, intrusion detection technology in Network security field, can be used in the management of colleges and universities, according to the mining rules may effectively assist the school management departments to carry out the poverty-aiding work It can also be used in the field of mobile communication to guide operators ' business operations and decision making of ancillary service providers.
III, basic concepts
The two most important concepts of the Apriori algorithm are support and confidence level (confidence):
- Support: Supported({A, B}) =p (AB), which is the probability that event a
- confidence level: c onfidence (a=>b) = support ({A, b})/ support ({A}), which is the probability that will occur at the same time as B in an event where a occurs, the confidence level of a to B in the calculation is the support degree of {A, A,/{a}.
- minimum confidence: The predetermined value, generally by the multiple attempts to obtain the results of the algorithm, to exclude each candidate set of elements, has been the next layer of frequent itemsets.
- minimum confidence level, preset value, to determine confidence level
- Strong rules: rules that satisfy both minimum support and minimum confidence are called strong rules
three, the realization principle The algorithm consists of two stages:
calculate the level of support for each layerAnd
calculates the confidence level based on the support degree. Here is a direct example of the initial set of 5 records, according to the product portfolio in the record, we can step through each layer of support, the calculation process such as:
Support degree calculation process
As you can see, you can finally get 3 layers of support: L1,l2,l3, Next, we can calculate the confidence level of each layer directly through the support degree, here we take L3 as an example:
confidence Calculation Process
confidence calculation is relatively simple, is based on the K-1 elements in a set of elements to the confidence level of another element, directly apply the upper formula. Here we can actually draw the rules, when the BC or CE appears, E or B will inevitably appear. Of course, this is just a simple example, the actual need to have enough samples, the results are more reliable.
Iv. Conclusion
from the top of the Apriori algorithm basic principle is still relatively simple , but in the actual process if the calculation, for n products will have 2^n?1 a combination, but n slightly larger calculation is very large, so the actual algorithm implementation, to use the beginning mentioned The properties of the Apriori algorithm are pruned to reduce the computational amount. In addition, the association rule algorithm also has the fp-growth and the Eclat and so on the more efficient algorithm, here no longer introduces, can understand by oneself.
Reference: Apriori algorithm detailed , using Apriori algorithm and fp-growth algorithm for correlation analysis
Data Mining Algorithm--apriori