"Machine Learning"--association rule algorithm from initial knowledge to application

Source: Internet
Author: User

First, the foregoing

The purpose of association rules is to find out the relationship between items in a data set , also known as shopping blue analytics (Market Basket analysis). For example, customers who buy shoes will probably buy socks for 10%, and 60% of customers who buy bread will also buy milk. One of the most famous examples is the "diaper and beer" story.

Second, related concepts

Correlation analysis :The task of finding interesting relationships in a large-scale data set. These relationships can be of two forms: frequent itemsets or association rules. Frequent Item sets:(Frequent item sets) refers to the collection of items that often appear together, Association Relationship:(Association rules) implies that there may be a strong relationship between the two items. Items and itemsets:

For example, the above {diaper, wine} often appear, there may be some relationship between them, spicy, how to determine whether it is a frequent itemsets? The main focus is on support and credibility.

First, let's see, what is a rule? The rules are shaped like "if ... So... (If ... Then ...) ", the former as the condition, the latter as the result. For example, if a customer buys a Coke, he will also buy juice.

How to measure a rule is good enough? There are two volumes, confidence levels (Confidence) and support levels.

Consider a purchase record like the following table.

After finishing

The number of bars and columns in the table above indicates the number of transactions for which both items are purchased. If the number of trades purchased with Orange is 4, the number of simultaneous purchases of Orange and Coke is 2.

The confidence level indicates to what extent this rule is trustworthy. the set of conditional items is a, and the result is set to B. The confidence level is calculated in a , and also contains the probability of B . namely confidence (A==>B) =p (b| A). For example, calculate the confidence level of "if Orange is Coke". Since only 2 of the 4 transactions containing orange contain Coke, the confidence level is 0.5.

support calculation in all Trading concentration, both A and B probability. For example, in 5 records, This rule has a support level of 2/5=0.4. Now this rule can be stated that if a customer buys orange, then 50% of the coke may be purchased. In this case (that is, buying Orange will buy Coke) there will be a 40% chance of happening. The support is for itemsets, Therefore, you can define a minimum support level, leaving only the set of itemsets that meet the minimum support level .

Association rules require that the set of minimum support thresholds that an itemsets must meet, called the Minimum support for itemsets (Minimum supports), is recorded as Supmin. an itemsets with a support degree greater than or equal to Supmin is called a frequent itemsets, or a frequent set , or a non-frequent set. Usually K-itemsets, if satisfied with Supmin, are called K-frequent sets, are recorded as LK. The minimum confidence level (Minimum Confidence) of association Rules is confmin, which represents the minimum reliability that association rules need to meet.

Three,Apriori algorithm

1. Principle

If an item set is frequent, then all of its subsets are also frequent. The inverse theorem of this theorem is: If an item set is non-frequent, then all its superset (the collection containing the collection) is also infrequent. The advent of the Apriori principle, after knowing that some itemsets are non-frequent, does not need to calculate the superset of the set, effectively avoids exponential growth of the number of itemsets, and calculates frequent itemsets within a reasonable time. 2. RealizeApriori algorithm is a method of discovering frequent itemsets. of the Apriori algorithm Two input parameters are minimum support level and data set。 The algorithm first generates a list of items for all individual items, and then scans the transaction to see which itemsets meet the minimum support requirements, where a collection that does not meet the minimum support level is stripped and then combined to produce the set of items with two datasets. Then rescan the transaction, removing the set of items that do not meet the minimum support level, and the process repeats until all itemsets are filtered out.



"Machine Learning"--association rule algorithm from initial knowledge to application

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.