Machine Learning Notes-association Rules

Source: Internet
Author: User

Statement:

Machine learning series mainly records their own learning machine learning algorithms in the process of some references and summaries, including some of the content is reference books and reference blog.

Directory:

    1. What are association rules
    2. The concepts that must be known in association rules
    3. The implementation process of association rules
    4. The core point of association rules--how to generate frequent itemsets
    5. Areas to be aware of during actual use
    6. Summary of association rules and homework after class

I. What are ASSOCIATION rules

The so-called data mining is to analyze the source data in some way, and discover some potentially useful information, that is, data mining can also be called Knowledge discovery. The machine learning algorithm is this "some way", association rules as one of the ten classic machine learning algorithms, so to understand the Association rules (although the current use of a few) naturally has a very important meaning. As the name implies, association rules are found in the data behind a certain rule or contact.

A simple example (diaper and beer is too classic): By researching what supermarket customers are buying, it is possible to find that 30% of customers buy bed sheets and pillowcases at the same time, while 80% of the customers who purchase sheets buy a pillowcase, which has an implicit relationship: Sheets → pillowcases, This means that customers who purchase sheets are likely to buy pillowcases, so the store can store sheets and pillowcases in the same shopping area for customers ' convenience.

In general, the scenarios where association rules can be applied are:

      • Optimize the catalogue of goods placed on shelves or optimized for mailing
      • Cross-sell or bundled sales
      • Search term recommendations or identify anomalies

Second, the concept

    • Item: A field in a trading database that is generally referred to as an item in a trade, such as: milk
    • Trade: A collection of all items that a customer takes place in a single transaction: such as {milk, bread, beer}
    • Itemsets: A collection containing several items (in one transaction), typically greater than 0
    • Support: The probability of the itemsets {x, y} appearing in the total set (see example below)
    • Frequent itemsets: An item set is more supported than a set threshold (artificially set or set according to the data distribution and experience), which is called the itemsets as a frequent item set.
    • Confidence level: The probability of Y being introduced by the association rule {X->y} under conditions where prerequisite X occurs (see example below)
    • Degree of ascension: the probability of having y at the same time as the condition containing x, and the ratio of the probability of containing y without x.

If there is a rule: beef-and-chicken, then the proportion of customers who buy beef and chicken is 3/7, while the proportion of customers who buy beef is 3/4. These two scale parameters are important metrics that are called support and confidence (confidence) in association rules. For the rule: beef-and-chicken, which has a support level of 3/7, means that 3/7 of all customers buy beef and chicken at the same time, which reflects the coverage of customers who buy beef and chicken at the same time, with a confidence level of 3/4, indicating that 3/4 of the customers who bought the beef bought the chicken, It reflects a predictable degree of how much the customer is likely to buy chicken if they buy beef. In fact, from the perspective of statistics and set to see this problem, if viewed as a probability problem, you can "customers buy beef and how much more likely to buy chicken" as a conditional probability event, and from the perspective of the set, you can see the following picture:

The above picture is a good description of the problem, s for all customers, and a for customers who buy beef, B for customers who buy chicken, and C for customers who buy both beef and chicken. So C.COUNT/S.COUNT=3/7,C.COUNT/A.COUNT=3/4.

Third, the realization process

Iv. how to generate frequent itemsets

Five, the attention point

Vi. Summary

Machine Learning Notes-association Rules

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.