Suppose you are the manager of a supermarket, you will want to understand the customer's shopping habits. You'll want to know what customers might buy at one time in the shopping, so you can arrange the shelves to make a bigger profit. This is the Association Rules (Association Rule). Its manifestations are as follows:
bread⇒milk[support=10%;confidence=60%] Bread\rightarrow milk [support=10\%; confidence=60\%]
The Support degree (support) and Confidence (confidence) of a rule are two metrics of a rule. They represent the application and credibility of the rules respectively. The meaning of the above representative is that 60% of customers who buy bread buy milk, while those who buy both make up 10% of the total deal. Association rule Mining Process Find all the frequent itemsets (frequent itemset): These itemsets have the same number of occurrences as the predefined minimum support number Min_sup; Strong association rules are generated from frequent itemsets : Find rules that meet minimum support and minimum confidence level
In general, the Research on Association rule mining is focused on the first step, and the second step is to simply count. Below I will introduce the classical algorithm Apriori algorithm of frequent itemsets mining, and provide Java source code. Apriori Algorithm
The Apriori algorithm was proposed by Agrawal and R.arikant in 1994. The core of the algorithm is the inverse monotonicity (antimonotone): If the number of occurrences of an item set is less than the minimum support, then the number of its superset will be less than the minimum support degree . By using this property, you can greatly reduce the search space. The basic step of the algorithm is to constantly scan the database, first find out the 1-itemset of the minimum support in the set of items, and then connect these items set to 2-itemset, and then scan the database, leaving the number of occurrences greater than the minimum support of the set, ... At the same time, according to the inverse monotonicity, when generating a superset, there are many item sets that do not count.
The following gives the Apriori algorithm source code, click to download