Reprinted from: http://m.blog.csdn.net/blog/sanqima/42746419
1. Support Level
The support degree represents the probability that the itemsets {x, y} appear in the total itemsets. The formula is:
Support (X→Y) = P (x, y)/P (i) = P (x∪y)/p (i) = num (xuy)/num (i)
Where I represents the total transaction set. Num () indicates the number of occurrences of a particular set of items in the transaction set.
For example, num (I) indicates the number of total transaction sets
Num (x∪y) indicates the number of transaction sets with {x, y} (number of calls).
2. Confidence level (Confidence)
The confidence level indicates the probability of Y being rolled out by the association rule "X→y" in the case where prerequisite X occurs. That is, in the set of items containing x, the likelihood of having Y, the formula is:
Confidence (x→y) = P (y| x) = p (x, y)/p (×) = P (xuy)/p (x)
3. Lifting degree (Lift)
The degree of Ascension represents the probability of having a y at the same time as the condition containing x, and the ratio of the probability of having y to a condition that does not contain x.
Lift (x→y) = P (y| X)/P (Y)
Example 1, it is known that 1000 customers buy the Spring Festival, divided into two groups, each group of 500 people, including a group of 500 people bought tea, while 450 people bought coffee; group 450 bought coffee, as shown in table (1):
Table (1) Shopping list
Try to solve 1) "Tea → coffee" support degree
2) The confidence level of "tea → coffee"
3) Promotion of "Tea → coffee"
Analysis:
Set x= {buy tea},y={buy coffee}, then the rules "Tea → coffee" means "that is bought tea, and bought coffee", so, "tea → coffee" of the support degree for
Support (x→y) = 450/500 = 90%
The confidence level of "tea → coffee" is
Confidence (x→y) = 450/500 = 90%
"Tea → coffee" to promote the degree of
Lift (x→y) = Confidence (x→y)/P (Y) = 90%/((450+450)/1000) = 90%/90% = 1
Because the lift degree lift (x→y) = 1, the x and Y are independent of each other, that is, if there is X, there is no effect on the appearance of Y. That is, whether to buy coffee, and there is no link to buy tea. The rule "Tea → coffee" is not established, or the relevance is very small, almost no, although its support and confidence is up to 90%, but it is not an effective association rules.
Rules that satisfy minimum support and minimum confidence are called "strong Association Rules." However, in strong association rules, there are also strong association rules and invalid strong association rules.
If lift (x→y) >1, then the rule "x→y" is a valid strong association rule.
If lift (x→y) <=1, then the rule "x→y" is an invalid strong association rule.
In particular, if lift (x→y) = 1, it means that X and Y are independent of each other.
Support, confidence, and lift in correlation analysis