Pattern discovery in Data Mining (iv) Pattern evaluation (Pattern Evaluation)

Source: Internet
Author: User
Introduction to Pattern evaluation

Pattern evaluation refers to a truly interesting pattern of identifying the knowledge represented by a measure of interest.

The strong rules we get from the Support-confidence Association rule Mining framework before are not necessarily interesting, so it's not enough to conduct a pattern evaluation, even in some cases, even commonly used lift and chi-square Measures also has no good effect.

This will introduce the concept of interest in pattern or rule evaluation, demonstrate the importance of null-invariance, and compare multiple interest measurements. Basic Concepts What kind of pattern is interesting

A pattern is interesting (interesting), it has the following characteristics: Easy to be understood in a certain degree of certainty, for new or test data is valid is potentially useful is novel

It is interesting if a pattern confirms some kind of hypothesis that the user is seeking to confirm. Interesting patterns represent knowledge that can be used for decision making. The objective measurement of two kinds of measurement of pattern interest degree
This measure is based on the structure of the discovered patterns and the statistics about them. For association rules that are shaped like x→y x\rightarrow y, an objective measure is the support level of the rule, which represents the percentage of transactions in the transaction database that satisfy the rule. Another objective measure is the confidence level (confidence), which evaluates the degree of certainty of the found rule. Association rules generally, each interest measure is associated with a threshold value that can be controlled by the user. Subjective independence
The subjective interest measure is based on the user's expectations of the data. This metric discovery pattern is interesting if they are unexpected (conflicting with the user's expectations) or provide vital information that the user can take action on. In the latter case, such a pattern is called actionable (actionable). The expected patterns may also be interesting if they confirm the assumptions the user wants to confirm or are similar to the user's hunch. Other interest measures include the accuracy and coverage of classification (If-then) rules.

Note: Here I say one more thing, after I took PDDM's course, I didn't quite understand what Professor Jiawei Han said. Many of the concepts are not very clear, so it is recommended that if possible, you can first see some of the basic fundamentals of data mining. Since the data on the pattern evaluation has not been searched, I have always thought that pattern evaluation is a model evaluation before I know the concepts clearly. limitations of the support confidence framework

Play−basketball→eat−cereal play-basketball \rightarrow eat-cereal [40%, 66.7%]
¬play−basketball→eat−cereal \neg play-basketball \rightarrow eat-cereal [35%, 87.5%]

If we rely solely on the association rules that support the confidence framework, we cannot easily come to an exact conclusion. This is the limitation of supporting the confidence framework. Lift and Chi-Square Measures Lift

Lift is used to judge the independence and relativity of events, and to a certain extent, it is very similar to the method of proving the independence of two events in probability theory. Its specific definition is as follows:

Lift (b,c) =c (b→c) s (c) =s (B∪C) s (B) XS (c) lift (b,c) ={c (b→c) \over s (c)}={s (b∪c) \over s (b) XS (c)}

Lift (b,c) =1 Lift (B, c) = 1:b and C are independent >1 > 1: Positive correlation <1: negative correlation

Example:

chi-square Measures

In mathematical notation, Chi is expressed as Χ\chi, and you can easily get it with Mathjax.

Χ2=∑observed−expectedexpectedχ^2=∑{observed−expected\over expected}

Χ2= (400−450) 2400

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.