In data miningAlgorithmIt is a widely used association rule mining algorithm. The single-minimum-support algorithm and multi-minimum-support algorithm can be considered as a special case of the Multi-minimum-support algorithm. In practical applications, the use frequency of the Multi-minimum support algorithm is relatively high. In many books about data mining, we have provided detailed false data of the MS-Apriori algorithm.CodeIn this algorithm, we only record the support count of each frequent project set. However, in the process of generating association rules, it is not enough to rely solely on the support count of frequent project sets. This leads to the so-called Header project issue. Let's give a simple example to solve this problem:
Eg: MIS (bread) = 2%, MIS (clothes) = 0.2%, MIS (shoes) = 0.1%. Project set {clothes, breads} real support is 0.15%, {clothes, shoes, breads} real support is 0.12%. According to the MS-Apriori algorithm{Clothes, bread} is not a frequent project set,{Clothes, shoes, and breads} are a frequent project set. Therefore, the support count of the former is not saved, and the support count of the latter is saved.
We cannot calculate the confidence level of the Rule {clothes, breads --> Shoes, breads}, {breads --> Clothes, shoes, because {clothes} and {bread} may not be frequent project sets.
We define head-item problem: when a project with the smallest MIS value in a frequent project set is behind a rule, we may not be able to calculate the confidence level of this rule.
Finally, we use the reverse Identification Method in mathematics to prove the problem of this Header project: set F to a frequent project set, and set a to a project with the smallest MIS value in F (A is called a header project ). According to the definition, we can see that MIS (f) = MIS (). Now we need to prove that such a rule X-> Y, where xuy = f, x ^ y = NULL, and a belongs to X, this rule also has a header project issue. Assuming that MIS (x) = MIS (A), X is also a frequent project set, and the support count of X is retained. F is a frequent project set, the support count of F is also retained, so the confidence level of X-> Y can be directly calculated. Therefore, when a is a rule's prefix, no project casting problem will occur.