This learning note is taken from Zhang Mingwei 08.
Data item set I = {I1, I2,..., Im}
A is a subset of data item set I.
Transaction T: corresponds to a data item subset. If the data item subset contains a, transaction t contains.
Transaction set D
Sup is the percentage of transactions in D that contain,
Support (A) = P (A) = | A (t) |/| d |, where a (t) is the transaction set that contains a, | A (t) | it is called support count.
Frequent Item Set: support (a)> = min_sup (minimum support), then a is the frequent item set.
Association rule A => B: If a and B are item sets, and a then B is empty
The support level of project a between B is called the support level of Association Rule A => B, that is, support (a => B) = Support (A between B)
Association rule A => B's confidence level C: D contains C % of the transaction in a and B. confidence (A => B) = Support (A then B)/support (B)
Strong rules: rules that meet both min_sup and min_conf
Association Mining: discovering strong association rules in large databases.
------------------------------------------------------------
Sequence S = <S1, S2,..., Sn>: ordered list of several item sets. SJ is an item set or element.
Element SJ = (x1, x2,... XK): it is composed of different items. The elements are ordered, but the items in the element are unordered.
Length: the number of items in a sequence. a sequence with a length of L is called an L-sequence.
Subsequence α-β supersequence:
Sequential database D
Support of sequence α in sequence database D: percentage of the number of groups containing α in the database to the total number.
Frequent sequence mode: support of sequence S> = min_sup
Sequence Mining: Given a sequence database and a user-defined minimum support min_sup, all frequent sequences are identified in the sequence database.