Association rule association rule mining is one of the many "rule-based" data mining methods. The basic theory of Association Rules (assuming that the reader knows) is not described here in detail, the following describes the algorithm design.
The main idea of the Apriori algorithm: 1. The candidate item set is constructed based on "the subset of a frequent item set must be a frequent item set, and the superset of a non-frequent item set must be a non-frequent item set, then, the support of candidate item sets is calculated by traversing the transaction database to obtain frequent item sets; 2. Association Rules are generated by frequent item sets. In my opinion, association rules are relatively simple in theory. I believe many people will feel this way, but it is quite difficult to design and implement algorithms. The key to the problem is: the frequent item set cannot be designed (a suitable one-item set and two-item set... n-General Data Structure of the item set) and reasonable data structure of the rule. Without these two data structures, the association rules should be "on paper" or "confused by flowers. The data structure of a rule is well designed. A rule contains only a few Members (Rule conditions, conclusions, support, reliability, and improvement ), therefore, a rule can be represented by a struct (rule:Typedef struct { Char condition [80]; Char conclusion [80]; Double sup; Double conf; Double lift;} Rule; All rules can be viewed as a list of the struct (list <rule> lst_rule) or a variable array (vector <rule> vt_rule )./* Anything, send me Email datamining@163.com My QQ 275869936From http://blog.sina.com.cn/dataminer321 */ The data structure of the item set is complicated. First, describe the structure and usage of the item set, so that we can have an overall understanding of it. The item set and the set of items for each transaction are similar. They are all collections of several items {A, B, C, D, E, G ...}, the number of elements (items) is 1... n (n-condition of the item set ). Use of item set: 1. Two frequent N-item sets are connected into a Hou selected n + 1 item set 2. When calculating the support of candidate item sets, determine whether a candidate item set is in the current transaction (auxiliary function 2 ). This requires that the data structure of the item set must be able to accommodate n items and n + 1 items. That is, the data structure of the item set must adapt to changes in the number of items. Here, a Data Structure of the item set contains two parts: 1. A string of the collection of items (which is also suitable for transactions); 2. Supports counting. Map <string, int> can be used for storage. The first parameter is a string, that is, string in STL. The item set (or transaction) is saved as a string, the item and item are separated by the symbol "|". For example, the item set composed of item A, item B, and item C is in the string m_itemset = "A | B | C" format. When used, auxiliary function 1 splits the m_itemset string according to "|" and stores each item in vector <string> vt_itemset. Each element in the vector is an item. The second parameter supports counting. Next we will discuss the second step, that is, rules generated by frequent item sets. The program implementation in this step is rarely mentioned in the paper. We will discuss the first step, that is, generating frequent item sets, it may be because the first step is crucial to improve the performance of the apsaradb for memcache. One n-item set can generate 2 N-power minus two rules (the set of conditions and conclusions of each rule is the items in the entire item set, that is, N items) [method 1]. do not consider more here. For example, each n-item set contains N n-1 items, each n-1 item set can also generate two rules minus the n-1 power of 2. The reason is that our frequent item set includes frequent 1-item set and frequent 2-item set... until the frequent N-item set (instead of storing the maximum length frequent item set), as long as we follow [method 1] to generate corresponding rules for each frequent item set, all the rules are obtained. Auxiliary Function 1:/* function call example: m_strsource = "A | B | C"; // input parameter substr = "| "; // Input parametersVector <string> vitem; getitemsfromstring (m_strsource, vitem, substr); vitem [0] = ""; // Output parametersVitem [1] = "B"; vitem [2] = "C"; */void getitemsfromstring (string & m_strsource, vector <string> & vitem, string substr)
{
Vitem. Clear ();
Int J;
Int I = m_strsource.find (substr, 0 );
If (I =-1)
{
// M_strsource is a item
Vitem. push_back (m_strsource );
}
Else
{
String m_strtemp = m_strsource.substr (0, I );
Vitem. push_back (m_strtemp );
While (I! =-1)
{
J = m_strsource.find (substr, I + 1 );
If (j =-1)
{
M_strtemp = m_strsource.substr (I + 1, m_strsource.size ()-i-1 );
Vitem. push_back (m_strtemp );
}
Else
{
M_strtemp = m_strsource.substr (I + 1, j-i-1 );
Vitem. push_back (m_strtemp );
}
I = J;
} // End of while
} // End of else
}
Auxiliary Function 2: judge whether the candidate item set V1 is in transaction v2
Bool isin (const vector <string> & V1, const vector <string> & V2)
{
Int nsize1 = v1.size ();
Int nsize2 = v2.size ();
For (INT I = 0; I <nsize1; I ++) // for1
{
Bool m_bflag1 = false;
For (Int J = 0; j <nsize2; j ++)// For2
{
If (V1 [I] = v2 [J])
{
M_bflag1 = true;
Break;
}
} // End of for2
If (! M_bflag1)
Return false;
} // End of for1
Return true;
}
With these two data structures and auxiliary functions, we believe that you can design your association rule mining algorithm.