Principle and implementation of Apriori algorithm

Source: Internet
Author: User

*********************************************** Declaration *********************** *******************************

Original works, from the "Xiaofeng Moon XJ" blog, Welcome to reprint, please be sure to indicate the source (HTTP://BLOG.CSDN.NET/XIAOFENGCANYUEXJ).

Due to various reasons, there may be many shortcomings, welcome treatise!

*************** ******************************************************************************************

There is a story: women in the United States often instruct their husbands to buy diapers for their children after work, and the husband buys his own beer after the diaper, so there are plenty of opportunities for beer and diapers to be bought together. The move has increased the volume of diapers and beer sales and has been a delight for many businesses. "Diapers and beer": a very well-known story about association rules. An association rule is to find the relationship between an item and an item in a data set, also known as a shopping blue analysis.

Is it important to mention association rules? Frequent itemsets: Support is greater than or equal to the minimum number of support itemsets. There are two more important measurement parameters:

1), support degree
Support is the number of trades with both X and Y in the transaction set and the total number of trades | The ratio of d|.
Support (X? Y) =count (X? Y)/| d|
The support degree reflects the probability that x and y appear simultaneously. The support degree of association rules is equal to the support degree of frequent sets.
2), confidence
Confidence is the ratio of the number of trades that contain x and Y to the number of trades that contain x. That
Confidence (X? Y) =support (X? Y)/support (X)
The credibility reflects the probability that the transaction contains Y if the transaction contains X. In general, only association rules with higher levels of support and confidence are of interest to the user.

Association rules find the Apriori algorithm of frequent itemsets, the Apriori algorithm is the most classical and basic algorithm for Mining Boolean Association rules frequent itemsets, the algorithm needs to search for candidate sets constantly, then pruning is to remove the candidate set containing the non-frequent subset, and the time complexity is the exponential level of the violent enumeration of all subsets O (n^2) is reduced to a polynomial level, and the polynomial-specific coefficients are determined by the underlying implementation. The Apriori algorithm is based on the fact that the algorithm uses a priori knowledge of the nature of frequent itemsets. Apriori uses an iterative approach called layered search, and K-itemsets are used to explore (k+1)-itemsets. First, find the collection of frequent 1-itemsets. The set is recorded as L1. L1 is used to find a collection of frequent 2-itemsets L2, while L2 is used to find L3, so continue until the frequent K-itemsets are not found. The Ariori algorithm has two main steps:

1. Connection

Using the found Lk, through 22 connections to draw ck+1, note that the connection of the lk[i],lk[j], must have the same k-1 attribute values, and then another two different distribution in lk[i],lk[j], so that the ck+1 is lk+1 candidate set.

2. Pruning

Candidate Sets ck+1 are not all frequent itemsets, they must be pruned and removed, as early as possible to prevent the data being processed is more and more invalid. It is the basis of pruning that only if the subset is a candidate set of frequent sets is the frequent set.


Apriori.h<span style= "FONT-FAMILY:VERDANA;FONT-SIZE:14PX;" ><span style= "FONT-FAMILY:VERDANA;FONT-SIZE:14PX;" ></span></span><pre name= "code" class= "CPP" >/** * Created by Xujin on 2014/12/1.   All rights reserved,but. */
#ifndef apriori_h#define apriori_h#include "Transaction.h" #include "TransactionSet.h" class capriori{private:double M _dminconfidence;double M_dminsupport;int m_nsize; int M_nminconfidence;int M_nminsupport;int m_nk; Ctransactionset *m_pctransactionset; Ctransactionset *m_pcdatecandidateset;private:void Erasecandidateset (); bool Hasinfrequentsubset (vector<string > &tveccandidateset); void Apriorigen (); void Findfrequent1itemset (); bool Isfrequentset (vector<string> &tveccandidateset); bool Isexist (vector< string> &tveccandidateset);p Ublic:capriori (double tmincon,double tminsup,int tk,ctransactionset *tPDaSet); void Findfrequentkitemset (); void print ();}; #endif


//Apriori <span style= "Font-family:verdana;" >.cpp<span style= "FONT-FAMILY:VERDANA;FONT-SIZE:14PX;" ><span style= "FONT-FAMILY:VERDANA;FONT-SIZE:14PX;"   ></span></span></span><pre name= "code" class= "CPP" >/** * Created by Xujin on 2014/12/1. All rights reserved,but. */
#include "Apriori.h" Capriori::capriori (double tmincon,double tminsup,int tk,ctransactionset *tpdaset) {this->m_ Dminconfidence=tmincon;this->m_dminsupport=tminsup;this->m_nk=tk;this->m_pctransactionset = TPDaSet; This->m_nsize = Tpdaset->getsize ();this->m_nminconfidence=this->m_dminconfidence*this->m_nsize; this->m_nminsupport=this->m_dminsupport*this->m_nsize;//cout<< "M_nMinConfidence M_nMinSupport m_ NK m_nsize "<<m_nMinConfidence<<" "<<m_nMinSupport<<" "<<m_nK<<" <<m_ Nsize<<endl;this->m_pcdatecandidateset=new ctransactionset ();} BOOL Capriori::hasinfrequentsubset (vector<string> &tveccandidateset) {bool Bret=false;if (this->m_ Pcdatecandidateset!=null) {for (Vector<string>::iterator out=tveccandidateset.begin (); out!= Tveccandidateset.end (); ++out) {vector<string>tmp;tmp.clear (); for (Vector<string>::iterator in= Tveccandidateset.begin (); In!=tveccandidateset.end (); ++in) {if (*out!=*In) {tmp.push_back (*in);}} if (!this->m_pcdatecandidateset->iscontain (TMP)) {Bret=true;break;}}} return bRet;} void Capriori::apriorigen () {vector<string> can; Ctransactionset *candidateset=new Ctransactionset (); for (size_t i=0;i< (this->m_pcdatecandidateset-> GetSize ()); ++i) {for (size_t j=0;j< (This->m_pcdatecandidateset->getsize ()); ++j) {if (i!=j) {can.clear (); if ( This->m_pcdatecandidateset->combinefrequentset (I,j,can)) {if (!this->hasinfrequentsubset (CAN) && ! Candidateset->isexist (CAN)) Candidateset->addtransaction (CTransaction (Can));}}} Erasecandidateset (); this->m_pcdatecandidateset=candidateset;} void Capriori::erasecandidateset () {for (Vector<ctransaction>::iterator iter=this->m_pcdatecandidateset- >getvectransaction (). Begin (); Iter!=this->m_pcdatecandidateset->getvectransaction (). end (); ++iter) { Iter->getvecitem (). Clear ();} This->m_pcdatecandidateset->getvectransaction (). Clear ();d elete This->m_pcdatecandidateset;this->m_Pcdatecandidateset=null;} BOOL Capriori::isfrequentset (vector<string> &tveccandidateset) {int sup=this->m_pctransactionset-> Getsupportcount (Tveccandidateset); if (Sup<m_nminsupport) return False;return true;} void Capriori::findfrequent1itemset () {for (Vector<ctransaction>::iterator iter=this->m_pctransactionset- >getvectransaction (). Begin (); Iter!=this->m_pctransactionset->getvectransaction (). end (); ++iter) {for ( Vector<string>::iterator Striter=iter->getvecitem (). Begin (); Striter!=iter->getvecitem (). end (); + + Striter) {vector<string> vec1itemset;vec1itemset.push_back (*striter); if (This->isfrequentset (vec1Itemset ) {if (!this->isexist (Vec1itemset)) m_pcdatecandidateset->addtransaction (CTransaction (Vec1Itemset));}}}} void Capriori::findfrequentkitemset () {this->findfrequent1itemset ();//this->print (); for (int i=2;i<=this- >m_nk;++i) {This->apriorigen ();//This->print (); if (This->m_pcdatecandidateset->getvectransaction ( ). Size () return;//cout<< "*********" <<endl;for (vector<ctransaction>::iterator iter=this->m_ ==0) Pcdatecandidateset->getvectransaction (). Begin (); Iter!= (This->m_pcdatecandidateset->getvectransaction ( ). End ());) {int Sup=this->m_pctransactionset->getsupportcount (Iter->getvecitem ());//cout<< "& &&&sup= "<<sup<<" m_nminsupport= "<<m_nminsupport<<endl;if (sup<this->m_ Nminsupport) {iter=this->m_pcdatecandidateset->getvectransaction (). Erase (ITER);//cout<< "!!!!!!!!!" <<endl;} else ++iter;//cout<< "^^ ^^ &&&&sup=" <<sup<< "m_nminsupport=" <<m_ Nminsupport<<endl;} cout<< "&&&&&&" <<endl;}} void Capriori::p rint () {m_pcdatecandidateset->print ();} BOOL Capriori::isexist (vector<string> &tveccandidateset) {if (This->m_pcdatecandidateset->isexist ( Tveccandidateset)) return True;return false;}



The main bottleneck of the Apriori algorithm is to search for the candidate itemsets constantly, can we find an algorithm that does not find the candidate itemsets frequently? And when the data to be mined is large and then needs to be stored in the database, theApriori algorithm has an unavoidable problem is to scan the database every time, involving a large number of I/O operations, it is time-consuming (of course, can not use the database).




Due to the limited time, in the process of writing a few references to some of the literature, thank you, at the same time, given the level of reasons, you inevitably have shortcomings, welcome treatise!



Principle and implementation of Apriori algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.