Correlation Mining and Aprioir algorithm

Last Update:2014-12-08 Source: Internet

Author: User

Tags new set

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Apriori algorithm

Advantages: Easy Code Implementation
Cons: May be slow on large data sets
Applicable data type: numerical or nominal type

Algorithm process:

Correlation analysis is a task of finding interesting relationships in a large-scale data set, where there are two interesting relationships: frequent itemsets (frequent item sets) or association Rules (Association).
Support: The level of support for an item set is defined as the proportion of records in that set in the dataset.
Confidence level (confidence): The confidence level of association rule A->b is expressed as support (A, B)/support (A)

There's 2^n-1 a combination of simple, violent things.
Apriori principle: If an item set is frequent, its set of children is also frequent.
In turn, it means that if an item is not frequent, then the item containing it is not a frequent item.

Here are two main processes:
1. Generate a frequent itemsets:

This is a very simple process is two sets of C, l back and forth, C is through the primary collection (like the most primitive ah, the combination of AH); L is a collection that is filtered through the support level. The process is generally as follows:
1. Build a collection of individual items based on the original data set C1
2. Calculate L1 According to C1
3. Find the C2 of the L1 can be merged
4. Repeat C2, L2, C3->.....->ck, above

2. Derivation of the Association rules:

With the frequent itemsets from the previous step, we just need to list the rules that can be listed in each frequent item set, and then calculate the confidence level and choose the confidence level that meets the requirements.

Function:

loadDataSet()
import datasets, datasets contain multiple lists, each list is an item set
createC1(dataSet)
Create a C1, extract all the individual items, the reason for using frozenset here is to use this as the dictionary key later.
scanD(D, Ck, minSupport)
Filter out CK that does not meet the minimum support level, return the satisfied LK and minimum support
apprioriGen(Lk, k)
Combine lk to get ck+1, where you can reduce the number of iterations by comparing only the first k-1 elements. For example Merge {0,1},{0,2},{1,2} merge, only need to judge once on the line
apriori(dataSet, minsupport=0.5)
Combine the above functions to complete the process. End condition is no longer able to produce a new set of items
generateRules(L, supportData, minConf=0.7)
Generate the main function of the association rule, starting with a frequent itemsets containing two items
calcConf(freqSet, H, supportData, brl, minConf=0.7)
For a given frequent itemsets freqset and the H computational confidence that can be inferred, the association rules are obtained
rulesFromConseq(freqSet, H, supportData, brl, minConf=0.7)
The difference here is that H can become more complex, for example there are now {1,2,3}-->{1}{2}, where we want to further combine H to get {"}" and thus more fully explore the association rules. This is a recursive process to know that the merge can no longer end.

1 #Coding=utf-82 defLoaddataset ():3     return[[1,3,4],[2,3,5],[1,2,3,5],[2,5]]4 defcreteC1 (dataSet):5C1 = []6      forTransactioninchDataSet:7          forIteminchTransaction:8             if[Item] not inchC1:9 c1.append ([item])Ten C1.sort () One     returnmap (FROZENSET,C1) A defScand (D, Ck, minsupport): -sscnt = {} -      forTidinchD: the          forAainchCk: -             ifCan.issubset (tid): -                 ifSscnt.has_key (CAN): -Sscnt[can] + = 1 +                 Else: -Sscnt[can] = 1 +NumItems =float (len (D)) ARetlist = [] atSupportdata = {} -      forKeyinchsscnt: -SUPPRT = Sscnt[key]/NumItems -         ifSupprt >=Minsupport: - retlist.append (Key) -Supportdata[key] =supprt in     returnRetlist,supportdata - defApprigen (lk,k): toRetlist = [] +LENLK =Len (Lk) -      forIinchRange (LENLK): the          forJinchRange (i+1, LENLK): *L1 = List (Lk[i]) [: k-2]#First k-1 $L2 = List (Lk[i]) [: k-2]Panax Notoginseng L1.sort () - L2.sort () the             ifL1 = =L2: +Retlist.append (lk[i) |Lk[j]) A     returnretlist the defApriori (DataSet, minsupport=0.5): +C1 =creteC1 (DataSet) -Bmap (set, DataSet) $L1, Supportdata = Scand (d,c1,minsupport=0.7) $L =[L1] -k=2 -      whileLen (L[k-2]) >0: theCk = Apprigen (l[k-2], k) -Lk, SUPK =Scand (D, Ck, Minsupport)Wuyi supportdata.update (SUPK) the l.append (Lk) -K + = 1 Wu     returnL,supportdata - defGeneraterules (L, Supportdata, minconf=0.7): AboutBigrules = [] $      forIinchRange (1,len (L)):#from the beginning with two -          forFreqsetinchL[i]: -H1 = [Frozenset ([item]) forIteminchFreqset] -             if(i>1):#number of frequent itemsets elements greater than 2 A rulesformconseq (freqset,h1,supportdata,bigrules,minconf) +             Else: the calcconf (freqset,h1,supportdata,bigrules,minconf) -     returnBigrules $ defCalcconf (Freqset, H, supportdata,brl,minconf=0.7): thePrunedh = [] the      forConseqinchH: theconf = Supportdata[freqset]/supportdata[freqset-Conseq] the         PrintSupportdata[freqset], Supportdata[freqset-Conseq] -         ifConf >=minconf: in             PrintFreqset-conseq,' -', Conseq,'conf', Conf theBrl.append ((freqset-conseq,conseq,conf)) the prunedh.append (CONSEQ) About     returnPrunedh the defRulesfromconseq (freqset,h,supportdata,brl,minconf=0.7): them =Len (h[0]) the     ifLen (freqset) > M+1: +HMP1 = Apprigen (h,m+1) -HMP1 =calcconf (freqset,hmp1,supportdata,brl,minconf) the         ifLen (HMP1) >1:Bayi rulesfromconseq (freqset,hmp1,supportdata,brl,minconf) the defMain (): theDataSet =Loaddataset () -L,supportdata = Apriori (DataSet, minsupport=0.7) -     PrintL theRules = Generaterules (l,supportdata,minconf=0.7) the     Printrules the      the if __name__=='__main__': -Main ()

From for notes (Wiz)

Correlation Mining and Aprioir algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More