Correlation Mining and Aprioir algorithm

Source: Internet
Author: User
Tags new set

Apriori algorithm
    • Advantages: Easy Code Implementation
    • Cons: May be slow on large data sets
    • Applicable data type: numerical or nominal type
Algorithm process:

Correlation analysis is a task of finding interesting relationships in a large-scale data set, where there are two interesting relationships: frequent itemsets (frequent item sets) or association Rules (Association).
Support: The level of support for an item set is defined as the proportion of records in that set in the dataset.
Confidence level (confidence): The confidence level of association rule A->b is expressed as support (A, B)/support (A)

There's 2^n-1 a combination of simple, violent things.
Apriori principle: If an item set is frequent, its set of children is also frequent.
In turn, it means that if an item is not frequent, then the item containing it is not a frequent item.

Here are two main processes:
1. Generate a frequent itemsets:

This is a very simple process is two sets of C, l back and forth, C is through the primary collection (like the most primitive ah, the combination of AH); L is a collection that is filtered through the support level. The process is generally as follows:
1. Build a collection of individual items based on the original data set C1
2. Calculate L1 According to C1
3. Find the C2 of the L1 can be merged
4. Repeat C2, L2, C3->.....->ck, above

2. Derivation of the Association rules:

With the frequent itemsets from the previous step, we just need to list the rules that can be listed in each frequent item set, and then calculate the confidence level and choose the confidence level that meets the requirements.

Function:

loadDataSet()
import datasets, datasets contain multiple lists, each list is an item set
createC1(dataSet)
Create a C1, extract all the individual items, the reason for using frozenset here is to use this as the dictionary key later.
scanD(D, Ck, minSupport)
Filter out CK that does not meet the minimum support level, return the satisfied LK and minimum support
apprioriGen(Lk, k)
Combine lk to get ck+1, where you can reduce the number of iterations by comparing only the first k-1 elements. For example Merge {0,1},{0,2},{1,2} merge, only need to judge once on the line
apriori(dataSet, minsupport=0.5)
Combine the above functions to complete the process. End condition is no longer able to produce a new set of items
generateRules(L, supportData, minConf=0.7)
Generate the main function of the association rule, starting with a frequent itemsets containing two items
calcConf(freqSet, H, supportData, brl, minConf=0.7)
For a given frequent itemsets freqset and the H computational confidence that can be inferred, the association rules are obtained
rulesFromConseq(freqSet, H, supportData, brl, minConf=0.7)
The difference here is that H can become more complex, for example there are now {1,2,3}-->{1}{2}, where we want to further combine H to get {"}" and thus more fully explore the association rules. This is a recursive process to know that the merge can no longer end.

  1. 1 #Coding=utf-82 defLoaddataset ():3     return[[1,3,4],[2,3,5],[1,2,3,5],[2,5]]4 defcreteC1 (dataSet):5C1 = []6      forTransactioninchDataSet:7          forIteminchTransaction:8             if[Item] not inchC1:9 c1.append ([item])Ten C1.sort () One     returnmap (FROZENSET,C1) A defScand (D, Ck, minsupport): -sscnt = {} -      forTidinchD: the          forAainchCk: -             ifCan.issubset (tid): -                 ifSscnt.has_key (CAN): -Sscnt[can] + = 1 +                 Else: -Sscnt[can] = 1 +NumItems =float (len (D)) ARetlist = [] atSupportdata = {} -      forKeyinchsscnt: -SUPPRT = Sscnt[key]/NumItems -         ifSupprt >=Minsupport: - retlist.append (Key) -Supportdata[key] =supprt in     returnRetlist,supportdata - defApprigen (lk,k): toRetlist = [] +LENLK =Len (Lk) -      forIinchRange (LENLK): the          forJinchRange (i+1, LENLK): *L1 = List (Lk[i]) [: k-2]#First k-1 $L2 = List (Lk[i]) [: k-2]Panax Notoginseng L1.sort () - L2.sort () the             ifL1 = =L2: +Retlist.append (lk[i) |Lk[j]) A     returnretlist the defApriori (DataSet, minsupport=0.5): +C1 =creteC1 (DataSet) -Bmap (set, DataSet) $L1, Supportdata = Scand (d,c1,minsupport=0.7) $L =[L1] -k=2 -      whileLen (L[k-2]) >0: theCk = Apprigen (l[k-2], k) -Lk, SUPK =Scand (D, Ck, Minsupport)Wuyi supportdata.update (SUPK) the l.append (Lk) -K + = 1 Wu     returnL,supportdata - defGeneraterules (L, Supportdata, minconf=0.7): AboutBigrules = [] $      forIinchRange (1,len (L)):#from the beginning with two -          forFreqsetinchL[i]: -H1 = [Frozenset ([item]) forIteminchFreqset] -             if(i>1):#number of frequent itemsets elements greater than 2 A rulesformconseq (freqset,h1,supportdata,bigrules,minconf) +             Else: the calcconf (freqset,h1,supportdata,bigrules,minconf) -     returnBigrules $ defCalcconf (Freqset, H, supportdata,brl,minconf=0.7): thePrunedh = [] the      forConseqinchH: theconf = Supportdata[freqset]/supportdata[freqset-Conseq] the         PrintSupportdata[freqset], Supportdata[freqset-Conseq] -         ifConf >=minconf: in             PrintFreqset-conseq,' -', Conseq,'conf', Conf theBrl.append ((freqset-conseq,conseq,conf)) the prunedh.append (CONSEQ) About     returnPrunedh the defRulesfromconseq (freqset,h,supportdata,brl,minconf=0.7): them =Len (h[0]) the     ifLen (freqset) > M+1: +HMP1 = Apprigen (h,m+1) -HMP1 =calcconf (freqset,hmp1,supportdata,brl,minconf) the         ifLen (HMP1) >1:Bayi rulesfromconseq (freqset,hmp1,supportdata,brl,minconf) the defMain (): theDataSet =Loaddataset () -L,supportdata = Apriori (DataSet, minsupport=0.7) -     PrintL theRules = Generaterules (l,supportdata,minconf=0.7) the     Printrules the      the if __name__=='__main__': -Main ()

From for notes (Wiz)



Correlation Mining and Aprioir algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.