FP Association Rules Mining

Last Update:2014-12-23 Source: Internet

Author: User

Keywords nbsp; the rules very this

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

FP Association Rules Mining Blog categories: Hadoop mapreducehadoop last year, the company 1 Demolition 4, and then demolition 3, in the demolition 25, really 72 move changes, I see the cock silk a burst of dread, but a year later did not affect my colleagues and my work, also did not hear some negative news, Nice, it looks like the level of a big cut. A large result of the demolition 25 is that the front flow is bound to be divided, this should be very tangled, a bit far, stop. This year, my technical direction has bi-directional algorithm More, this is my personal interest, the team focused on CRM this piece, now mention more is CEM, as if you also mention CRM is embarrassed and people say hello. In order to improve the user experience, so in doing a user behavior analysis of the east, the idea is to collect user behavior, better service members, one of the landing point is based on membership status, behavior speculated that the destination of electricity, that is, what the problem. Association rules algorithm Mainstream has 3, Apriori, based on the division algorithm, FP, they have their own focus, encyclopedia address: http://baike.baidu.com/view/1076817.htm

basic ideas are to find frequent itemsets apriori iteration of the way to find, inefficient, but the idea is very clear, based on the planning of the optimization of its performance, FP is the Han Jiawei design algorithm, only need to scan 2 databases, performance has a great upgrade, And the main mahout has a corresponding matter, mahout for MapReduce support friendly, so chose it.

Environmental

fp wiki https://cwiki.apache.org/confluence/display/MAHOUT/Parallel+Frequent+Pattern+Mining

Hadoop 0.20.2+mahout 0.5 +jdk 1.6 Different versions are incompatible, this pit I stepped over, pits and installs to see my previous article. Many problems encountered during installation some Baidu can be solved, there are many need to Google to see the forum of foreigners, think that every time Baidu did not make use of Google, the last foreigner's explanation is quite simple, but can solve the problem, so strongly recommend before the search to see whether the foreigners directly to the forum, Feel that the domestic problems are always richer.

Second, talk about the algorithm before the first talk about how to configure Eclipse to debug,

1, Eclipse installs MapReduce plug-in, find an installation on the net on line, should with the Hadoop version does not matter

2, this time you need to configure the plug-in's Hadoop information, because you need to interact with the Hadoop environment, you need to know the Namenode listening port and Jobtracker listening port, if you have forgotten your own configuration, then look at the file. Different Hadoop versions of the configuration files are also different, mine is hadoop0.20.2 (this is also a pit)

core-site.xml file Fs.default.name

mapred-site.xml file Mapred.job.tracker

so you can run and debug in Eclipse

Two, algorithm logic

Program main entrance is a fpgrowthdriver is actually a startup class, do some input parameter parsing, such as input and output, according to the parameters of the choice of stand-alone or distributed computing, by method specified, specific parameters to see Mahout- Fp-treewiki page (or you will be prompted for incorrect input parameters), my method specifies the MapReduce, the following code:

if ("Sequential". Equalsignorecase (Classificationmethod)) {runfpgrowth (params);} else if ("MapReduce". Equalsignorecase (Classificationmethod)) {Revisit conf = new revisit (); Hadooputil.delete (conf, outputdir); Pfpgrowth.runpfpgrowth (params);

Pfpgrowth.runpfpgrowth Main computational logic is always in this method, this method calls 5 methods, so the calculation process can be divided into 5 steps, I explained in detail what each step has done, before there is a reference to another blog, the first few steps to say very detailed, and there are pictures, but many details did not mention, blog address is: http ://www.cnblogs.com/zhangchaoyang/articles/2198946.html

1, Count

startparallelcounting This is a wordcount, for each element in db to do count, can be called count, so as to facilitate memory and understanding, output is a

flist, [(Potato chips, 7), (bread, 7), (eggs, 7), (Milk, 5), (beer, 4)] This will be used later, so these variable names are best understood and remembered.

2, Group

startgroupingitems flist randomly divided into n groups, each group put Maxpergroup element maxpergroup=flist.size ()/numgroups;

grouped data into glist {potato chips = 0, milk = 3, eggs = 2, bread = 1, beer = 4} number is group ID

did not use the MapReduce, the local completion,

The
flist corresponds to the mapping relationship of the element or element corresponding number to the group

3, Code

starttransactionsorting numbering, go heavy, sort, input and output similar to the following, but [0, 1, 2] is not an array, but a transactiontree, and now the position logic is very clear, the fourth step began to build the tree structure

milk, eggs, bread, potato Chips->> Single branch tree [0, 1, 2]

4, Tree

startparallelfpgrowth

Map: Read the third step output tree, and split into multiple trees: such as [0, 1, 2]--> [0, 1, 2] [0,1] [0] Output k=groupid (flist 2 corresponding groupId) v= corresponding tree

Reducer: In the third step, the output of the same groupid is constructed into a corresponding tree. It then iterates through the header entries and finds all the parent nodes recursively from the tree, so that the 1 elements in the header entry correspond to multiple paths, and then take these paths as input to step three, iterating over the

5, Mining

Startaggregating According to the fourth step output outputs top K frequent mode

PS: Time is more urgent, today to engage in the environment and do ETL, first write these, follow-up updates, found that there is no I refer to the blog to write the details, others have a map, the follow-up will also do a map, and then some of my code notes extracted out, will be more clear, convenient to see the people, executive power, executive power Ah, The picture will be there. Call it a call, go home ... 2013-03-30

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More