Parallel frequent pattern mining algorithm FP growth and its command usage under Mahout

Last Update:2017-02-27 Source: Internet

Author: User

Tags deprecated file system

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Today, we investigate the parallel frequent pattern mining algorithm PFP growth and its command use under Mahout, simply record the test results for later reference:

Environment: Jdk1.7 + Hadoop2.2.0 stand-alone pseudo cluster + Mahout0.6 (both versions 0.8 and 0.9 do not include this algorithm.) Mahout0.6 can have a bit of an accident with Hadoop2.2.0 Orz)

Part of the input data, the input data line represents a shopping basket:

4750,19394,25651,6395,5592
26180,10895,24571,23295,20578,27791,2729,8637
7380,18805,25086,19048,3190,21995,10908,12576
3458,12426,20578
1880,10702,1731,5185,18575,28967
21815,10872,18730
20626,17921,28930,14580,2891,11080
18075,6548,28759,17133
7868,15200,13494
7868,28617,18097,22999,16323,8637,7045,25733
12189,8816,22950,18465,13258,27791,20979
26728
17512,14821,18741
26619,14470,21899,6731
5184
28653,28662,18353,27437,5661,12078,11849,15784,7248,7061,18612,24277,4807,15584,9671,18741,3647,1000

。。。。。。

To execute a command:

Mahout fpg-i/workspace/dataguru/hadoopdev/week13/fpg/in/-o/workspace/dataguru/hadoopdev/week13/fpg/out-method Mapreduce-s 3

Parameter description:

-I input path, because run in the Hadoop environment, so the input path must be HDFs path, the experimental input path is/workspace/dataguru/hadoopdev/week13/fpg/in/user2items.csv

-O output path, specifying the output path in HDFs

See the following table for a full parameter description:

The output directory after the command executes:

casliyang@singlehadoop:~$ Hadoop dfs-ls/workspace/dataguru/hadoopdev/week13/fpg/out
Deprecated:use of this script to execute HDFS the command is deprecated.
Instead Use the HDFs command for it.
Found 4 Items
-rw-r--r--3 casliyang supergroup 5567 2014-06-17 17:50/workspace/dataguru/hadoopdev/week13/fpg/out/flist
Drwxr-xr-x-casliyang supergroup 0 2014-06-17 17:51/workspace/dataguru/hadoopdev/week13/fpg/out/fpgrowth
Drwxr-xr-x-casliyang supergroup 0 2014-06-17 17:51/workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatte RNs
Drwxr-xr-x-casliyang supergroup 0 2014-06-17 17:50/workspace/dataguru/hadoopdev/week13/fpg/out/parallelcount Ing

The frequent patterns dug out under the Frequentpatterns folder

casliyang@singlehadoop:~$ Hadoop dfs-ls/workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatterns
Deprecated:use of this script to execute HDFS the command is deprecated.
Instead Use the HDFs command for it.
Found 2 Items
-rw-r--r--3 casliyang supergroup 0 2014-06-17 17:51/workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatte Rns/_success
-rw-r--r--3 casliyang supergroup 10017 2014-06-17 17:51/workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatte rns/part-r-00000

The file is a serialized file and cannot be viewed directly, Mahout provides a command to convert it to normal text:

More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/sjjg/

Mahout seqdumper-s/workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatterns/part-r-00000-o/home/casliyang/ Outpattern

Note here that the output file path specified by-o must be a Linux file system, and the target file must be created in advance, or it will be an error.

Partial results of final output to/home/casliyang/outpattern

Key:29099:value: ([29099],18), ([29099, 4479],3)
Key:29202:value: ([29202],3)
Key:29203:value: ([29203],9), ([14020, 29203],3)
Key:29224:value: ([29224],3)
Key:29547:value: ([29547],5)
Key:2963:value: ([2963],8), ([2963, 21146],3)
Key:2999:value: ([2999],3)
Key:3032:value: ([3032],4)
Key:3047:value: ([3047],4)
Key:3151:value: ([3151],7), ([14020, 3151],4)
Key:3181:value: ([3181],3)
Key:3228:value: ([3228],14)
Key:3313:value: ([3313],3)
Key:3324:value: ([3324],3)
Key:3438:value: ([3438],3)
Key:3458:value: ([3458],4)
Key:3627:value: ([3627],11), ([3627, 11176],3)

。。。。。。

Meaning:

Key:itemid

Value: The frequent pattern associated with the item and its degree of support

With the mining out of the frequent patterns, you can further use the program according to business needs to do processing.

Mahout is really a great open source project!

Author: csdn Blog u010967382

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More