Parallel frequent pattern mining algorithm FP growth and its command usage under Mahout

Source: Internet
Author: User
Tags deprecated file system

Today, we investigate the parallel frequent pattern mining algorithm PFP growth and its command use under Mahout, simply record the test results for later reference:

Environment: Jdk1.7 + Hadoop2.2.0 stand-alone pseudo cluster + Mahout0.6 (both versions 0.8 and 0.9 do not include this algorithm.) Mahout0.6 can have a bit of an accident with Hadoop2.2.0 Orz)

Part of the input data, the input data line represents a shopping basket:

4750,19394,25651,6395,5592
26180,10895,24571,23295,20578,27791,2729,8637
7380,18805,25086,19048,3190,21995,10908,12576
3458,12426,20578
1880,10702,1731,5185,18575,28967
21815,10872,18730
20626,17921,28930,14580,2891,11080
18075,6548,28759,17133
7868,15200,13494
7868,28617,18097,22999,16323,8637,7045,25733
12189,8816,22950,18465,13258,27791,20979
26728
17512,14821,18741
26619,14470,21899,6731
5184
28653,28662,18353,27437,5661,12078,11849,15784,7248,7061,18612,24277,4807,15584,9671,18741,3647,1000

。。。。。。

To execute a command:

Mahout fpg-i/workspace/dataguru/hadoopdev/week13/fpg/in/-o/workspace/dataguru/hadoopdev/week13/fpg/out-method Mapreduce-s 3

Parameter description:

-I input path, because run in the Hadoop environment, so the input path must be HDFs path, the experimental input path is/workspace/dataguru/hadoopdev/week13/fpg/in/user2items.csv

-O output path, specifying the output path in HDFs

See the following table for a full parameter description:

The output directory after the command executes:

casliyang@singlehadoop:~$ Hadoop dfs-ls/workspace/dataguru/hadoopdev/week13/fpg/out
Deprecated:use of this script to execute HDFS the command is deprecated.
Instead Use the HDFs command for it.
Found 4 Items
-rw-r--r--3 casliyang supergroup 5567 2014-06-17 17:50/workspace/dataguru/hadoopdev/week13/fpg/out/flist
Drwxr-xr-x-casliyang supergroup 0 2014-06-17 17:51/workspace/dataguru/hadoopdev/week13/fpg/out/fpgrowth
Drwxr-xr-x-casliyang supergroup 0 2014-06-17 17:51/workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatte RNs
Drwxr-xr-x-casliyang supergroup 0 2014-06-17 17:50/workspace/dataguru/hadoopdev/week13/fpg/out/parallelcount Ing

The frequent patterns dug out under the Frequentpatterns folder

casliyang@singlehadoop:~$ Hadoop dfs-ls/workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatterns
Deprecated:use of this script to execute HDFS the command is deprecated.
Instead Use the HDFs command for it.
Found 2 Items
-rw-r--r--3 casliyang supergroup 0 2014-06-17 17:51/workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatte Rns/_success
-rw-r--r--3 casliyang supergroup 10017 2014-06-17 17:51/workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatte rns/part-r-00000

The file is a serialized file and cannot be viewed directly, Mahout provides a command to convert it to normal text:

More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/sjjg/

Mahout seqdumper-s/workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatterns/part-r-00000-o/home/casliyang/ Outpattern

Note here that the output file path specified by-o must be a Linux file system, and the target file must be created in advance, or it will be an error.

Partial results of final output to/home/casliyang/outpattern

Key:29099:value: ([29099],18), ([29099, 4479],3)
Key:29202:value: ([29202],3)
Key:29203:value: ([29203],9), ([14020, 29203],3)
Key:29224:value: ([29224],3)
Key:29547:value: ([29547],5)
Key:2963:value: ([2963],8), ([2963, 21146],3)
Key:2999:value: ([2999],3)
Key:3032:value: ([3032],4)
Key:3047:value: ([3047],4)
Key:3151:value: ([3151],7), ([14020, 3151],4)
Key:3181:value: ([3181],3)
Key:3228:value: ([3228],14)
Key:3313:value: ([3313],3)
Key:3324:value: ([3324],3)
Key:3438:value: ([3438],3)
Key:3458:value: ([3458],4)
Key:3627:value: ([3627],11), ([3627, 11176],3)

。。。。。。

Meaning:

Key:itemid

Value: The frequent pattern associated with the item and its degree of support

With the mining out of the frequent patterns, you can further use the program according to business needs to do processing.

Mahout is really a great open source project!

Author: csdn Blog u010967382

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.