Today, we investigate the parallel frequent pattern mining algorithm PFP growth and its command use under Mahout, simply record the test results for later reference:
Environment: Jdk1.7 + Hadoop2.2.0 stand-alone pseudo cluster + Mahout0.6 (both versions 0.8 and 0.9 do not include this algorithm.) Mahout0.6 can have a bit of an accident with Hadoop2.2.0 Orz)
Part of the input data, the input data line represents a shopping basket:
4750,19394,25651,6395,5592
26180,10895,24571,23295,20578,27791,2729,8637
7380,18805,25086,19048,3190,21995,10908,12576
3458,12426,20578
1880,10702,1731,5185,18575,28967
21815,10872,18730
20626,17921,28930,14580,2891,11080
18075,6548,28759,17133
7868,15200,13494
7868,28617,18097,22999,16323,8637,7045,25733
12189,8816,22950,18465,13258,27791,20979
26728
17512,14821,18741
26619,14470,21899,6731
5184
28653,28662,18353,27437,5661,12078,11849,15784,7248,7061,18612,24277,4807,15584,9671,18741,3647,1000
。。。。。。
To execute a command:
Mahout fpg-i/workspace/dataguru/hadoopdev/week13/fpg/in/-o/workspace/dataguru/hadoopdev/week13/fpg/out-method Mapreduce-s 3
Parameter description:
-I input path, because run in the Hadoop environment, so the input path must be HDFs path, the experimental input path is/workspace/dataguru/hadoopdev/week13/fpg/in/user2items.csv
-O output path, specifying the output path in HDFs
See the following table for a full parameter description:
The output directory after the command executes:
casliyang@singlehadoop:~$ Hadoop dfs-ls/workspace/dataguru/hadoopdev/week13/fpg/out
Deprecated:use of this script to execute HDFS the command is deprecated.
Instead Use the HDFs command for it.
Found 4 Items
-rw-r--r--3 casliyang supergroup 5567 2014-06-17 17:50/workspace/dataguru/hadoopdev/week13/fpg/out/flist
Drwxr-xr-x-casliyang supergroup 0 2014-06-17 17:51/workspace/dataguru/hadoopdev/week13/fpg/out/fpgrowth
Drwxr-xr-x-casliyang supergroup 0 2014-06-17 17:51/workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatte RNs
Drwxr-xr-x-casliyang supergroup 0 2014-06-17 17:50/workspace/dataguru/hadoopdev/week13/fpg/out/parallelcount Ing
The frequent patterns dug out under the Frequentpatterns folder
casliyang@singlehadoop:~$ Hadoop dfs-ls/workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatterns
Deprecated:use of this script to execute HDFS the command is deprecated.
Instead Use the HDFs command for it.
Found 2 Items
-rw-r--r--3 casliyang supergroup 0 2014-06-17 17:51/workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatte Rns/_success
-rw-r--r--3 casliyang supergroup 10017 2014-06-17 17:51/workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatte rns/part-r-00000
The file is a serialized file and cannot be viewed directly, Mahout provides a command to convert it to normal text:
More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/sjjg/
Mahout seqdumper-s/workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatterns/part-r-00000-o/home/casliyang/ Outpattern
Note here that the output file path specified by-o must be a Linux file system, and the target file must be created in advance, or it will be an error.
Partial results of final output to/home/casliyang/outpattern
Key:29099:value: ([29099],18), ([29099, 4479],3)
Key:29202:value: ([29202],3)
Key:29203:value: ([29203],9), ([14020, 29203],3)
Key:29224:value: ([29224],3)
Key:29547:value: ([29547],5)
Key:2963:value: ([2963],8), ([2963, 21146],3)
Key:2999:value: ([2999],3)
Key:3032:value: ([3032],4)
Key:3047:value: ([3047],4)
Key:3151:value: ([3151],7), ([14020, 3151],4)
Key:3181:value: ([3181],3)
Key:3228:value: ([3228],14)
Key:3313:value: ([3313],3)
Key:3324:value: ([3324],3)
Key:3438:value: ([3438],3)
Key:3458:value: ([3458],4)
Key:3627:value: ([3627],11), ([3627, 11176],3)
。。。。。。
Meaning:
Key:itemid
Value: The frequent pattern associated with the item and its degree of support
With the mining out of the frequent patterns, you can further use the program according to business needs to do processing.
Mahout is really a great open source project!
Author: csdn Blog u010967382