The previous article introduced the open source data mining software Weka to do Association rules mining, Weka convenient and practical, but can not handle large data sets, because the memory is not fit, give it more time is useless, so need to carry out distributed computing, Mahout is a based on Hadoop Cloth Data Mining Open source project (Mahout originally refers to a man riding on an elephant). Master the basic algorithm and use of association rules, combined with the mining of Distributed Association rules, we can deal with the basic mining of association rules, in practice, only need to grasp the business, understand the data can be easily.
Install Mahout
Riding on an elephant chevalier must need a male-corrected elephant, but this article does not understand the elephant Hadoop, so I assume that the Hadoop has been installed, on the installation of Hadoop, please google.
Download mahout8.0 to Apache website
Extract
TAR-ZXVF mahout-distribution-0.8.tar.gz
Move
sudo mv Tar mahout-distribution-0.8/usr/local/mahout-8
Configuration
sudo gedit/etc/profile
Enter the following content:
Export mahout_home=/usr/local/mahout-8
export path= $MAHOUT _home/bin: $PATH
Export hadoop_home=/usr/local/ Hadoop
Export path= $HADOOP _home/bin: $PATH
Quit the user to log on again to make the configuration file effective. Enter the Mahout-version test whether the installation was successful.
Data preparation
Download a shopping basket data retail.dat to http://fimi.ua.ac.be/data/.
Upload to Hadoop file system
Hadoop fs-mkdir /user/hadoop/mahoutdata #创建目录
Hadoop fs-put ~/data/retail.dat/user/hadoop/mahoutdata
Call Fpgrowth algorithm
Mahout fpg-i/user/hadoop/mahoutdata/retail.dat-o patterns-method mapreduce-s 1000-
regex ' [\] '
-I indicates that the-output,-s represents the minimum input,-o, and ' [\] ' means that the data in the row is separated by a space.
After a two-minute execution, the resulting file is serialized, the direct view will be garbled, so it needs to be restored back with Mahout:
Mahout seqdumper-i/user/hadoop/patterns/fpgrowth/part-r-00000-o
~/data/patterns.txt
Output results:
Key:39:value: ([39],50675)
key:48:value: ([48],42135), ([, 48],29142)
key:38:value: ([38],15596), ([39, 38 ],10345), ([38],7944, 38],6102)
key:32:value: ([32],15167), ([39, 32],8455), ([48, 32],8034), ([39, 4 8, 32],5402), ([32],2833, 32],1840), ([A., 32],1646), ([A, A,
32],1236)
Key:41:value: ( [41],14945], ([41],11414, 41],9018), ([38, 41],7366), ([39, 41],3897), ([
32, 41],3196), ([38, 41],3 051), ([41],2374, 41],2359), ([
48, 32, 41],2063), ([39, 48, 38, 41],1991), ([39, 48, 32, 41],1646) C8/>key:65:value: ([65],4472), ([, 65],2787), ([65],2529), ([A, 65],1797)
key:89:value: ([89],3837), ([ ([89],2798, 89],2749), ([89],2125] Key:225:value: ([225],3257), ([
39, 225],2351), ([48, 225],1736) , ([39, 48, 225],1400)
This output is only frequent itemsets, but it is not difficult to extract association rules on this basis.
Source: Www.cnblogs.com/fengfenggirl
See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/sjjg/