Data Mining Series (5) using Mahout to do the mining of mass Data Association rules

Source: Internet
Author: User
Tags file system hadoop fs

The previous article introduced the open source data mining software Weka to do Association rules mining, Weka convenient and practical, but can not handle large data sets, because the memory is not fit, give it more time is useless, so need to carry out distributed computing, Mahout is a based on Hadoop Cloth Data Mining Open source project (Mahout originally refers to a man riding on an elephant). Master the basic algorithm and use of association rules, combined with the mining of Distributed Association rules, we can deal with the basic mining of association rules, in practice, only need to grasp the business, understand the data can be easily.

Install Mahout

Riding on an elephant chevalier must need a male-corrected elephant, but this article does not understand the elephant Hadoop, so I assume that the Hadoop has been installed, on the installation of Hadoop, please google.

Download mahout8.0 to Apache website

Extract

TAR-ZXVF mahout-distribution-0.8.tar.gz

Move

sudo mv Tar mahout-distribution-0.8/usr/local/mahout-8

Configuration

sudo gedit/etc/profile

Enter the following content:

Export mahout_home=/usr/local/mahout-8
export path= $MAHOUT _home/bin: $PATH
Export hadoop_home=/usr/local/ Hadoop
Export path= $HADOOP _home/bin: $PATH

Quit the user to log on again to make the configuration file effective. Enter the Mahout-version test whether the installation was successful.

Data preparation

Download a shopping basket data retail.dat to http://fimi.ua.ac.be/data/.

Upload to Hadoop file system

Hadoop fs-mkdir  /user/hadoop/mahoutdata #创建目录
Hadoop fs-put ~/data/retail.dat/user/hadoop/mahoutdata
Call Fpgrowth algorithm
Mahout fpg-i/user/hadoop/mahoutdata/retail.dat-o patterns-method mapreduce-s 1000-

regex ' [\] '

-I indicates that the-output,-s represents the minimum input,-o, and ' [\] ' means that the data in the row is separated by a space.

After a two-minute execution, the resulting file is serialized, the direct view will be garbled, so it needs to be restored back with Mahout:

 Mahout seqdumper-i/user/hadoop/patterns/fpgrowth/part-r-00000-o 

~/data/patterns.txt

Output results:

Key:39:value: ([39],50675)
key:48:value: ([48],42135), ([, 48],29142)
key:38:value: ([38],15596), ([39, 38 ],10345), ([38],7944, 38],6102)
key:32:value: ([32],15167), ([39, 32],8455), ([48, 32],8034), ([39, 4 8, 32],5402), ([32],2833, 32],1840), ([A., 32],1646), ([A, A, 

32],1236)
Key:41:value: ( [41],14945], ([41],11414, 41],9018), ([38, 41],7366), ([39, 41],3897), ([ 

32, 41],3196), ([38, 41],3 051), ([41],2374, 41],2359), ([ 

48, 32, 41],2063), ([39, 48, 38, 41],1991), ([39, 48, 32, 41],1646) C8/>key:65:value: ([65],4472), ([, 65],2787), ([65],2529), ([A, 65],1797)
key:89:value: ([89],3837), ([ ([89],2798, 89],2749), ([89],2125] Key:225:value: ([225],3257), ([
39, 225],2351), ([48, 225],1736) , ([39, 48, 225],1400)

This output is only frequent itemsets, but it is not difficult to extract association rules on this basis.

Source: Www.cnblogs.com/fengfenggirl

See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/sjjg/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.