Install and configure mahout in ubuntu10.04

Source: Internet
Author: User
Tags hadoop fs
Document directory
  • 3. installation steps:

 

How to install mahout in ubuntu10.04

After one or two days of familiarity, I have mastered mahout installation. The following describes the installation steps, as shown in the figure.

 

 

1 software requirements:

1. jdk-6u27-linux-i586.bin

2. apache-maven-2.2.1-bin.tar.gz

3. hadoop-0.20.204.0.tar.gz (do not use the latest version, will report an error)

4. mahout-distribution-0.5.tar.gz

2 key steps:

1. Copy the software to a folder.

2. decompress the software.

3. Configure environment variables. /Etc/profile

3. installation steps:

1. Download the above software. JDK is available on the official Oracle website. Maven, hadoop, and mahout are all downloaded on the official Apache website.

2. Copy the software to/tmp and download it from the USB flash drive or directly from the Internet.

The four files in the red pen circle can be directly pulled from the USB flash drive to the/tmp directory.

Note: Go to the/tmp directory and follow the steps shown in red:

 

3. copy the preceding four files to the/usr/mahout directory one by one. Because the files under TMP will disappear after the next restart, the files must be copied to other directories, which causes permission problems, run the command line command to copy the file to the terminal, as shown in the following figure]

 

4.

[Mahout is the new directory, step: CD/usr {Switch directory to USR}; mkdir mahout {create mahout directory }]

Copy four files to/usr/mahout:

À CP jdk-6u27-linux-i586.bin/usr/mahout

À CP apache-maven-2.2.1-bin.tar.gz/usr/mahout

À CP hadoop-0.20.204.0.tar.gz/usr/mahout

À CP mahout-distribution-0.5.tar.gz/usr/mahout

 

5. If the replication fails, there are two ways to solve the problem:

First, add permissions for each file and then copy the file.

The second is to switch to the Super User (root/su) permission:

Run the following command to activate su: sudo passwd root:

First, confirm the password of the First Login User. Then, enter the su Super User Password, and then confirm it.

Activate the user, switch to the su user on the terminal, and then perform the preceding steps.

 

6. decompress the above files:

For *. Bin files, use./*. bin to decompress the files. For * .tar.gz or *. tgz files, use tar czvf * .tar.gz/*. tgz to decompress the files.

Once the above files are as follows:

À./jdk-6u27-linux-i586.bin

A tar zxvf apache-maven-2.2.1-bin.tar.gz

A tar zxvf hadoop-0.20.204.0.tar.gz

A tar zxvf mahout-distribution-0.5.tar.gz

Decompress the files to the/usr/mahout directory in sequence.

 

7. Configure environment variables: Add the following lines at the end of the/etc/profile file,

In the terminal, use gedit to open the/etc/profile Folder: sudo gedit/etc/profile. The following window appears. Add the following content in the Red Circle:

Note: The above content is added after umask 022. do not modify the content in the middle. Otherwise, the system cannot be started. The directory name of the file decompressed by the red strokes on the horizontal line. Modify the directory name according to the decompressed information. How can I rename all directories, google to find (MV ).

Save the file and close gedit. The colon (:) indicates the connection, and the dollar sign ($) indicates the output variable.

Run the following command on the terminal to re-import the/etc/profile file.

 

8. The basic problem is solved. Verify that all environment variables are entered in the terminal ]:

1) Verify that JDK is successfully installed: javac, Java

The above figure shows that the path and classpath in JDK are successfully configured.

2) Verify that Maven is successfully installed: MVN

The above image is displayed, and Maven is successfully installed.

3) Verify that hadoop is successfully installed: hadoop

As shown in the preceding figure, hadoop is successfully installed.

4) Verify that mahout is successfully installed: mahout

 

As shown in the preceding figure, mahout is successfully installed ,.

 

4 standalone test:

Data Preparation
CD/tmp
Wget http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data

Hadoop FS-mkdir testdata
Hadoop FS-put synthetic_control.data testdata
Hadoop FS-LSR testdata

Hadoop cluster to execute Clustering Algorithms
CD/usr/local/mahout

Mahout org. Apache. mahout. Clustering. syntheticcontrol. canopy. Job
Mahout org. Apache. mahout. Clustering. syntheticcontrol. kmeans. Job
Mahout org. Apache. mahout. Clustering. syntheticcontrol. fuzzykmeans. Job
Mahout org. Apache. mahout. Clustering. syntheticcontrol. Dirichlet. Job
Mahout org. Apache. mahout. Clustering. syntheticcontrol. meanshift. Job

If the execution succeeds, the output result should be displayed in/user/dev/output of HDFS.
Grouplens data sets
Http://www.grouplens.org/node/12,package movielens data sets, wikilens data set, book-crossing data set, Jester joke data set, eachmovie data set

Download 1 Mbit/s rating data

Mkdir 1m_rating
Wget http://www.grouplens.org/system/files/million-ml-data.tar__0.gz
Tar vxzf million-ml-data.tar__0.gz
Rm million-ml-data.tar__0.gz

Copy the data to the grouplens code directory. First, we can locally test the mahout power.
Cp *. dat/usr/local/mahout/examples/src/main/Java/org/Apache/mahout/CF/taste/example/grouplens

CD/usr/local/mahout/examples/
Run
MVN-Q Exec: Java-dexec. mainclass = "org. Apache. mahout. Cf. Taste. example. grouplens. grouplensrecommenderevaluatorrunner"
If you do not want to copy the file above, specify the location of the input file as follows:

Upload to HDFS
Hadoop FS-copyfromlocal 1m_rating/mahout_input/1 mrating
MVN-Q Exec: Java-dexec. mainclass = "org. apache. mahout. cf. taste. example. grouplens. grouplensrecommenderevaluatorrunner "-dexec. ARGs = "-I mahout_input/1 mrating"

 

 

5. Notes:

1. The latest version of haoop must not be used. Otherwise, an exception will be thrown when hadoop is input.

2. There are several ways to set environment variables:/etc/environment ,~ /. Bashrc and other methods. They have different priorities in Linux, specific Google queries.

3. to learn how to install the tool, you must be familiar with the commands in Linux. At first, when I set the environment variable/etc/profile file, I used the VI/etc/profile command, that is: using the VI editor to modify files is quite annoying because I am not familiar with VI and even have no idea how to delete a character.

6. Expand the environment [install eclipse or myeclipse ]:

Install eclipse or myeclipse development environment:

Download the file in Linux :*. GTZ or * .tar.gz, decompress the file, find a file related to *-install and double-click it. It will be automatically installed (for myeclipse Ga). If the environment variable is successfully configured After decompressing eclipse, you can double-click the eclipse icon. If eclipse cannot find the JRE error, map a soft connection in JRE under eclipse to JRE in JDK.

 

The basic process is as follows. It's easy ....

Start data mining later...

 

 

 

 

 

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.