One of zero-basic learning mahout: Building a standalone Environment

Source: Internet
Author: User
1. What is mahout?

Mahout is an open-source project (http://mahout.apache.org/) of Apache that provides several classic algorithms in the machine learning field, allowing developers to quickly build machine learning and data mining applications.

Mahout is based on hadoop. The name is also very interesting. hadoop is the name of an elephant, while mahout is like a husband and a viewer. It can be seen that the two are closely related. (This naturally reminds me of sun and eclipse ...)

 

At this time, I am a layman who has never used mahout. I have no practical experience in hadoop, and it is a real basic. My goal is to build a mahout development environment in the simplest way, so as to make mahout work as quickly as possible for further study.

 

As a result, after some tossing, we will have this article. This document recordsEclipse + Maven + mahoutBuild a standalone environment. I don't know if this is the simplest mahout development environment, but it should be relatively simple.

 

Ii. Install eclipse

I have nothing to say about it. I need to use eclipse if there is no foundation.

For Eclipse: http://www.eclipse.org/downloads/, select the Standard Edition.

Run eclipse after installation to prepare for subsequent installation.

 

3. Install Maven

What is Maven? You only need to know That mahout is a project management tool. With it, you can install mahout and related dependent components very efficiently.

Maven Official Website: Alibaba (m2eclipse). Its address is http://www.eclipse.org/m2e/. you can skip this section if you have installed maven.

 

The following describes how to install m2eclipse.

Go to the download page: http://www.eclipse.org/m2e/download/

 

There are two ways to install m2eclipse: one is to press the mouse on the install icon above, and then drag it to the eclipse window (to see where the mouse can be released and dragged to where, such as the title bar ), the following dialog box will pop up later. Click "Confirm.

 

The second method is to install the new software under the eclipse help menu:

Click the above menu, the following dialog box will pop up, click the Add button, then enter name and location (http://download.eclipse.org/technology/m2e/releases), location is copied from the previous page.

After confirmation, the following content is displayed. Select all projects and click Next.

Agree to the authorization. Click Finish to automatically install the maven plug-in.

After installation, you can use help> about> Installation Details to confirm the installed plug-in.

 

4. Use Maven to build a mahout Project

Run eclipse, choose File> New> project to create a project, and select Maven project.

Direct next

Select Maven-Archetype-Quickstart

Enter groupid and artifactid. You can name them as needed:

After clicking finish, eclipse will create a project as follows:

Double-click Pom. XML, select dependencies in the panel on the right, click Add, and enter mahout in the pop-up dialog box. MAVEN will search for related packages, select mahout-core, and click OK.

 

Then press Ctrl + S to save Pom. xml. MAVEN will download the relevant jar packages, which can be seen in the dependencies directory of the project.

At this point, our environment has been set up, and the next step is to write code.

 

5. write code and run the program

Double-click app. Java to edit it.

As a first attempt, you must select a simple algorithm. Here, I use the user-based collaborative filtering algorithm to calculate recommendation products. The complete code is as follows:

1 package COM. mine. mahout. practice; 2 3 Import Java. io. file; 4 Import Java. util. list; 5 6 Import Org. apache. mahout. cf. taste. impl. model. file. filedatamodel; 7 Import Org. apache. mahout. cf. taste. impl. neighborhood. nearestnuserneighborhood; 8 Import Org. apache. mahout. cf. taste. impl. recommender. genericuserbasedrecommender; 9 Import Org. apache. mahout. cf. taste. impl. similarity. pearsoncorrelationsimilarity; 10 Import Org. apache. mahout. cf. taste. model. datamodel; 11 import Org. apache. mahout. cf. taste. neighborhood. userneighborhood; 12 Import Org. apache. mahout. cf. taste. recommender. recommendeditem; 13 Import Org. apache. mahout. cf. taste. recommender. recommender; 14 Import Org. apache. mahout. cf. taste. similarity. usersimilarity; 15 16 17 public class app 18 {19 public static void main (string [] ARGs) 20 {21 try {22 // load data from a file 23 datamodel model = new filedatamodel (new file ("E: \ data.csv ")); 24 // specify the user similarity calculation method. Pearson correlation 25 usersimilarity similarity = new pearsoncorrelationsimilarity (model); 26 // specify the number of user neighbors, 227 userneighborhood neighborhood = new feature (2, similarity, model); 28 // build a user-based recommendation system 29 recommender = new feature (model, neighborhood, similarity ); 30 // get the recommendation result of the specified user. Here we get the 31 List of two recommendations of user 1 <recommendeditem> recommendations = recommender. recommend (1, 2); 32 // print recommendation result 33 for (recommendeditem recommendation: Recommendations) {34 system. out. println (recommendation); 35} 36} catch (exception e) {37 system. out. println (E); 38} 39} 40}

The preceding E: \ data.csv is a data file. The first column of the data is the user ID, the second column is the product ID, and the third column is the user's score on the product:

1,101,51,102,31,103,2.52,101,22,102,2.52,103,52,104,23,101,2.53,104,43,105,4.53,107,54,101,54,103,34,104,4.54,106,45,101,45,102,35,103,25,104,45,105,3.55,106,4

(Note: The above code and test data are referred from this blog post: http://blog.csdn.net/aidayei/article/details/6626699)

Then you can run the program. Select Java application:

 

Select an app, or run the app. Java directly without running the entire project.

The output result is as follows:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".SLF4J: Defaulting to no-operation (NOP) logger implementationSLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.RecommendedItem[item:104, value:4.257081]RecommendedItem[item:106, value:4.0]

As you can see, mahout provides two recommended products for user 1, which are 104 and 106, respectively.

 

In this way, we have completed the first mahout program. Is the whole process simple? I hope it will be helpful to my friends who have "no foundation.

 

Supplement: The preceding running results show three lines of red letters, indicating that the loading of staticloggerbinder in slf4j fails. Although it does not affect the running result, it is annoying. The solution is to edit pom again. XML dependencies, add a slf4j-nop package.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.