Mahout Learning mahout Introduction, installation, configuration, entry program testing

Source: Internet
Author: User

I. Introduction of Mahout

Check the Chinese meaning of mahout--the people, and then look at Mahout logo, well, want to play with the small yellow elephant happy, have to accompany the person to accompany the man to play a trick ...

Attached Logo:

(That's him, the mahout on the Elephant's head)

Step into the text:

Mahout is a powerful data mining tool that is a collection of distributed machine learning algorithms, including: implementation, classification, clustering of distributed collaborative filtering called taste. Mahout The biggest advantage is based on Hadoop implementation, a lot of previously run on a single-machine algorithm, converted to MapReduce mode, which greatly improved the algorithm can handle the amount of data and processing performance. machine learning algorithms implemented in Mahout:

Algorithm classes

Algorithm name

Chinese name

Classification algorithm

Logistic Regression

Logistic regression

Bayesian

Bayesian

Svm

Support Vector Machine

Perceptron

Perceptron algorithm

Neural Network

Neural network

Random forests

Random Forest

Restricted Boltzmann Machines

Finite-Boltzmann machine

Clustering algorithm

Canopy Clustering

Canopy Clustering

K-means Clustering

K-mean-value algorithm

Fuzzy K-means

Fuzzy K-Mean value

Expectation maximization

EM clustering (expected maximum clustering)

Mean Shift Clustering

Mean Drift Clustering

Hierarchical clustering

Hierarchical clustering

Dirichlet Process Clustering

Dirichlet process Clustering

Latent Dirichlet Allocation

LDA Clustering

Spectral clustering

Spectral clustering

Mining Association Rules

Parallel FP Growth algorithm

Parallel FP growth algorithm

Regression

Locally Weighted Linear Regression

Local weighted linear regression

dimensionality Reduction/Vieux-

Singular Value decomposition

Singular value decomposition

Principal Components Analysis

Principal component Analysis

Independent Component Analysis

Independent component Analysis

Gaussian discriminative Analysis

Gaussian discriminant analysis

Evolutionary algorithms

Parallelization of the Watchmaker framework

Recommended/Collaborative filtering

Non-distributed recommenders

Taste (USERCF, ITEMCF, Slopeone)

Distributed recommenders

Itemcf

Calculation of vector similarity

Rowsimilarityjob

Calculate the similarity between columns

Vectordistancejob

Calculate distance between vectors

Non-map-reduce algorithm

Hidden Markov Models

Hidden Markov model

Collection method Extension

Collections

Extends Java's collections class

Second, mahout installation, configuration

First, download Mahouthttp://archive.apache.org/dist/mahout/ Second, decompressionTAR-ZXVF mahout-distribution-0.9.tar.gz Third, configure environment variables3.1. Configure MAHOUT environment variable # set Mahout environmentexport mahout_home=/home/yujianxin/mahout/mahout-distribution-0.9export mahout_conf_dir= $MAHOUT _home/confexport path= $MAHOUT _home/conf: $MAHOUT _home/bin: $PATH 3.2, configure the required Hadoop environment variables for MAHOUT # Set Hadoop environmentexport hadoop_home=/home/yujianxin/hadoop/hadoop-1.1.2
Export hadoop_conf_dir= $HADOOP _home/conf
Export path= $PATH: $HADOOP _home/binexport hadoop_home_warn_suppress=not_null   Iv. Verifying that the mahout is installed successfullyExecutes the command mahout. If you list some algorithms, you are successful, v. Use of mahout for entry-level use5.1. Start Hadoop5.2, download test data            http://archive.ics.uci.edu/ml/databases/ synthetic_control/link synthetic_control.data 5.3, upload test data Hadoop fs-put synthetic_control.data /user/root /testdata5.4   Using the Kmeans clustering algorithm in Mahout, execute the command: Mahout-core   Org.apache.mahout.clustering.syntheticcontrol.kmeans.Job takes about 9 minutes to complete clustering.  5.5 View Clustering Results     perform Hadoop fs-ls/user/root/output to view clustering results.    Live, call it off. Mahout continue to learn ... Related [Mahout Learning Mahout] Recommendation: Mahout Realized machine Learning algorithm--Iteye Blog Use command: mahout-h.   The machine learning algorithm implemented in Mahout is shown in the following table:. EM clustering (expected to maximize clustering). Parallel FP growth algorithm. Parallelization of the watchmaker framework. Non-map-reduce algorithm. Extends the Java collections class. Mahout The biggest advantage is based on Hadoop implementation, a lot of previously run on a single-machine algorithm, converted to MapReduce mode, which greatly improved the algorithm can handle the amount of data and processing performance. 0 people have posted messages and punched->> here <<-participate in the discussion. -The software talents are exempt from the language low guarantee paid study in the United States. Apache Mahout 0.8 release, machine Learning Library-Open source Chinese Community latest news Apache Mahout 0.8 released, Apache Mahout is a new open source item developed by Apache Software Foundation (ASF) The main goal is to create some scalable machine learning algorithms that developers can use for free under the license of Apache. The project has grown to its two years and currently has only one public distribution. Mahout contains many implementations, including clusters, classifications, CP and evolutionary programs. In addition, Mahout can be effectively extended to the cloud by using the Apache Hadoop library. This version is primarily a code cleanup prior to release 1.0. -Numerous performance improvements to Vector and Matrix implementations, API's and their iterators (see also MAHOUT-1192, MAHOUT-1202). Mahout Learning mahout Introduction, installation, configuration, the introduction of the program test-CSDN Blog cloud computing recommended article check the Chinese meaning of mahout--the people, and then look at the Mahout logo, well, want to play with the small yellow elephant happy, we have to accompany the man who played with the person like a fool .... (That's him, the mahout on the Elephant's head).        Mahout  is a powerful data mining tool and a collection of distributed machine learning algorithms, including: implementation, classification, clustering of distributed collaborative filtering called taste. Mahout The biggest advantage is based on Hadoop implementation, a lot of previously run on a single-machine algorithm, converted to MapReduce mode, which greatly improved the algorithm can handle the amount of data and processing performance. Machine learning algorithms implemented in Mahout:. Parallelization of the watchmaker framework. Non-map-reduce algorithm. Extended the Java Collections class. List of machine learning algorithms implemented by Mahout--Iteye blog          apache Mahout is Apachesoftware An open source project under the Foundation (ASF), which provides a number of extensible machine learning Domain Classic algorithms, is designed to help developers create smart applications more quickly and easily, and has added support for Apache Hadoop in the latest version of Mahout. Enable these algorithms to run more efficiently in a cloud computing environment. The machine learning algorithms implemented in Mahout are shown in the following table:. Parallelization of the watchmaker framework. Extends the Java collections class. 0 people have posted messages and punched->> here <<-participate in the discussion. -The software talents are exempt from the language low guarantee paid study in the United States. Mahout Introduction--Internet-Iteye blog Mahout is a distribution of machine learning and data mining, unlike other open-source data mining software, which is based on Hadoop, so the advantage of Hadoop is mahout. http://mahout.apache.org/  says scalable refers to the scalability of Hadoop. Mahout implemented some data mining algorithms with map-reduce, and solved the problem of parallel mining. The "solution" here is a preliminary concept, and many algorithms cannot be implemented in parallel with Map-reduce for various reasons. http://www.apache.org/dyn/closer.cgi/mahout/ ), unzip; in the Mahout_home/bin directory, add the following in Mahout:. Mahout deployment Practices-- CSDN Blog Cloud computing recommended article one download Mahout and unzip. Java_home mahout Run the directory where the JDK is to be specified. MAHOUT_JAVA_HOME Specifies that this variable can override the Java_home value. Hadoop_home   If configured, run on a Hadoop distributed platform or stand-alone. HADOOP_CONF_DIR Specifies the configuration file directory for Hadoop. Mahout_local   If the value of this variable is not NULL, the MAHOUT is run on a single machine. Mahout_conf_dir the path to the MAHOUT configuration file, the default value is $mahout_home/src/conf. mahout_heapsize MAHOUT The maximum heap size that is available at runtime. Changes to the environment variable, added at the end of the file. Mahout one of the practical tutorials--CSDN Blog Cloud computing recommended article Mahout Practical tutorial (i). This paper tries to establish a framework for the readers from the perspective of use, and lays the foundation for the subsequent use of mahout. This article for the original article reproduced please indicate the original URL http://blog.csdn.net/comaple, thank you. The following first gives the source code svn address and the common data set for testing, which you can download and test. Mahout SVN warehouse Address: Http://svn.apache.org/repos/asf/mahout/trunk. Movie Length data address: Http://www.grouplens.org/system/files/ml-100k.zip. 1.    mahout Introduction. 2.    applied to Recommender systems(Item-based/user-based/slopone). Go Mahout Recommended Algorithm Foundation--The Small Gull blog Mahout recommendation algorithm is divided into the following major categories. 2. Similar user definitions and quantities. 2. The calculation speed is fast when the number of users is low. 1. The similarity based on item. 1.item is even faster when it is less. 2. It is very useful when the external concept of item is easy to understand and obtain. 1 based on the Slopeone algorithm (scoring variance rule). When the item number is very small, it is also effective. You need to limit the number of diffs storage otherwise the memory grows too fast. Based on the support vector machine (the feature of item is represented by a vector, the evaluation value of each dimension). Similar to the implementation based on similar users in genericuserbasedrecommender  (based on similar item). The main difference from genericitembasedrecommender  is that the weights are calculated differently (but, the weights is not the results of some similarity Metric. Film recommendation system based on Mahout--CSDN blog recommended article Apache Mahout is an open source project under the Apache Software Foundation (ASF), providing some extensible algorithms for the machine learning domain, Designed to help developers create smart applications more quickly and easily. Classical algorithms include clustering, classification, collaborative filtering, evolutionary programming, and so on, and the support for Apache Hadoop has been added to Mahout, enabling these algorithms to run more efficiently in cloud computing environments. Installation of 2.1 jdk1.6.0_21. JDK: http://www.oracle.com/technetwork/java/javase/downloads/index.html The version I'm using is jdk-6u21-linux-i586.bin. Original Mahout Collaborative filtering itembase recommenderjob Source Analysis--mahout supports 2 m/r of jobs for itembase collaborative filtering. Below we analyze the Recommenderjob, the version is mahout-distribution-0.7. SOURCE Package Location: Org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. Recommenderjob the first few stages and itemsimilarityjob are a, but Itemsimilarityjob calculates that the similarity matrix for item is over, and Recommenderjob continues to use the similarity matrix to calculate the top N items that should be recommended to him for each user. Recommenderjob input is also a userid, itemid[, Preferencevalue] format.

Mahout Learning mahout Introduction, installation, configuration, Getting Started program test

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.