Mahout Learning mahout Introduction, installation, configuration, entry program testing

Last Update:2015-01-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Introduction of Mahout

Check the Chinese meaning of mahout--the people, and then look at Mahout logo, well, want to play with the small yellow elephant happy, have to accompany the person to accompany the man to play a trick ...

Attached Logo:

(That's him, the mahout on the Elephant's head)

Step into the text:

Mahout is a powerful data mining tool that is a collection of distributed machine learning algorithms, including: implementation, classification, clustering of distributed collaborative filtering called taste. Mahout The biggest advantage is based on Hadoop implementation, a lot of previously run on a single-machine algorithm, converted to MapReduce mode, which greatly improved the algorithm can handle the amount of data and processing performance. machine learning algorithms implemented in Mahout:

Algorithm classes	Algorithm name	Chinese name
Classification algorithm	Logistic Regression	Logistic regression
	Bayesian	Bayesian
	Svm	Support Vector Machine
	Perceptron	Perceptron algorithm
	Neural Network	Neural network
	Random forests	Random Forest
	Restricted Boltzmann Machines	Finite-Boltzmann machine
Clustering algorithm	Canopy Clustering	Canopy Clustering
	K-means Clustering	K-mean-value algorithm
	Fuzzy K-means	Fuzzy K-Mean value
	Expectation maximization	EM clustering (expected maximum clustering)
	Mean Shift Clustering	Mean Drift Clustering
	Hierarchical clustering	Hierarchical clustering
	Dirichlet Process Clustering	Dirichlet process Clustering
	Latent Dirichlet Allocation	LDA Clustering
	Spectral clustering	Spectral clustering
Mining Association Rules	Parallel FP Growth algorithm	Parallel FP growth algorithm
Regression	Locally Weighted Linear Regression	Local weighted linear regression
dimensionality Reduction/Vieux-	Singular Value decomposition	Singular value decomposition
	Principal Components Analysis	Principal component Analysis
	Independent Component Analysis	Independent component Analysis
	Gaussian discriminative Analysis	Gaussian discriminant analysis
Evolutionary algorithms	Parallelization of the Watchmaker framework
Recommended/Collaborative filtering	Non-distributed recommenders	Taste (USERCF, ITEMCF, Slopeone)
Recommended/Collaborative filtering	Distributed recommenders	Itemcf
Calculation of vector similarity	Rowsimilarityjob	Calculate the similarity between columns
Calculation of vector similarity	Vectordistancejob	Calculate distance between vectors
Non-map-reduce algorithm	Hidden Markov Models	Hidden Markov model
Collection method Extension	Collections	Extends Java's collections class

Second, mahout installation, configuration

First, download Mahouthttp://archive.apache.org/dist/mahout/ Second, decompressionTAR-ZXVF mahout-distribution-0.9.tar.gz Third, configure environment variables3.1. Configure MAHOUT environment variable # set Mahout environmentexport mahout_home=/home/yujianxin/mahout/mahout-distribution-0.9export mahout_conf_dir= $MAHOUT _home/confexport path= $MAHOUT _home/conf: $MAHOUT _home/bin: $PATH 3.2, configure the required Hadoop environment variables for MAHOUT # Set Hadoop environmentexport hadoop_home=/home/yujianxin/hadoop/hadoop-1.1.2
Export hadoop_conf_dir= $HADOOP _home/conf
Export path= $PATH: $HADOOP _home/binexport hadoop_home_warn_suppress=not_null Iv. Verifying that the mahout is installed successfullyExecutes the command mahout. If you list some algorithms, you are successful, v. Use of mahout for entry-level use5.1. Start Hadoop5.2, download test data http://archive.ics.uci.edu/ml/databases/ synthetic_control/link synthetic_control.data 5.3, upload test data Hadoop fs-put synthetic_control.data /user/root /testdata5.4 Using the Kmeans clustering algorithm in Mahout, execute the command: Mahout-core Org.apache.mahout.clustering.syntheticcontrol.kmeans.Job takes about 9 minutes to complete clustering. 5.5 View Clustering Results perform Hadoop fs-ls/user/root/output to view clustering results. Live, call it off. Mahout continue to learn ... Related [Mahout Learning Mahout] Recommendation: Mahout Realized machine Learning algorithm--Iteye Blog Use command: mahout-h. The machine learning algorithm implemented in Mahout is shown in the following table:. EM clustering (expected to maximize clustering). Parallel FP growth algorithm. Parallelization of the watchmaker framework. Non-map-reduce algorithm. Extends the Java collections class. Mahout The biggest advantage is based on Hadoop implementation, a lot of previously run on a single-machine algorithm, converted to MapReduce mode, which greatly improved the algorithm can handle the amount of data and processing performance. 0 people have posted messages and punched->> here <<-participate in the discussion. -The software talents are exempt from the language low guarantee paid study in the United States. Apache Mahout 0.8 release, machine Learning Library-Open source Chinese Community latest news Apache Mahout 0.8 released, Apache Mahout is a new open source item developed by Apache Software Foundation (ASF) The main goal is to create some scalable machine learning algorithms that developers can use for free under the license of Apache. The project has grown to its two years and currently has only one public distribution. Mahout contains many implementations, including clusters, classifications, CP and evolutionary programs. In addition, Mahout can be effectively extended to the cloud by using the Apache Hadoop library. This version is primarily a code cleanup prior to release 1.0. -Numerous performance improvements to Vector and Matrix implementations, API's and their iterators (see also MAHOUT-1192, MAHOUT-1202). Mahout Learning mahout Introduction, installation, configuration, the introduction of the program test-CSDN Blog cloud computing recommended article check the Chinese meaning of mahout--the people, and then look at the Mahout logo, well, want to play with the small yellow elephant happy, we have to accompany the man who played with the person like a fool .... (That's him, the mahout on the Elephant's head). Mahout is a powerful data mining tool and a collection of distributed machine learning algorithms, including: implementation, classification, clustering of distributed collaborative filtering called taste. Mahout The biggest advantage is based on Hadoop implementation, a lot of previously run on a single-machine algorithm, converted to MapReduce mode, which greatly improved the algorithm can handle the amount of data and processing performance. Machine learning algorithms implemented in Mahout:. Parallelization of the watchmaker framework. Non-map-reduce algorithm. Extended the Java Collections class. List of machine learning algorithms implemented by Mahout--Iteye blog apache Mahout is Apachesoftware An open source project under the Foundation (ASF), which provides a number of extensible machine learning Domain Classic algorithms, is designed to help developers create smart applications more quickly and easily, and has added support for Apache Hadoop in the latest version of Mahout. Enable these algorithms to run more efficiently in a cloud computing environment. The machine learning algorithms implemented in Mahout are shown in the following table:. Parallelization of the watchmaker framework. Extends the Java collections class. 0 people have posted messages and punched->> here <<-participate in the discussion. -The software talents are exempt from the language low guarantee paid study in the United States. Mahout Introduction--Internet-Iteye blog Mahout is a distribution of machine learning and data mining, unlike other open-source data mining software, which is based on Hadoop, so the advantage of Hadoop is mahout. http://mahout.apache.org/ says scalable refers to the scalability of Hadoop. Mahout implemented some data mining algorithms with map-reduce, and solved the problem of parallel mining. The "solution" here is a preliminary concept, and many algorithms cannot be implemented in parallel with Map-reduce for various reasons. http://www.apache.org/dyn/closer.cgi/mahout/ ), unzip; in the Mahout_home/bin directory, add the following in Mahout:. Mahout deployment Practices-- CSDN Blog Cloud computing recommended article one download Mahout and unzip. Java_home mahout Run the directory where the JDK is to be specified. MAHOUT_JAVA_HOME Specifies that this variable can override the Java_home value. Hadoop_home If configured, run on a Hadoop distributed platform or stand-alone. HADOOP_CONF_DIR Specifies the configuration file directory for Hadoop. Mahout_local If the value of this variable is not NULL, the MAHOUT is run on a single machine. Mahout_conf_dir the path to the MAHOUT configuration file, the default value is $mahout_home/src/conf. mahout_heapsize MAHOUT The maximum heap size that is available at runtime. Changes to the environment variable, added at the end of the file. Mahout one of the practical tutorials--CSDN Blog Cloud computing recommended article Mahout Practical tutorial (i). This paper tries to establish a framework for the readers from the perspective of use, and lays the foundation for the subsequent use of mahout. This article for the original article reproduced please indicate the original URL http://blog.csdn.net/comaple, thank you. The following first gives the source code svn address and the common data set for testing, which you can download and test. Mahout SVN warehouse Address: Http://svn.apache.org/repos/asf/mahout/trunk. Movie Length data address: Http://www.grouplens.org/system/files/ml-100k.zip. 1. mahout Introduction. 2. applied to Recommender systems(Item-based/user-based/slopone). Go Mahout Recommended Algorithm Foundation--The Small Gull blog Mahout recommendation algorithm is divided into the following major categories. 2. Similar user definitions and quantities. 2. The calculation speed is fast when the number of users is low. 1. The similarity based on item. 1.item is even faster when it is less. 2. It is very useful when the external concept of item is easy to understand and obtain. 1 based on the Slopeone algorithm (scoring variance rule). When the item number is very small, it is also effective. You need to limit the number of diffs storage otherwise the memory grows too fast. Based on the support vector machine (the feature of item is represented by a vector, the evaluation value of each dimension). Similar to the implementation based on similar users in genericuserbasedrecommender (based on similar item). The main difference from genericitembasedrecommender is that the weights are calculated differently (but, the weights is not the results of some similarity Metric. Film recommendation system based on Mahout--CSDN blog recommended article Apache Mahout is an open source project under the Apache Software Foundation (ASF), providing some extensible algorithms for the machine learning domain, Designed to help developers create smart applications more quickly and easily. Classical algorithms include clustering, classification, collaborative filtering, evolutionary programming, and so on, and the support for Apache Hadoop has been added to Mahout, enabling these algorithms to run more efficiently in cloud computing environments. Installation of 2.1 jdk1.6.0_21. JDK: http://www.oracle.com/technetwork/java/javase/downloads/index.html The version I'm using is jdk-6u21-linux-i586.bin. Original Mahout Collaborative filtering itembase recommenderjob Source Analysis--mahout supports 2 m/r of jobs for itembase collaborative filtering. Below we analyze the Recommenderjob, the version is mahout-distribution-0.7. SOURCE Package Location: Org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. Recommenderjob the first few stages and itemsimilarityjob are a, but Itemsimilarityjob calculates that the similarity matrix for item is over, and Recommenderjob continues to use the similarity matrix to calculate the top N items that should be recommended to him for each user. Recommenderjob input is also a userid, itemid[, Preferencevalue] format.

Mahout Learning mahout Introduction, installation, configuration, Getting Started program test

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Mahout Learning mahout Introduction, installation, configuration, entry program testing

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support