Mahout Introduction
Mahout is an open source project under the Apache software Foundation (ASF),
Provides a number of extensible machine learning Domain Classic algorithm implementations designed to help developers create intelligent applications more quickly and easily
Mahout Related Resources
? Mahout Home: http://mahout.apache.org/
? Mahout Latest Version 0.8 downloads: http://mirrors.hust.edu.cn/apache/mahout/0.8/
Use mahout-distribution-0.8.tar.gz can try to run, the source code in the mahout-distribution-0.8-src.tar.gz
? Mahout Brief Installation steps:
If you do not need to modify the source code, just try to run, please do not have to install Maven (online Many tutorials will have this detour, please skip), for specific reference to the following tutorials
Http://www.hadoopor.com/thread-983-1-1.html
If you need to be able to modify the source code and recompile the package, you need to install MAVEN, please refer to the culture: http://wenku.baidu.com/view/dbd15bd276a20029bd642d55.html
? Mahout Professional Tutorial: Mahout in Action Http://yunpan.taobao.com/share/link/R56BdLH5O
Note: Published in 2012, corresponding to Mahout version 0.5, is currently mahout the latest book books. At present, only English version, but a bit, the inside vocabulary is basically a computer-based vocabulary, and map and source code, is suitable for reading.
? IBM mahout Introduction: http://www.ibm.com/developerworks/cn/java/j-mahout/
Note: Chinese version, update is time for 09, but inside for Mahout elaborated more comprehensive, recommended reading, especially the final book list, suitable for in-depth understanding
Course Introduction
This course covers the following topics:
1. Mahout Data Mining Tools
2, Hadoop implementation of the comprehensive recommendation system, involving the mapreduce, pig and mahout comprehensive combat
Courses for people
1, this course is suitable for a certain Java basic knowledge, database and SQL statements have a certain understanding of the skilled use of Linux system technical staff, especially for those who want to change jobs or seek a high-paying career
2, preferably have greenplum Hadoop, Hadoop2.0, YARN, Sqoop, Flumeavro, Mahout and other Big Data Foundation, learn the North wind course "Greenplum Distributed database development Introduction to Mastery", " Comprehensive in-depth greenplum Hadoop Big Data analysis platform, "Hadoop2.0, yarn in layman", "MapReduce, HBase Advanced Ascension", "MapReduce, HBase Advanced Promotion" for the best.
Course Outline
Mahout Data Mining Tools (10 hours)
Data mining concepts, system composition
Common methods and algorithms for data Mining (regression analysis, classification, clustering, etc.)
Data Mining analysis tools
Mahout supported Algorithms
Mahout origin and characteristics
Mahout installation, configuration and testing
Actual combat: Mahout K-means Cluster analysis
Mahout implementation of canopy algorithm
Mahout Implementation Classification algorithm
Actual combat: Mahout Logistic Regression classification prediction
Actual combat: Mahout naive Bayesian classification
Concept and classification of recommendation systems
Concept, classification and application of collaborative filtering recommendation algorithm
Actual combat: Implementation of Mahout-based film recommendation system
Hadoop Integrated Combat-text mining project (7 hours)
The concept of text mining and its application scenario
Project background
Project Flow
Chinese Word segmentation technology
The use of Cook looked through word breaker
Design and implementation of MapReduce parallel Word segmentation Program
Pig Partition Data Set
Mahout constructing naive Bayesian text classifier
Model application-Calculating user preference categories
Hadoop mahout Data Mining Practice (algorithm analysis, Project combat, Chinese word segmentation technology)