Hadoop mahout Data Mining Practice (algorithm analysis, Project combat, Chinese word segmentation technology)

Source: Internet
Author: User

Mahout Introduction

Mahout is an open source project under the Apache software Foundation (ASF),

Provides a number of extensible machine learning Domain Classic algorithm implementations designed to help developers create intelligent applications more quickly and easily

Mahout Related Resources

? Mahout Home: http://mahout.apache.org/

? Mahout Latest Version 0.8 downloads: http://mirrors.hust.edu.cn/apache/mahout/0.8/

Use mahout-distribution-0.8.tar.gz can try to run, the source code in the mahout-distribution-0.8-src.tar.gz

? Mahout Brief Installation steps:

If you do not need to modify the source code, just try to run, please do not have to install Maven (online Many tutorials will have this detour, please skip), for specific reference to the following tutorials

Http://www.hadoopor.com/thread-983-1-1.html

If you need to be able to modify the source code and recompile the package, you need to install MAVEN, please refer to the culture: http://wenku.baidu.com/view/dbd15bd276a20029bd642d55.html

? Mahout Professional Tutorial: Mahout in Action Http://yunpan.taobao.com/share/link/R56BdLH5O

Note: Published in 2012, corresponding to Mahout version 0.5, is currently mahout the latest book books. At present, only English version, but a bit, the inside vocabulary is basically a computer-based vocabulary, and map and source code, is suitable for reading.

? IBM mahout Introduction: http://www.ibm.com/developerworks/cn/java/j-mahout/

Note: Chinese version, update is time for 09, but inside for Mahout elaborated more comprehensive, recommended reading, especially the final book list, suitable for in-depth understanding

Course Introduction

This course covers the following topics:

1. Mahout Data Mining Tools

2, Hadoop implementation of the comprehensive recommendation system, involving the mapreduce, pig and mahout comprehensive combat

Courses for people

1, this course is suitable for a certain Java basic knowledge, database and SQL statements have a certain understanding of the skilled use of Linux system technical staff, especially for those who want to change jobs or seek a high-paying career

2, preferably have greenplum Hadoop, Hadoop2.0, YARN, Sqoop, Flumeavro, Mahout and other Big Data Foundation, learn the North wind course "Greenplum Distributed database development Introduction to Mastery", " Comprehensive in-depth greenplum Hadoop Big Data analysis platform, "Hadoop2.0, yarn in layman", "MapReduce, HBase Advanced Ascension", "MapReduce, HBase Advanced Promotion" for the best.

Course Outline

Mahout Data Mining Tools (10 hours)

Data mining concepts, system composition

Common methods and algorithms for data Mining (regression analysis, classification, clustering, etc.)

Data Mining analysis tools

Mahout supported Algorithms

Mahout origin and characteristics

Mahout installation, configuration and testing

Actual combat: Mahout K-means Cluster analysis

Mahout implementation of canopy algorithm

Mahout Implementation Classification algorithm

Actual combat: Mahout Logistic Regression classification prediction

Actual combat: Mahout naive Bayesian classification

Concept and classification of recommendation systems

Concept, classification and application of collaborative filtering recommendation algorithm

Actual combat: Implementation of Mahout-based film recommendation system

Hadoop Integrated Combat-text mining project (7 hours)

The concept of text mining and its application scenario

Project background

Project Flow

Chinese Word segmentation technology

The use of Cook looked through word breaker

Design and implementation of MapReduce parallel Word segmentation Program

Pig Partition Data Set

Mahout constructing naive Bayesian text classifier

Model application-Calculating user preference categories

Hadoop mahout Data Mining Practice (algorithm analysis, Project combat, Chinese word segmentation technology)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.