Hadoop Application Development Practice (Flume application development, search engine algorithms, Pipes, clustering, PageRank algorithms)

Source: Internet
Author: User

Hadoop is one of the hottest technologies of the 2013, through the north wind nets Robby teacher < in-depth Hadoop practical development >,

What is Hadoop and why do you learn Hadoop?


Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution. Leverage the power of the cluster for high-speed operations and storage. Hadoop implements a distributed filesystem (Hadoop Distributed File System), referred to as HDFs. HDFs is characterized by high fault tolerance and is designed to be deployed on inexpensive (low-cost) hardware. And it provides high transfer rates (HI throughput) to access application data for applications with very large datasets (large data set). HDFs relaxes (relax) POSIX requirements (requirements) so that data in the form of a stream can be accessed (streaming access) in the file system.
Hadoop is a software framework that enables distributed processing of large amounts of data. But Hadoop is handled in a way that is reliable, efficient, and scalable. Hadoop is reliable because it assumes that compute elements and storage will fail, so it maintains multiple copies of working data, ensuring that it can redistribute processing against failed nodes. Hadoop is efficient because it works in parallel and speeds up processing by parallel processing. Hadoop is also scalable and can handle petabytes of data. In addition, Hadoop relies on community server, so it costs less and can be used by anyone.
Hadoop comes with a framework written in the Java language, so it's ideal to run on a Linux production platform. This course is explained by using a Linux platform for simulation, based on real-world scenarios.

Highlight one: Advanced technology, Classic applications


The new course features a new platform: Oracle VirtualBox + CentOs + The latest Hadoop 1.x stable release, providing a more stable, secure platform that is closer to the real world of enterprise-class applications. Course complete, detailed introduction and implementation of a number of Hadoop Classic applications: Search engine Auto-recommendation, Friend Smart recommendation, Shortest path algorithm, PageRank. They represent the most successful and most widely used Hadoop application case, especially PageRank, which is the magic weapon that Google relies on, until now it is still the most important technology of Google search engine. These cases are a great help in learning how to develop successful Hadoop applications.

Highlight two: practical, comprehensive and in-depth

Non-Java MapReduce applications are one of the key components of Hadoop, and streaming and pipes are important technologies for porting existing applications to the Hadoop platform. In addition, Hadoop provides a huge amount of data processing solutions, but how to do raw data collection, Apache Flume gives the answer, the new Apache flume solution, more simple, practical and efficient. The course also details how to manage the nodes in the cluster effectively with the tools provided by Hadoop, all of which are a prerequisite knowledge for the Hadoop manager.

Highlight three: Lecturer-rich experience in cloud platform operation of Telecom Group

Lecturer Robby has a wealth of experience in the telecommunications group, is currently responsible for all aspects of the cloud platform, and has many years of in-house training experience. The lecture content is completely close to the enterprise demand, not on paper.


1th Chapter: (4 hours)

Search Engine Auto-recommendation (4 hours)


> New platform Build: Use VirtualBox to create a CentOS virtual machine and use RPM to install and use Hadoop


> Installation and basic use of in-memory database Redis


> Search engine Automatic Recommendation algorithm explained


> Using jQueryUI + Ajax + Redis to build front and rear frames


> Using map reduce to implement data statistics algorithms


> Customize the map reduce output to write data directly to the Redis memory database


2nd Chapter: (3 hours)

Friends Smart Recommendation (3 hours)


> Detailed explanation of application background and algorithms


> Web Framework build Struts2 + Redis


> Using map reduce in conjunction with Redis to implement a potential buddy lookup algorithm


> Complete application Logic implementation: Front-end friends attention, background potential friends to find, and then to the foreground potential friends recommend


3rd Chapter: (2 hours)

Hadoop Streaming (2 hours)


> Non-Java map reduce implementations


> How to implement a map reduce task using scripting language

> How to implement a map reduce task using C language


4th Chapter: (1 hours)

Hadoop Pipes (1 hours)


> Based on Hadoop Pipes, using C + + for map reduce tasks


5th Chapter: (2 hours)

Apache Flume Basics (2 hours)


> Apache Flume Overview


> Flume Agent Detailed introduction and use


> in-depth flume sub-modules: Source, sink and channel usage


6th chapter: (3 hours)

Flume Practice and distributed Applications (3 hours)


Combination of > and search Engine Auto-recommendation application


> Build Distributed flume Applications


7th chapter: (5 hours)

Hadoop implementation of the shortest path algorithm (approx. 5 hours)


> Shortest Path Algorithm Introduction


> How to implement the shortest path algorithm using MapReduce


> Find the shortest route between any city bus stops


8th chapter: (3 hours)

Hadoop Cluster Management (approx. 3 hours)


> Log Management for Hadoop


> Adding and removing Hadoop nodes dynamically


> Namenode and DATANODE directory structure introduction


> Data Security for HDFs: Fsimage and Editlog

> The use of Hadoop management tools Dfsadmin and FSCK


9th chapter: (5 hours)

Hadoop implementation of the PageRank algorithm (approx. 5 hours)


> Google's fame: PageRank algorithm Introduction


> How to implement the PageRank algorithm using Hadoop

Hadoop Application Development Practice (Flume application development, search engine algorithms, Pipes, clustering, PageRank algorithms)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.