How do you start learning Hadoop when you want to switch to big data?

Source: Internet
Author: User
Tags hadoop ecosystem

Learning Big Data first to understand the big Data learning route, first make clear what to learn first, then learn what, the big learning Framework know, the rest is step by step down from the foundation of the beginning of learning. Here is a popular learning route: the Hadoop ecosystem--strom--spark--algorithm. So learning Hadoop is the first step, and here's a statement that you need a Java foundation before you learn Hadoop, because the bottom of Hadoop is all written in Java, and you need to learn to use Linux's basic shell commands at the system level. Because you learn Hadoop you have to install Hadoop first. The position of Hadoop in the Big Data technology system is critical, and Hadoop is the foundation of big Data technology, and the level of mastery of Hadoop fundamentals will determine how far the big Data technology path can go. Big Data Learning Group 142973723

Let's talk about how to start learning Hadoop. The idea of this article is: To install and deploy Apache hadoop2.x version as the main line, to introduce the architecture of hadoop2.x, the work principle of each module, technical details. Installation is not an end, and Hadoop is the purpose of installation.

Hadoop Environment Setup

Part I: Linux environment installation

Hadoop is run on Linux, and although tools can also be run on Windows, it is recommended to run on Linux systems, the first part describes the installation of the Linux environment, configuration, Java JDK installation, and so on.

Part II: Hadoop native mode installation

Hadoop native mode is just for local development debugging, or for a quick installation experience with Hadoop, which is part of a simple introduction.

Part III: Hadoop pseudo-distributed mode installation

Learning Hadoop is typically done in pseudo-distributed mode. This pattern is the various modules running Hadoop on a single machine, pseudo-distributed means that although each module is running separately on each process, it is only running on one operating system and not really distributed.

Part IV: Fully distributed installation

Fully distributed mode is the mode of production environment, Hadoop runs on the server cluster, the production environment will generally do ha, to achieve high availability.

Part V: Hadoop ha Installation

Ha refers to high availability, in order to solve the Hadoop single point of failure problem, the production environment generally do ha deployment. This section describes how to configure the high availability of the hadoop2.x and briefly describes how ha works.

The installation process will be interspersed with a brief introduction of the knowledge involved. Hope to be helpful to everyone.

The above environment is built only to talk about the framework, due to the limited time, how to operate the specific message to communicate with me.

When the environment is set up, then try to write MapReduce to run the package. When you have no questions about the programming aspects of Hadoop, you can try to gain insight into the core ideas of MapReduce, especially map,shuffle,join,reduce.

For beginners to encounter a lot of problems, this is normal, but the problem is not scary, as long as a way to solve their own ability will be 1.1 points of improvement, here I wish in the big data on the road Qiuxian partners learn something. Big Data Learning Group 142973723

How do you start learning Hadoop when you want to switch to big data?

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.