Learning Big Data first to understand the big Data learning route, first make clear what to learn first, then learn what, the big learning Framework know, the rest is step by step down from the foundation of the beginning of learning. Here is a popular learning route: the Hadoop ecosystem--strom--spark--algorithm. So learning Hadoop is the first step, and here's a statement that you need a Java foundation before you learn Hadoop, because the bottom of Hadoop is all written in Java, and you need to learn to use Linux's basic shell commands at the system level. Because you learn Hadoop you have to install Hadoop first. The position of Hadoop in the Big Data technology system is critical, and Hadoop is the foundation of big Data technology, and the level of mastery of Hadoop fundamentals will determine how far the big Data technology path can go. Big Data Learning Group 142973723
Let's talk about how to start learning Hadoop. The idea of this article is: To install and deploy Apache hadoop2.x version as the main line, to introduce the architecture of hadoop2.x, the work principle of each module, technical details. Installation is not an end, and Hadoop is the purpose of installation.
Hadoop Environment Setup
Part I: Linux environment installation
Hadoop is run on Linux, and although tools can also be run on Windows, it is recommended to run on Linux systems, the first part describes the installation of the Linux environment, configuration, Java JDK installation, and so on.
Part II: Hadoop native mode installation
Hadoop native mode is just for local development debugging, or for a quick installation experience with Hadoop, which is part of a simple introduction.
Part III: Hadoop pseudo-distributed mode installation
Learning Hadoop is typically done in pseudo-distributed mode. This pattern is the various modules running Hadoop on a single machine, pseudo-distributed means that although each module is running separately on each process, it is only running on one operating system and not really distributed.
Part IV: Fully distributed installation
Fully distributed mode is the mode of production environment, Hadoop runs on the server cluster, the production environment will generally do ha, to achieve high availability.
Part V: Hadoop ha Installation
Ha refers to high availability, in order to solve the Hadoop single point of failure problem, the production environment generally do ha deployment. This section describes how to configure the high availability of the hadoop2.x and briefly describes how ha works.
The installation process will be interspersed with a brief introduction of the knowledge involved. Hope to be helpful to everyone.
The above environment is built only to talk about the framework, due to the limited time, how to operate the specific message to communicate with me.
When the environment is set up, then try to write MapReduce to run the package. When you have no questions about the programming aspects of Hadoop, you can try to gain insight into the core ideas of MapReduce, especially map,shuffle,join,reduce.
For beginners to encounter a lot of problems, this is normal, but the problem is not scary, as long as a way to solve their own ability will be 1.1 points of improvement, here I wish in the big data on the road Qiuxian partners learn something. Big Data Learning Group 142973723
How do you start learning Hadoop when you want to switch to big data?