Step by step teach you how to install and configure Hadoop multi-node Clusters
1Cluster deployment
Hadoop 1.1Introduction
Hadoop is an open-source distributed computing platform under the Apache Software Foundation. Take Hadoop Distributed File System HDFS (Hadoop Distributed Filesystem) and MapReduce (open-source implementation of Google MapReduce)The core Hadoop provides users with a distributed infrastructure with transparent underlying system details.
Hadoop clusters can be divided into Master and Salve roles. OneHDFS clusters are composed of one NameNode and several DataNode. NameNode acts as the master server to manage the file system namespace and client access to the file system; DataNode in the cluster manages the stored data. The MapReduce framework is composed of a single JobTracker running on the master node and a TaskTracker running on each slave node. The master node schedules all tasks of a job, which are distributed across different slave nodes. The master node monitors their execution and re-executes the previous failed tasks. The slave node is only responsible for the tasks assigned by the master node. When a Job is submitted, after JobTracker receives the submitted Job and configuration information, it will distribute the configuration information to the slave node, schedule the task, and monitor the execution of TaskTracker.
From the above introduction, HDFS and MapReduce constitute the core of the Hadoop distributed system architecture.