Let's explain the configuration of the Hadoop cluster.
This article assumes that the reader has the basis for a Hadoop stand-alone configuration and that the same parts are not restated.
Take three test machine as an example to build a small cluster, the IP of three machines is
192.168.200.1;192.168.200.2;192.168.200.3
CYGWIN,JDK is installed with a single-machine pseudo-distributed deployment of Hadoop under Windows (1), which is skipped here.
1. Configure the Hosts
Add the following records to the hosts file of the three machines:
192.168.200.1 HADOOP1 #master Namenode
192.168.200.2 HADOOP2 #datanode
192.168.200.3 HADOOP3 #datanode
2. Configure Hadoop on the HADOOP1
hadoop-env.sh, Core-site.xml, Hdfs-site.xml, mapred-site.xml Configure single-machine pseudo-distributed deployment with Hadoop under Windows (1),
Just change the hostname of localhost to hadoop1 here to skip over.
Configuration Masters File: Hadoop1
Configuration slaves file: Hadoop2 hadoop3
3. Then copy the Hadoop folder on the HADOOP1 to the HADOOP2, HADOOP3 machine;
If the JDK installation directory is different, you need to change the configuration path for Java_home in hadoop-env.sh.
4, configure no password login
Run the command Ssh-keygen on three machines, then copy the id_rsa.pub files on the HADOOP1 and Hadoop2 to the HADOOP1,
and import the id_rsa.pub on the three machines into the Authorized_keys file, and the Authorized_keys file is copied to the HADOOP2,HADOOP3.
5, in HADOOP1 first format namenode format,
Hadoop Namenode-format
Then run the command: start-all.sh to start the entire cluster.
Using the JPS command to check if the Hadoop process on master and slave started successfully, the JPS command checks the Namenode,jobtracker process on master.
Check the datanode,tasktracker process on the slave.
6, integration with Myeclipes with Windows Hadoop single-machine pseudo-distributed Deployment (3).