Hadoop full distribution Model environment building

Last Update:2016-10-21 Source: Internet

Author: User

Tags scp command

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is my first time to build a full distribution model, this article is to refer to the user tutorial, according to my own practice process to organize it. I built it with three virtual machines, each of which is ubuntuserver16.04.1 (64-bit). There are many steps and parameters in the construction process I am still in the study, the specific principle I can not be clear now, and so on after the knowledge to me, I will revise the shortcomings of this article and perfect. At the end of the article there are reference articles, we can comprehensive reference. Set up a group to prepare 1 virtual machines: Three VMs are ubuntuserver16.04.1 (UBUNTU-16.04.1-DESKTOP-AMD64.ISO) operating system, virtual machine hostname (hostname) is MASTER,SLAVE0 respectively, Slave12 Hadoop version: Hadoop-2.7.3.tar.gz3 jdk:jdk-8u101-linux-x64.tar.gz4 Create a new user in each virtual machine, named Hadoop (Note: username can be any name, However, to ensure that the user name of each virtual machine is consistent, so that the virtual machine can be configured to remote password-free login to be successful. Two-mount JDK (same operation on each machine) (1) Download Source: http://www.oracle.com/technetwork/java/ Javase/downloads/jdk8-downloads-2133151.html (2) Place the compressed package into the/USR/LIB/JVM directory (3) Unzip the package: sudo tar-zxvf JDK-8U101-LINUX-X64.TAR.GZ (4) Configuring environment variables to open the shell environment variable profile for the current user Vim ~/.BASHRC add the following information at the end of the file #begin copyexport java_home=/usr/ Lib/jvm/jdk1.8.0_101
Export classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jarexport path= $JAVA _home/bin: $PATH #end copy Note: (1) The blue part of the above configuration information is the path of the JDK folder you extracted, which can be set according to your own situation, if I unzip the installation package jdk-8u101-linux-x64.tar.gz to the/usr/local path, then Java_ HOME configuration changed to export java_home=/usr/local/jdk1.8.0_101 (2) The above configuration applies only to the current user, if you want to make the same settings for all users, that is, if you want all users to be able to use the JDK, you should configure/etc/ BASHRC file, see http://blog.csdn.net/chenchong08/article for the relationship between/ETC/PROFILE,/ETC/BASHRC,~/.BASHRC and ~/.bash_profile /details/7833242 (5) to make the configured environment variable effective: The source ~/.BASHRC (~/.BASHRC file configuration changes without restarting the system immediately after the setting takes effect) (6) Check whether the installation was successful: Java- version# If the installation is successful, you will be prompted with the following information three password-free login settings hadoop work, each node to communicate with each other, under normal conditions of communication between Linux to provide a user name, Password (The purpose is to ensure communication security), if the need for human intervention input password, obviously inconvenient, do this step is to allow each node to automatically pass the security certification, does not affect the normal communication. Make sure the SSH service is installed before you set up password-free login, then directly install the following command, if installed, you will be prompted to install it. sudo apt-get install openssh-server sudo apt-get install ssh3.1 first generate public and private keys on master to log on to the system as Hadoop CD ~/.SSH If there is no. ssh this hidden folder, first execute the following command: SSH localhost in. ssh directory execution: ssh-keygen-t rsa-p "(Note: finally make two single quotes)That is: With the RSA algorithm, generate the public key, the private key pair,-P "" indicates a blank password. After the command runs, the. SSH directory will be generated in the personal home directory with two files Id_rsa (private key), id_rsa.pub (public key) 3.2 imported public key cat. Ssh/id_rsa.pub >>. Ssh/authorized_ keys after the execution, you can test on this machine, with SSH to connect themselves, that is: ssh localhost (or ssh master), if unfortunately prompted to enter a password, the description has not worked, there is a key operation chmod 600. Ssh/authorized_keys (modify file permissions, otherwise does not work) then test under SSH localhost, if you do not need to enter the password, the connection is successful, indicating OK, a machine has been done. 3.3 Generate public key, key on other machine, and copy public key file to Mastera) log in to other two machines slave0, slave1, execute: ssh-keygen-t rsa-p "Generate public key, key B) as Hadoop Then, using the SCP command, the public key file is issued to master (that is, the machine that has just been done) SLAVE0: SCP. ssh/id_rsa.pub [Email protected]:/home/hadoop/id_rsa_0. PUB&NBSP;SLAVE1: SCP. ssh/id_rsa.pub [email protected]:/home/hadoop/id_rsa_1.pub After the two lines have been executed, go back to master and look at the /home/hadoop directory, there should be two new files id_rsa_0.pub, Id_rsa_1.pub, and then on master, import these two public keys cat id_rsa_0.pub >>. ssh/ Authorized_keys cat id_rsa_1.pub >> .ssh/authorized_keys So, on the master machine, there is a public key for all 3 machines. After completing this step, you can only implement master password telnet to slave0 and slave1, and the single two slave nodes (slave0,slave1) cannot be password-free to log on to the primary node master. We need to do the next step. Configuration 3.4 The "most complete" public key on master, copy to other machine a) to remain on master, &NBSP;SCP. Ssh/authorized_keys [Email protected]:/home/hadoop/.ssh/authorized_keys &NBSP;SCP. Ssh/authorized_keys [email protected]:/home/hadoop/.ssh/authorized_keys b) modify other machines authorized The permissions of the _keys file (if the Hadoop user has permission 6 or 7 for Authorized_keys, this step is not necessary) &NBSP;SLAVE0 and the slave1 machine, execute the command chmod. ssh/ authorized_keys3.5 authentication on each virtual machine, with the hostname authentication of other SSH machines, if the connection is successful without a password, indicates OK

Summary: This step is very important, the main idea is to generate a public key, a private key on each node, and then release the public key to all other nodes. RSA algorithm is non-symmetric encryption algorithm, only the public key published, as long as the private key is not leaked, or can not decrypt, so security is still guaranteed.

If this step fails, according to my personal experience, mostly a permissions issue, check if Hadoop has sufficient permissions, it is recommended that Hadoop be added to the Sudoers list and the root user group. In addition, here are also some of the reasons for the failure of SSH password settings, please move to SSH password-free setup Failure reason summary Four install Hadoop we installed Hadoop directly under the user root directory (/home/hadoop/) 4.1 Directly unzip the Hadoop compression package to the root directory tar -ZXVF hadoop-2.7.3.tar.gz after decompression completed in/home/hadoop/will be more than a hadoop-2.7.3 folder 4.2 modify the configuration file a total of seven files need to be configured:

$HADOOP _home/etc/hadoop/hadoop-env.sh

$HADOOP _home/etc/hadoop/yarn-env.sh

$HADOOP _home/etc/hadoop/core-site.xml

$HADOOP _home/etc/hadoop/hdfs-site.xml

$HADOOP _home/etc/hadoop/mapred-site.xml

$HADOOP _home/etc/hadoop/yarn-site.xml

$HADOOP _home/etc/hadoop/slaves

where $hadoop_home represents the HADOOP root, the default in this article is/home/hadoop/hadoop-2.7.3 (1) Configuration file 1:hadoop-env.sh This file is a configuration of the Hadoop running basic environment and needs to be modified for the location of the Java Virtual machine. Therefore, modify the Java_home value in this file to the native installation location (for example, Export java_home=/usr/lib/jvm/java-1.7.0) (2) Configuration file 2:yarn-env.sh The file is the configuration of the Yarn Framework runtime environment, and you also need to modify the location of the Java Virtual machine. In this file, modify the Java_home value to the native installation location (for example, export java_home=/usr/lib/jvm/ java-1.7.0) (3) configuration file 3:slaves The file contains all slave node information, in this case, write the following (host name of the hosts slave): Slave0slave1 (4) configuration file 4:core-site.xml< configuration><property><name>hadoop.tmp.dir</name><value>/home/hadoop/tmp</ Value></property><property><name>fs.default.name</name><value>hdfs://master :9000</value></property></configuration> This is the core configuration file for Hadoop, which needs to be configured for these two properties, Fs.default.name is configured with the name of the HDFS system for Hadoop, located at Port 9000 of the host, and Hadoop.tmp.dir configured with the root location of the TMP directory for Hadoop. There is no location in the file system, so create a new one with the mkdir command (I created the TPM folder in the user root directory). For details about the core-site.xml parameters, see: http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/ Core-default.xml (5) configuration file 5:hdfs-Site.xml<configuration><property> <name>dfs.http.address</name> <value >master:50070</value></property><property> <name> Dfs.namenode.secondary.http-address</name> <value>master:50090</value></property> <property> <name>dfs.replication</name> <value>1</value></property ><property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/ Dfs/name</value></property><property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/dfs/data</value></property><property> <name> Dfs.webhdfs.enabled</name> <value>true</value></property></configuration> Note: The paths/dfs/name and/dfs/data need to be created manually; Dfs.http.address is configured with the HTTP access location for HDFs, and the number of copies of the file block is configured in Dfs.replication, which is generally less than the number of slave machines. The hdfs-site.xml of namenode is that the Dfs.webhdfs.enabled property must be set to TRue, otherwise you will not be able to use Webhdfs liststatus, listfilestatus, etc. need to list the file, folder status commands, because this information is saved by the Namenode. Access Namenode HDFs using port 50070, Access Datanode Webhdfs using 50075 ports. Access files, folder information using Namenode IP and 50070 ports, access to file content or open, upload, modify, download and other operations using Datanode IP and 50075 ports. To do all of the WEBHDFS operations using Namenode's IP and port without differentiating the ports, you need to set dfs.webhdfs.enabled in Hefs-site.xml to true on all datanode. For a detailed description of the hdfs-site.xml parameters, see: http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/ Hdfs-default.xml (6) configuration file 6:mapred-site.xml<configuration><property> <name> mapred.job.tracker</name> <value>master:9001</value></property><property> <name>mapred.map.tasks</name> <value>20</value></property>< property> <name>mapred.reduce.tasks</name> <value>4</value></property ><property> <name>mapreduce.framework.name</name> <value>yarn</value ></property><property> <name>mapreduce.jobhistory.address</name> <value>Master:10020</value></property>< property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888 </value></property></configuration> This is the configuration of the MapReduce task, Because hadoop2.x uses the yarn framework, to implement a distributed deployment, you must configure yarn under the Mapreduce.framework.name property. Mapred.map.tasks and Mapred.reduce.tasks are the task numbers for map and reduce respectively, and as for what is map and reduce, refer to the other data for more information. Other properties are port configurations for some processes, all of which are configured under a host. For detailed parameters of Mapred-site.xml, see: http://hadoop.apache.org/docs/r2.7.3/hadoop-mapreduce-client/ Hadoop-mapreduce-client-core/mapred-default.xml (7) configuration file 7:yarn-site.xml<configuration><property> <name>yarn.resourcemanager.address</name> <value>master:8032</value></ property><property> <name>yarn.resourcemanager.scheduler.address</name> < value>master:8030</value></property><property> <name> Yarn.resourcemanager.webapp.address</name> <value>Master:8088</value></property><property> <name> yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value></ property><property> <name>yarn.resourcemanager.admin.address</name> <value >Master:8033</value></property><property> <name>yarn.nodemanager.aux-services </name> <value>mapreduce_shuffle</value></property><property> <name >yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value> Org.apache.hadoop.mapred.shufflehandler</value></property></configuration> the file is a yarn frame configuration, The main task is to start location 4.3 to copy the configured Hadoop directly to the other slave nodes (the following commands are performed on the Master master node) scp–r ~/hadoop-2.7.3 [Email protected]:~/scp–r ~/ hadoop-2.7.3 [email protected]:~/ 4.4 Boot verification (1) format Namenode (execute the following command under the Hadoop installation folder) Bin/hdfs Namenode-forma (2) Start hdfssbin/start-dfs.sh at this time in MAThe processes running on the ster are: Namenode secondarynamenode slave0 and slave1 processes running above are: Datanode (3) Start yarnsbin/ Start-yarn.sh the processes running on master at this time are: Namenode secondarynamenode resourcemanager slave0 and Slave1 the processes running above are: Datanode NodeManager (4) Check Startup results view cluster Status: Bin/hdfs dfsadmin–report View File Block composition: Bin/hdfs fsck/-files-blocks View hdfs:http:// 192.168.56.109:50070 (host IP) view rm:http://192.168.56.109:8088 (host IP) 5 reference article (1)/HTTP/ Www.aboutyun.com/thread-11824-1-1.html (2) http://jingyan.baidu.com/article/27fa73269c02fe46f9271f45.html (3) HTTP ://www.cnblogs.com/yjmyzz/p/4280069.html (4) http://www.cnblogs.com/yjmyzz/p/4481720.html (5) http://blog.csdn.net /m1213642578/article/details/52468829?locationnum=3 (6) http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist /hadoop-hdfs/extendedattributes.html (7) http://blog.csdn.net/shoubuliaolebu/article/details/43575027 (8)/HTTP// blog.csdn.net/yangjl38/article/details/7583374

Hadoop full distribution mode environment Setup

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More