This article is intended to provide the most basic, can be used in the production environment of Hadoop, HDFS distributed environment of the building, Self is a summary and collation, but also to facilitate the new learning to use.
first, the basic environmentBefore installing Hadoop on Linux, you need to install two programs first:
1.1 Installation Instructions1. JDK 1.6 or later (the installation described in this article is jdk1.7);2. SSH (Secure Shell Protocol), recommended to install OpenSSH. The following is a brief description of the reasons for installing the two programs: 1. Hadoop is developed in Java, and Hadoop's compilation and MapReduce operations require the use of JDK. 2. Hadoop requires SSH to launch daemons for each host in the Salve list, so SSH must also be installed, even if the pseudo-distributed version is installed (because Hadoop does not differentiate between clustered and pseudo-distributed). For pseudo-distributed, Hadoop uses the same processing as the cluster, which starts the process on the hosts documented in the file conf/slaves in sequence, except that salve is localhost (i.e. itself) in pseudo-distributed, so for pseudo-distributed Hadoop, SSH is a must.
installation and configuration of the 1.1 JDK1, upload the compressed package I am using the WinSCP tool to upload jdk-7u76-linux-x64.tar.gz compression Package 2, extract the compressed package TAR-ZXVF jdk-7u76-linux-x64.tar.gz3, move the extracted directory to/usr/ Local directory mv/lutong/jdk1.7.0_76//USR/LOCAL/4, configure environment variables vim/etc/profile5, reload/etc/profile, make configuration effective Source/etc/profile6, See if the configuration is in effect echo $PATHjava-version appears as the above message indicates that it has been configured. Second, host configuration because I build a Hadoop cluster contains three machines, so you need to modify the configuration of the hosts file for each machine, the command is as follows: vim/etc/hosts If you do not have sufficient permissions, you can switch the user to root. Three machines with unified content Add the following host configuration: You can modify the server name to master, Slave1, Slave2hostname master via hostname
third, the installation and configuration of Hadoop
3.1 Creating a file directoryFor ease of administration, Namenode, Datanode, and temporary files for the master HDFs are created under the user directory:/data/hdfs/name/data/hdfs/data/data/hdfs/ TMP then copies these directories to the same directory as Slave1 and Slave2 through the SCP command.
3.2 Downloads
First, download Hadoop from the Apache website (http://www.apache.org/dyn/closer.cgi/hadoop/common/) and select the recommended download image (HTTP// mirrors.hust.edu.cn/apache/hadoop/common/), I select the hadoop-2.6.0 version and download it to the Master machine's/data directory using the following command:
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gzThen use the following command to
hadoop-2.7.1.tar.gzUnzip to/data directory TAR-ZXVF hadoop-2.7.1.tar.gz
3.3 Configuring environment variablesGo back to the/data directory, configure the Hadoop environment variable, and the command is as follows: Vim/etc/profile add the following in/etc/profile immediately to make the HADOOP environment variable take effect, execute the following command: source/etc/ The profile then uses the Hadoop command, and the discovery can be prompted, indicating that the configuration is in effect.
3.4 Configuration of HadoopGo to hadoop-2.7.1 configuration directory: Cd/data/hadoop-2.7.1/etc/hadoop modify Core-site.xml, Hdfs-site.xml, Mapred-site.xml and Yarn-site.xml files. 3.4.1 Modify Core-site.xmlvim core-site.xml3.4.2 Modify vim hdfs-site.xmlvim hdfs-site.xml3.4.3 modify Vim Mapred-site.xmlvim mapred-site.xml3.4.4 Modify Vim Yarn-site.xmlvim Yarn-site.xml because we have configured the JAVA_HOME environment variables, So hadoop-env.sh and yarn-env.sh these two files without modification, because the configuration is: Export Java_home=${java_home} Finally, Copy the entire hadoop-2.7.1 folder and its subfolders to the same directory as slave1 and slave2 using SCP: scp-r/data/hadoop-2.7.1 [email protected]:/datascp-r/data/ hadoop-2.7.1 [email Protected]:/data Five, run Hadoop5.1 format namenode execute command: Hadoop namenode-format execution process such as:
The final execution results are as follows:
5.2 Start Namenode Execute command as follows:/data/hadoop-2.7.1/sbin/hadoop-daemon.sh start Namenode
Execute the JPS command on master and get the following result:
5.3 Start DatanodeExecute the command as follows:/data/hadoop-2.7.1/sbin/hadoop-daemons.sh start Datanode execution results are as follows:
Master
Slave1
Slave2
Indicates that the Datanode on Slave1 and Slave2 are functioning properly. The above methods of starting Namenode and Datanode can be replaced with start-dfs.sh scripts:
5.4 Running YarnRunning yarn also has a similar approach to running HDFS. Start ResourceManager using the following command: We will not go into the above-mentioned way, to see the simple start-up method using start-yarn.sh: Perform JPS on master:
Indicates that the ResourceManager is operating normally.
Perform JPS on both slave, and you will see NodeManager running normally, such as:
Vi. Test hadoop6.1 Test HDFs
The final test is to see if the Hadoop cluster is performing properly and the command to test is as follows:
6.2 Test yarn can access yarn's management interface and verify yarn as shown in:
6.3 Test MapReduce
Do not want to write MapReduce code. Fortunately, the Hadoop installation package provides a ready-made example in the Share/hadoop/mapreduce directory of Hadoop. Running Examples:
Vii. Configuring the problems encountered in running Hadoop
7.1 java_home not set? When you start the report:
You need/data/hadoop-2.7.1/etc/hadoop/hadoop-env.sh, add java_home path
7.2 ncompatible Clusterids
Since configuring a Hadoop cluster is not an overnight operation, it is often accompanied by a ... --and the process of running, so the Datanode will not start, often after viewing the log, the following issues are found:
This issue occurs because there is a different cluster ID each time the Hadoop cluster is started, so you need to clean up the data in the database directory on the Boot failure node (such as the/home/jiaan.gja/hdfs/data I created).
7.3 Nativecodeloader's warning
When testing Hadoop, a careful person might see the warning message in:
Construction of Hadoop2.7.1 cluster environment under Linux (Super detailed version)