Hadoop series HDFS (Distributed File System) installation and configuration

Source: Internet
Author: User
Tags hdfs dfs

Hadoop series HDFS (Distributed File System) installation and configuration
Environment Introduction:
IP node
192.168.3.10 HDFS-Master
192.168.3.11 hdfs-slave1
192.168.3.12 hdfs-slave2
1. Add hosts to all machines
192.168.3.10 HDFS-Master
192.168.3.11 hdfs-slave1
192.168.3.12 hdfs-slave2
# Description
// The host name cannot contain underscores or special symbols. Otherwise, many errors may occur.
2. Configure SSH password-less login to the machine
Perform the following operations on 192.168.3.10:
/Usr/bin/ssh-keygen-T RSA
Ssh-copy-ID-I/root/. Ssh/id_rsa.pub 192.168.3.10 # Enter yes and enter the root password of 192.168.3.10
Ssh-copy-ID-I/root/. Ssh/id_rsa.pub 192.168.3.11 # Enter yes and the root password of 192.168.3.11
Ssh-copy-ID-I/root/. Ssh/id_rsa.pub 192.168.3.12 # Enter yes and enter the root password of 192.168.3.12
SSH 192.168.3.10 # test whether a password is not required to enter 192.168.3.10. If no password is required, the logon is successful.
SSH 192.168.3.11 # test whether a password is not required to enter 192.168.3.11. If no password is required, the logon is successful.
SSH 192.168.3.12 # test whether to enter 192.168.3.12 without a password. If not, it indicates that the login is successful without a password.
3. Install JDK software and configure the JDK environment on all machines
CD/root/soft
Rpm-ihv jdk-7u51-linux-x64.rpm
3.1 open the/etc/profile file and add the following content
Export java_home =/usr/Java/Default
Export classpath =.: $ java_home/JRE/lib/RT. jar: $ java_home/lib/dt. jar: $ java_home/lib/tools. Jar
Export Path = $ path: $ java_home/bin
3.2 refresh Environment Variables
Source/etc/profile
Java-version # Check whether the installation is successful
4. Install hadoop software on all machines
4.1hadoop Installation
MV/root/soft/hadoop-2.4.0.tar.gz/usr/local/
CD/usr/local/
Tar zxvf hadoop-2.4.0.tar.gz
Music hadoop-2.4.0 hadoop
4.2 open the/etc/profile file and add the following content Environment Variables
Export hadoop_home =/usr/local/hadoop
Export hadoop_common_home = $ hadoop_home
Export hadoop_hdfs_home = $ hadoop_home
Export hadoop_mapred_home = $ hadoop_home
Export hadoop_yarn_home = $ hadoop_home
Export hadoop_conf_dir = $ hadoop_home/etc/hadoop
Export Path = $ path: $ hadoop_home/bin
Export hadoop_common_lib_native_dir = $ hadoop_home/lib/native
Export hadoop_opts = "-djava. Library. Path = $ hadoop_home/lib"
4.3 refresh Environment Variables
Source/etc/profile
4.4 create a configuration file directory
Mkdir-P/data/hadoop/{TMP, name, Data, VAR}
5. Configure hadoop on 192.168.3.10
5.1 configure hadoop-env.sh
Vim/usr/local/hadoop/etc/hadoop/hadoop-env.sh.
Change Export java_home =$ {java_home} to export java_home =/usr/Java/default.
5.2 configure the slaves File
Vim/usr/local/hadoop/etc/hadoop/slaves
# Add the following content
Hdfs-slave1
Hdfs-slave2
5.3 configure the core-site.xml File
Vim/usr/local/hadoop/etc/hadoop/core-site.xml.
# Add the following content
<Configuration>
<Property>
<Name> fs. Default. Name </Name>
<Value> HDFS: // HDFS-MASTER: 9000 </value>
</Property>
<Property>
<Name> hadoop. tmp. dir </Name>
<Value>/data/hadoop/tmp </value>
</Property>
</Configuration>
# Description
// Fs. Default. Name is the URI of namenode. HDFS: // host name: Port/
// Hadoop. TMP. dir: the default temporary path of hadoop. It is recommended that you delete the tmp directory in this file if a newly added node or another unknown datanode cannot be started.
// However, if the directory of the namenode machine is deleted, you need to run the namenode format command again.
5.4 configure the hdfs-site.xml File
Vim/usr/local/hadoop/etc/hadoop/hdfs-site.xml.
# Add the following content
<Configuration>
<Property>
<Name> DFS. Name. dir </Name>
<Value>/data/hadoop/name </value>
<Description> </description>
</Property>
<Property>
<Name> DFS. Data. dir </Name>
<Value>/data/hadoop/Data </value>
<Description> </description>
</Property>
<Property>
<Name> DFS. Replication </Name>
<Value> 2 </value>
</Property>
</Configuration>
# Description
// DFS. Name. DIR is the local file system path for namenode to persistently store the namespace and transaction logs. When this value is a comma-separated directory list, the nametable data will be copied to all directories for redundant backup.
// DFS. Replication indicates the number of data to be backed up. The default value is 3. If this number is greater than the number of machines in the cluster, an error occurs. (I only have 2 datanode here, so I changed it to 2)
5.5 configure the mapred-site.xml File
CP/usr/local/hadoop/etc/hadoop/mapred-site.xml.template/usr/local/hadoop/etc/hadoop/mapred-site.xml
Vim/usr/local/hadoop/etc/hadoop/mapred-site.xml.
# Add the following content
<Configuration>
<Property>
<Name> mapred. Job. Tracker </Name>
<Value> HDFS-MASTER: 9001. </value>
</Property>
<Property>
<Name> mapred. Local. dir </Name>
<Value>/data/hadoop/var </value>
</Property>
</Configuration>
# Description
// Mapred. Job. tracker is the host (or IP) and Port of jobtracker. HOST: port. The/data/hadoop/var directory must be created in advance, and you must use the chown-R command to modify the directory permissions.
5.6 configure the yarn-site.xml File
Vim/usr/local/hadoop/etc/hadoop/yarn-site.xml.
# Add the following content
<Configuration>
<Property>
<Name> yarn. nodemanager. Aux-services </Name>
<Value> mapreduce_shuffle </value>
</Property>
</Configuration>
5.7 synchronize hadoop profiles to hdfs-slave1 and hdfs-slave2
SCP-r/usr/local/hadoop [email protected]:/usr/local/
SCP-r/usr/local/hadoop [email protected]:/usr/local/
6. format the Distributed File System
# Format hadoop in 192.168.3.10
HDFS namenode-format
7. Start the hadoop Cluster
# Start hadoop at 192.168.3.10
/Usr/local/hadoop/sbin/start-all.sh
8. view the status of master and slave Processes
# Check the Java Process at 192.168.3.10:
[[Email protected] hadoop] # JPs
8030 namenode
8213 secondarynamenode
JPS 8615
8356 ResourceManager
# Check the Java Process on several other datanode slave nodes:
[[Email protected] hadoop] # JPs
JPS 2821
2702 nodemanager
2586 datanode
Note: If the process on each node is in progress, the cluster is successfully deployed.
9. Simple verification of HDFS
# Test HDFS distributed storage at 192.168.3.10:
// Create a directory
Hdfs dfs-mkdir/test
Hdfs dfs-mkdir/test/01
// View the Directory
[[Email protected] sbin] # hdfs dfs-ls/
Drwxr-XR-X-root supergroup 0/test
// The Directory to be viewed by the hacker contains subdirectories.
[[Email protected] sbin] # hdfs dfs-ls-r/
Drwxr-XR-X-root supergroup 0/test
Drwxr-XR-X-root supergroup 0/test/01
// Add a file
Hdfs dfs-Put/root/soft/aa.txt/test
[[Email protected] sbin] # hdfs dfs-ls-r/test
Drwxr-XR-X-root supergroup 0/test/01
-RW-r -- 2 root supergroup 4/test/aa.txt
// Obtain the object
Hdfs dfs-Get/test/aa.txt/tmp/
[[Email protected] sbin] # ls/tmp/aa.txt
/Tmp/aa.txt
// Delete an object
Hdfs dfs-RM/test/aa.txt
[[Email protected] sbin] # hdfs dfs-ls-r/test
Drwxr-XR-X-root supergroup 0/test/01
// Delete the Directory
Hdfs dfs-rm-r/test/01
[[Email protected] sbin] # hdfs dfs-ls-r/
Drwxr-XR-X-root supergroup 0/test
// Modify the file owner
Hdfs dfs-chown-r root/test
// Modify permissions
Hdfs dfs-chmod-r 700/test
// View the status of the entire cluster on the webpage
Http: // 192.168.3.10: 50070
// View the status of the entire cluster through commands
10. Disable the hadoop Cluster
# Disable hadoop at 192.168.3.10
/Usr/local/hadoop/sbin/stop-all.sh


Note: Warn util. nativecodeloader: Unable to load native-hadoop library for your platform... the warning using builtin-Java classes where applicable has not been solved, but HDFS can now be uploaded and downloaded normally. If you know how to solve this problem, please leave a message with me.

This article from the "Chengdu @ A like" blog, please be sure to keep this source http://azhuang.blog.51cto.com/9176790/1551782

Hadoop series HDFS (Distributed File System) installation and configuration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.