Hadoop Configuration Process Practice!

Source: Internet
Author: User
Tags xsl hadoop fs

1 Hadoop configuration

caveats: Turn off all firewalls

server

ip

system

master

< P align= "left" >10.0.0.9

centos 6.0 X64

slave1

10.0.0.11

Centos 6.0 X64

slave2

10.0.0.12

centos 6.0 X64

Hadoop version: hadoop-0.20.2.tar.gz1.1 in master: (Operations on SLAVE1 and Slave2 are the same as below)

#vi/etc/hosts three machines in the same configuration
10.0.0.9 Master
10.0.0.11 slave1
10.0.0.12 Slave2

#vi/etc/sysconfig/network

Networking=yes

Hostname= Host Name

#reboot

1.2 Login with root, install SCP (on Master, Slave1 and Slave2)

#yum Install Openssh-clients

1.3 Login with root, build Hadoop user (on Master, Slave1 and Slave2)

#useradd Hadoop
#passwd Hadoop input 111111 as a password

1.4 Configuring the SHH authentication 1.4.1 on Master

#su-hadoop into the Hadoop user directory
$SSH-keygen-t RSA to set up SSH directory, hit enter the end
$CD/home/hadoop/.ssh

1.4.2 on the slave1.

#su-hadoop into the Hadoop user directory
$SSH-keygen-t RSA to set up SSH directory, hit enter the end

1.4.3 on the slave2.

#su-hadoop into the Hadoop user directory
$SSH-keygen-t RSA to set up SSH directory, hit enter the end

1.4.4 on Master

$SCP-R id_rsa.pub [email protected]:/home/hadoop/.ssh/authorized_keys_m
Rename the key on Master to the slave1 Hadoop user and rename it to Authorized_keys_m
Password input: 111111 $SCP-R id_rsa.pub [email protected]:/home/hadoop/.ssh/authorized_keys_m
Rename the key on Master to the Slave2 Hadoop user and rename it to Authorized_keys_m

Password Input: 111111

1.4.5 on the slave1.

$ cd/home/hadoop/.ssh

$SCP-R id_rsa.pub [email protected]:/home/hadoop/.ssh/authorized_keys_s1
Upload the key on the slave1 to the Hadoop user of Master

$cat id_rsa.pub >> authorized_keys_m Add a local key Authorized_keys

$CP Authorized_keys_m Authorized_keys

$RM –RF Authorized_keys_m

1.4.6 on the slave2.

$ cd/home/hadoop/.ssh

$SCP-R id_rsa.pub [email protected]:/home/hadoop/.ssh/authorized_keys_s2
Upload the key on the slave2 to the Hadoop user of Master

$cat id_rsa.pub >> authorized_keys_m Add a local key Authorized_keys

$CP Authorized_keys_m Authorized_keys

$RM –RF Authorized_keys_m

1.4.7 on Master

$CD/home/hadoop/.ssh

$cat id_rsa.pub >> authorized_keys_s1 Add a local key authorized_keys_s1

$ authorized_keys_s2 >> authorized_keys_s1

$CP authorized_keys_s1 Authorized_keys

$RM –RF authorized_keys_s1 Authorized_keys_s2

1.5 Login with root, install JDK (on master, Slave1 and Slave2)

#yum Install Java-1.6.0-openjdk-devel

This installation, Java executable files are automatically added to the/usr/bin/directory.
Verify the shell command: java-version See if it matches your version number.

1.6 Login with root, install Hadoop (in master) myself in Slave1 and Slave2 also added, own Add. I don't know if I can

#mount –t auto/dev/sdb/mnt (USB stick mount is not required)

#cp Hadoop-0.2.0**.tar.z/home/hadoop

#cd/home/hadoop

#tar –vxzf Hadoop***.tar.gz/home/hadoop

Modify the/etc/profile on master to add the following:
Export hadoop_home=/home/hadoop/hadoop-0.20.2
Export path= $PATH: $HADOOP _home/bin

Perform

#source/etc/profile to bring it into effect

1.7 Login with root, install Hadoop (in master)

Configuration conf/hadoop-env.sh file

#添加 Export Java_home=

/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64 here to modify the installation location for your JDK.

Test Hadoop Installation: (with Hadoop users)

Hadoop jar Hadoop-0.20.2-examples.jar WordCount conf//tmp/out

1.8 Cluster configuration (all nodes are the same) or in master configuration, copy to other machine 1.8.1 profile: Conf/core-site.xml

<?xml version= "1.0"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:49000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop_home/var</value>
</property>
</configuration>

1) Fs.default.name is the URI of the Namenode. HDFS://Host Name: Port/

2) Hadoop.tmp.dir:Hadoop Default temporary path, this is the best configuration, if the new node or in other cases, the inexplicable datanode can not start, delete the TMP directory in this file. However, if you delete this directory for the Namenode machine, you will need to re-execute the namenode formatted command.

1.8.2 configuration file: Conf/mapred-site.xml

<?xml version= "1.0"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:49001</value>
</property>


<property>
<name>mapred.local.dir</name>
<value>/home/hadoop/hadoop_home/var</value>
</property>
</configuration>

1) Mapred.job.tracker is the host (or IP) and port of the Jobtracker. Host: Port.

1.8.3 configuration file: Conf/hdfs-site.xml

<?xml version= "1.0"?>

<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>

<configuration>

<property>

<name>dfs.name.dir</name>

<value>/home/hadoop/name1 </value>

<description> </description>

</property>

<property>

<name>dfs.data.dir</name>

<value>/home/hadoop/data1 </value>

<description> </description>

</property>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

</configuration>

1) dfs.name.dir is a local file system path Namenode persistent storage namespace and transaction log. When this value is a comma-delimited list of directories, the NameTable data will be copied to all directories for redundant backups.
2) Dfs.data.dir is a comma-separated list of local file system paths that Datanode store block data. When this value is a comma-separated list of directories, the data will be stored in all directories, usually distributed on different devices.
3) Dfs.replication is the amount of data that needs to be backed up, which is 3 by default, and if this number is larger than the number of machines in the cluster.

Note: Here the name1 , name2 , data1 , data2 directories cannot be pre-created, Hadoop It is created automatically when you format it, but there is a problem if you create it beforehand.

1.8.4 Configuring Masters and Slaves Master-slave nodes

Configure Conf/masters and conf/slaves to set master-slave nodes, note that host names are best used, and that each host name can be accessed through the host name.

VI Masters:
Input:

Node1

VI Slaves:

Input:
Node2
Node3

Configure the end, copy the configured Hadoop folder to the other cluster machine , and ensure that the above configuration is correct for other machines, for example: If the Java installation path of other machines is different, modify conf/ hadoop-env.sh

$ scp-r/home/hadoop/hadoop-0.20.2 [email protected]:/home/hadoop

$ scp-r/home/hadoop/hadoop-0.20.2 [email protected]:/home/hadoop

1.8.5 Root Change directory permissions (master, slave1, Slave2) (this step is optional)

#cd/home/

#chown-R Hadoop:hadoop Hadoop

#chmod ugo+rwx Hadoop

1.9 Hadoop Boot

Precautions:

Format, start Hadoop, just start at master, and slave will start automatically with master. Manual operation is not required for slave.

1.9.1 (Hadoop user) formats a new Distributed file system (shuts down every firewall such as iptable)

First, format a new Distributed File system
$ bin/hadoop Namenode-format
System output in case of success:

12/02/06 00:46:50 INFO Namenode. Namenode:startup_msg:
/************************************************************
Startup_msg:starting NameNode
Startup_msg:host = ubuntu/127.0.1.1
Startup_msg:args = [-format]
Startup_msg:version = 0.20.203.0
Startup_msg:build =http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203-r 1099333; Compiled by ' Oom ' on Wed 4 07:57:50 PDT 2011
************************************************************/
12/02/0600:46:50 INFO Namenode. Fsnamesystem:fsowner=root,root
12/02/06 00:46:50 INFO Namenode. Fsnamesystem:supergroup=supergroup
12/02/06 00:46:50 INFO Namenode. Fsnamesystem:ispermissionenabled=true
12/02/06 00:46:50 INFO Common. Storage:imagefile of size 94 saved in 0 seconds.
12/02/06 00:46:50 INFO Common. STORAGE:STORAGEDIRECTORY/OPT/HADOOP/HADOOPFS/NAME1 has been successfully formatted.
12/02/06 00:46:50 INFO Common. Storage:imagefile of size 94 saved in 0 seconds.
12/02/06 00:46:50 INFO Common. Storage:storagedirectory/opt/hadoop/hadoopfs/name2 has been successfully formatted.
12/02/06 00:46:50 INFO Namenode. Namenode:shutdown_msg:
/************************************************************
Shutdown_msg:shutting down NameNode atv-jiwan-ubuntu-0/127.0.0.1
************************************************************/

View output to ensure successful format of distributed File system
After execution, you can see the/home/hadoop//name1 and/home/hadoop//name2 two directories on the master machine. Starting Hadoop on the master node master, the master node initiates all Hadoop from the node.

1.9.2 (Hadoop user) Start all nodes

starting mode 1:

$ bin/start-all.sh (both HDFs and Map/reduce are started)
System output:

starting Namenode, logging to/usr/local/hadoop/logs/hadoop-hadoop-namenode-ubuntu.out
Node2: Starting Datanode, Loggingto/usr/local/hadoop/logs/hadoop-hadoop-datanode-ubuntu.out
node3:starting Datanode, Loggingto/usr/local/hadoop/logs/hadoop-hadoop-datanode-ubuntu.out
Node1:starting secondarynamenode,logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-ubuntu.out
Starting Jobtracker, logging to/usr/local/ Hadoop/logs/hadoop-hadoop-jobtracker-ubuntu.out
node2:starting tasktracker,logging to/usr/local/hadoop/logs/ Hadoop-hadoop-tasktracker-ubuntu.out
node3:starting tasktracker,logging to/usr/local/hadoop/logs/ Hadoop-hadoop-tasktracker-ubuntu.out
As you can see in slave ' s output above, it'll automatically format it ' s storage Directory (specified by Dfs.data.dir) if it isn't formattedalready. It would also create the directory if it does not exist yet.


After execution, you can see the/home/hadoop/hadoopfs/data1 and/home/hadoop/data2 two directories on the master (Node1) and Slave (NODE1,NODE2) machines.

starting mode 2:

Starting a Hadoop cluster requires that the HDFS cluster and the Map/reduce cluster be started.

On the assigned Namenode, run the following command to start HDFs:
$ bin/start-dfs.sh (separate launch of HDFs cluster)

The bin/start-dfs.sh script launches the Datanode daemon on all listed slave, referring to the contents of the ${hadoop_conf_dir}/slaves file on Namenode.

On the assigned Jobtracker, run the following command to start Map/reduce:
$bin/start-mapred.sh (start map/reduce separately)

The bin/start-mapred.sh script launches the Tasktracker daemon on all listed slave, referring to the contents of the ${hadoop_conf_dir}/slaves file on Jobtracker.

1.9.3 Close all nodes

When you close Hadoop from Master node master, the master node shuts down all Hadoop from the node.

$ bin/stop-all.sh

The log of the HADOOP daemon is written to the ${hadoop_log_dir} directory (default is ${hadoop_home}/logs).

${hadoop_home} is the installation path.

1.10 Testing

Note: If you are accessing Windows client. Please add master, slave1 and other IP to host under C:\Windows.

1) Browse the network interfaces of Namenode and Jobtracker, their addresses by default:

namenode-http://node1:50070/
jobtracker-http://node2:50030/

3) Use Netstat–nat to see if ports 49000 and 49001 are in use.

4) Use JPS to view processes

To check if the daemon is running, you can use the JPS command (which is the PS utility for the JVM process). This command lists 5 daemons and their process identifiers.

5) Copy the input files to the Distributed File system:
$ bin/hadoop Fs-mkdir Input
$ bin/hadoop fs-put conf/core-site.xml input

To run the sample program provided by the release version:
$ bin/hadoop jar hadoop-0.20.2-examples.jar grep input Output ' dfs[a-z. +

6. Supplement
Q:bin/hadoop jar hadoop-0.20.2-examples.jar grep input Output ' dfs[a-z. + ' What does that mean?
A:bin/hadoop Jar (run jar package with Hadoop) Hadoop-0.20.2_examples.jar (the name of the jar package) grep (the class to be used, followed by the parameter) input output ' dfs[a-z. +
The entire operation is grep in the Hadoop sample program, and the input directory on the corresponding HDFs is output.
Q: What is grep?
A:A Map/reduce Program This counts the matches of A regex in the input.

To view the output file:

Copy the output file from the Distributed file system to the local file system view:
$ bin/hadoop fs-get Output output
$ cat output/*

Or

To view the output file on a distributed File system:
$ bin/hadoop Fs-cat output/*

Statistical results:
[Email protected]:~/hadoop/hadoop-0.20.2-bak/hadoop-0.20.2#bin/hadoop fs-cat output/part-00000
3 Dfs.class
2 Dfs.period
1 dfs.file
1 dfs.replication
1 dfs.servers
1 dfsadmin

Other viewing tools

Namenode with Java-brought Widgets JPS view processes

[[Email Protected]~]$/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/bin/jps

18978 Jobtracker

21242 Jps

18899 Secondarynamenode

18731 NameNode

Review the process on each Datanode

[Email protected] ~]$/USR/LIB/JVM/JAVA-1.6.0-OPENJDK-1.6.0.0.X86_64/BIN/JPS

17706 Tasktracker

20499 Jps

17592 DataNode

[Email protected] ~]$/USR/LIB/JVM/JAVA-1.6.0-OPENJDK-1.6.0.0.X86_644/BIN/JPS

28550 Tasktracker

28436 DataNode

30798 Jps

To view the cluster status on Namenode:

[Email protected]]$ Hadoop dfsadmin-report

Configured capacity:123909840896 (115.4 GB)

Present capacity:65765638144 (61.25 GB)

DFS remaining:65765257216 (61.25 GB)

DFS used:380928 (372 KB)

DFS used%: 0%

Under Replicated blocks:0

Blocks with corrupt replicas:0

Missing blocks:0

Datanodes Available:2 (2 total, 0 dead)

Hadoop Configuration Process Practice!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.