Hadoop pseudo-distributed and fully distributed configuration

Source: Internet
Author: User
Tags hadoop fs

Three hadoop modes:
Local Mode: local simulation, without using a Distributed File System
Pseudo-distributed mode: five processes are started on one host.
Fully Distributed mode: at least three nodes, JobTracker and NameNode are on the same host, secondaryNameNode is a host, DataNode and Tasktracker are a host.

Test environment:

CentOS2.6.32-358. el6.x86 _ 64

Jdk-7u21-linux-x64.rpm

Hadoop-0.20.2-cdh3u6.tar.gz

1. hadoop pseudo-distributed mode configuration
[Root @ localhost ~] # Rpm-ivh jdk-7u21-linux-x64.rpm
[Root @ localhost ~] # Vim/etc/profile. d/java. sh
JAVA_HOME =/usr/java/latest
PATH = $ JAVA_HOME/bin: $ PATH
Export JAVA_HOME PATH
[Root @ localhost ~] # Tar hadoop-0.20.2-cdh3u6.tar.gz-C/usr/local/
[Root @ localhost ~] # Cd/usr/local/
[Root @ localhost local] # ln-sv hadoop-0.20.2-cdh3u6/hadoop
[Root @ localhost ~] # Vim/etc/profile. d/hadoop. sh
HADOOP_HOME =/usr/local/hadoop
PATH = $ HADOOP_HOME/bin: $ PATH
Export HADOOP_HOME PATH

Test whether jdk and hadoop are correctly installed.
[Root @ localhost ~] # Java-version
[Root @ localhost ~] # Hadoop version
Create a user and modify hadoop File Permissions
[Root @ localhost ~] # Useradd hduser
[Root @ localhost ~] # Passwd hduser
[Root @ localhost ~] # Chown-R hduser. hduser/usr/local/hadoop/
Create a temporary hadoop data storage directory
[Root @ localhost ~] # Mkdir/hadoop/temp-pv
[Root @ localhost ~] # Chown-R hduser. hduser/hadoop/

Main script functions:
/Usr/local/hadoop/bin/start-dfs.sh start the namenode datanode secondarynamenode Process
/Usr/local/hadoop/bin/start-mapred.sh start jobtracker tasktracker
/Usr/local/hadoop/bin/hadoop-daemon.sh to start a process separately
/Usr/local/hadoop/bin/start-all.sh start all processes
/Usr/local/hadoop/bin/stop-all.sh stop all processes
Main configuration file:
/Usr/local/hadoop/conf/masters Save the second name node location secondaryNameNode)
/Usr/local/hadoop/conf/slaves save all nodes running tasktracker and datanode from the node location)
/Usr/local/hadoop/conf/core-site.xml used to define system-level Parameters
/Usr/local/hadoop/conf/hdfs-site.xml HDFS settings
/Usr/local/hadoop/conf/mapred-site.xml HDFS settings, such as the default number of reduce tasks, the default upper and lower limits of the memory that can be used by tasks, etc.
/Usr/local/hadoop/conf/hadoop-env.sh defines configuration information related to hadoop Runtime Environment

To start hadoop, you only need to modify the configuration file.
[Root @ localhost conf] # vim core-site.xml.

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href=\'#\'"  Put site-specific property overrides in this file. --><configuration>  <property>      <name>hadoop.tmp.dir</name>          <value>/hadoop/temp</value>            </property>  <property>      <name>fs.default.name</name>          <value>hdfs://localhost:8020</value>  </property></configuration>


[Root @ localhost conf] # vim mapred-site.xml.

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href=\'#\'"  Put site-specific property overrides in this file. --><configuration>  <property>      <name>mapred.job.tracker</name>      <value>localhost:8021</value>  </property></configuration>


[Root @ localhost conf] # vim hdfs-site.xml.

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href=\'#\'"  Put site-specific property overrides in this file. --><configuration>  <property>      <name>dfs.replication</name>      <value>1</value>  </property></configuration>


Configure hduser to access the local machine through ssh without a password
[Hduser @ localhost ~] $ Ssh-keygen-t rsa-p''
[Hduser @ localhost. ssh] $ ssh-copy-id-I id_rsa.pub hduser @ localhost

[Hduser @ localhost ~] $ Hadoop namenode-format Name node [hduser @ localhost ~] $ Start-all.sh startup Service
[Hduser @ localhost ~] $ Jps view Processes
NameNode
DataNode
JobTracker
TaskTracker
SecondaryNameNode
If the preceding five processes are started, hadoop configuration is successful.

Common hadoop commands:
[Hduser @ localhost ~] $ View hadoop help
[Hduser @ localhost ~] $ Hadoop fs
[Hduser @ localhost ~] $ Hadoop fs-mkdir test create a directory on HDFS
[Hduser @ localhost ~] $ Hadoop fs-ls viewing files or directories
[Hduser @ localhost ~] $ Hadoop fs-put test.txt test upload local files to HDFS

Use the task model provided by hadoop to test hadoop availability:
[Hduser @ localhost ~] $ Hadoop jar/usr/local/hadoop/hadoop-examples-0.20.2-cdh3u6.jar read jar files
[Hduser @ localhost ~] $ Hadoop jar/usr/local/hadoop/hadoop-examples-0.20.2-cdh3u6.jar wordcount view wordcount syntax format
Usage: wordcount <in> <out>
In reading the file location out saving the result location on HDFS, the directory cannot exist in advance)

[Hduser @ localhost ~] $ Hadoop jar/usr/local/hadoop/hadoop-examples-0.20.2-cdh3u6.jar wordcount test wordcount-out
[Hduser @ localhost ~] $ Hadoop job-list all: View executed jobs
[Hduser @ localhost ~] $ Hadoop fs-ls wordcount-out view task output results
[Hduser @ localhost ~] $ Hadoop fs-cat wordcount-out/part-r-00000

View the web Task process provided by hadoop: firewall needs to be disabled for access)
The HTTP server address and port of JobTracker. The default value is 0.0.0.0: 50030;
The HTTP server address and port of TaskTracker. The default value is 0.0.0.0: 50060;
The HTTP server address and port of NameNode. The default value is 0.0.0.0: 50070;
The HTTP server address and port of DataNode. The default value is 0.0.0.0: 50075;
SecondaryNameNode's HTTP server address and port. The default value is 0.0.0.0: 50090;


Ii. hadoop fully distributed configuration:
NameNode and JobTracker lab201 on one node)
SecondaryNameNodeSNN) on a node (lab202)
DataNode and TaskTracker are on one node (lab203)

Perform the following operations on the three nodes. Note: The three nodes maintain time synchronization)
[Root @ localhost ~] # Rpm-ivh jdk-7u21-linux-x64.rpm
[Root @ localhost ~] # Vim/etc/profile. d/java. sh
JAVA_HOME =/usr/java/latest
PATH = $ JAVA_HOME/bin: $ PATH
Export JAVA_HOME PATH
[Root @ localhost ~] # Tar hadoop-0.20.2-cdh3u6.tar.gz-C/usr/local/
[Root @ localhost ~] # Cd/usr/local/
[Root @ localhost local] # ln-sv hadoop-0.20.2-cdh3u6/hadoop
[Root @ localhost ~] # Vim/etc/profile. d/hadoop. sh
HADOOP_HOME =/usr/local/hadoop
PATH = $ HADOOP_HOME/bin: $ PATH
Export HADOOP_HOME PATH
[Root @ localhost ~] # Java-version
[Root @ localhost ~] # Hadoop version
[Root @ localhost ~] # Useradd hduser
[Root @ localhost ~] # Passwd hduser
[Root @ localhost ~] # Chown-R hduser. hduser/usr/local/hadoop/
[Root @ lab201 ~] # Mkdir-pv/hadoop/temp
[Root @ lab201 ~] # Chown-R hduser. hduser/hadoop

Master node configuration lab201 ):
Configure the master node hduser without a password to access the slave Node
[Root @ lab201 ~] # Su-hduser
[Hduser @ lab201 ~] $ Ssh-keygen-t rsa-p''
[Hduser @ hjlab1 ~] $ Ssh-copy-id-I. ssh/id_rsa.pub hduser @ localhost
[Hduser @ lab201 ~] $ Ssh-copy-id-I. ssh/id_rsa.pub hduser @ lab201
[Hduser @ lab201 ~] $ Ssh-copy-id-I. ssh/id_rsa.pub hduser @ lab203

[Hduser @ lab201 conf] $ vim masters modify the SecondaryNameNode Node
Lab202

[Hduser @ lab201 conf] $ vim slaves modify slave Node
Lab203

[Hduser @ lab201 conf] $ vim core-site.xml

<configuration>  <property>      <name>hadoop.tmp.dir</name>        <value>/hadoop/temp</value>  </property>  <property>      <name>fs.default.name</name>        <value>hdfs://lab201:8020</value>  </property></configuration>


[Hduser @ lab201 conf] $ vim mapred-site.xml

<configuration>  <property>      <name>mapred.job.tracker</name>      <value>lab201:8021</value>  </property></configuration>


[Hduser @ lab201 conf] $ vim hdfs-site.xml hduser requires write permission on hadoop to automatically create/hadoop/data/hadoop/name and other files

<configuration>        <property>                <name>dfs.replication</name>                <value>1</value>                <description>The actual number of replications can be specified when the file is created.</description>        </property>        <property>                <name>dfs.data.dir</name>                <value>/hadoop/data</value>                <final>ture</final>                <description>The directories where the datanode stores blocks.</description>        </property>        <property>                <name>dfs.name.dir</name>                <value>/hadoop/name</value>                <final>ture</final>                <description>The directories where the namenode stores its persistent matadata.</description>        </property>        <property>                <name>fs.checkpoint.dir</name>                <value>/hadoop/namesecondary</value>                <final>ture</final>                <description>The directories where the secondarynamenode stores checkpoints.</description>        </property></configuration>

Copy the configuration file to the slave node:
[Hduser @ lab201 conf] $ scp hdfs-site.xml core-site.xml mapred-site.xml lab2:/usr/local/hadoop/conf/
[Hduser @ lab201 conf] $ scp hdfs-site.xml core-site.xml mapred-site.xml lab3:/usr/local/hadoop/conf/

[Hduser @ lab201 conf] $ hadoop namenode-format
[Hduser @ lab201 conf] $ start-all.sh
[Hduser @ lab201 conf] $ jps
Check whether the process is started on the jps node, open the relevant page in the browser, and check whether the process can be accessed normally.

Function Testing is similar to the pseudo-distributed mode.

This article is from the linuxgfc blog. For more information, contact the author!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.