Run hadoop in a standalone pseudo-distributed manner

Source: Internet
Author: User

1) installation and configuration of the Java environment
2) install hadoop

Download hadoop-0.20.2.tar.gz from hadoop and decompress tar zxvf hadoop-0.20.0.tar.gz

Add in hadoop-env.sh
Export java_home =/home/heyutao/tools/jdk1.6.0 _ 20
Export hadoop_home =/home/heyutao/tools/hadoop-0.20.2
Export Path = $ path:/home/heyutao/tools/hadoop-0.20.2/bin

Test whether hadoop is successfully installed. bin/hadoop

3) Configure hadoop in a single-host environment

A) edit the configuration file

1) Modify CONF/core-site.xml:
<Configuration>
<Property>
<Name> fs. Default. Name </Name>
<Value> HDFS :/// localhost: 9000 </value>
</Property>

<Property>
<Name> hadoop. tmp. dir </Name>
<Value>/tmp/hadoop-$ {user. name} </value>
</Property>

</Configuration>

2) Modify CONF/mapred-site.xml:
<Configuration>
<Property>
<Name> mapred. Job. Tracker </Name>
<Value> localhost: 9001 </value>
</Property>
</Configuration>

3) Modify CONF/hdfs-site.xml:
<Configuration>
<Property>
<Name> DFS. Replication </Name>
<Value> 1 </value>
</Property>
</Configuration>

# The fs. Default. name parameter specifies the IP address and port number of the namenode. The default value is file: //, indicating that the local file system is used for the non-distributed mode of a single machine. Here we specify the namenode running on the local localhost.
# The mapred. Job. Tracker parameter specifies the IP address and port number of jobtracker. The default value is local, which indicates that jobtracker and tasktracker are executed in the same local Java Process and used in standalone non-distributed mode. Here we specify to use jobtracker running on the local localhost (using a separate Java Process as jobtracker ).
# The DFS. Replication parameter specifies the number of times each block in HDFS is replicated, which acts as a redundant data backup. In a typical production system, this number is usually set to 3.

B. Disable the firewall.
$ Sudo UFW disable
Note: This step is very important. If you do not close it, The datanode cannot be found.

C) Set SSH

SSH localhost

 
If the following error occurs, it is likely that the ssh-server has not been installed:

 
SSH: connect to host localhost port 22: Connection refused

 
Install SSH-Server:

 
Sudo apt-Get install OpenSSH-Server

 
Set SSH to login without manual Password Input
$ Ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa
$ Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys

Start ssh-Server

 
Sudo/etc/init. d/ssh start

D) Format a new distributed file system.

$ Hadoop-0.20.2 CD
$ Bin/hadoop namenode-format

E) Start the hadoop process.

$ Bin/start-all.sh
The output information on the console should show that namenode, datanode, secondary namenode, jobtracker, and tasktracker are enabled. After the startup is complete, you can see that five new Java processes have been started through PS-ef.
 
F) run the wordcount Application

$ Hadoop-0.20.2 CD
$ Mkdir Test
$ CD Test

# Create two text files in the "test" directory. The wordcount program will count the number of occurrences of each word.

$ Echo "Hello world, bye, world."> file1.txt
$ Echo "Hello hadoop, goodbye, hadoop"> file2.txt
$ CD ..
# Copy the./test-TXT directory on the local file system to the root directory of HDFS, and change the directory name to input
$ Bin/hadoop DFS-put./test Input

# Wordcount in the execution example
$ Bin/hadoop jars hadoop-0.20.2-examples.jar wordcount Input Output

# View execution results:

# Copy the file from HDFS to the local file system and view it again:

$ Bin/hadoop DFS-Get output

$ Cat output /*

# You can also directly view

$ Bin/hadoop DFS-cat output /*

G) $ bin/stop-all.sh # Stop hadoop Process
H) disable SSH-Server

 
Sudo/etc/init. d/ssh stop

Fault diagnosis:
(1) After executing $ bin/start-all.sh to start the hadoop process, five Java processes are started, and five PID files are created under the/tmp directory to record these process IDs. The five files show the Java processes corresponding to namenode, datanode, secondary namenode, jobtracker, and tasktracker. When you think that hadoop is not working properly, you can first check whether the five Java processes are running normally.
(2) Use web interfaces. Access http: // localhost: 50030 to view the running status of jobtracker. Access http: // localhost: 50060 to view the running status of tasktracker. Access http: // localhost: 50070 to view the status of namenode and the entire Distributed File System, and view files and logs in the distributed file system.
(3) view the log files under the $ {hadoop_home}/logs directory, namenode, datanode, secondary namenode, jobtracker, and tasktracker each have a corresponding log file, each running computing task also has application log files. Analyzing these log files helps you find the cause of the fault.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.