Run hadoop in a standalone pseudo-distributed manner

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1) installation and configuration of the Java environment
2) install hadoop

Download hadoop-0.20.2.tar.gz from hadoop and decompress tar zxvf hadoop-0.20.0.tar.gz

Add in hadoop-env.sh
Export java_home =/home/heyutao/tools/jdk1.6.0 _ 20
Export hadoop_home =/home/heyutao/tools/hadoop-0.20.2
Export Path = $ path:/home/heyutao/tools/hadoop-0.20.2/bin

Test whether hadoop is successfully installed. bin/hadoop

3) Configure hadoop in a single-host environment

A) edit the configuration file

1) Modify CONF/core-site.xml:
<Configuration>
<Property>
<Name> fs. Default. Name </Name>
<Value> HDFS :/// localhost: 9000 </value>
</Property>

<Property>
<Name> hadoop. tmp. dir </Name>
<Value>/tmp/hadoop-$ {user. name} </value>
</Property>

</Configuration>

2) Modify CONF/mapred-site.xml:
<Configuration>
<Property>
<Name> mapred. Job. Tracker </Name>
<Value> localhost: 9001 </value>
</Property>
</Configuration>

3) Modify CONF/hdfs-site.xml:
<Configuration>
<Property>
<Name> DFS. Replication </Name>
<Value> 1 </value>
</Property>
</Configuration>

# The fs. Default. name parameter specifies the IP address and port number of the namenode. The default value is file: //, indicating that the local file system is used for the non-distributed mode of a single machine. Here we specify the namenode running on the local localhost.
# The mapred. Job. Tracker parameter specifies the IP address and port number of jobtracker. The default value is local, which indicates that jobtracker and tasktracker are executed in the same local Java Process and used in standalone non-distributed mode. Here we specify to use jobtracker running on the local localhost (using a separate Java Process as jobtracker ).
# The DFS. Replication parameter specifies the number of times each block in HDFS is replicated, which acts as a redundant data backup. In a typical production system, this number is usually set to 3.

B. Disable the firewall.
$ Sudo UFW disable
Note: This step is very important. If you do not close it, The datanode cannot be found.

C) Set SSH

SSH localhost

If the following error occurs, it is likely that the ssh-server has not been installed:

SSH: connect to host localhost port 22: Connection refused

Install SSH-Server:

Sudo apt-Get install OpenSSH-Server

Set SSH to login without manual Password Input
$ Ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa
$ Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys

Start ssh-Server

Sudo/etc/init. d/ssh start

D) Format a new distributed file system.

$ Hadoop-0.20.2 CD
$ Bin/hadoop namenode-format

E) Start the hadoop process.

$ Bin/start-all.sh
The output information on the console should show that namenode, datanode, secondary namenode, jobtracker, and tasktracker are enabled. After the startup is complete, you can see that five new Java processes have been started through PS-ef.

F) run the wordcount Application

$ Hadoop-0.20.2 CD
$ Mkdir Test
$ CD Test

# Create two text files in the "test" directory. The wordcount program will count the number of occurrences of each word.

$ Echo "Hello world, bye, world."> file1.txt
$ Echo "Hello hadoop, goodbye, hadoop"> file2.txt
$ CD ..
# Copy the./test-TXT directory on the local file system to the root directory of HDFS, and change the directory name to input
$ Bin/hadoop DFS-put./test Input

# Wordcount in the execution example
$ Bin/hadoop jars hadoop-0.20.2-examples.jar wordcount Input Output

# View execution results:

# Copy the file from HDFS to the local file system and view it again:

$ Bin/hadoop DFS-Get output

$ Cat output /*

# You can also directly view

$ Bin/hadoop DFS-cat output /*

G) $ bin/stop-all.sh # Stop hadoop Process
H) disable SSH-Server

Sudo/etc/init. d/ssh stop

Fault diagnosis:
(1) After executing $ bin/start-all.sh to start the hadoop process, five Java processes are started, and five PID files are created under the/tmp directory to record these process IDs. The five files show the Java processes corresponding to namenode, datanode, secondary namenode, jobtracker, and tasktracker. When you think that hadoop is not working properly, you can first check whether the five Java processes are running normally.
(2) Use web interfaces. Access http: // localhost: 50030 to view the running status of jobtracker. Access http: // localhost: 50060 to view the running status of tasktracker. Access http: // localhost: 50070 to view the status of namenode and the entire Distributed File System, and view files and logs in the distributed file system.
(3) view the log files under the $ {hadoop_home}/logs directory, namenode, datanode, secondary namenode, jobtracker, and tasktracker each have a corresponding log file, each running computing task also has application log files. Analyzing these log files helps you find the cause of the fault.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Run hadoop in a standalone pseudo-distributed manner

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Run hadoop in a standalone pseudo-distributed manner

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support