Using hadoop2.4 to build clusters under Ubuntu (Pseudo-distributed)

Source: Internet
Author: User
Tags hadoop fs

To really learn Hadoop, you have to use the cluster, but for the average developer, there is no large-scale cluster for testing, so you can only use pseudo-distributed. Here's how to build a pseudo-distributed cluster.

In order to save time and space, some of the previous steps are no longer described. This paper is built on the premise of the single-machine mode. If you don't have a stand-alone mode, check out my previous post. Use hadoop2.4 to build clusters under Ubuntu (standalone mode)

First step configuration Hdfs-site.xml

/usr/local/hadoop/etc/hadoop/hdfs-site.xml is used to configure each host in the cluster to be available, specifying the directory on the host as Namenode and Datanode.

Open a file by command

sudo  gedit  /usr/local/hadoop/etc/hadoop/hdfs-site.xml


Modify File Contents

Add the following content between the <configuration></configuration> of the file

<property>        <name>dfs.replication</name>        <value>1</value>    </ property>    <property>        <name>dfs.namenode.name.dir</name>        <value>file:/ usr/local/hadoop/hdfs/name</value>    </property>    <property>        <name> dfs.datanode.data.dir</name>        <value>file:/usr/local/hadoop/hdfs/data</value> </ Property>

Save, close the edit window

Then create a new folder under the Hadoop directory. These can all be defined by themselves, but one thing to be aware of. The path name written in Hdfs-site.xml needs to be the same as the file name you built. Don't make a mistake, that's fine.

Create a new folder from the command.

A command can be done, of course, separate writing is also possible. (Remember it's under the Hadoop folder)


Second Step configuration Core-site.xml

This file contains the configuration information file for Hadoop startup time in this directory Usr/local/hadoop/etc/hadoop/core-site.xml

First open the configuration file with a command

sudo gedit/usr/local/hadoop/etc/hadoop/core-site.xml


Add the following to the <configuration></configuration> of the file. Here I would like to make a special statement, you can use the following three experiments, the author preferred the first one, this one generally will not have a problem. Sometimes the version of Hadoop and the system itself some problems, if the configuration can not be started after the end, you can try the following two.

The first type:

<property>        <name>fs.default.name</name>        <value>hdfs://localhost:9000</value >    </property>

The second type:

<property>        <name>fs.defaultFS</name>        <value>hdfs://localhost:9000</value >    </property>

Defaultfs is a new version of the wording, if you use the previous default.name generally no problem, sometimes you will be prompted to expire , let you use defaultfs. You can try it all.
The third type:

<property>        <name>hadoop.tmp.dir</name>        <value>file:/usr/local/hadoop/tmp</ Value>        <description>abase for other temporary directories.</description>    </property>    <property>        <name>fs.defaultFS</name>        <value>hdfs://localhost:9000</ Value>    </property>

The purpose of this configuration is that if the Hadoop.tmp.dir parameter is not configured, Hadoop defaults to use a temporary directory of/tmp/hadoo-hadoop, and this directory will be killed after each reboot, you must re-execute format (not verified). So it's best to set it up in a pseudo-distributed configuration. I have not verified it. Readers can try.

Step three configuration Yarn-site.xml

The/usr/local/hadoop/etc/hadoop/yarn-site.xml contains configuration information for MapReduce when it is started. The previous version of Hadoop does not have this, the start time only with bin/start-all.sh, now use the yarn framework after the need to configure it. I but the configuration 2.4, the use of the previous configuration 1.1 time method, the results of a half day, wasted time.

Open this file in the editor

sudo gedit    /usr/local/hadoop/etc/hadoop/yarn-site.xml

Add the following between the <configuration></configuration> of the file:

   <property>        <name>yarn.nodemanager.aux-services</name>        <value>mapreduce_shuffle </value>    </property>    <property>        <name> Yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>        <value> Org.apache.hadoop.mapred.shufflehandler</value>   </property>

Save, close the edit window

Fourth Step configuration Mapred-site.xml

By default, there is a mapred.xml.template file under the/usr/local/hadoop/etc/hadoop/folder, and we want to copy the file and name it Mapred.xml, which is used to specify the framework used by MapReduce. Some of the previous versions directly use the Mapred.xml file, version 2.4, just provide a template, so we need to copy a mapred.xml file ourselves.

Copy and rename (note the path)

CP Mapred-site.xml.template Mapred-site.xml

Editor opens this new file

sudo gedit/usr/local/hadoop/etc/hadoop/mapred-site.xml


Add the following between the <configuration></configuration> of the file:

<property>        <name>mapreduce.framework.name</name>        <value>yarn</value>  </property>

Save, close the edit window

Fifth step format the Distributed File system and start Hadoop

Use the following command (the path to the Hadoop directory, novice may easily ignore these issues.) Including the front of all need to pay attention to the path problem! )

  HDFs Namenode-format    


It only needs to be executed once, and if it is executed again after Hadoop has been used, all the data on the HDFS will be erased.

After the configuration and operation described above, you can start this single-node cluster

To execute a startup command:

sbin/start-dfs.sh    

First execution, if there is a yes/no prompt, enter Yes, the carriage return. The results are as follows


Next, execute:

    

The results are as follows


After executing these two commands, Hadoop will start and run

Execute the JPS command and you will see Hadoop-related processes, such as. If it is not complete, the configuration is not correct.


At the same time we can also view on the browser.

Browser opens http://localhost:50070/will see the HDFS admin page


The browser opens http://localhost:8088 and you see the Hadoop Process Management page.



The sixth step is to check the running results through the WordCount program.

Before you introduce the steps, it is important to understand that Bin/hadoop is a command and can now be used with Bin/dfs.

First create a folder, this is not a normal folder, after the creation is completed you can not see under the directory, you need to use the Hadoop file system instructions to see. Of course, using Eclipse development, you can see the files in the file system. Next blog I will explain how to use Eclispe to build a Hadoop environment.

   Bin/hadoop fs-mkdir-p Input


Copy the README.txt from the Hadoop directory to the DFS new input, or you can copy other files. If you are familiar with vim, you can write it yourself.

   Hadoop fs-copyfromlocal README.txt Input


-copyfromlocal is an operational instruction for the Hadoop file system.

Run WordCount

Hadoop jar Share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.4.0-sources.jar Org.apache.hadoop.examples.WordCount Input Output

You can see the running process.

You can see the results of the run.


When you are finished running, view the word statistics results

Hadoop Fs-cat output/*


The result is:

Before the second run, please delete the output file, otherwise you will get an error.

Use the command:

Hadoop fs-rm-r./output


The attentive reader should have found out that I have been there for such an exception.

Hint WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable

This is because the Hadoop native library is a 32-bit system compiled, on 64-bit systems will have this hint, you need to download Hadoop source code recompile, you can refer to http://stackoverflow.com/questions/ 19943766/hadoop-unable-to-load-native-hadoop-library-for-your-platform-error-on-centos

I compiled the problem or not resolved, and I may have some relationship with the system itself.

Specifically, you can refer to this blog, at the end of the article, the author speaks very clearly

http://dblab.xmu.edu.cn/blog/powerxing/install-hadoop-2-4-1-single-node/







Using hadoop2.4 to build clusters under Ubuntu (Pseudo-distributed)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.