Open source framework for distributed computing Introduction to Hadoop practice (II.)

Source: Internet
Author: User
Tags xsl

In fact, see the official Hadoop document has been able to easily configure the distributed framework to run the environment, but since the write a little bit more, at the same time there are some details to note that the fact that these details will let people grope for half a day. Hadoop can run stand-alone, but also can configure the cluster run, single run will not need to say more, just follow the demo running instructions directly to execute the command. The main point here is to talk about the process of running the cluster configuration.

Environment

7 ordinary machines, operating systems are Linux. Memory and CPU will not say, anyway Hadoop is a big feature is the machine in many not fine. JDK must be more than 1.5, this remember. The machine name of 7 machines must be different, the following talks to the machine name for MapReduce has a great impact.

Deployment considerations

As I described above, there are two broad categories of roles for Hadoop clusters: Master and slave, which are primarily configured with the roles of Namenode and Jobtracker, responsible for the execution of distributed data and decomposition tasks for the explorer, The latter configures the roles of Datanode and Tasktracker, responsible for distributed data storage and task execution. I was going to see if a machine could be configured as Master, but also as a slave, However, it was found that there was a conflict between the machine name configuration during Namenode initialization and Tasktracker execution (Namenode and Tasktracker had some conflicts with the hosts configuration, is the machine name corresponding to the IP on the configuration front or the localhost corresponding IP in front of a bit of a problem, but may also be my own problem, this can be based on the implementation of the situation to give me feedback. Finally, the decision of a master, six sets of slave, subsequent complex application development and test results of the comparison will increase the machine configuration.

Implementation steps

The same directory is created on all machines, and the same user can be created to do the installation path of Hadoop for the user's home path. For example, I built/home/wenchu on all the machines.

Download Hadoop, first extract to master. Here I am the version of the downloaded 0.17.1. At this point the installation path of Hadoop is/home/wenchu/hadoop-0.17.1.

After decompression into the Conf directory, the main need to modify the following documents: Hadoop-env.sh,hadoop-site.xml, masters, slaves.

The underlying configuration file for Hadoop is Hadoop-default.xml, and the code for Hadoop can tell that the job will be created by default when a job is created Config,config first read into the Hadoop-default.xml configuration and then read the Hadoop-sit E.xml configuration (This file is initially configured to be empty), hadoop-site.xml the main configuration you need to cover the Hadoop-default.xml system-level configuration, and you need to use in your mapreduce process of the custom configuration (specific some use such as final reference Document).

The following is a simple hadoop-site.xml configuration:

<?xml version= "1.0"
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"
< !--put site-specific property overrides in this file.
<configuration>
<property>
<name>fs.default.name</name>//your namenode configuration   , machine name plus port
<value>hdfs://10.2.224.46:54310/</value>
</property>
<property>
<name>mapred.job.tracker</name>//your jobtracker configuration, machine name plus port
<value>hdfs://10.2.224.46:54311/ </value>
</property>
<property>
<name>dfs.replication</name>//number of data needs to be backed up , the default is three
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir </name>//hadoop default temporary path, this is the best configuration, if the new node or other circumstances inexplicable datanode can not start, delete the TMP directory in this file. However, if this directory is removed from the Namenode machine, then the Namenode formatted command needs to be executed again.
<value>/home/wenchu/hadoop/tmp/</value>
</property>
<property>
<name >maprSome parameters of the Ed.child.java.opts</name>//java virtual machine can be referenced by configuring
<value>-xmx512m</value>
</property
<property>
The size of the <name>dfs.block.size</name>//block, the unit byte, followed by the use, must be a multiple of 512, Because CRC is used for file integrity verification, the default configuration 512 is the smallest unit of checksum.
<value>5120000</value>
<description>the default block size for new FILES.</DESCRIPTION&G T
</property>
</configuration>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.