Hadoop installation and hadoop environment (APACHE) version

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This morning, I helped a new person remotely build a hadoop cluster (1. in versions X or earlier than 0.22), I am deeply touched. Here I will write down the simplest Apache hadoop construction method and provide help to new users. I will try my best to explain it in detail. Click here to view the avatorhadoop construction steps.

1. Environment preparation:

1 ). machine preparation: the target machine must be pinged to each other. Therefore, the virtual machines on different machines must be configured with a "Bridge Connection" (if it is a host, disable the host machine firewall first. For the specific configuration method of the Internet access method, Google vmvair Internet access configuration, KVM bridge Internet access, xen can manually configure the lan ip address during installation, please leave a message); disable the machine firewall:/etc/init. d/iptables stop; chkconfig iptables off; we recommend that you use hadoopservern to modify the Host Name of the machine. N indicates the number of the machine you actually assigned, because if the host name contains '_''. 'and other special symbols will cause startup problems. Modify the/etc/hosts of the machine and add the ing between IP address and hostname.

2). Download and decompress the stable version of hadoop package and configure the Java environment (for Java environment, generally ~ /. Bash_profile, considering Machine security issues );

3). No key. Here is a small trick: On hadoopserver1

Ssh-kengen-t rsa-p'; press ENTER

Ssh-copy-ID user @ host;

Then ~ /. Copy id_rsa and id_rsa.pub under the ssh/directory to other machines;

SSH hadoopserver2; run SCP-R ~ /. Ssh/authorized_keys hadoopserver1 :~ /. SSH/; in this way, all the key-free operations are completed, and mutual SSH can be performed. There are many practical and multi-learning methods. the Internet does not say that hadoop uses ssh-copy-ID for key-free operations to simplify the operation.

2. steps:

1). Modify the following files in the conf directory of hadoop Extract on hadoopserver1 (namenode:

Core-site.xml:

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration><property>          <name>fs.default.name</name>          <value>hdfs://hadoopserver1:9000</value></property><property>          <name>hadoop.tmp.dir</name>          <value>/xxx/hadoop-version/tmp</value></property></configuration>

Hdfs-site.xml:

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration>    <property>  <name>dfs.permissions</name>  <value>false</value></property>        <property>          <name>dfs.replication</name>          <value>3</value>        </property>        <property>          <name>dfs.name.dir</name>          <value>/xxx/hadoop-version/name</value>        </property>        <property>          <name>dfs.data.dir</name>          <value>/xxx/hadoop-version/data</value>        </property>        <property>          <name>dfs.block.size</name>          <value>670720</value>        </property><!--<property>  <name>dfs.secondary.http.address</name>  <value>0.0.0.0:60090</value>  <description>    The secondary namenode http server address and port.    If the port is 0 then the server will start on a free port.  </description></property><property>  <name>dfs.datanode.address</name>  <value>0.0.0.0:60010</value>  <description>    The address where the datanode server will listen to.    If the port is 0 then the server will start on a free port.  </description></property><property>  <name>dfs.datanode.http.address</name>  <value>0.0.0.0:60075</value>  <description>    The datanode http server address and port.    If the port is 0 then the server will start on a free port.  </description></property><property>  <name>dfs.datanode.ipc.address</name>  <value>0.0.0.0:60020</value>  <description>    The datanode ipc server address and port.    If the port is 0 then the server will start on a free port.  </description></property><property>  <name>dfs.http.address</name>  <value>0.0.0.0:60070</value>  <description>    The address and the base port where the dfs namenode web ui will listen on.    If the port is 0 then the server will start on a free port.  </description></property>--><property>  <name>dfs.support.append</name>  <value>true</value>  <description>Does HDFS allow appends to files?               This is currently set to false because there are bugs in the               "append code" and is not supported in any prodction cluster.  </description></property></configuration>

Mapred-site.xml

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration>        <property>          <name>mapred.job.tracker</name>          <value>hadoopserver1:9001</value>        </property>        <property>          <name>mapred.tasktracker.map.tasks.maximum</name>          <value>2</value>        </property>        <property>          <name>mapred.tasktracker.reduce.tasks.maximum</name>          <value>2</value>        </property><!--<property>      <name>mapred.job.tracker.http.address</name>  <value>0.0.0.0:50030</value>  <description>    The job tracker http server address and port the server will listen on.    If the port is 0 then the server will start on a free port.  </description></property><property>  <name>mapred.task.tracker.http.address</name>  <value>0.0.0.0:60060</value>  <description>    The task tracker http server address and port.    If the port is 0 then the server will start on a free port.  </description></property>--></configuration>

The hostname of secondname is entered in the master to inform hadoop to start secondname on this machine;

Slaves indicates the datanode node, with one hostname in one row

2). Modify hadoop-env.sh:

Specify java_home to your Java installation directory

Add a startup environment: Export hadoop_opts = "-djava.net. preferipv4stack = true ". Used to ensure binding of ipv4ip;

3) manual distribution: SCP-r hadoop directory hadoopserver1. .. n:/directory with the same prefix/

4). Start:

Bin/hadooop namenode-format

Bin/start-all.sh

5) Enter http: // hadoopserver1 IP: 50070 in the browser to view the machine status.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop installation and hadoop environment (APACHE) version

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop installation and hadoop environment (APACHE) version

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support