This morning, I helped a new person remotely build a hadoop cluster (1. in versions X or earlier than 0.22), I am deeply touched. Here I will write down the simplest Apache hadoop construction method and provide help to new users. I will try my best to explain it in detail. Click here to view the avatorhadoop construction steps.
1. Environment preparation:
1 ). machine preparation: the target machine must be pinged to each other. Therefore, the virtual machines on different machines must be configured with a "Bridge Connection" (if it is a host, disable the host machine firewall first. For the specific configuration method of the Internet access method, Google vmvair Internet access configuration, KVM bridge Internet access, xen can manually configure the lan ip address during installation, please leave a message); disable the machine firewall:/etc/init. d/iptables stop; chkconfig iptables off; we recommend that you use hadoopservern to modify the Host Name of the machine. N indicates the number of the machine you actually assigned, because if the host name contains '_''. 'and other special symbols will cause startup problems. Modify the/etc/hosts of the machine and add the ing between IP address and hostname.
2). Download and decompress the stable version of hadoop package and configure the Java environment (for Java environment, generally ~ /. Bash_profile, considering Machine security issues );
3). No key. Here is a small trick: On hadoopserver1
Ssh-kengen-t rsa-p'; press ENTER
Ssh-copy-ID user @ host;
Then ~ /. Copy id_rsa and id_rsa.pub under the ssh/directory to other machines;
SSH hadoopserver2; run SCP-R ~ /. Ssh/authorized_keys hadoopserver1 :~ /. SSH/; in this way, all the key-free operations are completed, and mutual SSH can be performed. There are many practical and multi-learning methods. the Internet does not say that hadoop uses ssh-copy-ID for key-free operations to simplify the operation.
2. steps:
1). Modify the following files in the conf directory of hadoop Extract on hadoopserver1 (namenode:
Core-site.xml:
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration><property> <name>fs.default.name</name> <value>hdfs://hadoopserver1:9000</value></property><property> <name>hadoop.tmp.dir</name> <value>/xxx/hadoop-version/tmp</value></property></configuration>
Hdfs-site.xml:
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration> <property> <name>dfs.permissions</name> <value>false</value></property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.name.dir</name> <value>/xxx/hadoop-version/name</value> </property> <property> <name>dfs.data.dir</name> <value>/xxx/hadoop-version/data</value> </property> <property> <name>dfs.block.size</name> <value>670720</value> </property><!--<property> <name>dfs.secondary.http.address</name> <value>0.0.0.0:60090</value> <description> The secondary namenode http server address and port. If the port is 0 then the server will start on a free port. </description></property><property> <name>dfs.datanode.address</name> <value>0.0.0.0:60010</value> <description> The address where the datanode server will listen to. If the port is 0 then the server will start on a free port. </description></property><property> <name>dfs.datanode.http.address</name> <value>0.0.0.0:60075</value> <description> The datanode http server address and port. If the port is 0 then the server will start on a free port. </description></property><property> <name>dfs.datanode.ipc.address</name> <value>0.0.0.0:60020</value> <description> The datanode ipc server address and port. If the port is 0 then the server will start on a free port. </description></property><property> <name>dfs.http.address</name> <value>0.0.0.0:60070</value> <description> The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port. </description></property>--><property> <name>dfs.support.append</name> <value>true</value> <description>Does HDFS allow appends to files? This is currently set to false because there are bugs in the "append code" and is not supported in any prodction cluster. </description></property></configuration>
Mapred-site.xml
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration> <property> <name>mapred.job.tracker</name> <value>hadoopserver1:9001</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>2</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>2</value> </property><!--<property> <name>mapred.job.tracker.http.address</name> <value>0.0.0.0:50030</value> <description> The job tracker http server address and port the server will listen on. If the port is 0 then the server will start on a free port. </description></property><property> <name>mapred.task.tracker.http.address</name> <value>0.0.0.0:60060</value> <description> The task tracker http server address and port. If the port is 0 then the server will start on a free port. </description></property>--></configuration>
The hostname of secondname is entered in the master to inform hadoop to start secondname on this machine;
Slaves indicates the datanode node, with one hostname in one row
2). Modify hadoop-env.sh:
Specify java_home to your Java installation directory
Add a startup environment: Export hadoop_opts = "-djava.net. preferipv4stack = true ". Used to ensure binding of ipv4ip;
3) manual distribution: SCP-r hadoop directory hadoopserver1. .. n:/directory with the same prefix/
4). Start:
Bin/hadooop namenode-format
Bin/start-all.sh
5) Enter http: // hadoopserver1 IP: 50070 in the browser to view the machine status.