Although I have installed a Cloudera CDH cluster (see http://www.cnblogs.com/pojishou/p/6267616.html for a tutorial), I ate too much memory and the given component version is not optional. If only to study the technology, and is a single machine, the memory is small, or it is recommended to install Apache native cluster to play, production is naturally cloudera cluster, unless there is a very powerful operation.
I have 3 virtual machine nodes this time. Each gave 4G, if the host memory 8G, can make 3 2G, should also be OK.
0. installation file preparation
Hadoop 2.7.3:http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
One, virtual machine preparation
Set the IP address, hosts,ssh password-free login, Scp,sudo, shut down the firewall, YUM,NTP time synchronization slightly.
Java installation slightly.
Reference: http://www.cnblogs.com/pojishou/p/6267616.html
Second, Hadoop installation
1. Decompression
tar -zxvf hadoop-2.7. 3. tar. gz-c/opt/program/Ln -s/opt/hadoop-2.7. 3 /opt/hadoop
2. Modify the configuration file
(1), hadoop-env.sh
VI /opt/hadoop/etc/hadoop/hadoop-env. SH Export Java_home=/opt/java
(2), Core-site.xml
Vi/opt/hadoop/etc/hadoop/core-site.xml<Configuration> < Property> <name>Fs.defaultfs</name> <value>hdfs://node00:9000</value> </ Property> < Property> <name>Hadoop.tmp.dir</name> <value>/opt/hadoop/tmp</value> </ Property></Configuration>
(3), Hdfs-site.xml
Vi/opt/hadoop/etc/hadoop/hdfs-site.xml<Configuration> < Property> <name>Dfs.namenode.name.dir</name> <value>/opt/hadoop/data/name</value> </ Property> < Property> <name>Dfs.datanode.data.dir</name> <value>/opt/hadoop/data/data</value> </ Property> < Property> <name>Dfs.replication</name> <value>3</value> </ Property> < Property> <name>Dfs.secondary.http.address</name> <value>node00:50090</value> </ Property></Configuration>
(4), Mapred-site.xml
Vi/opt/hadoop/etc/hadoop/mapred-site.xml<Configuration> < Property> <name>Mapreduce.framework.name</name> <value>Yarn</value> </ Property></Configuration>
(5), Yarn-site.xml
Vi/opt/hadoop/etc/hadoop/yarn-site.xml<Configuration> < Property> <name>Yarn.resourcemanager.hostname</name> <value>Node00</value> </ Property> < Property> <name>Yarn.nodemanager.aux-services</name> <value>Mapreduce_shuffle</value> </ Property></Configuration>
(6), Slaves
Node01node02
3. Initialize HDFs
/opt/hadoop/bin/hadoop Namenode-format
4. Start the cluster
/opt/hadoop/sbin/start-all. SH
5. Testing
/opt/hadoop/bin/hadoop jar/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7. 3 5 Ten
Just find the PI and it's OK.
Apache Hadoop Cluster Offline installation Deployment (i)--hadoop (HDFS, YARN, MR) installation