First, the environment
Operating system: CentOS 6.5 64-bit operating system
Note: Hadoop2.0 above uses the JDK environment is 1.7,linux comes with the JDK to unload, reinstall
Download Address: http://www.oracle.com/technetwork/java/javase/downloads/index.html
Software version: hadoop-2.3.0-cdh5.1.0.tar.gz, zookeeper-3.4.5-cdh5.1.0.tar.gz
Download Address: http://archive.cloudera.com/cdh5/cdh/5/
Start the installation: second, JDK installation
1. Check if you have your own JDK
Rpm-qa | grep JDK
java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.i686
2. Uninstall your own JDK
Yum-y Remove java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.i686
Or:
RPM-E--nodeps java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
3, Installation jdk-7u55-linux-x64.tar.gz
Create the folder Java under the usr/directory, run TAR–ZXVF under the Java folder jdk-7u55-linux-x64.tar.gz
Unzip to the Java directory
[Root@master01 java]# ls
jdk1.7.0_55 Three, configure environment variables
Run Vim/etc/profile
#/etc/profile
# System wide environment and startup programs, for login setup
# Functions and aliases Go IN/ETC/BASHRC
Export java_home=/usr/java/jdk1.7.0_55 export
jre_home=/usr/java/jdk1.7.0_55/jre
export classpath=/usr/ Java/jdk1.7.0_55/lib
Export path= $JAVA _home/bin: $PATH
Save changes, run Source/etc/profile reload environment variable
Run Java-version
[Root@master01 java]# Java-version
Java Version "1.7.0_55"
Java (TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot (TM) 64-bit Server VM (build 24.55-b03, Mixed mode)
JDK Configuration successful four, system configuration
1. Turn off the firewall
Chkconfig iptables off (permanently off)
Configure host name and Hosts file
2, SSH no password authentication configuration
Because Hadoop runs a process that requires remote management of the Hadoop daemon, the Namenode node needs to link each datanode node through SSH (Secure Shell) to stop or start their process, so SSH must be without a password, So we have to make namenode nodes and Datanode nodes into a non-secret communication, the same datanode also need to configure the Namenode node without a password link.
Configure on each machine:
Vi/etc/ssh/sshd_config Open
Rsaauthentication Yes # Enable RSA authentication, pubkeyauthentication Yes # Enable public key private key pairing authentication mode
Master01: Run: ssh-keygen-t rsa-p "Do not enter password direct enter
The default is stored in the/root/.ssh directory,
Cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[Root@master01. ssh]# ls
Authorized_keys Id_rsa id_rsa.pub known_hosts
If you are not installing the root user (this is the default root installation), you need to release the permissions and execute the following command:
chmod 755. SSH
chmod ~/.ssh/authorized_keys
v. Hadoop pseudo-distributed configuration
5.1 Edit file: etc/hadoop/hadoop-env.sh (Note: java_home If you have a value to replace with your own java_home)
# set to the root ofyour Java installation
export Java_home=/usr/java/latest
# assuming your installation director Y is/usr/local/hadoop
Export Hadoop_prefix=/usr/local/hadoop
5.2 Adding Hadoop environment variables
Export Hadoop_home=/usr/local/cdh/hadoop
5.3
Edit File Etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs:// localhost:9000</value>
</property>
</configuration>
Edit etc/hadoop/hdfs-site.xml (/usr/local/cdh/hadoop/data/dfs/name directory must be manually created and reformatted, otherwise error)
<configuration> <property> <!--turn on web hdfs--> <name>dfs.webhdfs.enabled</name>
<value>true</value> </property> <property> <name>dfs.replication</name>
<value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/usr/local/cdh/hadoop/data/dfs/name</value> <description> Namenode Store Name table (fsimage)
Directories (need to be modified) </description> </property> <property> <name>dfs.namenode.edits.dir</name> <value>${dfs.namenode.name.dir}</value> <description>namenode extensive transactionfile (edits) local directory (
Need to be modified) </description> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/local/cdh/hadoop/data/dfs/data</value> <description>datanode Store block local directory (requires modification) &L T;/description> </property> </configuration>
Editor: Etc/hadoop/mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value >yarn</value>
</property>
</configuration>
Editor: Etc/hadoop/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
< value>mapreduce_shuffle</value>
</property>
</configuration>
Six: Start and verify the installation is successful
Format: To format HDFs first:
Bin/hdfs Namenode-format
Start:
sbin/start-dfs.sh
sbin/start-yarn.sh
View process: JPS
7448 ResourceManager
8277 Secondarynamenode
7547 NodeManager
8079 DataNode
7975 NameNode
8401 Jps
1. Open Browser
namenode-http://localhost:50070/
2. Create Folder
3. $bin/hdfs dfs-mkdir/user
$ bin/hdfs dfs-mkdir/user/<username>
4. Copy file
$ bin/hdfs dfs-put etc/hadoop input
5. Run Job
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0-cdh5.1.0.jar grep input Output ' Dfs [A-Z.] + '
6. View outputs
$ bin/hdfs dfs-get output output
$ cat output/*