Hadoop is used for processing big data, with the core of HDFs, Map/reduce. Although the current work does not need to use this, but the technology is not pressure, after a lot of virtual machine to try, and finally will Hadoop2.5.2 the environment to build up smoothly.
First, prepare a centos, change the hostname to master, and add the master corresponding native IP address to the/etc/hosts.
Linux Basic Configuration
Vi/etc/sysconfig/network
#编辑文件里面的HOSTNAME =master
Vi/etc/hosts
#添加
Native IP Address Master
Then turn off the iptables and set the boot not to start.
Service Iptables Stop
Chkconfig iptables off
Reboot the system, then configure SSH login without password. The reason for configuring this is that you can start Hadoop without entering a password.
SSH Login without password
Vi/etc/ssh/sshd_config
#以下4行的注释需要打开
Hostkey/etc/ssh/ssh_host_rsa_key
Rsaauthentication Yes
Pubkeyauthentication Yes
Authorizedkeysfile. Ssh/authorized_keys
#保存 and restart sshd
Service sshd Restart
#生成免登陆秘钥
SSH-KEYGEN-T RSA
#一路回车就行. 2 files will then be generated in the. SSH folder in the current logged in user's home directory.
#进入. SSH directory.
Cat Id_rsa.pub >> Authorized_keys
#现在可以用ssh无密码登陆系统了.
SSH localhost
JDK installation configuration (slightly)
The version used is jdk-7u79-linux-x64.
Installing and configuring Hadoop2.5.2
Upload the downloaded tar.gz package to the environment.
TAR-ZXVF hadoop-2.5.2.tar.gz-c/usr
Vi/etc/profile
#将以下内容放在最后面.
Export java_home=/usr/java/jdk1.7.0_79
Export classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar
Export hadoop_home=/usr/hadoop-2.5.2
Export path= $PATH: $JAVA _home/bin: $HADOOP _home/bin: $HADOOP _home/sbin
Export Hadoop_common_home= $HADOOP _home
Export Hadoop_hdfs_home= $HADOOP _home
Export Hadoop_mapred_home= $HADOOP _home
Export Hadoop_yarn_home= $HADOOP _home
Export hadoop_conf_dir= $HADOOP _home/etc/hadoop
Export hadoop_common_lib_native_dir= $HADOOP _home/lib/native
Export Hadoop_opts=-djava.library.path= $HADOOP _home/lib
#保存, and execute Source/etc/profile
#配置Hadoop
#创建hadoop的name与data目录
Mkdir-p/usr/hdfs/name
Mkdir-p/usr/hdfs/data
Mkdir-p/usr/hdfs/tmp
Cd/usr/hadoop-2.5.2/etc/hadoop
Set the java_home of the following files
Hadoop-env.sh hadoop-yarn.sh
VI Core-site.xml
#在configuration节点里面加入以下配置, notice that IP is changed to native IP
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hdfs/tmp</value>
<description>a base for other temporary directories.</description>
</property>
<!--file System properties-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.1.112:9000</value>
</property>
VI Hdfs-site.xml
#同样在configuration节点里面加入以下配置
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
#从模板复制一份mapred-site.xml
CP Mapred-site.xml.template Mapred-site.xml
VI Mapred-site.xml
#同样在configuration节点里面加入以下配置
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
VI Yarn-site.xml
#同样在configuration节点里面加入以下配置, pay attention to changing the IP address to the cost machine.
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<description>the address of the Applications manager interface in the rm.</description>
<name>yarn.resourcemanager.address</name>
<value>192.168.1.112:18040</value>
</property>
<property>
<description>the address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>192.168.1.112:18030</value>
</property>
<property>
<description>the address of the RM Web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>192.168.1.112:18088</value>
</property>
<property>
<description>the address of the resource tracker interface.</description>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>192.168.1.112:8025</value>
</property>
At this point, the initial Hadoop environment has been configured, and you need to format the Namenode before booting.
Enter the command "Hadoop Namenode-format";
Start command:
start-dfs.sh
start-yarn.sh
Stop command:
stop-dfs.sh
stop-yarn.sh
When the boot is complete, open the browser input http://192.168.1.112:50070 with http://192.168.1.112:18088 to verify the installation.
Test Hadoop
Verify that the installation is correct by running the wordcount that comes with Hadoop.
Go to the Hadoop installation directory and enter the following command.
mkdir Example
CD Example
Edit File1.txt and File2.txt
VI file1.txt
Hello Zhm
Hello Hadoop
Hello CZ
VI File2.txt
Hadoop is OK
Hadoop is Newbee
Hadoop 2.5.2
Cd..
Hadoop Fs-mkdir/data
Hadoop fs-put-f example/file1.txt Example/file2.txt/data
#运行wordcount例子
Hadoop jar./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.5.2-sources.jar Org.apache.hadoop.examples.wordcount/data/output
#查看运行结果
Hadoop fs-cat/output/part-r-00000
#结果如下:
2.5.2 1
CZ 1
Hadoop 4
Hello 3
is 2
Newbee 1
OK 1
ZHM 1
Here, the environment is already configured, and the following is the use of MAVEN to develop Hadoop projects.
In the process of installation, it is necessary to encounter problems. A good search on the internet can usually find the answer you want.
CentOS Configuration Hadoop