CentOS Configuration Hadoop

Source: Internet
Author: User

Hadoop is used for processing big data, with the core of HDFs, Map/reduce. Although the current work does not need to use this, but the technology is not pressure, after a lot of virtual machine to try, and finally will Hadoop2.5.2 the environment to build up smoothly.

First, prepare a centos, change the hostname to master, and add the master corresponding native IP address to the/etc/hosts.

Linux Basic Configuration

Vi/etc/sysconfig/network
#编辑文件里面的HOSTNAME =master
Vi/etc/hosts
#添加
Native IP Address Master
Then turn off the iptables and set the boot not to start.

Service Iptables Stop
Chkconfig iptables off
Reboot the system, then configure SSH login without password. The reason for configuring this is that you can start Hadoop without entering a password.

SSH Login without password

Vi/etc/ssh/sshd_config
#以下4行的注释需要打开
Hostkey/etc/ssh/ssh_host_rsa_key
Rsaauthentication Yes
Pubkeyauthentication Yes
Authorizedkeysfile. Ssh/authorized_keys

#保存 and restart sshd

Service sshd Restart

#生成免登陆秘钥
SSH-KEYGEN-T RSA
#一路回车就行. 2 files will then be generated in the. SSH folder in the current logged in user's home directory.
#进入. SSH directory.
Cat Id_rsa.pub >> Authorized_keys

#现在可以用ssh无密码登陆系统了.
SSH localhost
JDK installation configuration (slightly)

The version used is jdk-7u79-linux-x64.

Installing and configuring Hadoop2.5.2
Upload the downloaded tar.gz package to the environment.

TAR-ZXVF hadoop-2.5.2.tar.gz-c/usr

Vi/etc/profile

#将以下内容放在最后面.
Export java_home=/usr/java/jdk1.7.0_79
Export classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar
Export hadoop_home=/usr/hadoop-2.5.2
Export path= $PATH: $JAVA _home/bin: $HADOOP _home/bin: $HADOOP _home/sbin
Export Hadoop_common_home= $HADOOP _home
Export Hadoop_hdfs_home= $HADOOP _home
Export Hadoop_mapred_home= $HADOOP _home
Export Hadoop_yarn_home= $HADOOP _home
Export hadoop_conf_dir= $HADOOP _home/etc/hadoop
Export hadoop_common_lib_native_dir= $HADOOP _home/lib/native
Export Hadoop_opts=-djava.library.path= $HADOOP _home/lib


#保存, and execute Source/etc/profile



#配置Hadoop
#创建hadoop的name与data目录
Mkdir-p/usr/hdfs/name
Mkdir-p/usr/hdfs/data
Mkdir-p/usr/hdfs/tmp

Cd/usr/hadoop-2.5.2/etc/hadoop
Set the java_home of the following files
Hadoop-env.sh hadoop-yarn.sh


VI Core-site.xml
#在configuration节点里面加入以下配置, notice that IP is changed to native IP
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hdfs/tmp</value>
<description>a base for other temporary directories.</description>
</property>
<!--file System properties-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.1.112:9000</value>
</property>



VI Hdfs-site.xml
#同样在configuration节点里面加入以下配置
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>


#从模板复制一份mapred-site.xml
CP Mapred-site.xml.template Mapred-site.xml
VI Mapred-site.xml
#同样在configuration节点里面加入以下配置
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>


VI Yarn-site.xml
#同样在configuration节点里面加入以下配置, pay attention to changing the IP address to the cost machine.
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<description>the address of the Applications manager interface in the rm.</description>
<name>yarn.resourcemanager.address</name>
<value>192.168.1.112:18040</value>
</property>
<property>
<description>the address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>192.168.1.112:18030</value>
</property>
<property>
<description>the address of the RM Web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>192.168.1.112:18088</value>
</property>
<property>
<description>the address of the resource tracker interface.</description>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>192.168.1.112:8025</value>
</property>
At this point, the initial Hadoop environment has been configured, and you need to format the Namenode before booting.

Enter the command "Hadoop Namenode-format";

Start command:

start-dfs.sh

start-yarn.sh

Stop command:

stop-dfs.sh

stop-yarn.sh

When the boot is complete, open the browser input http://192.168.1.112:50070 with http://192.168.1.112:18088 to verify the installation.



Test Hadoop

Verify that the installation is correct by running the wordcount that comes with Hadoop.

Go to the Hadoop installation directory and enter the following command.

mkdir Example
CD Example


Edit File1.txt and File2.txt

VI file1.txt
Hello Zhm

Hello Hadoop

Hello CZ


VI File2.txt
Hadoop is OK

Hadoop is Newbee

Hadoop 2.5.2

Cd..
Hadoop Fs-mkdir/data
Hadoop fs-put-f example/file1.txt Example/file2.txt/data
#运行wordcount例子
Hadoop jar./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.5.2-sources.jar Org.apache.hadoop.examples.wordcount/data/output
#查看运行结果
Hadoop fs-cat/output/part-r-00000
#结果如下:
2.5.2 1
CZ 1
Hadoop 4
Hello 3
is 2
Newbee 1
OK 1
ZHM 1
Here, the environment is already configured, and the following is the use of MAVEN to develop Hadoop projects.

In the process of installation, it is necessary to encounter problems. A good search on the internet can usually find the answer you want.

CentOS Configuration Hadoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.