Brief introduction:
Hadoop is the primary tool for dealing with big data, and its core is HDFs, MapReduce. For the convenience of learning, I built a pseudo-distributed environment on the virtual machine to carry out the development study.
First, prepare before installation:
1) Linux server: Vmware on CentOS6.4 Mini Installation
2) jdk:jdk-7u65-linux-x64.gz
3) SSH:SSH Client
4) Yum source is properly configured: Yum List view
5) hadoop:hadoop-2.5.2.tar.gz
Second, the Environment configuration
1) Basic Linux environment settings:
Vi/etc/sysconfig/network Edit Hostname=master
vi
/etc/hosts
#添加
本机IP地址 Master
#关闭防火墙iptables and set to boot without starting the service iptables stopchkconfig iptables off 2) JDK installation configuration # Extracts the JDK to the specified directory/opt/javamkdir/opt/ JAVATAR-ZXVF jdk-7u65-linux-x64.gz-c/opt/java #配置环境变量vi/etc/profile# Add the following at the end of the file export java_home=/opt/java/ Jdk1.7.0_65export classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jarexport path= $PATH: $JAVA _home/bin# Save exit, take effect immediately this configuration file source/etc/profile# See if Java is configured well java-version
Reboot the system, then configure SSH login without password. The reason for configuring this is that you can start Hadoop without entering a password.
3) Configure SSH login without password
vi
/etc/ssh/sshd_config
#以下4行的注释需要打开
HostKey
/etc/ssh/ssh_host_rsa_key
RSAAuthentication
yes
PubkeyAuthentication
yes
AuthorizedKeysFile .
ssh
/authorized_keys
#保存,并重启sshd
service sshd restart
#生成免登陆秘钥
ssh
-keygen -t rsa
#一路回车就行。之后会在当前登陆用户主目录中的.ssh文件夹里生成2个文件。
#进入.ssh目录。
cat
id_rsa.pub >> authorized_keys
#现在可以用ssh无密码登陆系统了。
ssh
localhost
PS: If there is an error, the Yum List|grep ssh# is not installed by the SSH client to find the SSH client yum install-y SSH client name three, Hadoop installation and configuration 1) upload the downloaded Hadoop installation package (hadoop-2.5.2.tar.gz) to the/tmp directory via FTP 2) unzip the Hadoop installation package into the specified directory MKDIR/OPT/HADOOPTAR-ZXVF Hadoop-2.5.2.tar.gz-c/opt/hadoop3) Configure the environment variable vi/etc/profile# add the following at the end of the file
Export hadoop_home=/usr/hadoop-2.5.2
Export path= $PATH: $JAVA _home/bin: $HADOOP _home/bin: $HADOOP _home/sbin
Export Hadoop_common_home= $HADOOP _home
Export Hadoop_hdfs_home= $HADOOP _home
Export Hadoop_mapred_home= $HADOOP _home
Export Hadoop_yarn_home= $HADOOP _home
Export hadoop_conf_dir= $HADOOP _home/etc/hadoop
Export hadoop_common_lib_native_dir= $HADOOP _home/lib/native
Export Hadoop_opts=-djava.library.path= $HADOOP _home/lib
#保存 and execute SOURCE/ETC/PROFILE#CD $HADOOP _home to see if you can get into HADOOP's HOME directory 4) Configure Hadoop#hadoop configuration file on: $HADOOP _home/etc/hadoop# Hadoop main configuration Five files: hadoop_env.sh,core_site.sh,hdfs_site.xml,mapred-site.xml,yarn-site.xml
#创建hadoop的name与data目录
mkdir
-p
/usr/hdfs/name
mkdir
-p
/usr/hdfs/data
mkdir
-p
/usr/tmp
#hadoop_env. Shvi hadoop_env.sh# Configure the directory #core_site for export java_home. Shvi core_site.sh
#在configuration节点里面加入以下配置,Master也为为本地ip地址
<property>
<name>hadoop.tmp.
dir
<
/name
>
<value>
/usr/tmp
<
/value
>
<description>A base
for
other temporary directories.<
/description
>
<
/property
>
<!--
file
system properties-->
<property>
<name>fs.defaultFS<
/name
>
<value>Master
:9000<
/value
>
<
/property
>
#hdfs_site. Xmlvi Hdfs_site.xml
#同样在configuration节点里面加入以下配置,配置分片数和namenode和datanode的目录
<property>
<name>dfs.replication<
/name
>
<value>1<
/value
>
<
/property
>
<property>
<name>dfs.namenode.name.
dir
<
/name
>
<value>
/usr/hdfs/name
<
/value
>
<
/property
>
<property>
<name>dfs.datanode.data.
dir
<
/name
>
<value>
/usr/hdfs/data
<
/value
>
<
/property
>
#mapred-site.xml
#从模板复制一份mapred-site.xml
cp
mapred-site.xml.template mapred-site.xml
vi
mapred-site.xml
#同样在configuration节点里面加入以下配置,因为从0.23.0版本开始,就用新的mapreduce框架YARN来代替旧的框架了,所以,这里要配置成yarn
<property>
<name>mapreduce.framework.name<
/name
>
<value>yarn<
/value
>
<
/property
>
#yarn-site.xml
vi
yarn-site.xml
#同样在configuration节点里面加入以下配置,注意使用Master或者本机的ip地址
<property>
<name>yarn.nodemanager.aux-services<
/name
>
<value>mapreduce_shuffle<
/value
>
<
/property
>
<property>
<description>The address of the applications manager interface
in
the RM.<
/description
>
<name>yarn.resourcemanager.address<
/name
>
<value>Master:18040<
/value
>
<
/property
>
<property>
<description>The address of the scheduler interface.<
/description
>
<name>yarn.resourcemanager.scheduler.address<
/name
>
<value>Master:18030<
/value
>
<
/property
>
<property>
<description>The address of the RM web application.<
/description
>
<name>yarn.resourcemanager.webapp.address<
/name
>
<value>Master:18088<
/value
>
<
/property
>
<property>
<description>The address of the resource tracker interface.<
/description
>
<name>yarn.resourcemanager.resource-tracker.address<
/name
>
<value>Master:8025<
/value
>
<
/property
>
至此,就将Hadoop的基本环境配置好了,现在可以开始我们的Hadoop之旅了!
Iv. Hadoop startup 1) HDFs format # Before we start Hadoop, we need to start Hadoop with the namenode format Input command: Hadoop Namenode-format 2), and these sh files are $hadoop_ Home/sbin Start command: start-dfs.shstart-yarn.sh Stop command: stop-dfs.shstop-yarn.sh 3) View the boot status # we can see the jps# from the command we have in Java, 6 processes 9293 Jps5762 ResourceManager4652 NameNode5850 NodeManager4907 SecondaryNameNode4733 Datanode So we're starting hadoop up! We can view through the browser, open the browser input http://Master ip:50070 and Http://Master ip:8088 verify the installation. V. Testing Hadoop
#通过运行hadoop自带的wordcount来验证安装是否正确.
#进入hadoop安装的目录, enter the following command.
mkdir
example
cd
example
#编辑file1. txt and File2.txt
vi
file1.txt
Hellozhmsddhello Hadoophello CZ
vi
file2.txt
Hadoop is Okhadoop is newbeehadoop 2.5.2# copy fiel1 and file2 to input directory: Data directory
cd
..
hadoop fs -
mkdir
/data
hadoop fs -put -f example
/file1
.txt example
/file2
.txt
/data
#进入HADOOP_HOME目录,运行wordcount例子
hadoop jar .
/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2
.5.2-sources.jar org.apache.hadoop.examples.WordCount
/data
/output
#查看运行结果
hadoop fs -
cat
/output/part-r-00000
#结果如下:
2.5.2 1
cz 1
hadoop 4
hello 3
is 2
newbee 1
ok 1
zhm 1
Here, the environment is already configured, and the following is the use of MAVEN to develop Hadoop projects.
CentOS5.4 build Hadoop2.5.2 pseudo-distributed environment