CentOS5.4 build Hadoop2.5.2 pseudo-distributed environment

Source: Internet
Author: User

Brief introduction:

Hadoop is the primary tool for dealing with big data, and its core is HDFs, MapReduce. For the convenience of learning, I built a pseudo-distributed environment on the virtual machine to carry out the development study.

First, prepare before installation:

1) Linux server: Vmware on CentOS6.4 Mini Installation

2) jdk:jdk-7u65-linux-x64.gz

3) SSH:SSH Client

4) Yum source is properly configured: Yum List view

5) hadoop:hadoop-2.5.2.tar.gz

Second, the Environment configuration

1) Basic Linux environment settings:

Vi/etc/sysconfig/network Edit Hostname=master vi /etc/hosts #添加 本机IP地址   Master#关闭防火墙iptables and set to boot without starting the service iptables stopchkconfig iptables off 2) JDK installation configuration # Extracts the JDK to the specified directory/opt/javamkdir/opt/ JAVATAR-ZXVF jdk-7u65-linux-x64.gz-c/opt/java #配置环境变量vi/etc/profile# Add the following at the end of the file export java_home=/opt/java/ Jdk1.7.0_65export classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jarexport path= $PATH: $JAVA _home/bin# Save exit, take effect immediately this configuration file source/etc/profile# See if Java is configured well java-version

Reboot the system, then configure SSH login without password. The reason for configuring this is that you can start Hadoop without entering a password.

3) Configure SSH login without password

vi /etc/ssh/sshd_config #以下4行的注释需要打开 HostKey  /etc/ssh/ssh_host_rsa_key RSAAuthentication  yes PubkeyAuthentication  yes AuthorizedKeysFile      . ssh /authorized_keys #保存,并重启sshd service sshd restart #生成免登陆秘钥 ssh -keygen -t rsa #一路回车就行。之后会在当前登陆用户主目录中的.ssh文件夹里生成2个文件。 #进入.ssh目录。 cat id_rsa.pub >> authorized_keys #现在可以用ssh无密码登陆系统了。 ssh localhostPS: If there is an error, the Yum List|grep ssh# is not installed by the SSH client to find the SSH client yum install-y SSH client name three, Hadoop installation and configuration 1) upload the downloaded Hadoop installation package (hadoop-2.5.2.tar.gz) to the/tmp directory via FTP 2) unzip the Hadoop installation package into the specified directory MKDIR/OPT/HADOOPTAR-ZXVF Hadoop-2.5.2.tar.gz-c/opt/hadoop3) Configure the environment variable vi/etc/profile# add the following at the end of the file

Export hadoop_home=/usr/hadoop-2.5.2
Export path= $PATH: $JAVA _home/bin: $HADOOP _home/bin: $HADOOP _home/sbin
Export Hadoop_common_home= $HADOOP _home
Export Hadoop_hdfs_home= $HADOOP _home
Export Hadoop_mapred_home= $HADOOP _home
Export Hadoop_yarn_home= $HADOOP _home
Export hadoop_conf_dir= $HADOOP _home/etc/hadoop
Export hadoop_common_lib_native_dir= $HADOOP _home/lib/native
Export Hadoop_opts=-djava.library.path= $HADOOP _home/lib

#保存 and execute SOURCE/ETC/PROFILE#CD $HADOOP _home to see if you can get into HADOOP's HOME directory 4) Configure Hadoop#hadoop configuration file on: $HADOOP _home/etc/hadoop# Hadoop main configuration Five files: hadoop_env.sh,core_site.sh,hdfs_site.xml,mapred-site.xml,yarn-site.xml #创建hadoop的name与data目录 mkdir -p  /usr/hdfs/name mkdir -p  /usr/hdfs/data mkdir -p  /usr/tmp#hadoop_env. Shvi hadoop_env.sh# Configure the directory #core_site for export java_home. Shvi core_site.sh #在configuration节点里面加入以下配置,Master也为为本地ip地址 <property>        <name>hadoop.tmp. dir < /name >        <value> /usr/tmp < /value >        <description>A base  for other temporary directories.< /description >    < /property > <!-- file system properties-->    <property>        <name>fs.defaultFS< /name >        <value>Master :9000< /value >    < /property >#hdfs_site. Xmlvi Hdfs_site.xml #同样在configuration节点里面加入以下配置,配置分片数和namenode和datanode的目录      <property>          <name>dfs.replication< /name >          <value>1< /value >      < /property >      <property>          <name>dfs.namenode.name. dir < /name >          <value> /usr/hdfs/name < /value >      < /property >      <property>          <name>dfs.datanode.data. dir < /name >          <value> /usr/hdfs/data < /value >      < /property >#mapred-site.xml #从模板复制一份mapred-site.xml cp mapred-site.xml.template mapred-site.xml vi mapred-site.xml #同样在configuration节点里面加入以下配置,因为从0.23.0版本开始,就用新的mapreduce框架YARN来代替旧的框架了,所以,这里要配置成yarn <property>    <name>mapreduce.framework.name< /name >    <value>yarn< /value > < /property >#yarn-site.xml vi yarn-site.xml #同样在configuration节点里面加入以下配置,注意使用Master或者本机的ip地址 <property>      <name>yarn.nodemanager.aux-services< /name >      <value>mapreduce_shuffle< /value >    < /property >    <property>      <description>The address of the applications manager interface  in the RM.< /description >      <name>yarn.resourcemanager.address< /name >      <value>Master:18040< /value >    < /property >    <property>      <description>The address of the scheduler interface.< /description >      <name>yarn.resourcemanager.scheduler.address< /name >      <value>Master:18030< /value >    < /property >    <property>      <description>The address of the RM web application.< /description >      <name>yarn.resourcemanager.webapp.address< /name >      <value>Master:18088< /value >    < /property >    <property>      <description>The address of the resource tracker interface.< /description >      <name>yarn.resourcemanager.resource-tracker.address< /name >      <value>Master:8025< /value >    < /property >  至此,就将Hadoop的基本环境配置好了,现在可以开始我们的Hadoop之旅了!Iv. Hadoop startup 1) HDFs format # Before we start Hadoop, we need to start Hadoop with the namenode format Input command: Hadoop Namenode-format 2), and these sh files are $hadoop_ Home/sbin Start command: start-dfs.shstart-yarn.sh Stop command: stop-dfs.shstop-yarn.sh 3) View the boot status # we can see the jps# from the command we have in Java, 6 processes 9293 Jps5762 ResourceManager4652 NameNode5850 NodeManager4907 SecondaryNameNode4733 Datanode So we're starting hadoop up! We can view through the browser, open the browser input http://Master ip:50070 and Http://Master ip:8088 verify the installation. V. Testing Hadoop

#通过运行hadoop自带的wordcount来验证安装是否正确.

#进入hadoop安装的目录, enter the following command. mkdir example cd example#编辑file1. txt and File2.txt vi file1.txtHellozhmsddhello Hadoophello CZ

vifile2.txt

Hadoop is Okhadoop is newbeehadoop 2.5.2# copy fiel1 and file2 to input directory: Data directory cd .. hadoop fs - mkdir /data hadoop fs -put -f example /file1 .txt example /file2 .txt  /data #进入HADOOP_HOME目录,运行wordcount例子 hadoop jar . /share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2 .5.2-sources.jar org.apache.hadoop.examples.WordCount  /data /output #查看运行结果 hadoop fs - cat /output/part-r-00000 #结果如下: 2.5.2   1 cz      1 hadoop  4 hello   3 is      2 newbee  1 ok      1 zhm     1Here, the environment is already configured, and the following is the use of MAVEN to develop Hadoop projects.

CentOS5.4 build Hadoop2.5.2 pseudo-distributed environment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.