hadoop~ Big Data

Last Update:2016-07-11 Source: Internet

Author: User

Tags hdfs dfs hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop is a distributed filesystem (Hadoop distributedfile system) HDFS. Hadoop is a large amount of data that can beDistributed Processingof theSoftwareFramework. Hadoop processes data in a reliable, efficient, and scalable way. Hadoop is reliable because it assumes that compute elements and storage will fail, so it maintains multiple copies of working data, ensuring that the failed nodes are re -Distribution Processing. Hadoop comes withJavathe framework for language writing.

The master node of Hadoop includes the name node, the subordinate name node and the Jobtracker daemon, and the utilities and browsers used to manage the cluster. The slave node includes the Tasktracker and data nodes. The master node includes daemons that provide Hadoop cluster management and orchestration, while the slave nodes include daemons that implement Hadoop file system (HDFS) storage capabilities and MapReduce functionality (data processing capabilities).

Namenode is the primary server in Hadoop, typically HDFS The software that runs on a separate machine in the instance, which manages the file System namespace and access to the files stored in the cluster. One namenode and one secondary namenode can be found in each Hadoop cluster. When an external client sends a request to create a file, NameNode responds with the block identity and the DataNode IP address of the first copy of the block. The NameNode also notifies other DataNode that will receive a copy of the block.

the datanode,h adoop cluster consists of a NameNode and a large number of Datanode. DataNode are usually organized in a rack, with a rackSwitchConnect all the systems together. The DataNode responds to read-write requests from the HDFS client. They also respond to commands from NameNode to create, delete, and copy blocks.

Jobtracker is a master service, and after the software is started, the Jobtracker receives the job, each subtask task that dispatches the job runs on Tasktracker and monitors them, if found run it again with a failed task.

Tasktracker is a slaver service that runs on multiple nodes. Tasktracker actively communicates with Jobtracker, receives jobs, and is responsible for performing each task directly. Tasktracker are required to run on the datanode of HDFs.

NameNode, secondary, NameNode, Jobtracker run on the master node, and on each slave node, deploy a datanode and tasktracker to This slave server runs a data handler that can handle native data as directly as possible.

Server2.example.com 172.25.45.2 (Master)

server3.example.com 172.25.45.3 (slave)

server4.example.com 172.25.45.4 (slave)

server5.example.com 172.25.45.5 (slave)

Configuration for Hadoop Legacy:
Server2,server3,server4 and Server5 add Hadoop users:

Useradd-u, Hadoop

echo Westos | passwd--stdin Hadoop

Server2:

SH Jdk-6u32-linux-x64.bin # #安装JDK

MV jdk1.6.0_32//home/hadoop/java

MV hadoop-1.2.1.tar.gz/home/hadoop/

Su-hadoop

Vim. Bash_profile

Export Java_home=/home/hadoop/javaexport classpath=.: $JAVA _home/lib: $JAVA _home/jre/libexport path= $PATH: $HOME/bin : $JAVA _home/bin

source. bash_profile

Tar zxf hadoop-1.1.2.tar.gz # #配置hadoop单节点

Ln-s hadoop-1.1.2 Hadoop

Cd/home/hadoop/hadoop/conf

Vim hadoop-env.sh

Export Java_home=/home/hadoop/java

Cd..

mkdir input

CP Conf/*.xml input/

Bin/hadoop Jar Hadoop-examples-1.1.2.jar

Bin/hadoop jar hadoop-examples-1.1.2.jar grep input Output ' dfs[a-z. +

CD output/

Cat *

1 dfsadmin

Set master to slave without password login:

Server2:

Su-hadoop

Ssh-keygen

Ssh-copy-id localhost

Ssh-copy-id 172.25.45.3

Ssh-copy-id 172.25.45.4

Cd/home/hadoop/hadoop/conf

Vim Core-site.xml # #指定 Namenode

<property><name>fs.default.name</name><value>hdfs://172.25.45.2:9000</value>< /property>

Vim Mapred-site.xml # #指定 Jobtracker

<configuration><property><name>mapred.job.tracker</name><value>172.25.45.2:9001 </value></property><configuration>

Vim Hdfs-site.xml # #指定文件保存的副本数

<configuration><property><name>dfs.replication</name><value>1</value></ Property><configuration>

Cd..

Bin/hadoop Namenode-format # #格式化成一个新的文件系统

Ls/tmp

Hadoop-hadoop Hsperfdata_hadoop Hsperfdata_root Yum.log

bin/start-dfs.sh # #启动hadoop进程

JPs

650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M00/83/F9/wKioL1eChGvSUA2kAAAqoUUgwMg223.png-wh_500x0-wm_3 -wmp_4-s_258939859.png "title=" 2016-07-07 08_53_37 screen. png "alt=" wkiol1echgvsua2kaaaqouugwmg223.png-wh_50 "/>

bin/start-mapred.sh

JPs

650) this.width=650; "Src=" Http://s3.51cto.com/wyfs02/M02/83/FA/wKiom1eChIiCtb8SAAA4Mz_DRe0395.png-wh_500x0-wm_3 -wmp_4-s_3626137455.png "title=" 2016-07-07 08_53_45 screen. png "alt=" wkiom1echiictb8saaa4mz_dre0395.png-wh_50 "/>

Open in Browser: 172.25.45.2:50030

650) this.width=650; "Src=" Http://s1.51cto.com/wyfs02/M00/83/F9/wKioL1eChKGAvuJMAAE8GFdygZI253.png-wh_500x0-wm_3 -wmp_4-s_888672759.png "title=" 2016-07-07 08_56_05 screen. png "alt=" wkiol1echkgavujmaae8gfdygzi253.png-wh_50 "/>

Open 172.25.45.2:50070

650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M00/83/FA/wKiom1eChLGSqhbrAAByk_RT7jA334.png-wh_500x0-wm_3 -wmp_4-s_675966571.png "title=" 2016-07-07 08_56_19 screen. png "alt=" wkiom1echlgsqhbraabyk_rt7ja334.png-wh_50 "/>

Bin/hadoop fs-put Input Test # #给分布式文件系统考入新建的文件

650) this.width=650; "Src=" Http://s3.51cto.com/wyfs02/M01/83/FA/wKiom1eChYCw-wNPAACbm0z_NQg317.png-wh_500x0-wm_3 -wmp_4-s_1259541688.png "title=" 2016-07-07 09_00_01 screen. png "alt=" wkiom1echycw-wnpaacbm0z_nqg317.png-wh_50 "/>

Bin/hadoop jar Hadoop-examples-1.2.1.jar WordCount Output

At the same time in the Web page

650) this.width=650; "Src=" Http://s3.51cto.com/wyfs02/M00/83/FA/wKiom1eChSKxoBdtAAB6mm5rLmk308.png-wh_500x0-wm_3 -wmp_4-s_3725032956.png "title=" 2016-07-07 09_04_02 screen. png "alt=" wkiom1echskxobdtaab6mm5rlmk308.png-wh_50 "/>

To view uploaded files in a webpage:

Bin/hadoop fs-get Output Test

Cat test/*

RM-FR test/# #删除下载的文件

2. Server2:

Shared File system:

Su-root

Yum Install Nfs-utils-y

/etc/init.d/rpcbind start

/etc/init.d/nfs start

Vim/etc/exports

/home/hadoop * (rw,anonuid=900,anongid=900)

Exportfs-rv

Exportfs-v

Server3 and Server4:

Yum Install Nfs-utils-y

/etc/init.d/rpcbind start

SHOWMOUNT-E 172.25.45.2 # #

Export list for 172.25.45.2:

/home/hadoop *

Mount 172.25.45.2:/home/hadoop/home/hadoop/

650) this.width=650; "Src=" Http://s3.51cto.com/wyfs02/M01/83/FA/wKiom1eChk7iGXtZAAB2cIEwQgw772.png-wh_500x0-wm_3 -wmp_4-s_3035208311.png "title=" 2016-07-08 01_36_40 screen. png "alt=" wkiom1echk7igxtzaab2ciewqgw772.png-wh_50 "/>

650) this.width=650; "Src=" Http://s4.51cto.com/wyfs02/M02/83/F9/wKioL1eChljAQTjGAABzAFy0qnk248.png-wh_500x0-wm_3 -wmp_4-s_1464444331.png "title=" 2016-07-08 01_36_55 screen. png "alt=" wkiol1echljaqtjgaabzafy0qnk248.png-wh_50 "/>

Server2:

Su-hadoop

CD hadoop/conf

Vim Hdfs-site.xml

<configuration><property><name>dfs.replication</name><value>2</value></ Property></configuration>

Vim Slaves # #slave端的ip

172.25.45.3172.25.45.4

Vim Masters # #master端的ip

172.25.45.2

Hint: # #如果还有之前的进程开着, must be closed before formatting, to ensure that JPS no process run

Steps to close a process

bin/stop-all.sh # #执行完之后, sometimes the tasktracker,datanode will open, so close them

bin/hadoop-daemon.sh Stop Tasktracker

bin/hadoop-daemon.sh Stop Datanode

Delete the file in/tmp as a Hadoop user, save the file with no permissions

Su-hadoop

Bin/hadoop Namenode-format

bin/start-dfs.sh

Bin/start-mapred.s

Bin/hadoop fs-put Input Test # # #

Bin/hadoop jar hadoop-examples-1.2.1.jar grep test output ' dfs[a-z. + ' # #

While uploading and opening 172.25.45.2:50030 in the browser, you'll see that you're uploading files.

Su-hadoop

Bin/hadoop Dfsadmin-report

DD If=/dev/zero of=bigfile bs=1m count=200

Bin/hadoop fs-put bigfile Test

Open 172.25.45.2:50070 in the browser

650) this.width=650; "Src=" Http://s4.51cto.com/wyfs02/M02/83/F9/wKioL1eChqezxt3tAAGWuRnNH4M217.png-wh_500x0-wm_3 -wmp_4-s_3490417492.png "title=" 2016-07-08 02_24_12 screen. png "alt=" wkiol1echqezxt3taagwurnnh4m217.png-wh_50 "/>

3. New Server5.example.com 172.25.45.5 as the new slave end:

Su-hadoop

Yum Install Nfs-utils-y

/etc/init.d/rpcbind start

Useradd-u, Hadoop

echo Westos | passwd--stdin Hadoop

Mount 172.25.45.2:/home/hadoop//home/hadoop/

Su-hadoop

Vim Hadoop/conf/slaves

172.25.45.3172.25.45.4172.25.45.5

Cd/home/hadoop/hadoop

bin/hadoop-daemon.sh Start Datanode

bin/hadoop-daemon.sh Start Tasktracker

JPs

Delete a Slave end:

Server2:

Su-hadoop

Cd/home/hadoop/hadoop/conf

Vim Mapred-site.xml

<property><name>dfs.hosts.exclude</name><value>/home/hadoop/hadoop/conf/ Datanode-excludes</value></property>

Vim/home/hadoop/hadoop/conf/datanode-excludes

172.25.45.3 # #删除172.25.45.3 not as Slave end

Cd/home/hadoop/hadoop

Bin/hadoop Dfsadmin-refreshnodes # #刷新节点

Bin/hadoop Dfsadmin-report # #查看节点状态 will find the data on the Server3 transferred to Serve5

on the Server3:

Su-hadoop

bin/stop-all.sh

Cd/home/hadoop/hadoop

bin/hadoop-daemon.sh Stop Tasktracker

bin/hadoop-daemon.sh Stop Datanode

Server2:

Vim/home/hadoop/hadoop/conf/slaves

172.25.45.4

172.25.45.5

4. Configure the new version of Hadoop:

Server2:

Su-hadoop

Cd/home/hadoop

Tar zxf jdk-7u79-linux-x64.tar.gz

Ln-s Jdk1.7.0_79/java

Tar zxf hadoop-2.6.4.tar.gz

Ln-s hadoop-2.6.4 Hadoop

Cd/home/hadoop/hadoop/etc/hadoop

Vim hadoop-env.sh

Export Java_home=/home/hadoop/javaexport Hadoop prefix=/home/hadoop/hadoop

Cd/home/hadoop/hadoop

mkdir INP

CP Etc/hadoop/*.xml Input

TAR-TF Hadoop-native-64-2.6.0.tar

TAR-XF hadoop-native-64-2.6.0.tar-c hadoop/lib/native/

Cd/home/hadoop/hadoop

RM-FR output/

Bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar grep input Output ' dfs[a-z. +

cd/hone/hadoop/hadoop/etc/hadoop/

Vim Slaves

172.25.45.3172.25.45.4

Vim CORE-SITE.XM

<configuration><property><name>fs.defaultfs</name><value>hdfs://172.25.45.2:9000 </value></property></configuration>

Vim Mapred-site.xml

<configuration><property><name>mapred.job.tracker</name><value>172.25.45.2:9001 </value></property><configuration>

Vim Hdfs-site.xml

<configuration><property><name>dfs.replication</name><value>2</value></ Property></configuration>

Cd/home/hadoop/hadoop

Bin/hdfs Namenode-format

sbin/start-dfs.sh

JPs

Bin/hdfs Dfs-mkdir/user/hadoop # #要上传的文件, you must create a new directory before uploading it

Bin/hdfs Dfs-put Input/test

RM-FR input/

Bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar grep test output ' dfs[a-z. +

Bin/hdfs Dfs-cat output/*

1Dfsadmin

Open 172.25.45.2:50070 in the browser

650) this.width=650; "Src=" Http://s1.51cto.com/wyfs02/M00/83/F9/wKioL1eCimSzmD_XAAGhHLjiZ2c249.png-wh_500x0-wm_3 -wmp_4-s_2454728916.png "title=" 2016-07-08 07_55_10 screen. png "alt=" wkiol1ecimszmd_xaaghhljiz2c249.png-wh_50 "/>

hadoop~ Big Data

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More