CentOS 6.5 Installation hadoop1.2.1 experience (from pseudo-distributed to fully distributed)

Last Update:2016-08-12 Source: Internet

Author: User

Tags filezilla rsync hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Forwarding Please indicate this article link

Preparatory work:

Vmware-workstation (network unification is set to bridging)

Xshell or putty (easy to operate under Windows, copy and Paste commands convenient, more recommended to use the first, the future does not need to enter the IP address and account password)

FileZilla (transfer file, port 22, using the SFTP protocol)

Environment for

Centos6.5 X86 Minimal

Hadoop1.2.1

jdk-8u73-linux-i586

First configure pseudo-distributed, run the pseudo-distributed and then upgrade to fully distributed mode.

Note: 192.168.67.57 is the master node host.

The following actions are performed under the root user

I. Creating a directory on a Linux system

Mkdir/opt

Upload the Hadoop-1.2.1-bin.tar and jdk-8u73-linux-i586.rpm to the/OPT directory

Installing Hadoop in the/OPT directory will reduce the hassle

Second, set the static IP

Vi/etc/sysconfig/network-scripts/ifcfg-eth0

Third, close the firewall

Vi/etc/selinux/config setting selinux=disabled

Also, enter the following command

Service Iptables Status--View firewall status

Service iptables stop--Shut down the firewall

Service Ip6tables stop--Shut down the firewall

Chkconfig ip6tables Off--set firewall on self-powered off

Chkconfig iptables Off--set firewall on self-powered off

Chkconfig iptables--list--View Firewall service Status list

Chkconfig ip6tables--list--View Firewall service Status list

#iptables and Ip6tables, are Linux firewall software, the difference is that the TCP/IP protocol used by Ip6tables is IP6.

Iv. Modify the hosts, which is set to master

Vi/etc/hosts add

192.168.67.57 Master

Vi/etc/sysconfig/network

Networking=yes

Hostname=master

V. Increase user group, user

Groupadd Hadoop

Useradd–g Hadoop Hadoop

passwd Hadoop

VI. Install Java and configure the Java environment

I am using the RPM installation package to simplify the installation

RPM-IVH jdk-8u73-linux-i586.rpm

The installation directory is/usr/java/jdk1.8.0_73

Copy the installation directory to make it easy to configure the environment later

Vi/etc/profile

Join at the bottom

Export java_home=/usr/java/jdk1.8.0_73

Export JRE_HOME=/USR/JAVA/JDK1.8.0_73/JRE

Export classpath=.: $JAVA _home/lib: $JAVA _home/jre/lib

Export path= $PATH: $JAVA _home/bin: $JAVA _home/jre/bin

Save the exit and execute the following command to make the configuration effective!

[[Email protected] ~] #chmod +x/etc/profile; increase execution permissions

[[Email protected] ~] #source/etc/profile, make the configuration take effect!

The above actions are done under the root user.

Seven, set up SSH password-free login, switch to Hadoop users

Check for SSH and rsync tools first

Rpm-qa |grep SSH

Rpm-qa |grep rsync (optional, better)

Without the SSH and rsync tools, use the following command to install:

Yum Install SSH #安装SSH

Yum Install Rsync # (rsync is a remote Data Sync tool that allows fast synchronization of files between multiple hosts via Lan/wan)

Service sshd Restart #启动SSH服务

SSH-KEYGEN-T RSA # will be prompted, press ENTER to

Cat id_rsa.pub >> Authorized_keys # Add to license

chmod./authorized_keys # Modify file permissions, or you will not be able to login without password

chmod ~/.ssh #修改目录权限

Viii. installation of Hadoop

Cd/opt #进入opt目录

Tar-zxf/opt/hadoop-1.2.1-bin.tar.gz-c/opt #解压

MV hadoop-1.2.1 Hadoop #重命名

Chown-r hadoop:hadoop Hadoop #更改所属用户, very important

Nine, try a stand-alone mode

The default mode for Hadoop is non-distributed, just configure the Java environment to run without a Hadoop configuration

Cd/opt/hadoop

mkdir./input

CP./etc/hadoop/*.xml./input # Use the configuration file as an input file

./bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep./input./output ' dfs[a-z.] +

Cat./output/*

Hadoop cannot overwrite the result file, so first delete the./output, the next time you run the example above will not error.

Rm-r./output

X. Configuring Hadoop

(1) Vi/etc/profile

Add the following:

# Hadoop Environment Variables

Export Hadoop_home=/opt/hadoop

Export hadoop_install= $HADOOP _home

Export Hadoop_mapred_home= $HADOOP _home

Export Hadoop_common_home= $HADOOP _home

Export Hadoop_hdfs_home= $HADOOP _home

Export Yarn_home= $HADOOP _home

Export path= $PATH: $HADOOP _home/sbin: $HADOOP _home/bin

————————————————————————————————

Enter the configuration folder for Hadoop below

cd/opt/hadoop/conf/

(2) Configuration hadoop-env.sh

VI hadoop-env.sh at the bottom of the add

Export java_home=/usr/java/jdk1.8.0_73

Export Hadoop_home_warn_suppress=1 #我的环境在格式化namenode时出现过Warning: $HADOOP _home is deprecated, this sentence according to the situation

(3) Configuration Core-site.xml

VI Core-site.xml

<name>hadoop.tmp.dir</name>

<value>/opt/hadoop/tmp</value>

<description>abase for other temporary directories.</description>

</property>

<name>fs.default.name</name>

</property>

</configuration>

(4) Configuration Hdfs-site.xml

VI Hdfs-site.xml

<name>dfs.replication</name>

</property>

</configuration>

(5) Configuration Mapred-site.xml

VI Mapred-site.xml

<name>mapred.job.tracker</name>

</property>

</configuration>

(6) Configuration of masters and slaves files (this can be omitted)

Empty all, then add Master's IP address

(7) Formatting

Cd/opt/hadoop

Hadoop Namenode-format

start-all.sh

(8) View running status

JPs

That's how it started.

Xi. Execution of the sample program

Hadoop Basic Operations Command

(1) Listing files (HDFs is a system in the system)

Hadoop Fs-ls/

(2) Turn off Hadoop

stop-all.sh

(3) Create a new folder

Hadoop fs-mkdir/newfile

(4) Adding files to HDFs

is to view the file in FileZilla, I have entered two sentences of English sentence in File1.txt and file2.txt.

Note that the file folder and the files inside are all Hadoop user groups, or they will fail

Hadoop fs-put/opt/hadoop/file/*/newfile

(5) Execute the sample program WordCount

Let's see where the package is.

Ll/opt/hadoop | grep jar

Hadoop Jar/opt/hadoop/hadoop-examples-1.2.1.jar Wordcount/newfile/output/file

/newfile as input file

/output/file for Output file

(6) View execution results

Hadoop fs-cat/output/file/part-r-00000

(7) Other common commands

Download file

Hadoop fs-get/output/file/part-r-00000/home/hadoop/

(8) Deleting files

Hadoop fs-rmr/output/file

Summary of issues

In using this

Environment for

Centos6.5 X86 Minimal

Hadoop1.2.1

jdk-8u73-linux-i586

Before configuring, the configuration I used was

Centos6.5 X64 Minimal

hadoop2.6.0

Jdk-8u101-linux-x64

But we met.

"WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable "

Although the Namenode,datanode is started, the new folder under HDFs fails

Hadoop Fs–mkdir Input

Unable to do anything with HDFs

later replaced by

Centos6.5 X86 Minimal

hadoop2.6.0

Jdk-8u101-linux-x64

Http://f.dataguru.cn/thread-542396-1-1.html

And then, downgrade Hadoop.

Centos6.5 X86 Minimal

Hadoop1.2.1

jdk-8u73-linux-i586

Startup time encountered warning: $HADOOP _home is deprecated., add a command to fix it, mentioned earlier.

————————————————————————————————————————————

Here's how to upgrade it to a fully distributed

One

In the case of a successful pseudo-distributed operation, shut down and clone another three virtual machines. centos6.5 static IP clone out of the words is not Internet, set reference!!!

Modify the IP of each virtual machine, set as static IP, then modify the hostname, modify the hosts, as for/etc/profile no longer configured.

What you need to modify are:

Vi/etc/sysconfig/network-scripts/ifcfg-eth0

Vi/etc/sysconfig/network

Vi/etc/hosts

Second, SSH configuration

Because of cloning, each virtual machine's SSH files are consistent

Only under the Hadoop user, master and slave each other ssh, it is possible.

Third, configure Hadoop

Only masters and slaves needed to be modified.

(1) Under the master node

Vi/opt/hadoop/conf/masters

192.168.67.57 is my master's IP address.

(2) under the master node

Vi/opt/hadoop/conf/slaves

Why is it four IPs, because I later added a slave dynamically when the cluster was running. The number of slave to write the number of IP on the line.

(3) Send the Slaves,masters file to the other three slaves

Scp/opt/hadoop/conf/slaves 192.168.67.58:/opt/hadoop/conf/

Scp/opt/hadoop/conf/masters 192.168.67.58:/opt/hadoop/conf/

(4) Formatting

Before you do the formatting, delete the/opt/hadoop/tmp.

and empty the logs inside the/opt/hadoop/logs.

Iv. dynamic addition of slave nodes

It is recommended to clone the virtual machine directly, and refer to the settings of pseudo-distributed programming when fully distributed to modify the configuration file.

The first step is to modify the basic information of the virtual machine.

What you need to modify are:

Vi/etc/sysconfig/network-scripts/ifcfg-eth0

Vi/etc/sysconfig/network

Vi/etc/hosts

Step Two, SSH, refer to the above

Third, modify the Masters file and slaves file under/opt/hadoop/conf/on Master host and send to all slaves nodes

Fourth step, because the other nodes are running, do not need to format HDFs again

Only need to start the Datanode and Tasktracker processes on the new slave node

hadoop-daemon.sh Start Datanode

hadoop-daemon.sh Start Tasktracker

The JPS can be used to view the running condition, or the newly added node can be viewed via the Web page.

Fourth step, load balancing if necessary

Run start-balancer.sh on the master node for data load Balancing

start-balancer.sh

CentOS 6.5 Installation hadoop1.2.1 experience (from pseudo-distributed to fully distributed)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More