Forwarding Please indicate this article link
Preparatory work:
Vmware-workstation (network unification is set to bridging)
Xshell or putty (easy to operate under Windows, copy and Paste commands convenient, more recommended to use the first, the future does not need to enter the IP address and account password)
FileZilla (transfer file, port 22, using the SFTP protocol)
Environment for
Centos6.5 X86 Minimal
Hadoop1.2.1
jdk-8u73-linux-i586
First configure pseudo-distributed, run the pseudo-distributed and then upgrade to fully distributed mode.
Note: 192.168.67.57 is the master node host.
The following actions are performed under the root user
I. Creating a directory on a Linux system
Mkdir/opt
Upload the Hadoop-1.2.1-bin.tar and jdk-8u73-linux-i586.rpm to the/OPT directory
Installing Hadoop in the/OPT directory will reduce the hassle
Second, set the static IP
Vi/etc/sysconfig/network-scripts/ifcfg-eth0
Third, close the firewall
Vi/etc/selinux/config setting selinux=disabled
Also, enter the following command
Service Iptables Status--View firewall status
Service iptables stop--Shut down the firewall
Service Ip6tables stop--Shut down the firewall
Chkconfig ip6tables Off--set firewall on self-powered off
Chkconfig iptables Off--set firewall on self-powered off
Chkconfig iptables--list--View Firewall service Status list
Chkconfig ip6tables--list--View Firewall service Status list
#iptables and Ip6tables, are Linux firewall software, the difference is that the TCP/IP protocol used by Ip6tables is IP6.
Iv. Modify the hosts, which is set to master
Vi/etc/hosts add
192.168.67.57 Master
Vi/etc/sysconfig/network
Networking=yes
Hostname=master
V. Increase user group, user
Groupadd Hadoop
Useradd–g Hadoop Hadoop
passwd Hadoop
VI. Install Java and configure the Java environment
I am using the RPM installation package to simplify the installation
RPM-IVH jdk-8u73-linux-i586.rpm
The installation directory is/usr/java/jdk1.8.0_73
Copy the installation directory to make it easy to configure the environment later
Vi/etc/profile
Join at the bottom
Export java_home=/usr/java/jdk1.8.0_73
Export JRE_HOME=/USR/JAVA/JDK1.8.0_73/JRE
Export classpath=.: $JAVA _home/lib: $JAVA _home/jre/lib
Export path= $PATH: $JAVA _home/bin: $JAVA _home/jre/bin
Save the exit and execute the following command to make the configuration effective!
[[Email protected] ~] #chmod +x/etc/profile; increase execution permissions
[[Email protected] ~] #source/etc/profile, make the configuration take effect!
The above actions are done under the root user.
Seven, set up SSH password-free login, switch to Hadoop users
Check for SSH and rsync tools first
Rpm-qa |grep SSH
Rpm-qa |grep rsync (optional, better)
Without the SSH and rsync tools, use the following command to install:
Yum Install SSH #安装SSH
Yum Install Rsync # (rsync is a remote Data Sync tool that allows fast synchronization of files between multiple hosts via Lan/wan)
Service sshd Restart #启动SSH服务
SSH-KEYGEN-T RSA # will be prompted, press ENTER to
Cat id_rsa.pub >> Authorized_keys # Add to license
chmod./authorized_keys # Modify file permissions, or you will not be able to login without password
chmod ~/.ssh #修改目录权限
Viii. installation of Hadoop
Cd/opt #进入opt目录
Tar-zxf/opt/hadoop-1.2.1-bin.tar.gz-c/opt #解压
MV hadoop-1.2.1 Hadoop #重命名
Chown-r hadoop:hadoop Hadoop #更改所属用户, very important
Nine, try a stand-alone mode
The default mode for Hadoop is non-distributed, just configure the Java environment to run without a Hadoop configuration
Cd/opt/hadoop
mkdir./input
CP./etc/hadoop/*.xml./input # Use the configuration file as an input file
./bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep./input./output ' dfs[a-z.] +
Cat./output/*
Hadoop cannot overwrite the result file, so first delete the./output, the next time you run the example above will not error.
Rm-r./output
X. Configuring Hadoop
(1) Vi/etc/profile
Add the following:
# Hadoop Environment Variables
Export Hadoop_home=/opt/hadoop
Export hadoop_install= $HADOOP _home
Export Hadoop_mapred_home= $HADOOP _home
Export Hadoop_common_home= $HADOOP _home
Export Hadoop_hdfs_home= $HADOOP _home
Export Yarn_home= $HADOOP _home
Export path= $PATH: $HADOOP _home/sbin: $HADOOP _home/bin
————————————————————————————————
Enter the configuration folder for Hadoop below
cd/opt/hadoop/conf/
(2) Configuration hadoop-env.sh
VI hadoop-env.sh at the bottom of the add
Export java_home=/usr/java/jdk1.8.0_73
Export Hadoop_home_warn_suppress=1 #我的环境在格式化namenode时出现过Warning: $HADOOP _home is deprecated, this sentence according to the situation
(3) Configuration Core-site.xml
VI Core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
<description>abase for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.67.57:9000</value>
</property>
</configuration>
(4) Configuration Hdfs-site.xml
VI Hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
(5) Configuration Mapred-site.xml
VI Mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>http://192.168.67.57:9001</value>
</property>
</configuration>
(6) Configuration of masters and slaves files (this can be omitted)
Empty all, then add Master's IP address
(7) Formatting
Cd/opt/hadoop
Hadoop Namenode-format
start-all.sh
(8) View running status
JPs
That's how it started.
Xi. Execution of the sample program
Hadoop Basic Operations Command
(1) Listing files (HDFs is a system in the system)
Hadoop Fs-ls/
(2) Turn off Hadoop
stop-all.sh
(3) Create a new folder
Hadoop fs-mkdir/newfile
(4) Adding files to HDFs
is to view the file in FileZilla, I have entered two sentences of English sentence in File1.txt and file2.txt.
Note that the file folder and the files inside are all Hadoop user groups, or they will fail
Hadoop fs-put/opt/hadoop/file/*/newfile
(5) Execute the sample program WordCount
Let's see where the package is.
Ll/opt/hadoop | grep jar
Hadoop Jar/opt/hadoop/hadoop-examples-1.2.1.jar Wordcount/newfile/output/file
/newfile as input file
/output/file for Output file
(6) View execution results
Hadoop fs-cat/output/file/part-r-00000
(7) Other common commands
Download file
Hadoop fs-get/output/file/part-r-00000/home/hadoop/
(8) Deleting files
Hadoop fs-rmr/output/file
Summary of issues
In using this
Environment for
Centos6.5 X86 Minimal
Hadoop1.2.1
jdk-8u73-linux-i586
Before configuring, the configuration I used was
Centos6.5 X64 Minimal
hadoop2.6.0
Jdk-8u101-linux-x64
But we met.
"WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable "
Although the Namenode,datanode is started, the new folder under HDFs fails
Hadoop Fs–mkdir Input
Unable to do anything with HDFs
later replaced by
Centos6.5 X86 Minimal
hadoop2.6.0
Jdk-8u101-linux-x64
Http://f.dataguru.cn/thread-542396-1-1.html
And then, downgrade Hadoop.
Centos6.5 X86 Minimal
Hadoop1.2.1
jdk-8u73-linux-i586
Startup time encountered warning: $HADOOP _home is deprecated., add a command to fix it, mentioned earlier.
————————————————————————————————————————————
Here's how to upgrade it to a fully distributed
One
In the case of a successful pseudo-distributed operation, shut down and clone another three virtual machines. centos6.5 static IP clone out of the words is not Internet, set reference!!!
Modify the IP of each virtual machine, set as static IP, then modify the hostname, modify the hosts, as for/etc/profile no longer configured.
What you need to modify are:
Vi/etc/sysconfig/network-scripts/ifcfg-eth0
Vi/etc/sysconfig/network
Vi/etc/hosts
Second, SSH configuration
Because of cloning, each virtual machine's SSH files are consistent
Only under the Hadoop user, master and slave each other ssh, it is possible.
Third, configure Hadoop
Only masters and slaves needed to be modified.
(1) Under the master node
Vi/opt/hadoop/conf/masters
192.168.67.57 is my master's IP address.
(2) under the master node
Vi/opt/hadoop/conf/slaves
Why is it four IPs, because I later added a slave dynamically when the cluster was running. The number of slave to write the number of IP on the line.
(3) Send the Slaves,masters file to the other three slaves
Scp/opt/hadoop/conf/slaves 192.168.67.58:/opt/hadoop/conf/
Scp/opt/hadoop/conf/masters 192.168.67.58:/opt/hadoop/conf/
(4) Formatting
Before you do the formatting, delete the/opt/hadoop/tmp.
and empty the logs inside the/opt/hadoop/logs.
Iv. dynamic addition of slave nodes
It is recommended to clone the virtual machine directly, and refer to the settings of pseudo-distributed programming when fully distributed to modify the configuration file.
The first step is to modify the basic information of the virtual machine.
What you need to modify are:
Vi/etc/sysconfig/network-scripts/ifcfg-eth0
Vi/etc/sysconfig/network
Vi/etc/hosts
Step Two, SSH, refer to the above
Third, modify the Masters file and slaves file under/opt/hadoop/conf/on Master host and send to all slaves nodes
Fourth step, because the other nodes are running, do not need to format HDFs again
Only need to start the Datanode and Tasktracker processes on the new slave node
hadoop-daemon.sh Start Datanode
hadoop-daemon.sh Start Tasktracker
The JPS can be used to view the running condition, or the newly added node can be viewed via the Web page.
Fourth step, load balancing if necessary
Run start-balancer.sh on the master node for data load Balancing
start-balancer.sh
CentOS 6.5 Installation hadoop1.2.1 experience (from pseudo-distributed to fully distributed)