The "three-step" process of the Hadoop pseudo-distribution environment
First, JDK installation and environment variable configuration
1, test first, whether the JDK is installed
Java-version
2. View the number of CentOS positions
File/bin/ls
3. Switch to usr/, create java/directory
CD/
Ls
CD usr/
mkdir Java
CD java/
Ls
4, upload local download good, show upload command is not installed
Rz
5, download RZ, sz command
Yum-y Install Lrzsz
6, upload the local download good jdk-7u79-linux-x64.tar.gz
7, check whether, last success! Jdk-7u79-linux-x64.tar.gz
Rz
8, the installation package jdk-7u79-linux-x64.tar.gz, decompression.
Tar zxvf jdk-7u79-linux-x64.tar.gz
Decompression Complete!
9, the JDK environment variable configuration
Cd/etc/profile.d
Ls
VI java.sh
java_home=/usr/java/jdk1.7.0_79
Path= $JAVA _home/bin: $PATH
Classpath= $JAVA _home/lib: $CLASSPATH
Export Java_home PATH CLASSPATH
10. Use source to make the new JDK environment variable configuration file effective.
source/etc/profile.d/java.sh
11, configure the Hosts file, add the relationship between hostname and IP, convenient for future visits.
12, first, ifconfig view IP
13, Vi/etc/hosts, to modify the Hosts file
To: Wq save exit.
14. Prepare Hadoop dedicated users and groups
Groupadd Hadoop This is the creation of a Hadoop user group
Useradd-g Hadoop Hadoop This is a new Hadoop user and added to the Hadoop group
passwd Hadoop Hadoop user password for Hadoop
15, because Hadoop uses a lot of ports, it is recommended to turn off the firewall to avoid unnecessary problems, the production environment can be the corresponding port security control. Need to shut down the firewall first
Centos6.5 comes with a firewall that is iptables firewall, no firewall firewall.
Service Iptables Status
16. Then you need to use a command to turn off the firewall. There are two ways of doing this:
Chkconfig iptables off This method is permanently closed and will not be restored after the system restarts.
Service Iptables Stop This method is immediate and will be restored after the system restarts.
Second, SSH configuration
1 Configure SSH to implement no password authentication configuration, first switch to the Hadoop user you just created.
Since Hadoop requires no password login as Datanode node, and because the current node is both Namenode and datanode because of the deployment of a single node, SSH login with no password is required at this time. Here's how:
Su Hadoop
Cd
2. Create the. SSH directory, generate the key
mkdir. SSH
SSH-KEYGEN-T RSA
3. Switch to the. SSH directory to view the public and private keys
CD. SSH
Ls
4. Copy the public key into the log file. To see if replication succeeded
CP Id_rsa.pub Authorized_keys
Ls
5. View the contents of the diary file
VI Authorized_keys
6, back to/home/hadoop/, to give permission
Cd..
chmod. SSH assigns the. SSH folder permissions to 700
chmod. ssh/* gives 600 permissions to files inside the. SSH folder
7, switch to the root user, install the SSH plugin
Su Root
Yum-y Install Openssh-clients
8, switch to/home/hadoop/, test ssh without password access,
Su Hadoop
SSH djt002
Yes
Third, the Hadoop environment variable configuration
Next
Build a Hadoop pseudo-distributed environment
Download and unzip the Hadoop2.2.0, put the files in the/usr/java/directory,注意:下载前确认一下你的
Linux
系统是
64
位系统还是
32
位系统,分别下载对应的版本,如果下载错了,后面会有很多问题
1 First, switch to/usr/java/, then switch to root user, and then/root/java
Cd/usr/java
Su Root
2, using the wget command installation, not installed?
Yum-y Install wget
3, with wget command online download hadoop-2.2.0-x64.tar.gz
wget http://hadoop.f.dajiangtai.com/hadoop2.2/hadoop-2.2.0-x64.tar.gz
or 3 Use the RZ command to upload hadoop-2.2.0-x64.tar.gz from Windows to/usr/java/.
4. View, Unzip,
Ls
Tar axvf hadoop-2.2.0-x64.tar.gz
5. Change the file name hadoop-2.2.0 to Hadoop
MV hadoop-2.2.0 Hadoop
6, the newly renamed Hadoop file, permissions to the Hadoop user
Chown-r Hadoop:hadoop Hadoop
7 Create the Hadoop data directory and assign the entire data directory permissions to the Hadoop user
Mkdir-p/data/dfs/name
Mkdir-p/data/dfs/data
Mkdir-p/data/tmp
Chown-r Hadoop:hadoop Hadoop/data
8. Modify the configuration file for Hadoop
Switch to Hadoop user, switch to Hadoop directory
Su Hadoop
CD Hadoop
9. View the configuration file under Etc/hadoop
CD etc/
Ls
CD hadoop/
Ls
10, modify the Etc/hadoop/core-site.xml configuration file, add the following information
VI Core-site.xml
Here is the address and port number of the Distributed File System for HDFs (Hadoop)
<property>
<name>fs.defaultFS</name>
<value>hdfs://djt002:9000</value>
</property>
The following configuration is the common directory where the HDFs path holds data
<property>
<name>hadoop.tmp.dir</name>
<value>file:/data/tmp</value>
</property>
The following configuration is, Because of the security mechanism introduced in hadoop1.0, the job submitter from the client becomes Hadoop, regardless of the original submitter's user, in order to solve the problem, introduce security violation function, allow a super user to replace other users to submit the job or execute the command, and externally, the performer is still a normal user. So
Configuration set to any client
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
Configuration set to any user group
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
That is, the total is as follows:
Modify the Etc/hadoop/hdfs-site.xml configuration file to add the following information.
VI Hdfs-site.xml
below this is configured to be Namenode file directory
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/dfs/name</value>
<final>true</final>
</property>
below this is configured to be Datanode file directory
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/dfs/data</value>
<final>true</final>
</property>
The following configuration is a copy of the block and HDFs permissions
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
The total is as follows:
12, modify the Etc/hadoop/mapred-site.xml configuration file, add the following information.
CP Mapred-site.xml.template Mapred-site.xml
VI Mapred-site.xml
A mapreduce environment is configured for yarn
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
13, modify the Etc/hadoop/yarn-site.xml configuration file, add the following information.
VI Yarn-site.xml
in order to be able to run MapReduce program, we need to get . Nodemanger Load at startup Shuffle . So the following settings are required
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
14, modify the Etc/hadoop/slaves, add the following information. That is, slaves file
VI Slaves
is now a pseudo-distributed single-node cluster, so Datanode and the Namedata On a single node .
Revision changed to
15. Setting the Hadoop environment variable
Switch to root user, modify the/etc/profile file
Su Root
Vi/etc/profile
Not hadoop_home=/usr/java/hadoop-2.2.0, because the first name changed.
Hadoop_home=/usr/java/hadoop
Path= $HADOOP _home/bin: $PATH
Export Hadoop_home PATH
Summary: The JDK environment variable configuration is placed in the/ETC/PROFILE.D Hadoop environment variable configuration is placed on the/etc/profile
16. Make the configuration file effective
Source/etc/profile
Fourth Test Hadoop run
1 Switch to the Hadoop user and go back to the Hadoop directory.
Su Hadoop
Cd..
Cd..
Ls
2 Formatting Namenode
Bin/hadoop Namenode-format
3 Starting the cluster
sbin/start-all.sh
4 Viewing the cluster process
JPs
5 Administrator Run Notepad
6 Local Hosts file
Then, save, and then close.
7 Finally, it is time to verify that Hadoop is installed successfully.
On Windows, you can access WebUI through http://djt002:50070 to view the status of Namenode, the cluster, and the file system. This is the Web page for HDFs.
http://djt002:50070
8 new Djt.txt, used for testing. Test with the WordCount program that comes with Hadoop
Then, save the exit,
9 View the file directory of HDFs.
Hadoop Fs-ls/
Hadoop Fs-mkdir/dajiangtai
Hadoop Fs-ls/
Hadoop Fs-put/usr/java/hadoop/djt.txt/dajiangtai
Hadoop Fs-ls/dajiangtai
10 Open a Web page to see the job's dynamics
Http://djt002:8088/cluster/apps
11 Perform Hadoop-brought WordCount program to test run-down
Bin/hadoop jar Share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar Wordcount/dajiangtai/djt.txt/dajiangtai /wordcount-out
As an explanation,/dajiangtai/djt.txt is the input path, and/dajiangtai/wordcount-out is the output path.
If all of the above are OK, then congratulations, your Hadoop single node pseudo-distributed operating environment has been built successfully!
Construction of pseudo-distributed cluster environment for Hadoop 2.2.0