Hadoop Pseudo-distributed construction

Last Update:2015-10-10 Source: Internet

Author: User

Tags echo command ssh server hdfs dfs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Experimental platform:

Virtual Box 4.3.24

CentOS7

JDK 1.8.0_60

Hadoop 2.6.0

The Hadoop Basic installation configuration consists of the following steps:

1) Create a Hadoop user

2) Installing Java

3) Set SSH login permissions

4) stand-alone installation configuration

5) pseudo-distributed installation configuration

1.1 Creating a Hadoop user

The command for the Linux create user is useradd, and the command to set the password is passwd

Under CentOS, first we create a Hadoop user group with the Useradd command, and its password is also Hadoop:

useradd Hadoop #设置hadoop用户组 passwd Hadoop #配置hadoop用户组的密码

In the/home folder, a Hadoop folder appears

1.2 Installing the JDK

Download a JDK to Oracle's website. and copy it to the/USR/LIB/JVM.

CP jdk-8u60-linux-x64. tar. GZ/USR/LIB/JVM

Then unzip

tar -zxvf jdk-8u60-linux-x64. tar. gz

Z indicates that the Gzip property is decompressed

X means decompression, C means compression

V indicates the display process

F means the file name is received

Open a. bashrc file with the editor:

VI ~/.BASHRC

On the first line join

Export java_home=/usr/lib/jvm/jdk1.8.0_60
Export java_bin= $JAVA _home/bin
Export path= $PATH: $JAVA _home/bin
Export classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar
Export Java_home java_bin PATH CLASSPATH

You will then have to make the changes take effect:

SOURCE ~/.BASHRC

Use the Echo command to view environment variables:

Echo $JAVA _home/usr/lib/jvm/jdk1. 8. 0_60

Configuration is successful, you can use the Java-version command to see if the JDK was installed successfully

1.3 Configuring SSH

CentOS7 installed OpenSSH (client) By default, we just have to start. Under Shell, type the following command:

Service sshd Start

We also install the SSH server, which executes under the shell:

Yum Install Openssh-server

After the installation can use the command ssh to SSH login to this machine, because it is the first time, will appear the following tips

Follow the prompts to enter Yes, then enter the user's Hadoop password, you can log on to the machine. But to enter the password each time, we need to configure SSH to be able to access without a password.

Type exit to launch the link you just established

To generate an SSH key:

[Email protected] hadoop]#Ssh-keygen-T rsagenerating Public/private RSA key pair. Enterfile inch whichTo save the key (/root/.SSH/id_rsa): #直接回车Enter Passphrase (empty forno passphrase): #直接回车Enter same passphrase again: #直接回车Your identification has been savedinch/root/.SSH/Id_rsa. Your public key has been savedinch/root/.SSH/id_rsa.pub.The Key fingerprint is: the: 6c:b7: the: E7: .: 7f: -:d 3: $: 5a: -: 5e: +: f3:6a [email protected]the key's Randomart image is:+--[RSA2048]----+|               . O | |               .=||       o+| | .     .        .*||   S.       e=o| | .          o.o. +. | |  +o.           O | |           +o O | | ... . |+-----------------+

The key is successfully generated and the key is placed in the/root/. ssh/id_rsa.pub position. The key is then added to the authorization

cd/root/. SSH CP id_rsa.pub Authorized_keys #将刚才生成的密钥加入授权

Then execute ssh localhost command, you can find no need to enter the login password again

1.4 Installing Hadoop

Unzip the downloaded Hadoop package (note that binary files, not down into the source) into the/usr/local directory

tar -zxvf./hadoop-2.6. 0. tar. gz-c/usr/local  # Unzip to/usr/local

Rename the resulting folder to Hadoop

mv ./hadoop-2.6. 0/./hadoop

Go to the Bin folder under the Hadoop folder to see if the installation was successful with the Hadoop version command

, Hadoop has been installed successfully. Then we can run the Hadoop official example to test whether the functionality is working, and we run the WordCount example to verify that Hadoop was installed successfully. First, create the input folder in the Hadoop directory to hold the incoming data, and then copy the ./etc/Hadoop/ Next configuration file into the input folder Next, create a new output folder in the Hadoop directory to store the data.

mkdir input
CP etc/hadoop/*.xml inputmkdir output

Finally execute the following code, invoking the grep feature of Hadoop:

grep ' Dfs[a-z.] +'

Then look at the contents of the output data

cat ./output/*

You can get the following results when you run the above command

1 dfsadmin

1.5 Hadoop pseudo-distributed configuration

Pseudo-distributed installation refers to simulating a small cluster on a single machine. When Hadoop runs in a pseudo-distributed manner on a single node, the Hadoop process runs as a separate Java process, with nodes both as NameNode and as DataNode. Whether it is true or pseudo-distributed, we need to set up a configuration file to work together on each component, and for a pseudo-distributed configuration, we need to modify the Core-site.xml, Hdfs-site.xml and Mapred-site-xml (the latest Hadoop does not have this file) of these three files. The configuration file for Hadoop is located /usr/local/hadoop/etc/hadoop/ in.

Modify the Core-site.xml to

<configuration></configuration>

Switch

<configuration>        <property>                <name>fs.default.name</name>                <value>hdfs: // localhost:9000</value>        </property></configuration>

The <name> tag represents the configuration item's name,<value> the configured value. For the Core-site.xml file, we only need to establish the address and port number of the HDFs, the port number according to the official document configured to 9000. Then we modify the Hdfs-site.xml file. The following changes are followed:

<configuration>        <property>                <name>dfs.replication</name>                <value>1 </value>        </property></configuration>

Once the configuration is complete, the file system needs to be initialized first, since much of Hadoop's work is done on its own HDFs file system, so it is necessary to initialize the file system in order to further start the computing task and to perform namenode formatting in the bin directory:

./hdfs Namenode-format

15/10/10 19:28:30 INFO util. Exitutil:exiting with status 0
15/10/10 19:28:30 INFO Namenode. Namenode:shutdown_msg:
/************************************************************
Shutdown_msg:shutting down NameNode at localhost/127.0.0.1
************************************************************/

An exit status of 0 indicates that the initialization succeeded, and if 1 indicates that the format failed.

Then open the daemon for the named node and the data node:

After the boot is complete, enter the JPS command to see if the startup is successful, and if successful, you will see 3 processes:

Jps、NameNode, DataNode andSecondaryNameNode

After successful startup, you can access the Web interface http://localhost:50070 to view Hadoop information.

In the above stand-alone instance we read the local data with grep, and in the pseudo-distributed we read the data in HDFs, so we need to build an HDFs file system.

Bin/hdfs DFS-mkdir -p/user/hadoop/input

Because the Hadoop file system is created, it is not displayed under the Linux file system. Then Copy all the files under etc/Hadoop to the HDFs folder input.

Bin/hdfs dfs-put etc/hadoop/*. Xml/user/hadoop/input

After the copy is complete, you can view the list of files with the following command:

Bin/hdfs DFS-ls /user/hadoop/input

Reference:

http://www.powerxing.com/install-hadoop/

Http://www.centoscn.com/CentOS/config/2013/0926/1713.html

Principles and applications of big data technology the woods are raining

Hadoop Pseudo-distributed construction

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More