Hadoop is simplified-from installing Linux to building a clustered environment

Last Update:2017-05-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction and Environment preparation

The core of Hadoop is the Distributed File system HDFs and the batch compute MapReduce. In recent years, with the rise of big data, cloud computing, the internet of things, but also greatly attracted my interest, read a lot of articles on the Internet, feel or foggy, many unnecessary configuration are in the introductory tutorial appeared. Through a summary of thinking and tutorials, I want to pass the simple way to the same students who want to get started with Hadoop. In fact, if you have a good Java foundation, when you get started, you will feel that Hadoop is actually very simple, big data is nothing more than a large amount of data, need a lot of machines together to complete the storage work, cloud computing is nothing more than a number of machines together .

Operation Suggestion: Theory first understand three points, first practice operation finished, then look back to theory, in the following article I will analyze the theory, and finally use the mind map to summarize the overall look of its hadoop.

Environment Preparation: Http://pan.baidu.com/s/1dFrHyxV Password: 1e9g (recommended to go to the official website under the environment, to native flavor, do not secondhand)

Centos-linux System: Centos-7-x86_64-dvd-1511.iso

VirtualBox Virtual machine: Virtualbox-5.1.18-114002-win.exe

Xshell Telnet tool: Xshell.exe

XFTP Remote File transfer: Xftp.exe

Hadoop:hadoop-2.7.3.tar.gz

jdk8:jdk-8u91-linux-x64.rpm

the physical architecture of Hadoop

Physical architecture: Assume that the room has four machines to build a clustered environment, Master (ip:192.168.56.100), Slave1 (ip:192.168.56.101), Slave2 (ip:192.168.56.102), Slave3 (IP : 192.168.56.103). Here's a brief introduction to the specifics, which I'll cover in more detail in the HDFs article in Hadoop.

Distributed: Different locations, different functions, multi-state computers for different data through the communication network to connect to other, unified control, the coordination of the completion of large-scale information processing computer systems. Simply put, a hard disk can be divided into two parts: file index and file data, then the file index is deployed on a separate server we call the master root node (NameNode), the file data is deployed in the Master Node Management child node is called the slave node (DataNode).

installing Linux with Virtulbox

Reference: http://www.cnblogs.com/qiuyong/p/6815903.html

Configure the cluster to communicate under the same virtual LAN

Note: Through the above operation, the Master (192.168.56.100) machine has been set up and the virtual network environment is configured under the same virtual machine.

Vim/etc/sysconfig/network
Networking=yes gateway=192.168.56.1 (Description: Configuration means, connect the VirtualBox this NIC)
Vim/etc/sysconfig/network-sripts/ifcfg-enp0s3
Type=ethernet ipaddr=192.168.56.100 netmask=255.255.255.0 (Description: Configuration means, set own IP)
Modify host Name: Hostnamectl set-hostname Master
Restart Network: Service network restart
View Ip:ifconfig
Turn off the firewall if Windows can ping and, if Ping is different. Master:ping 192.168.56.1 windows:ping 192.168.56.100
systemctl Stop firewalld-->system disable FIREWALLD

remote login and file transfer using Xshell, Xftp

Using VirtualBox login, uploading files will be more troublesome, using Xshell remote login.

Upload files using xftp.

Upload hadoop-2.7.3.tar.gz, jdk-8u91-linux-x64.rpm to/usr/local directory. Novice tip: In the right window, select the/usr/local directory, the left double-click the compression package on the upload succeeded.

Configuring the Hadoop environment

Unzip jdk-8u91-linux-x64.rpm:rpm-ivh/usr/local/jdk-8u91-linux-x64.rpm--> default installation directory to/usr/java
Confirm that the JDK is installed successfully. Rpm-qa | grep jdk,java-version to see if the installation was successful.
Unzip the hadoop-2.7.3.tar.gz:tar-vhf/usr/local/hadoop-2.7.3.tar.gz.
Modify the directory named hadoop:mv/usr/local/hadoop-2.7.3 Hadoop
Switch directory to Hadoop configuration file directory: Cd/usr/local/hadoop/etc/hadoop
Vim hadoop-env.sh
Modify export java_home statement to export Java_home=/usr/java/default
Exit Edit page: Press ESC to enter: WQ
Vim/etc/profile
Append export path= to the file at the end $PATH:/usr/hadoop/bin:/usr/hadoop/sbin
Source/etc/profile

Divergent thinking-further

1: Only one master is configured now? That slave1, Slave2, Slave3 also such a set?

A: Subconscious inside, there must be a solution to avoid. Of course, VirtualBox also provides the ability to replicate machines. Select Master and right-click Copy. In this case, a machine that is exactly the same as master will be taken care of. We only need to modify the configuration of the network. Note: Building a clustered environment requires you to replicate three.

Question 2: How do I see if these Linux machines are in the same environment?

A: I'll go through the contents again. Start four Linux machines (you can right-click to start without an interface)--use Xshell telnet--and select tools (send keys to the interface you use). Enter Ping 192.168.56.100, 192.168.56.101, 192.168.56.102, 192.168.56.103.

Configure and start Hadoop

1, configure the domain name for four machines. Vim/etc/hosts

192.168.56.100 Master

192.168.56.101 slave1

192.168.56.102 Slave2

192.168.56.103 Slave3

2. Switch to the Hadoop configuration file directory/usr/local/hadoop/etc/hadoop vim Core-site.xml

3, modify four Linux machine Core-site.xml, named four machine who is master (NameNode).

<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>

4, in the master node machine named its child nodes have what: vim/usr/local/hadoop/etc/hadoop/slaves (in fact, the IP named child node)

Slave1

Slave2

Slave3

5, initialize the master configuration: HDFs Namenode-format

6. Start the Hadoop cluster and use JPS to view the start of the node

Start master:hadoop-daemon.sh start Namenode

Start slave:hadoop-daemon.sh start Datanode

7. View Cluster boot situation: HDFs Dfsadmin-report or use web http://192.168.56.100:50070/

Hadoop is simplified-from installing Linux to building a clustered environment

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop is simplified-from installing Linux to building a clustered environment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop is simplified-from installing Linux to building a clustered environment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support