Hadoop is simplified-from installing Linux to building a clustered environment

Source: Internet
Author: User

Introduction and Environment preparation

The core of Hadoop is the Distributed File system HDFs and the batch compute MapReduce. In recent years, with the rise of big data, cloud computing, the internet of things, but also greatly attracted my interest, read a lot of articles on the Internet, feel or foggy, many unnecessary configuration are in the introductory tutorial appeared. Through a summary of thinking and tutorials, I want to pass the simple way to the same students who want to get started with Hadoop. In fact, if you have a good Java foundation, when you get started, you will feel that Hadoop is actually very simple, big data is nothing more than a large amount of data, need a lot of machines together to complete the storage work, cloud computing is nothing more than a number of machines together .

Operation Suggestion: Theory first understand three points, first practice operation finished, then look back to theory, in the following article I will analyze the theory, and finally use the mind map to summarize the overall look of its hadoop.

Environment Preparation: Http://pan.baidu.com/s/1dFrHyxV Password: 1e9g (recommended to go to the official website under the environment, to native flavor, do not secondhand)

Centos-linux System: Centos-7-x86_64-dvd-1511.iso

VirtualBox Virtual machine: Virtualbox-5.1.18-114002-win.exe

Xshell Telnet tool: Xshell.exe

XFTP Remote File transfer: Xftp.exe

Hadoop:hadoop-2.7.3.tar.gz

jdk8:jdk-8u91-linux-x64.rpm

the physical architecture of Hadoop

Physical architecture: Assume that the room has four machines to build a clustered environment, Master (ip:192.168.56.100), Slave1 (ip:192.168.56.101), Slave2 (ip:192.168.56.102), Slave3 (IP : 192.168.56.103). Here's a brief introduction to the specifics, which I'll cover in more detail in the HDFs article in Hadoop.

Distributed: Different locations, different functions, multi-state computers for different data through the communication network to connect to other, unified control, the coordination of the completion of large-scale information processing computer systems. Simply put, a hard disk can be divided into two parts: file index and file data, then the file index is deployed on a separate server we call the master root node (NameNode), the file data is deployed in the Master Node Management child node is called the slave node (DataNode).

  

installing Linux with Virtulbox

Reference: http://www.cnblogs.com/qiuyong/p/6815903.html

Configure the cluster to communicate under the same virtual LAN

Note: Through the above operation, the Master (192.168.56.100) machine has been set up and the virtual network environment is configured under the same virtual machine.

  1. Vim/etc/sysconfig/network
  2. Networking=yes gateway=192.168.56.1 (Description: Configuration means, connect the VirtualBox this NIC)
  3. Vim/etc/sysconfig/network-sripts/ifcfg-enp0s3
  4. Type=ethernet ipaddr=192.168.56.100 netmask=255.255.255.0 (Description: Configuration means, set own IP)
  5. Modify host Name: Hostnamectl set-hostname Master
  6. Restart Network: Service network restart
  7. View Ip:ifconfig
  8. Turn off the firewall if Windows can ping and, if Ping is different. Master:ping 192.168.56.1 windows:ping 192.168.56.100
  9. systemctl Stop firewalld-->system disable FIREWALLD  
remote login and file transfer using Xshell, Xftp

Using VirtualBox login, uploading files will be more troublesome, using Xshell remote login.

 

  

Upload files using xftp.

  

  

Upload hadoop-2.7.3.tar.gz, jdk-8u91-linux-x64.rpm to/usr/local directory. Novice tip: In the right window, select the/usr/local directory, the left double-click the compression package on the upload succeeded.

Configuring the Hadoop environment
  1. Unzip jdk-8u91-linux-x64.rpm:rpm-ivh/usr/local/jdk-8u91-linux-x64.rpm--> default installation directory to/usr/java
  2. Confirm that the JDK is installed successfully. Rpm-qa | grep jdk,java-version to see if the installation was successful.
  3. Unzip the hadoop-2.7.3.tar.gz:tar-vhf/usr/local/hadoop-2.7.3.tar.gz.
  4. Modify the directory named hadoop:mv/usr/local/hadoop-2.7.3 Hadoop
  5. Switch directory to Hadoop configuration file directory: Cd/usr/local/hadoop/etc/hadoop
  6. Vim hadoop-env.sh
  7. Modify export java_home statement to export Java_home=/usr/java/default
  8. Exit Edit page: Press ESC to enter: WQ
  9. Vim/etc/profile
  10. Append export path= to the file at the end $PATH:/usr/hadoop/bin:/usr/hadoop/sbin
  11. Source/etc/profile
Divergent thinking-further

1: Only one master is configured now? That slave1, Slave2, Slave3 also such a set?

A: Subconscious inside, there must be a solution to avoid. Of course, VirtualBox also provides the ability to replicate machines. Select Master and right-click Copy. In this case, a machine that is exactly the same as master will be taken care of. We only need to modify the configuration of the network. Note: Building a clustered environment requires you to replicate three.

Question 2: How do I see if these Linux machines are in the same environment?

A: I'll go through the contents again. Start four Linux machines (you can right-click to start without an interface)--use Xshell telnet--and select tools (send keys to the interface you use). Enter Ping 192.168.56.100, 192.168.56.101, 192.168.56.102, 192.168.56.103.

Configure and start Hadoop

1, configure the domain name for four machines. Vim/etc/hosts

192.168.56.100 Master

192.168.56.101 slave1

192.168.56.102 Slave2

192.168.56.103 Slave3

2. Switch to the Hadoop configuration file directory/usr/local/hadoop/etc/hadoop vim Core-site.xml

3, modify four Linux machine Core-site.xml, named four machine who is master (NameNode).

<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>

4, in the master node machine named its child nodes have what: vim/usr/local/hadoop/etc/hadoop/slaves (in fact, the IP named child node)

Slave1

Slave2

Slave3

5, initialize the master configuration: HDFs Namenode-format

6. Start the Hadoop cluster and use JPS to view the start of the node

Start master:hadoop-daemon.sh start Namenode

Start slave:hadoop-daemon.sh start Datanode

    

7. View Cluster boot situation: HDFs Dfsadmin-report or use web http://192.168.56.100:50070/

    

Hadoop is simplified-from installing Linux to building a clustered environment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.