Introduction and Environment preparation
The core of Hadoop is the Distributed File system HDFs and the batch compute MapReduce. In recent years, with the rise of big data, cloud computing, the internet of things, but also greatly attracted my interest, read a lot of articles on the Internet, feel or foggy, many unnecessary configuration are in the introductory tutorial appeared. Through a summary of thinking and tutorials, I want to pass the simple way to the same students who want to get started with Hadoop. In fact, if you have a good Java foundation, when you get started, you will feel that Hadoop is actually very simple, big data is nothing more than a large amount of data, need a lot of machines together to complete the storage work, cloud computing is nothing more than a number of machines together .
Operation Suggestion: Theory first understand three points, first practice operation finished, then look back to theory, in the following article I will analyze the theory, and finally use the mind map to summarize the overall look of its hadoop.
Environment Preparation: Http://pan.baidu.com/s/1dFrHyxV Password: 1e9g (recommended to go to the official website under the environment, to native flavor, do not secondhand)
Centos-linux System: Centos-7-x86_64-dvd-1511.iso
VirtualBox Virtual machine: Virtualbox-5.1.18-114002-win.exe
Xshell Telnet tool: Xshell.exe
XFTP Remote File transfer: Xftp.exe
Hadoop:hadoop-2.7.3.tar.gz
jdk8:jdk-8u91-linux-x64.rpm
the physical architecture of Hadoop
Physical architecture: Assume that the room has four machines to build a clustered environment, Master (ip:192.168.56.100), Slave1 (ip:192.168.56.101), Slave2 (ip:192.168.56.102), Slave3 (IP : 192.168.56.103). Here's a brief introduction to the specifics, which I'll cover in more detail in the HDFs article in Hadoop.
Distributed: Different locations, different functions, multi-state computers for different data through the communication network to connect to other, unified control, the coordination of the completion of large-scale information processing computer systems. Simply put, a hard disk can be divided into two parts: file index and file data, then the file index is deployed on a separate server we call the master root node (NameNode), the file data is deployed in the Master Node Management child node is called the slave node (DataNode).
installing Linux with Virtulbox
Reference: http://www.cnblogs.com/qiuyong/p/6815903.html
Configure the cluster to communicate under the same virtual LAN
Note: Through the above operation, the Master (192.168.56.100) machine has been set up and the virtual network environment is configured under the same virtual machine.
- Vim/etc/sysconfig/network
- Networking=yes gateway=192.168.56.1 (Description: Configuration means, connect the VirtualBox this NIC)
- Vim/etc/sysconfig/network-sripts/ifcfg-enp0s3
- Type=ethernet ipaddr=192.168.56.100 netmask=255.255.255.0 (Description: Configuration means, set own IP)
- Modify host Name: Hostnamectl set-hostname Master
- Restart Network: Service network restart
- View Ip:ifconfig
- Turn off the firewall if Windows can ping and, if Ping is different. Master:ping 192.168.56.1 windows:ping 192.168.56.100
- systemctl Stop firewalld-->system disable FIREWALLD
remote login and file transfer using Xshell, Xftp
Using VirtualBox login, uploading files will be more troublesome, using Xshell remote login.
Upload files using xftp.
Upload hadoop-2.7.3.tar.gz, jdk-8u91-linux-x64.rpm to/usr/local directory. Novice tip: In the right window, select the/usr/local directory, the left double-click the compression package on the upload succeeded.
Configuring the Hadoop environment
- Unzip jdk-8u91-linux-x64.rpm:rpm-ivh/usr/local/jdk-8u91-linux-x64.rpm--> default installation directory to/usr/java
- Confirm that the JDK is installed successfully. Rpm-qa | grep jdk,java-version to see if the installation was successful.
- Unzip the hadoop-2.7.3.tar.gz:tar-vhf/usr/local/hadoop-2.7.3.tar.gz.
- Modify the directory named hadoop:mv/usr/local/hadoop-2.7.3 Hadoop
- Switch directory to Hadoop configuration file directory: Cd/usr/local/hadoop/etc/hadoop
- Vim hadoop-env.sh
- Modify export java_home statement to export Java_home=/usr/java/default
- Exit Edit page: Press ESC to enter: WQ
- Vim/etc/profile
- Append export path= to the file at the end $PATH:/usr/hadoop/bin:/usr/hadoop/sbin
- Source/etc/profile
Divergent thinking-further
1: Only one master is configured now? That slave1, Slave2, Slave3 also such a set?
A: Subconscious inside, there must be a solution to avoid. Of course, VirtualBox also provides the ability to replicate machines. Select Master and right-click Copy. In this case, a machine that is exactly the same as master will be taken care of. We only need to modify the configuration of the network. Note: Building a clustered environment requires you to replicate three.
Question 2: How do I see if these Linux machines are in the same environment?
A: I'll go through the contents again. Start four Linux machines (you can right-click to start without an interface)--use Xshell telnet--and select tools (send keys to the interface you use). Enter Ping 192.168.56.100, 192.168.56.101, 192.168.56.102, 192.168.56.103.
Configure and start Hadoop
1, configure the domain name for four machines. Vim/etc/hosts
192.168.56.100 Master
192.168.56.101 slave1
192.168.56.102 Slave2
192.168.56.103 Slave3
2. Switch to the Hadoop configuration file directory/usr/local/hadoop/etc/hadoop vim Core-site.xml
3, modify four Linux machine Core-site.xml, named four machine who is master (NameNode).
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
4, in the master node machine named its child nodes have what: vim/usr/local/hadoop/etc/hadoop/slaves (in fact, the IP named child node)
Slave1
Slave2
Slave3
5, initialize the master configuration: HDFs Namenode-format
6. Start the Hadoop cluster and use JPS to view the start of the node
Start master:hadoop-daemon.sh start Namenode
Start slave:hadoop-daemon.sh start Datanode
7. View Cluster boot situation: HDFs Dfsadmin-report or use web http://192.168.56.100:50070/
Hadoop is simplified-from installing Linux to building a clustered environment