instruction manual 01: Installing Hadoop
Part 1: Installing and configuring virtual machines
1. install Linux.
(1) Open Vmvirtualbox
(2) Control--New virtual machine, enter virtual machine name "marst+ number" type: Linux, version: Other Linux (64-bit)
Centos-6.8-x86_64-bin-dvd1
(3) Memory recommendations 1024M, create virtual Disk VHD virtual hard disk, dynamic allocation, HDD 20G.
(4) Right-click the virtual machine master you created, set: Network, NIC 1: Bridge network card. Storage: CD, select the CENTOS-6.8-X86_64-BIN-DVD1 installation package. (Unzip first: d:\ Big Data software \centos-6.8-x86_64-bin-dvd1.rar)
(5) Choose Install or upgrade an existing system (install or upgrade existing systems)
(6) Select the installation language, Simplified Chinese, keyboard u.s.english, storage device->basic strage Devices, (Basic step installation), select "Yes Discard anyd data" (Yes, delete all detected hard drive data ".
(7) Set host name: "Mater01.centos.com", click Next, select Time zone, set Administrator password: Hadoop.
(8) Select "Use all Space" (using all spaces).
(9) Choose to install the CentOS component, select "Desktop" here
(10) Install successfully, reboot, login, enter username "root" and password "Hadoop".
2 . Set IP
IP is assigned to the same network segment as the host, and the system automatically acquires the IP address via the ipconfig command. and ping the virtual machine address through the host, whether it can ping.
You can also manually modify the IP address as follows:
(1) Modify the configuration file "/etc/sysconfig/network-scripts/ifcfg-eth0".
(2) Executive order "Vi/etc/sysconfig/network-scripts/ifcfg-eth0
(3) Modify Onboot=yes (whether to activate the network card at startup), bootproto=static (static IP), add IP address, subnet mask, gateway.
(4) Restart Service (Execute Command service network restart ")
3. Remote Connection Virtual machine (window installation SSH tool)
(1) Ping the virtual machine on the host to see if it is ok to ping it. (Note: Linux and Windows host need to shut down the firewall, Linux Shutdown Firewall command: Service iptabes stop)
(2) Install SSH Secure on the host. After installing the desktop two icons, one is the Remote connection command operation tool, and the other is the file Transfer tool.
(3) Click Quick Connect, enter the Linux IP address, user name: root, Port number: 22, Password: Hadoop can telnet.
4 . Virtual machine online installation software
Tip: The Yum command is a RPM-based package manager in Fedora and Redhat as well as in SuSE. 1. RPM is a package name for Linux, ending with. rpm 2. Yum is a package manager developed by Duke University to improve the installation of RPM packages. 3, yum installation method can completely solve the problem of the RPM installation of relevance 4. Yum provides commands to find, install, and delete one or even all packages, and the commands are concise and well-written. Yum commands use reference articles Http://man.linuxde.net/yum Yum and Apt-get The difference: https://www.cnblogs.com/wxishang1991/p/5322489.html
|
To configure the local Yum source main steps:
(1) Execute the command "CD/ETC/YUM.REPOS.D".
(2) To view the files in the YUM.REPOS.D directory:
Centos-base.repo, is the network source, Centos-medi.pro is the local source.
Configuring a local source requires that a Yum source other than the Yum local source be disabled.
Execute command "MV Centos-base.repo Centos-base.rep.bak"
(3) Execute command "VI Centos-media.repo". Change the value of BaseURL to file:///media/(disc mount point position), change the value of Gpgcheck to 0 (indicating that RPM packets downloaded from this source are not checked), and enable to 1 (which means that the source is enabled). ,
(4) Mounting
Executive Mount/dev/dvd/media
If the test mount succeeds, check to see if the virtual machine CD is already enabled.
(5) Update cloud source
Yum Cleal All
(6) Install the software from the CD installation package using Yum. For example, install VIM, zip, Openssh-server, openssh-clent.
Yum install–y vim zip openssh-server openssh-client
5. Test Implementation
Use the Vim editor to write the file a.txt in the/OPT directory of the virtual machine master.
(1) Use the SSH tool to open the master session.
(2) Enter/opt directory, command for cd/opt
(3) using Vim to create an empty file, the command is: Vim a.txt
(4) Press "A" "I", or "O" key, enter the editing state, write a paragraph in the A.txt file: "Welcome hadoop!"
(5) Press "ESC" key, exit the edit status, enter ": Wq", press ENTER to save exit.
Another way
Use the SSH File Transfer tool to connect master, write the a.txt on the window machine and upload it to the Mater/opt folder.
LINUX basic Commands article: Http://www.cnblogs.com/yjd_hycf_space/p/7730690.html
Vim Use article: 54314053
Part 2: Installing Java
1 . Installing Java under Windows
(1) Double-click JDK file Installation
(2) Change the JDK installation directory;
(3) Change the JRE installation directory;
(4) Configuring environment variables
System variables → NEW: Java_home variable value: C:\Program files\java\jdk1.8.0_121
System Variable →path→ edit: At the end of the variable value input:%java_home%\bin;%java_home%\jre\bin;
System variables → NEW: CLASSPATH variable value:.; %java_home%\lib;%java_home%\lib\tools.jar
2 . Installing Java under Linux
(1) Upload the JDK installation package to the/OPT directory
(2) Enter the/OPT directory and execute the command "RPM–IVH jdk-7u80-linux-x64.rpm" to install the JDK
3 . Configure SSH login without password
(1) Use Ssh-keygen to generate public and private key pairs.
Enter the command "ssh-keygen-t RSA" and press the ENTER key three times
3. Test Implementation
To view the JDK version under Windows, follow these steps.
(1) Open the Windwos run window.
(2) Execute the command "Java–version".
The following are the steps to view the JDK version under Linux systems.
(1) Open a terminal session.
(2) Execute the command "Java–version".
Part 3 Hadoop Installation and Configuration
Based on ideas: 1. Create a new virtual machine master, configure the fixed IP (can set two network cards, one card bridge, another network card NAT mode), shut down the firewall, install the necessary software; 2. Clone Master to Salve1, Slave2, Slave3; 3. Change the IP of the slave1~slave3 to fixed IP; |
1. upload the Hadoop installation package.
Uploading hadoop-2.6.4.tar.gz files to the/opt directory via SSH Secure file Transfer client
2. unzip the hadoop-2.6.0.tar.gz. zip file
TAR-ZXF hadoop-2.6.0.tar.gz-c/usr/local
After decompression, see the/usr/local/hadoop-2.6.0 folder
3. Configure Hadoop
Enter directory:
cd/usr/local/hadoop-2.6.4/etc/hadoop/
Modify the following file in turn:
4.1 Core-site.xml
<configuration>
<!--Configure Namenode address for HDFs-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020</value>
</property>
<!--Configure the storage directory where the Hadoop runtime produces data, not temporary data--
<property>
<name>hadoop.tmp.dir</name>
<value>/var/log/hadoop/tmp</value>
</property>
</configuration>
4.2 hadoop-env.sh
<!--modify Java_hom as follows:-->
Export java_home=/usr/java/jdk1.7.0_80
4.3 hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hadoop/hdfs/data</value>
</property>
<!--Specify the Web Access port for HDFs--
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
<!--Specify the amount of replica data that HDFs stores data--
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
4.4 Mapred-site.xml
<configuration>
<!--Specify the MapReduce programming model to run on yarn--
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!--jobhistory Properties--
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
4.5 Yarn-site.xml
<configuration>
<!--designated Yarn's eldest (ResourceManager's address)--
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<!-the exposed address of the client to which the client submits the application to RM-
<!-address to Applicationmaster, applicationmaster the address to the RM to request resources to release resources--
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<!-exposure to NodeManager--->
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data/hadoop/yarn/local</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/data/tmp/logs</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://master:19888/jobhistory/logs/</value>
<description>url for Job history server</description>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2048</value>
</property>
<--nodemanager total number of virtual CPUs available--
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>1</value>
</property>
4.6 yarn-env.sh
Export java_home=/usr/java/jdk1.7.0_80
If you have a Linux client slave node, Copy the Hadoop installation file to the cluster slave node or run the 6th step clone directly Scp-r/usr/local/hadoop-2.6.4 slave01:/usr/local Scp-r/usr/local/hadoop-2.6.4 slave02:/usr/local Scp-r/usr/local/hadoop-2.6.4 slave03:/usr/local |
4.7 Modify Slaves File
Slave1
Slave2
Slave3
4.8 Setting IP Mappings
Edit/etc/hosts
10.0.2.4 Master master.centos.com
10.0.2.5 slave1 slave1.centos.com
10.0.2.6 Slave2 slave2.centos.com
10.0.2.7 Slave3 slave3.centos.com
5 . Building a cluster network environment
(1) Setting NAT network in global setting, cluster segment set to 10.0.2.0
(2) Add the master host to network card 2 and set it to NAT mode.
Network card 2 IP address set to: 10.0.2.4
6 . Cloning a virtual machine
Clone Master to Slave1, Slave2, slave3, modify Slave1~slave3 IP, change to fixed IP;
Turn on the virtual machine slave1.
Open settings, network, refresh the NIC address.
(1) Execute the command "rm–rf/etc/udev/rules.d/70-persistent-net.rules" to delete the file.
(2) Execute command ifconfig–a, view hwaddr.
(3) Modify the/etc/sysconfig/network-scripts/ifcfg-eth0 file, modify the IP address and network card address
Slave1, Slave2, slave3 network card IP addresses are set to: 10.0.2.4;10.0.2.5, 10.0.2.6
(4) Modify the machine name and execute the command: "Vim/etc/sysconfig/network.
The modified machine names were: slave1.cents.com, slave2.cents.com, slave3.cents.com,
7. Configure SSH login without password
(1) Use Ssh-keygen to generate public and private key pairs.
Enter the command "ssh-keygen-t RSA" and press the ENTER key three times
[[email protected] ~]# ssh-keygen-t RSA
Generating pub/private RSA key pair.
Enter file in which to save the key (/ROOT/.SSH/ID_RSA):
Created directory '/root/.ssh '.
Enter passphrase (empty for no passphrase):
Enter same Passphrase again:
Your identification has been saved In/root/.ssh/id_rsa.
Your public key has been saved in/root/.ssh/id_rsa.pub.
The key fingerprint is:
a6:13:5a:7b:54:eb:77:58:bd:56:ef:d0:64:90:66:d4 [email protected]
The key ' s Randomart image is:
+--[RSA 2048]----+
| .. |
| . . e|
| . = |
| . . o O |
| o S. . =|
| O *. o ++|
| . + . . o ooo|
| O. .. o |
| .|
+-----------------+
Generate a private key id_rsa and public key id_rsa.pub two files. The Ssh-keygen is used to generate the RSA type key and to manage the key, and the parameter "-T" specifies the type of SSH key to be created as RSA.
(2) using Ssh-copy-id to copy the public key to the remote machine
Ssh-copy-id-i/root/.ssh/id_rsa.pub master//Enter yes,123456 in turn (root user's password)
Ssh-copy-id-i/root/.ssh/id_rsa.pub slave1
Ssh-copy-id-i/root/.ssh/id_rsa.pub Slave2
Ssh-copy-id-i/root/.ssh/id_rsa.pub Slave3
(3) Verify that the time is synchronized
In turn, enter
SSH slave1
SSH slave2
SSH Slave3
8. Configure the time synchronization service
(1) Install the NTP service. At each node:
Yum-y Install NTP
(2) Set the assumption that the master node is the NTP Service Master node, then it is configured as follows.
Use the command "vim/etc/ntp.conf" to open the/etc/ntp.conf file, comment out the line beginning with the server, and add:
Restrict 10.0.2.0 mask 255.255.255.0 nomodify notrap
Server 127.127.1.0
Fudge 127.127.1.0 Stratum 10
(3) Configure NTP in Slave1, Slave2, Slave3, and modify the/etc/ntp.conf file as well, commenting out the line beginning with the server and adding:
Server Master
(4) Execute the command "service iptables stop & Chkconfig iptables off" to permanently shut down the firewall, both the primary node and the slave node are shutting down.
(5) Start the NTP service.
① execute command on master node "service ntpd start & Chkconfig ntpd on"
② command "Ntpdate master" on Slave1, Slave2, slave3 to synchronize time
③ perform "service ntpd start & Chkconfig ntpd on" in Slave1, Slave2, slave3 to start and permanently start the NTP service.
9. Add java_home and Hadoop paths in/etc/profile
Export hadoop_home=/usr/local/hadoop-2.6.4
Export path= $HADOOP _home/bin: $PATH:/usr/java/jdk1.7.0_80/bin
Source/etc/profile Making changes effective
Format Namenode
Enter directory (usr/local)
Cd/opt/hadoop-2.6.4/bin
Perform formatting
./hdfs Namenode-format
One . start a cluster
Go to Catalog
Cd/usr/local/hadoop-2.6.4/sbin
Execute start:./
./start-dfs.sh
./start-yarn.sh
./mr-jobhistory-daemon.sh Start Historyserver
Use JPS to view processes
[Email protected] sbin]# JPS
3672 NodeManager
3301 DataNode
3038 NameNode
4000 Jobhistoryserver
4058 Jps
3589 ResourceManager
3408 Secondarynamenode
Turn off the firewall (performed on all nodes):
Service Iptables Stop
Chkconfig iptables off
Browser view:
http://master:50070
http://master:8088
Instruction manual 01: Installing Hadoop