Instruction manual 01: Installing Hadoop

Source: Internet
Author: User
Tags ssh secure file transfer

instruction manual 01: Installing Hadoop  

Part 1: Installing and configuring virtual machines

1. install Linux.

(1) Open Vmvirtualbox

(2) Control--New virtual machine, enter virtual machine name "marst+ number" type: Linux, version: Other Linux (64-bit)

Centos-6.8-x86_64-bin-dvd1

(3) Memory recommendations 1024M, create virtual Disk VHD virtual hard disk, dynamic allocation, HDD 20G.

(4) Right-click the virtual machine master you created, set: Network, NIC 1: Bridge network card. Storage: CD, select the CENTOS-6.8-X86_64-BIN-DVD1 installation package. (Unzip first: d:\ Big Data software \centos-6.8-x86_64-bin-dvd1.rar)

(5) Choose Install or upgrade an existing system (install or upgrade existing systems)

(6) Select the installation language, Simplified Chinese, keyboard u.s.english, storage device->basic strage Devices, (Basic step installation), select "Yes Discard anyd data" (Yes, delete all detected hard drive data ".

(7) Set host name: "Mater01.centos.com", click Next, select Time zone, set Administrator password: Hadoop.

(8) Select "Use all Space" (using all spaces).

(9) Choose to install the CentOS component, select "Desktop" here

(10) Install successfully, reboot, login, enter username "root" and password "Hadoop".

2 . Set IP

IP is assigned to the same network segment as the host, and the system automatically acquires the IP address via the ipconfig command. and ping the virtual machine address through the host, whether it can ping.

You can also manually modify the IP address as follows:

(1) Modify the configuration file "/etc/sysconfig/network-scripts/ifcfg-eth0".

(2) Executive order "Vi/etc/sysconfig/network-scripts/ifcfg-eth0

(3) Modify Onboot=yes (whether to activate the network card at startup), bootproto=static (static IP), add IP address, subnet mask, gateway.

(4) Restart Service (Execute Command service network restart ")

3. Remote Connection Virtual machine (window installation SSH tool)

(1) Ping the virtual machine on the host to see if it is ok to ping it. (Note: Linux and Windows host need to shut down the firewall, Linux Shutdown Firewall command: Service iptabes stop)

(2) Install SSH Secure on the host. After installing the desktop two icons, one is the Remote connection command operation tool, and the other is the file Transfer tool.

(3) Click Quick Connect, enter the Linux IP address, user name: root, Port number: 22, Password: Hadoop can telnet.  

4 . Virtual machine online installation software

Tip: The Yum command is a RPM-based package manager in Fedora and Redhat as well as in SuSE.

1. RPM is a package name for Linux, ending with. rpm

2. Yum is a package manager developed by Duke University to improve the installation of RPM packages.

3, yum installation method can completely solve the problem of the RPM installation of relevance

4. Yum provides commands to find, install, and delete one or even all packages, and the commands are concise and well-written.

Yum commands use reference articles Http://man.linuxde.net/yum

Yum and Apt-get The difference: https://www.cnblogs.com/wxishang1991/p/5322489.html

To configure the local Yum source main steps:

(1) Execute the command "CD/ETC/YUM.REPOS.D".

(2) To view the files in the YUM.REPOS.D directory:

Centos-base.repo, is the network source, Centos-medi.pro is the local source.

Configuring a local source requires that a Yum source other than the Yum local source be disabled.

Execute command "MV Centos-base.repo Centos-base.rep.bak"

(3) Execute command "VI Centos-media.repo". Change the value of BaseURL to file:///media/(disc mount point position), change the value of Gpgcheck to 0 (indicating that RPM packets downloaded from this source are not checked), and enable to 1 (which means that the source is enabled). ,

(4) Mounting

Executive Mount/dev/dvd/media

If the test mount succeeds, check to see if the virtual machine CD is already enabled.

(5) Update cloud source

Yum Cleal All

(6) Install the software from the CD installation package using Yum. For example, install VIM, zip, Openssh-server, openssh-clent.

Yum install–y vim zip openssh-server openssh-client

5. Test Implementation

Use the Vim editor to write the file a.txt in the/OPT directory of the virtual machine master.

(1) Use the SSH tool to open the master session.

(2) Enter/opt directory, command for cd/opt

(3) using Vim to create an empty file, the command is: Vim a.txt

(4) Press "A" "I", or "O" key, enter the editing state, write a paragraph in the A.txt file: "Welcome hadoop!"

(5) Press "ESC" key, exit the edit status, enter ": Wq", press ENTER to save exit.

Another way

Use the SSH File Transfer tool to connect master, write the a.txt on the window machine and upload it to the Mater/opt folder.

LINUX basic Commands article: Http://www.cnblogs.com/yjd_hycf_space/p/7730690.html

Vim Use article: 54314053

Part 2: Installing Java

1 . Installing Java under Windows

(1) Double-click JDK file Installation

(2) Change the JDK installation directory;

(3) Change the JRE installation directory;

(4) Configuring environment variables

System variables → NEW: Java_home variable value: C:\Program files\java\jdk1.8.0_121

System Variable →path→ edit: At the end of the variable value input:%java_home%\bin;%java_home%\jre\bin;

System variables → NEW: CLASSPATH variable value:.; %java_home%\lib;%java_home%\lib\tools.jar

2 . Installing Java under Linux

(1) Upload the JDK installation package to the/OPT directory

(2) Enter the/OPT directory and execute the command "RPM–IVH jdk-7u80-linux-x64.rpm" to install the JDK

3 . Configure SSH login without password

(1) Use Ssh-keygen to generate public and private key pairs.

Enter the command "ssh-keygen-t RSA" and press the ENTER key three times

3. Test Implementation

To view the JDK version under Windows, follow these steps.

(1) Open the Windwos run window.

(2) Execute the command "Java–version".

The following are the steps to view the JDK version under Linux systems.

(1) Open a terminal session.

(2) Execute the command "Java–version".

Part 3 Hadoop Installation and Configuration

Based on ideas:

1. Create a new virtual machine master, configure the fixed IP (can set two network cards, one card bridge, another network card NAT mode), shut down the firewall, install the necessary software;

2. Clone Master to Salve1, Slave2, Slave3;

3. Change the IP of the slave1~slave3 to fixed IP;

1. upload the Hadoop installation package.

Uploading hadoop-2.6.4.tar.gz files to the/opt directory via SSH Secure file Transfer client

2. unzip the hadoop-2.6.0.tar.gz. zip file

TAR-ZXF hadoop-2.6.0.tar.gz-c/usr/local

After decompression, see the/usr/local/hadoop-2.6.0 folder

3. Configure Hadoop

Enter directory:

cd/usr/local/hadoop-2.6.4/etc/hadoop/

Modify the following file in turn:

4.1 Core-site.xml

<configuration>

<!--Configure Namenode address for HDFs-->

<property>

<name>fs.defaultFS</name>

<value>hdfs://master:8020</value>

</property>

<!--Configure the storage directory where the Hadoop runtime produces data, not temporary data--

<property>

<name>hadoop.tmp.dir</name>

<value>/var/log/hadoop/tmp</value>

</property>

</configuration>

4.2 hadoop-env.sh

<!--modify Java_hom as follows:-->

Export java_home=/usr/java/jdk1.7.0_80

4.3 hdfs-site.xml

<configuration>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:///data/hadoop/hdfs/name</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:///data/hadoop/hdfs/data</value>

</property>

<!--Specify the Web Access port for HDFs--

<property>

<name>dfs.namenode.secondary.http-address</name>

<value>master:50090</value>

</property>

<!--Specify the amount of replica data that HDFs stores data--

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

</configuration>

4.4 Mapred-site.xml

<configuration>

<!--Specify the MapReduce programming model to run on yarn--

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<!--jobhistory Properties--

<property>

<name>mapreduce.jobhistory.address</name>

<value>master:10020</value>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>master:19888</value>

</property>

</configuration>

4.5 Yarn-site.xml

<configuration>

<!--designated Yarn's eldest (ResourceManager's address)--

<property>

<name>yarn.resourcemanager.hostname</name>

<value>master</value>

</property>

<!-the exposed address of the client to which the client submits the application to RM-

<!-address to Applicationmaster, applicationmaster the address to the RM to request resources to release resources--

<property>

<name>yarn.resourcemanager.address</name>

<value>${yarn.resourcemanager.hostname}:8032</value>

</property>

<!-exposure to NodeManager--->

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>${yarn.resourcemanager.hostname}:8030</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>${yarn.resourcemanager.hostname}:8088</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.https.address</name>

<value>${yarn.resourcemanager.hostname}:8090</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>${yarn.resourcemanager.hostname}:8031</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>${yarn.resourcemanager.hostname}:8033</value>

</property>

<property>

<name>yarn.nodemanager.local-dirs</name>

<value>/data/hadoop/yarn/local</value>

</property>

<property>

<name>yarn.log-aggregation-enable</name>

<value>true</value>

</property>

<property>

<name>yarn.nodemanager.remote-app-log-dir</name>

<value>/data/tmp/logs</value>

</property>

<property>

<name>yarn.log.server.url</name>

<value>http://master:19888/jobhistory/logs/</value>

<description>url for Job history server</description>

</property>

<property>

<name>yarn.nodemanager.vmem-check-enabled</name>

<value>false</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<property>

<name>yarn.nodemanager.resource.memory-mb</name>

<value>2048</value>

</property>

<property>

<name>yarn.scheduler.minimum-allocation-mb</name>

<value>512</value>

</property>

<property>

<name>yarn.scheduler.maximum-allocation-mb</name>

<value>4096</value>

</property>

<property>

<name>mapreduce.map.memory.mb</name>

<value>2048</value>

</property>

<property>

<name>mapreduce.reduce.memory.mb</name>

<value>2048</value>

</property>

<--nodemanager total number of virtual CPUs available--

<property>

<name>yarn.nodemanager.resource.cpu-vcores</name>

<value>1</value>

</property>

4.6 yarn-env.sh

Export java_home=/usr/java/jdk1.7.0_80

If you have a Linux client slave node,

Copy the Hadoop installation file to the cluster slave node or run the 6th step clone directly

Scp-r/usr/local/hadoop-2.6.4 slave01:/usr/local

Scp-r/usr/local/hadoop-2.6.4 slave02:/usr/local

Scp-r/usr/local/hadoop-2.6.4 slave03:/usr/local

4.7 Modify Slaves File

Slave1

Slave2

Slave3

4.8 Setting IP Mappings

Edit/etc/hosts

10.0.2.4 Master master.centos.com

10.0.2.5 slave1 slave1.centos.com

10.0.2.6 Slave2 slave2.centos.com

10.0.2.7 Slave3 slave3.centos.com

5 . Building a cluster network environment

(1) Setting NAT network in global setting, cluster segment set to 10.0.2.0

(2) Add the master host to network card 2 and set it to NAT mode.

Network card 2 IP address set to: 10.0.2.4

6 . Cloning a virtual machine

Clone Master to Slave1, Slave2, slave3, modify Slave1~slave3 IP, change to fixed IP;

Turn on the virtual machine slave1.

Open settings, network, refresh the NIC address.

(1) Execute the command "rm–rf/etc/udev/rules.d/70-persistent-net.rules" to delete the file.

(2) Execute command ifconfig–a, view hwaddr.

(3) Modify the/etc/sysconfig/network-scripts/ifcfg-eth0 file, modify the IP address and network card address

Slave1, Slave2, slave3 network card IP addresses are set to: 10.0.2.4;10.0.2.5, 10.0.2.6

(4) Modify the machine name and execute the command: "Vim/etc/sysconfig/network.

The modified machine names were: slave1.cents.com, slave2.cents.com, slave3.cents.com,

7. Configure SSH login without password

(1) Use Ssh-keygen to generate public and private key pairs.

Enter the command "ssh-keygen-t RSA" and press the ENTER key three times

[[email protected] ~]# ssh-keygen-t RSA

Generating pub/private RSA key pair.

Enter file in which to save the key (/ROOT/.SSH/ID_RSA):

Created directory '/root/.ssh '.

Enter passphrase (empty for no passphrase):

Enter same Passphrase again:

Your identification has been saved In/root/.ssh/id_rsa.

Your public key has been saved in/root/.ssh/id_rsa.pub.

The key fingerprint is:

a6:13:5a:7b:54:eb:77:58:bd:56:ef:d0:64:90:66:d4 [email protected]

The key ' s Randomart image is:

+--[RSA 2048]----+

| .. |

| . . e|

|   . =  |

| . . o O |

|   o S. . =|

|   O *. o ++|

| . + . . o ooo|

| O. .. o |

| .|

+-----------------+

Generate a private key id_rsa and public key id_rsa.pub two files. The Ssh-keygen is used to generate the RSA type key and to manage the key, and the parameter "-T" specifies the type of SSH key to be created as RSA.

(2) using Ssh-copy-id to copy the public key to the remote machine

Ssh-copy-id-i/root/.ssh/id_rsa.pub master//Enter yes,123456 in turn (root user's password)

Ssh-copy-id-i/root/.ssh/id_rsa.pub slave1

Ssh-copy-id-i/root/.ssh/id_rsa.pub Slave2

Ssh-copy-id-i/root/.ssh/id_rsa.pub Slave3

(3) Verify that the time is synchronized

In turn, enter

SSH slave1

SSH slave2

SSH Slave3

8. Configure the time synchronization service

(1) Install the NTP service. At each node:

Yum-y Install NTP

(2) Set the assumption that the master node is the NTP Service Master node, then it is configured as follows.

Use the command "vim/etc/ntp.conf" to open the/etc/ntp.conf file, comment out the line beginning with the server, and add:

Restrict 10.0.2.0 mask 255.255.255.0 nomodify notrap

Server 127.127.1.0

Fudge 127.127.1.0 Stratum 10

(3) Configure NTP in Slave1, Slave2, Slave3, and modify the/etc/ntp.conf file as well, commenting out the line beginning with the server and adding:

Server Master

(4) Execute the command "service iptables stop & Chkconfig iptables off" to permanently shut down the firewall, both the primary node and the slave node are shutting down.

(5) Start the NTP service.

① execute command on master node "service ntpd start & Chkconfig ntpd on"

② command "Ntpdate master" on Slave1, Slave2, slave3 to synchronize time

③ perform "service ntpd start & Chkconfig ntpd on" in Slave1, Slave2, slave3 to start and permanently start the NTP service.

9. Add java_home and Hadoop paths in/etc/profile

Export hadoop_home=/usr/local/hadoop-2.6.4

Export path= $HADOOP _home/bin: $PATH:/usr/java/jdk1.7.0_80/bin

Source/etc/profile Making changes effective

Format Namenode

Enter directory (usr/local)

Cd/opt/hadoop-2.6.4/bin

Perform formatting

./hdfs Namenode-format

One . start a cluster

Go to Catalog

Cd/usr/local/hadoop-2.6.4/sbin

Execute start:./

./start-dfs.sh

./start-yarn.sh

./mr-jobhistory-daemon.sh Start Historyserver

Use JPS to view processes

[Email protected] sbin]# JPS

3672 NodeManager

3301 DataNode

3038 NameNode

4000 Jobhistoryserver

4058 Jps

3589 ResourceManager

3408 Secondarynamenode

Turn off the firewall (performed on all nodes):

Service Iptables Stop

Chkconfig iptables off

Browser view:

http://master:50070

http://master:8088

Instruction manual 01: Installing Hadoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.