Spark Starter Combat Series--2.spark compilation and Deployment (top)-Basic Environment building

Last Update:2016-06-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Note "1, the series and the use of the installation package/test data can be in the" Portrait of the Big gift--spark Getting Started Combat series "to obtain; 2, spark compilation and deployment will be based on CentOS 64-bit operating system, mainly considering the actual application of the general use of 64-bit operating system, The content is divided into three parts: Basic Environment Building, Hadoop compiling and installing, and spark compiling installation, which is the foundation of the following experiment; 3, the article demonstrates the Hadoop, spark the compilation process, while the subordinate resources provide a compiled installation package, feel that the compile time can be directly using these compiled installation package for deployment. 1. Operating Environment Description 1.1 hard Software environment

L host operating system: Windows 64-bit, dual core 4 thread, clock 2.2g,10g memory

L Virtual Software: Vmware®workstation 9.0.0 build-812388

L Virtual Machine Operating system: CentOS6.5 64-bit, single core, 1G memory

L Virtual Machine Operating environment:

ØJDK:1.7.0_55 64-bit

Øhadoop:2.2.0 (requires compilation to 64-bit)

øscala:2.10.4

øspark:1.1.0 (requires compilation)

1.2 Cluster network environment

The cluster contains three nodes, the nodes can be password-free SSH access, the node IP address and host name distribution are as follows:

ordinal	IP address	machine name	type	number of cores/memory	user name	Contents
1	192.168.0.61	hadoop1	nn/dn/rm master/worker	1 nuclear/3g	hadoop	/app program path /app/scala- ... /app/hadoop /app/complied
2	192.168.0.62	Hadoop2	Dn/nm/worker	1 nuclear/2g	Hadoop
3	192.168.0.63	Hadoop3	Dn/nm/worker	1 nuclear/2g	Hadoop

1. All nodes are CentOS6.5 64bit system, firewall/selinux are disabled, a Hadoop user is created on all nodes, the user home directory is/home/hadoop, and the uploaded file is stored in the/home/hadoop/upload folder.

2. A directory/app is created on all nodes to hold the installer, and the owner is a Hadoop user and must have RWX permissions (the general practice is that the root user creates the/app directory under the root directory and modifies the directory owner to Hadoop using the Chown command). Otherwise, Hadoop users can use SSH to distribute files to other machines with insufficient permission prompts

1.3 Installation using Tools 1.3.1 Linux File Transfer tool

It is recommended to transfer files to the Linux system using SSH Secure file Transfer, which is the tool menu and shortcut, the middle part to the left of the local file directory, to the right of the remote file directory, can be dragged and other ways to achieve file download and upload, the bottom of the operation situation monitoring area, As shown in the following:

1.3.2 Linux command-line execution tool

The ssh Secure shell SSH Secure Shell provides remote command execution as shown in:

L securecrt securecrt is commonly used to remotely execute Linux command line tools, as shown in:

2, build the model machine environment

This installment cluster divides into three nodes, this section constructs the model machine environment constructs, constructs divides into installs the operating system, sets up the system environment and the configuration running environment three steps.

2.1 Installing the operating system

The first step is to insert the CentOS 6.5 installation media, using media to start the computer appears the following interface

Linstall or upgrade an existing system to install or upgrade existing systems

Basic graphics driver for Linstall system with basic video driver during installation

Lrescue installed system to enter the repair mode

Lboot from local drive exit installation from hard disk boot

Lmemory Test Memory detection

Second Step media detection select "Skip" and skip directly

The third step appears the boot interface, click "Next"

Fourth Step select the installation process language, select "Chinese"

Fifth Step keyboard Layout select "U.s.english"

Sixth step Select "Basic Storage devies" click "Next"

The seventh step asks whether to overwrite all data, select "Yes,discard any"

Eighth step hostname fill in the format "English name."

The Nineth step time zone can be clicked on the map, select "Shanghai" and cancel the system clock uses UTC selection

The tenth step is to set the root password

11th Step hard disk partition, be sure to follow the icon point selection

The 12th step asks whether to overwrite the hard drive, select "Write Changes to Disk"

The 13th step selects the system installation mode as "Desktop"

The 14th Step desktop environment is set up, click Install

15th Step installation Complete, restart

After the 16th step restart, the license information

17th Step Create user and set password (not set user and password here)

18th Step "Date and Time" check "Synchronize data and time over the network"

System will restart after Finsh

2.2 Setting up the system environment

This section of the server configuration needs to be configured locally on the server, after configuration, you need to restart the server to confirm that the configuration is in effect, especially the remote access server needs to set a fixed IP address.

2.2.1 Setting the machine name

Log in as root, use #vi/etc/sysconfig/network to open the configuration file, set the server's machine name according to the actual situation, the new machine name will take effect after reboot

2.2.2 Setting the IP address

1. Click System-->preferences-->network Connections as shown in:

2. Modify or rebuild the network connection, set the connection as manual and set the following network information:

IP Address: 192.168.0.61

Subnet Mask: 255.255.255.0

Gateway: 192.168.0.1

dns:221.12.1.227 (need to set DNS server based on location)

" Attention "

1, the gateway, DNS, etc. according to the actual situation of the network settings, and set the connection mode as "Available to all users", or the remote connection will not be able to connect to the server after the server restart;

2, if it is running on VM Ware virtual machine, the network uses bridge mode, the settings can be connected to the Internet to facilitate later Hadoop and spark compilation and other experiments.

3. On the command line, use the ifconfig command to view the settings IP address information, if the modification IP does not take effect, need to restart the machine to set up (if the machine needs to be remote access after Setup, it is recommended to restart the machine, to verify that the machine IP is effective):

2.2.3 Setting the host mapping file

1. Use root to edit the/etc/hosts mapping file, set the IP address and machine name Mapping, set the information as follows:

#vi/etc/hosts

L 192.168.0.61 HADOOP1

L 192.168.0.62 HADOOP2

L 192.168.0.63 HADOOP3

2. Restart the network settings using the following command

#/etc/init.d/network restart

or #service network restart

3. Verify that the settings are successful

2.2.4 Shutting down the firewall

Firewall and SELinux need to be turned off during Hadoop installation, or an exception will occur

1. Service iptables Status View the firewall state as shown below to indicate that Iptables is turned on

2. Close the iptables with the following command as the root user

#chkconfig iptables off

2.2.5 Close SELinux

1. Use the Getenforce command to see if it is turned off

2. Modify the/etc/selinux/config file

Change the selinux=enforcing to selinux=disabled and restart the machine after executing the command

#vi/etc/selinux/config

2.3 Configuring the operating Environment 2.3.1 update OpenSSL

There is a bug in the CentOS system's own OpenSSL, if you do not update OpenSSL during the Ambari deployment process, the node cannot be connected via SSH, and the following command is used to update:

#yum Update OpenSSL

2.3.2 Modifying an SSH configuration file

1. Open the Sshd_config configuration file with the following command as the root user

#vi/etc/ssh/sshd_config

Open three configurations as shown in:

Rsaauthentication Yes

Pubkeyauthentication Yes

Authorizedkeysfile. Ssh/authorized_keys

2. Restart the service after configuration

#service sshd Restart

2.3.3 adding Hadoop groups and users

Use the following command to increase the Hadoop group and the Hadoop User (password) to create a Hadoop component storage directory

#groupadd-G Hadoop

#useradd-u 2000-g hadoop Hadoop

#mkdir-P/app/hadoop

#chown-R Hadoop:hadoop/app/hadoop

#passwd Hadoop

Create a Hadoop user upload file directory, set the directory Group and folder for Hadoop

#mkdir/home/hadoop/upload

#chown-R Hadoop:hadoop/home/hadoop/upload

2.3.4JDK Installation and Configuration

1. Download the JDK1.7 64bit installation package

Open the JDK1.7 64bit installation package download link for:

Http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

After opening the interface, select Accept License agreement and then download jdk-7u55-linux-x64.tar.gz as shown in:

2. Give Hadoop users the/usr/lib/java directory to read and write permissions, using the following command:

$sudo Chmod-r 777/usr/lib/java

This step is likely to encounter problem 2.2, can refer to the solution to handle

3. Upload the downloaded installation package to the/usr/lib/java directory using the SSH tool introduced by 1.1.3.1 and unzip it using the following command:

$tar-ZXVF jdk-7u55-linux-x64.tar.gz

After extracting the directory as shown:

4. Configure the/etc/profile file with the root user and take effect on the configuration

Export java_home=/usr/lib/java/jdk1.7.0_55

Export path= $JAVA _home/bin: $PATH

Export classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar

5. Re-login and verify

$logout

$java-version

2.3.5 Scala Installation and configuration

1. Download the Scala installation package

Scala2.10.4 installation package Download link is:http://www.scala-lang.org/download/2.10.4.html, Because the idea is somewhat unusual under Scala2.11.4, it is recommended to install the Scala2.10.4 version here

2. Upload the Scala installation file

Upload the downloaded Scala installation package to the/home/hadoop/upload directory using the SSH Secure File transfer tool (as described in 1.3.1), as shown in:

3. Unzip

To the upload directory, unzip it with the following command:

$CD/home/hadoop/upload

$tar-ZXF scala-2.10.4.tgz

Migrate to the/app directory:

$sudo MV scala-2.10.4/app/

4. Configure the/etc/profile file with the root user and take effect on the configuration

Export scala_home=/app/scala-2.10.4

Export path= $PATH: ${scala_home}/bin

5. Re-login and verify

$exit

$scala-version

3. Configure the cluster environment

Copy the template machine to generate the other two nodes, follow the planning settings and their naming and IP address, and finally set up SSH without password login.

3.1 Copying the Model machine

Two copies of the model machine, HADOOP2 and HADOOP3 nodes respectively

3.2 Setting the machine name and IP address

Log in as root, use vi/etc/sysconfig/network to open the configuration file, modify the machine name according to the 1.2 plan, modify the machine name and restart the machine, the new machine name will take effect after reboot

Modify machine IP Address According to 2.2.2 configuration method

3.3 Configuring SSH login without password

1. Use the Hadoop user login to generate the private key and public key using the following command in three nodes;

$SSH-keygen-t RSA

2. Enter the/home/hadoop/.ssh directory to name the public key Authorized_keys_hadoop1, AUTHORIZED_KEYS_HADOOP2, and AUTHORIZED_KEYS_HADOOP3, respectively, in three nodes, Use the following command:

$CD/home/hadoop/.ssh

$CP id_rsa.pub AUTHORIZED_KEYS_HADOOP1

3. Transfer the public key of two slave nodes (HADOOP2, HADOOP3) to the/home/hadoop/.ssh folder of the HADOOP1 node using the SCP command;

$SCP authorized_keys_hadoop2 [Email protected]:/home/hadoop/.ssh

$SCP authorized_keys_hadoop3 [Email protected]:/home/hadoop/.ssh

4. Save the three node's public key information to the Authorized_key file

Use the $cat authorized_keys_hadoop1 >> authorized_keys command

5. Distribute the file to the other two slave nodes

Use $scp authorized_keys [email protected]:/home/hadoop/.ssh to distribute the password file

6. Use the following settings in three machines Authorized_keys read and Write permissions

$chmod Authorized_keys

7. Test whether SSH password-free login is effective

3.4 Setting the machine startup mode (optional)

After setting up the cluster environment, you can let the cluster run in command-line mode to reduce the resources consumed by the cluster. To use the root user #vi/etc/inittab, change Id:5:initdefault: to Id:3:initdefault:

The Linux system runs at a specified run level at any time, and the different run-level programs and services are different, and the work to be done differs from what is to be achieved. CentOS sets the run level shown in the following table, and the system can switch between these runlevel to accomplish different tasks. Run level description

L 0 All processes will be terminated, the machine will stop orderly, the system is at this runlevel when shutting down

L 1 Single user mode. For system maintenance, only a handful of processes are running, and all services are not started

L 2 Multi-user mode. As with Run Level 3, only the network file System (NFS) service is not started

L 3 Multi-user mode. Allow multiple users to log on to the system, which is the system's default boot level

L 4 Leave user-defined RunLevel

L 5 Multi-user mode, and run X-window after system boot, give a graphical login window

L 6 All processes are terminated, system restarts

4. Problem solving 4.1 Installing the CentOS64-bit virtual machine This host supports the Intel Vt-x, but the Intel Vt-x is disabled

An error occurred during installation due to the use of a 64-bit virtual machine in the hadoop2.x 64bit compilation installation:

Press F1 to enter the BIOS setup utility use the ARROW keys Security panel to find virtualization press ENTER to go inside the Intel virtualizationtechnology change to Enabled press the F10 key to save and exit Select Yes to press ENTER to shut down completely (power off) Wait a few seconds to restart the computer this Intel virtualization technology is open for success

4.2 * * * isn't in the sudoers file workaround

When using a Hadoop user to assign a folder, use the chmod command to appear with the "Hadoop is isn't in the sudoers file. This incident would be reported "error, as follows:

1. Use the SU command to enter the root user

2. Add Write permission to the file, the Operation command is: chmod u+w/etc/sudoers

3. Edit the/etc/sudoers file, use the command "Vi/etc/sudoers" to enter the edit mode, find: "Root all= (All) all" to add "Hadoop all= (All)" below, and then save the exit.

Spark Starter Combat Series--2.spark compilation and Deployment (top)-Basic Environment building

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark Starter Combat Series--2.spark compilation and Deployment (top)-Basic Environment building

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support