Install and configure spark under CentOS 7.0

Source: Internet
Author: User
Tags readfile

Installation Environment: Virtual machine: Vmware®workstation 8.0.1 (network bridging) Os:centos Version 7JDK: Jdk-7u79-linux-x64.tarscala version: Scala-2.11.7spark version: spark-1.4.0-bin-hadoop2.4 User: Hadoop created when installing CentOS, belongs to the Administrators group
First step: Configure SSH

Use the Hadoop login system to run at the terminal:

Yum Install Openssh-server

If prompted:

This is because the Yum service is occupied and requires a forced unlock:

Rm-rf/var/run/yum.pid

The terminal will be networked to download the installation package and install it on its own. After the installation is complete, enter the following command to verify that Port 22 is open:

Netstat-nat

Make sure that Port 22 is turned on, and then check that SSH is installed correctly, enter

SSH localhost

Enter the current user name and password press ENTER to confirm that the installation is successful, and SSH login requires a password.

Here's a little bit of emphasis:

SSH configuration is actually configured without password access, using identity credentials instead of password authentication, access only need to provide an identity credential, do not need to enter a password. That is to say, each user has a unique credential, to whom to access, the credentials to whom (that is, copied to someone else's directory)

Next, enter the command in the terminal to enter the Hadoop account directory:

Cd/home/hadoop

Re-enter:

SSH-KEYGEN-T RSA

and then return.

Then we go into the. ssh folder again and append the id_rsa.pub to the Authorized_keys file with the following command:

CD. SSH

By the way, see what files are in this directory. Id_rsa is the private key belonging to the account, Id_rsa.pub is the public key of the account, which is to be handed out.

Need to say more here, if a master server has more than one account to configure password-free access to do?

The main server below should have a file called Authorized_keys, who need to configure password-free access, you can append your public key in this file.

CP Id_rsa.pub Authorized_keys

Test password-Free login again

SSH localhost

It is best to open several more terminals, test SSH logins, or restart the service test:

Service sshd Restart Restart services

Service sshd Start Services

Service sshd Stop Services

NETSTAT-ANTP | grep sshd to see if Port 22 is started

Any time you encounter a permission refusal to add sudo to the command, the following is rejected:

Chkconfig sshd on set boot up

Chkconfig sshd off disables SSH boot

Step Two: Configure Java, Scala, SAPRK

We will copy all the required software to the/home/data directory at once. Need to log off using root login, other user permissions are insufficient.

The root user can work directly under the UI and unzip well.

Then switch back to the Hadoop user (* important)

To configure the Java path:

sudo gedit/etc/profile

In the last line enter add:

#JAVA VARIABLES START

Export java_home=/home/data/jdk1.7.0_79

Export path= $PATH: $JAVA _home/bin

#JAVA VARIABLES END

Then refresh the system configuration to check for Java installation:

Source/etc/profile

Java-version

See this note that the Java environment was installed successfully.

Next, configure the Scala environment:

sudo gedit/etc/profile

In the last line add:

#SCALA VARIABLES START

Export scala_home=/home/data/scala-2.11.7

Export path= $PATH: $SCALA _home/bin

#SCALA VARIABLES END

Then refresh the system configuration to check for Java installation:

Source/etc/profile

Scala-version

See description success:

Next, configure the spark environment:

sudo gedit/etc/profile

In the last line add:

#SPARK VARIABLES START

Export spark_home=/home/data/spark-1.4.0-bin-hadoop2.4

Export path= $PATH: $SPARK _home/bin

#SPARK VARIABLES END

Profile files that are configured to complete should be as follows:

Then refresh the system configuration:

Source/etc/profile

Enter the Conf directory for Spark:

Backup files:

sudo mv Spark-env.sh.template spark-env.sh

Then edit the newly created file:

sudo gedit spark-env.sh

At the bottom of the file, add:

Export scala_home=/home/data/scala-2.11.7

Export java_home=/home/data/jdk1.7.0_79

Export Spark_master_ip=localhost

Export spark_worker_memory=1024m

Export Master=spark://localhost 7070

Finally, you should edit the machine name in the slaves directory, because my machine name is localhost, so I don't edit it.

Step three: Run Spark

Start the spark cluster.

Enter the Sbin directory:

Then access in the browser: localhost:8080

You can see information for a worker node from the page.

We enter the bin directory of Spark, using the "Spark-shell" console:

The following interface should appear without errors:

Under test:

Go to the Spark-shell Web console page by accessing "http://localhost:4040":

Input:

Hello World

Hello Hadoop

Pls say hello

Then enter the read program on the Scala command line:

Val readFile = Sc.textfile ("File:///home/file/test1.txt")

Re-execution:

Readfile.collect

To view the Spark-shell Web console:

Install and configure spark under CentOS 7.0

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.