SPARK-2.2.0 cluster installation deployment and Hadoop cluster deployment

Source: Internet
Author: User
Tags ssh

Spark is primarily deployed in a production environment in a cluster where Linux systems are installed. Installing Spark on a Linux system requires pre-installing the dependencies required for JDK, Scala, and so on.

Because Spark is a computing framework, you need to have a persistence layer in the cluster that stores the data beforehand, such as HDFs, Hive, Cassandra, and so on, and then run the app from the startup script.


1. Installing the JDK Oracle JDK Download Address: http://www.oracle.com/technetwork/java/javase/downloads/index.html Configuring environment variables

Vim ~/.bash_profile
Add the following content
java_home=/opt/jdk1.8.0_65
classpath= $JAVA _home/lib/
path= $PATH: $HOME/bin: $JAVA _home/bin
Executing the source ~/.bash_profile to make the environment variable effective
2. Install Scale

Download Scala Address: http://www.scala-lang.org/download/


Configure environment variables, add the following

Export scala_home=/data/spark/scala-2.12.3/
export path= $PATH: $SCALA _home/bin
Executing the source ~/.bash_profile to make the environment variable effective
Perform scala-version and the normal output indicates success.


3. Installing the Hadoop server

Host Name IP Address Jdk User
Master 10.116.33.109 1.8.0_65 Root
Slave1 10.27.185.72 1.8.0_65 Root
Slave2 10.25.203.67 1.8.0_65 Root

Download address for Hadoop: http://hadoop.apache.org/

Configure the Hosts file (same operation per node) vim/etc/hosts
10.116.33.109 Master
10.27.185.72 Slave
110.25.203.67 Slave2
	
SSH No password authentication configurationReference: Linux SSH password-free login on the master node must verify that you can log on without a password, or you will get an error. SSH master ssh slave1 ssh slave2 Hadoop Cluster SetupConfiguring environment variables after extracting hadoop-2.7.2.tar.gz files vim ~/.bash_profile
Export hadoop_home=/data/spark/hadoop-2.7.2
 export path= $PATH: $HADOOP _home/bin
 export path= $PATH: $HADOOP _ Home/sbin
 export hadoop_mapred_home= $HADOOP _home
 export hadoop_common_home= $HADOOP _home
 export HADOOP _hdfs_home= $HADOOP _home
 export yarn_home= $HADOOP _home export
 hadoop_root_logger=info,console
 Export hadoop_common_lib_native_dir= $HADOOP _home/lib/native
 export hadoop_opts= "-djava.library.path= $HADOOP _home/ Lib
Execute the source ~/.bash_profile to make the environment variable effective This environment variable operates identically on all nodes.

Modify $hadoop_home/etc/hadoop/hadoop-env.sh
Export java_home=/opt/jdk1.8.0_65/
Even if the environment variable is already configured, it must be modified here, otherwise it will be reported as "Java_home is not set and could not being found."
Modify $hadoop_home/etc/hadoop/slaves
Slave1
Slave2

Modify $hadoop_home/etc/hadoop/core-site.xml
<configuration>
      <property>
          <name>fs.defaultFS</name>
          <value>hdfs:// master:9000</value>
      </property>
      <property>
         <name>io.file.buffer.size</ name>
         <value>131072</value>
     </property>
     <property>
          <name> hadoop.tmp.dir</name>
          <value>/data/spark/hadoop-2.7.2/tmp</value>
     </property >
 </configuration>

Modify $hadoop_home/etc/hadoop/hdfs-site.xml
<configuration>
    <property>
      <name>dfs.namenode.secondary.http-address</name>
      <value>Master:50090</value>
    </property>
    <property>
      <name> dfs.replication</name>
      <value>2</value>
    </property>
    <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/data/spark/hadoop-2.7.2/hdfs/name</ value>
    </property>
    <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/data/spark/hadoop-2.7.2/hdfs/data</value>
    </property>
</ Configuration>

Modify $hadoop_home/etc/hadoop/mapred-site.xml (cp mapred-site.xml.template Mapred-site.xml)
<configuration>
 <property>
    <name>mapreduce.framework.name</name>
    <value >yarn</value>
  </property>
  <property>
          <name>mapreduce.jobhistory.address </name>
          <value>Master:10020</value>
  </property>
  <property>
          < name>mapreduce.jobhistory.address</name>
          <value>Master:19888</value>
  </ Property>
</configuration>

Modify $hadoop_home/etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <v alue>mapreduce_shuffle</value> </property> <property> <name>yarn.resourceman
         ager.address</name> <value>Master:8032</value> </property> <property>
     <name>yarn.resourcemanager.scheduler.address</name> <value>Master:8030</value>
         </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>Master:8031</value> </property> <property> <name>yarn.reso urcemanager.admin.address</name> <value>Master:8033</value> </property> <pro Perty> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</valu E> </property> </configuration> 

Copy the Hadoop folder of the master node to Slave1 and Slave2.
Scp-r hadoop-2.7.2 slave1:/data/spark/
scp-r hadoop-2.7.2 slave2:/data/spark/

Start the cluster on the master node and format the Namenode before starting:

Hadoop Namenode-format

Start:

$HADOOP _home/sbin/start-all.sh Check, each node executes JPS

Namenode Display Datanode display Hadoop management interface HTTP://MASTER:8088/Server hostname has not been modified, but the Hosts file configuration node name, resulting in subsequent failure of various tasks, the main is unable to obtain the server IP address through the host name. Symptoms include: MapReduce ACCEPTED not running 4. Install SparkSpark's download Address: http://spark.apache.org/This example spark version SPARK-2.2.0-BIN-HADOOP2.7.TGZ configuration environment variable contents
Export spark_home=/data/spark/spark-2.2.0-bin-hadoop2.7
export path= $PATH: $SPARK _home/bin

Enter the $spark_home/conf directory and copy the CP Spark-env.sh.template spark-env.sh; CP slaves.template Slaves Configuration spark-env.sh file, add the following
Export SCALA_HOME=/DATA/SPARK/SCALA-2.12.3/export 
java_home=/opt/jdk1.8.0_65  
export spark_master_ip= 10.116.33.109  
export spark_worker_memory=128m  
export Hadoop_conf_dir=/data/spark/hadoop-2.7.2/etc/hadoop
Export spark_dist_classpath=$ (/data/spark/hadoop-2.7.2/bin/hadoop CLASSPATH)

export spark_local_ip= 10.116.33.109
Export spark_master_host=10.116.33.109
Spark_master_host must be configured, otherwise the slave node will error "caused by:java.io.IOException:Failed to connect to localhost/127.0.0.1:7077"

To modify $spark_home/conf/slaves, add the following:
Master
Slave1
Slave2

Copy the configured spark files to the Slave1 and SLAVE2 nodes.
SCP $SPARK _home root@slave1: $SPARK _home
SCP $SPARK _home root@slave2: $SPARK _home

Start the cluster on the master node
$SPARK _home/sbin/start_all.sh

To see if the cluster started successfully:
JPS Master node new master process

Slave node new worker process





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.