Each node performs the following operations (or the SCP to the other node after the operation is completed on one node):
1. unzip the spark installer to the program directory/bigdata/soft/spark-1.4.1, and contract this directory to $spark_home
tar–zxvf spark-1.4-bin-hadoop2.6.tar.gz
2. Configure Spark
### Add the following content:
Export java_home=/bigdata/soft/jdk1.7.0_79
Export scala_home=/bigdata/soft/scala-2.10.5
Export Hadoop_conf_dir=/bigdata/soft/hadoop-2.6.0/etc/hadoop
Export spark_master_ip=cloud-001
#export spark_master_port=7077
Export SPARK_WORKER_MEMORY=1G
Export Spark_worker_cores=1
Export Spark_worker_instances=1
Export Spark_classpath= $SPARK _classpath:/bigdata/soft/spark-1.4.1/lib/mysql-connector-java-5.1.31.jar
## Set the slave node based on the cluster node
cloud-002
cloud-003
## Create a new spark log directory on HDFs first
$Hadoop _home/bin/hadoop Fs–mkdir/applogs
$Hadoop _home/bin/hadoop Fs–mkdir/applogs/spark
## Copy a spark's configuration file
CP Spark-defaults.conf.template spark-defaults.conf
## Remove two of these lines
Spark.master spark://cloud-001:7077
spark.eventLog.enabled true
Spark.eventLog.dir Hdfs://cloud-001:8020/applogs/spark
### The content is basically consistent with hive configuration, as described below:
<?xml version= "1.0"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive_1_2_0?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.PersistenceManagerFactoryClass</name>
<value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value>
</property>
<property>
<name>javax.jdo.option.DetachAllOnCommit</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.NonTransactionalRead</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>0p;/9ol.</value>
</property>
<property>
<name>javax.jdo.option.Multithreaded</name>
<value>true</value>
</property>
<property>
<name>datanucleus.connectionPoolingType</name>
<value>BoneCP</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://cloud-001:8020</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>cloud-001</value>
</property>
</configuration>
such as CP $HIVE _home/lib/mysql-connector-java-5.1.31.jar $SPARK _home/lib
3. Standlone mode start Cluster
Start Master and worker:
$SPARK _home/sbin/start-all.sh
Start the Hive service for Spark
$SPARK _home/sbin/start-thriftserver.sh--master spark://cloud-001:7077--driver-memory 1g--executor-memory 1g-- Total-executor-cores 2
4. Test
Test Spark-shell
$SPARK _home/bin/spark-shell--master spark://cloud-001:7077--driver-memory 1g--executor-memory 1g-- Total-executor-cores 2
Test Spark-sql
$SPARK _home/bin/spark-sql--master spark://cloud-001:7077--driver-memory 1g--executor-memory 1g-- Total-executor-cores 2
Or
$SPARK _home/bin/beeline-u jdbc:hive2://cloud-001:10000-n Hadoop
Spark 1.4.1 Installation Configuration