Spark Cluster Setup
1 Spark Compilation
1.1 Download Source code
git clone git://github.com/apache/spark.git-b branch-1.6
1.2 Modifying the pom file
Add cdh5.0.2 related profiles as follows:
<profile>
<id>cdh5.0.2</id>
<properties>
< Hadoop.version>2.3.0-cdh5.0.2
1.3 Compiling
Build/mvn-pyarn-pcdh5.0.2-phive-phive-thriftserver-pnative-dskiptests Package
The above command, due to foreign maven.twttr.com by the wall, added hosts,199.16.156.89 maven.twttr.com, executed again. 2 spark cluster build [Spark on YARN] 2.1 Modifying a configuration file
--spark-env.sh--
export spark_ssh_opts= "-p9413"
export hadoop_conf_dir=/opt/hadoop/hadoop-cluster/ Modules/hadoop-2.3.0-cdh5.0.2/etc/hadoop
export Spark_executor_instances=1
export spark_executor_cores=4
Export spark_executor_memory=1g
export ld_library_path= $LD _library_path: $HADOOP _home/lib/native/
-- slaves--
192.168.3.211 hadoop-dev-211
192.168.3.212 hadoop-dev-212
192.168.3.213 hadoop-dev-213
192.168.3.214 hadoop-dev-214
2.2 Cluster planning, starting up clusters
--Cluster planning--
hadoop-dev-211 Master, Woker
hadoop-dev-212 woker
hadoop-dev-213 woker
hadoop-dev-214 Woker
--Start master--
sbin/start-master.sh
--Start wokers--
sbin/start-slaves.sh
2.3 Viewing interface
3 set into hive
Add Hive-site.xml and Hive-log4j.properties to the Conf directory in Spark
4 Spark Example Demo
4.1 reading MySQL data to hive
# step 1, start Spark-shell bin/spark-shell--jars lib_managed/jars/hadoop-lzo-0.4.17.jar \--driver-class-path/opt/ Hadoop/hadoop-cluster/modules/apache-hive-1.2.1-bin/lib/mysql-connector-java-5.6-bin.jar # step 2, read MySQL data val JDBCDF = SqlContext.read.format ("jdbc"). Options (Map ("url", "jdbc:mysql://hadoop-dev-212:3306/hive", "dbtable"-
> "VERSION", "User", "Hive", "Password", "123456")). Load ();
# Step 3, turn to Hive table jdbcdf.saveastable ("test");