If you want to use the Spark SQL CLI to read the table in hive directly for analysis, you just need to set up a few simple answers.
1. Copy Hive-site.xml to spark conf
cd/usr/local/hive/conf/hive-site.xml/usr/local/spark-1.5. 1/conf/
2. Configure Spark Classpath, add MySQL driver class
$ vim conf/spark-env. SH export Spark_classpath= $SPARK _classpath: $SPARK _local_dirs/lib/mysql-connector-java-5.1. . jar
3. Start Hive Metastore
When testing, the spark SQL CLI can also be successfully started without this step
2>&1 &
4. Start the Spark-sql CLI
$ bin/spark-sql--master yarn-client
SET spark.sql.hive.version=1.2.1
SET spark.sql.hive.version=1.2.1
Spark-sql>
The above is done.
Execute one SQL in hive with Spark-sql. Look at the effect.
Hive
Select count (cookie) from Dmp_data where age='21001'490 msecok283977642.0921 row (s) Hive
Spents 42 seconds
Sparksql CLI
$ bin/spark-sql--master yarn-Clientspark-sql>SelectCount (cookie) from Dmp_data where age='21001';... the/ A/ - -: One: -INFO Scheduler. Dagscheduler:resultstage3(Processcmd at Clidriver.java:376) finishedinch 2.402s the/ A/ - -: One: -INFO Scheduler. Dagscheduler:job2Finished:processcmd at Clidriver.java:376, took22.894938s2839776Time taken:23.917Seconds, fetched1Row (s)
spents 23 seconds, less than half.
As some logic is not very complex tasks, you can directly use SQL to complete the
The Spark SQL CLI configuration uses