Brief introduction
Spark SQL provides JDBC connectivity, which is useful for connecting business intelligence (BI) tools to a spark cluster And for sharing a cluster across multipleusers. The JDBC server runs as a standalone Spark driver program The can is shared by multiple clients. Any client can cache tables in memory, query them, and so on and the cluster resources and cached data would be shared Amon g all of them.
Spark SQL ' s JDBC server corresponds to the HiveServer2 in Hive. It is also known as the "Thrift server" since it uses the Thrift communication protocol. Note that the JDBC server requires Spark is built with Hive support
Operating Environment
Cluster Environment: CDH5.3.0
The specific jar versions are as follows:
Spark version: 1.2.0-cdh5.3.0
Hive Version: 0.13.1-cdh5.3.0
Hadoop version: 2.5.0-cdh5.3.0
Start the JDBC server
Cd/etc/spark/confln-s/etc/hive/conf/hive-site.xml hive-site.xmlcd/opt/cloudera/parcels/cdh/lib/spark/chmod--R 777 logs/cd/opt/cloudera/parcels/cdh/lib/spark/sbin./start-thriftserver.sh--master Yarn
Connecting to the JDBC server with Beeline
Cd/opt/cloudera/parcels/cdh/lib/spark/binbeeline-u Jdbc:hive2://hadoop04:10000[[email protected] bin]# beeline-u Jdbc:hive2://hadoop04:10000scan complete in 2msConnecting to jdbc:hive2://hadoop04:10000connected To:spark SQL ( Version 1.2.0) driver:hive JDBC (version 0.13.1-cdh5.3.0) Transaction Isolation:transaction_repeatable_readbeeline Version 0.13.1-cdh5.3.0 by Apache hive0:jdbc:hive2://hadoop04:10000>
How to use the JDBC server for Spark SQL