Sparksql refers to the Spark-sql CLI, which integrates hive, essentially accesses the hbase table via hive, specifically through Hive-hbase-handler, as described in the configuration: Hive (v): Hive and HBase integration
Directory:
- Sparksql Accessing HBase Configuration
- Test validation
Sparksql to access HBase configuration:
- Copy the associated jar package for HBase to the $spark_home/lib directory on the SPARK node, as shown in the following list:
guava-14.0.1. Jarhtrace-core-3.1.0-Incubating.jarhbase-common-1.1.2.2.4.2.0-258. Jarhbase-common-1.1.2.2.4.2.0-258-Tests.jarhbase-client-1.1.2.2.4.2.0-258. Jarhbase-server-1.1.2.2.4.2.0-258. Jarhbase-protocol-1.1.2.2.4.2.0-258. Jarhive-hbase-handler-1.2.1000.2.4.2.0-258. jar
- On Ambari, configure the $spark_home/conf/spark-env.sh of the SPARK node to add the above jar package to Spark_classpath, such as:
- A list of configuration items is as follows: note There can be no spaces or carriage returns between jar packages
Export spark_classpath=/usr/hdp/2.4.2.0-258/spark/lib/guava-11.0.2. Jar:/usr/hdp/2.4.2.0-258/spark/lib/hbase-client-1.1.2.2.4.2.0-258. jar:/usr/hdp/2.4.2.0-258/spark/lib/hbase-common-1.1.2.2.4.2.0-258. jar:/usr/hdp/2.4.2.0-258/spark/lib/hbase-protocol-1.1.2.2.4.2.0-258. jar:/usr/hdp/2.4.2.0-258/spark/lib/hbase-server-1.1.2.2.4.2.0-258. jar:/usr/hdp/2.4.2.0-258/spark/lib/hive-hbase-handler-1.2.1000.2.4.2.0-258. jar:/usr/hdp/2.4.2.0-258/spark/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.4.2.0-258/spark/lib/protobuf-java-2.5.0. jar:${Spark_classpath}
- Copy the Hbase-site.xml to ${hadoop_conf_dir}, because the HADOOP profile directory ${hadoop_conf_dir} is configured in spark-env.sh, Therefore, the hbase-site.xml is loaded, and the Hbase-site.xml is mainly configured with the following parameters:
< Property><name>Hbase.zookeeper.quorum</name><value>R,hdp2,hdp3</value><Description>Zookeeper node used by hbase</Description></ Property>< Property><name>Hbase.client.scanner.caching</name><value>100</value><Description>HBase client scan cache, which is very helpful for query performance</Description></ Property>
- Ambari Component Services that are affected after you modify the configuration on a restart
Test validation:
- Any spark client node validation:
- Command: cd/usr/hdp/2.4.2.0-258/spark/bin (Spark installation directory)
- Command: ./spark-sql
- Execution: select * from Stocksinfo; (Stocksinfo is the hive external table associated with hbase)
- The results are OK:
Spark (iv): Spark-sql read HBase