Spark (iv): Spark-sql read HBase

Source: Internet
Author: User
Tags zookeeper

Sparksql refers to the Spark-sql CLI, which integrates hive, essentially accesses the hbase table via hive, specifically through Hive-hbase-handler, as described in the configuration: Hive (v): Hive and HBase integration

Directory:

    • Sparksql Accessing HBase Configuration
    • Test validation

Sparksql to access HBase configuration:

  • Copy the associated jar package for HBase to the $spark_home/lib directory on the SPARK node, as shown in the following list:
    guava-14.0.1. Jarhtrace-core-3.1.0-Incubating.jarhbase-common-1.1.2.2.4.2.0-258. Jarhbase-common-1.1.2.2.4.2.0-258-Tests.jarhbase-client-1.1.2.2.4.2.0-258. Jarhbase-server-1.1.2.2.4.2.0-258. Jarhbase-protocol-1.1.2.2.4.2.0-258. Jarhive-hbase-handler-1.2.1000.2.4.2.0-258. jar
  • On Ambari, configure the $spark_home/conf/spark-env.sh of the SPARK node to add the above jar package to Spark_classpath, such as:
  • A list of configuration items is as follows: note There can be no spaces or carriage returns between jar packages
    Export spark_classpath=/usr/hdp/2.4.2.0-258/spark/lib/guava-11.0.2. Jar:/usr/hdp/2.4.2.0-258/spark/lib/hbase-client-1.1.2.2.4.2.0-258. jar:/usr/hdp/2.4.2.0-258/spark/lib/hbase-common-1.1.2.2.4.2.0-258. jar:/usr/hdp/2.4.2.0-258/spark/lib/hbase-protocol-1.1.2.2.4.2.0-258. jar:/usr/hdp/2.4.2.0-258/spark/lib/hbase-server-1.1.2.2.4.2.0-258. jar:/usr/hdp/2.4.2.0-258/spark/lib/hive-hbase-handler-1.2.1000.2.4.2.0-258. jar:/usr/hdp/2.4.2.0-258/spark/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.4.2.0-258/spark/lib/protobuf-java-2.5.0. jar:${Spark_classpath}
  • Copy the Hbase-site.xml to ${hadoop_conf_dir}, because the HADOOP profile directory ${hadoop_conf_dir} is configured in spark-env.sh, Therefore, the hbase-site.xml is loaded, and the Hbase-site.xml is mainly configured with the following parameters:
< Property><name>Hbase.zookeeper.quorum</name><value>R,hdp2,hdp3</value><Description>Zookeeper node used by hbase</Description></ Property>< Property><name>Hbase.client.scanner.caching</name><value>100</value><Description>HBase client scan cache, which is very helpful for query performance</Description></ Property> 
    • Ambari Component Services that are affected after you modify the configuration on a restart

Test validation:

    • Any spark client node validation:
    • Command: cd/usr/hdp/2.4.2.0-258/spark/bin (Spark installation directory)
    • Command: ./spark-sql
    • Execution: select * from Stocksinfo; (Stocksinfo is the hive external table associated with hbase)
    • The results are OK:

Spark (iv): Spark-sql read HBase

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.