Background: Some of the business needs of the company are stored on hbase, and there are always business people looking for me for all kinds of data, so I want to load it directly into the RDD with Spark (shell) for calculation
Summary:
1. Related environment
2. Code examples
Content
1. Related environment
Spark version: 2.0.0
Hadoop version: 2.4.0
HBase version: 0.98.6
Note: Use CDH5 to build a cluster
Write a commit script
Export spark2_home=/var/lib/hadoop-hdfs/spark-2.0.0-bin-hadoop2.4
Export Hbase_lib_home=/opt/cloudera/parcels/cdh/lib/hbase
$SPARK 2_home/bin/spark-shell \
--jars $HBASE _lib_home/hbase-common-0.98.6-cdh5.3.2.jar, $HBASE _lib_home/hbase-client-0.98.6-cdh5.3.2.jar, $HBASE _lib_home/hbase-protocol-0.98.6-cdh5.3.2.jar,\
$HBASE _lib_home/hbase-server-0.98.6-cdh5.3.2.jar, $HBASE _lib_home/lib/htrace-core-2.04.jar
2. Code examples
Pom Add hbase dependency: Https://github.com/Tongzhenguo/my_scala_code/blob/master/pom.xml
Write Spark Driver Application class: https://github.com/Tongzhenguo/my_scala_code/blob/master/src/main/scala/utils/ Hbasesparkreadutils.scala
Spark reads HBase