Use Spark-thriftserver operation on CDH Carbondata

Last Update:2018-07-26 Source: Internet

Author: User

Tags zip

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Carbondata is a new type of tabular file format for distributed computing, this time using Spark-thrift mode to operate Carbondata, briefly describes how to start Spark-carbondata-thriftserver. version CDH 5.10.3 spark 2.1.0 carbondata 1.2.0 download spark https://archive.apache.org/dist/spark/spark-2.1.0 /spark-2.1.0-bin-hadoop2.6.tgz Carbondata https://dist.apache.org/repos/dist/release/carbondata/1.2.0/ Apache-carbondata-1.2.0-source-release.zip carbondata Compilation

The compilation environment is: jdk-1.8.0_151,maven-3.5.2 (low version of not tried, but the official said lowest: jdk7,maven3.3)

Unzip Apache-carbondata-1.2.0-source-release.zip
CD carbondata-parent-1.2.0
mvn-dskiptests-pspark-2.1-dspark.version=2.1.0 Clean Package

After the compilation succeeds, it will appear in the carbondata-parent-1.2.0/assembly/target/scala-2.11/directory carbondata_2.11-1.2.0- Shade-hadoop2.2.0.jar (different build versions, jar package names may not be the same) start step

Note:
1. In the following command, the/carbondata directory of HDFs should be created with the HDFs user and read and write permissions to the user rights with the chmod command if the startup user does not create the permission.
2. The boot node does not have the same node as the HiveServer2 service to prevent port collisions (I have not yet found the way to change the Thriftserver port).

TAR-ZXVF spark-2.1.0-bin-hadoop2.6.tgz
CD spark-2.1.0-bin-hadoop2.6
Cp/etc/hive/conf/hive-site.xml conf #可以读取到hive的表
CP Conf/spark-env.sh.template conf/spark-env.sh
VI conf/spark-env.sh
Add to:
Export hadoop_conf_dir=/etc/hadoop/conf
Export yarn_conf_dir=/etc/hadoop/conf.cloudera.yarn/

mkdir Carbondata_lib
#将carbondata_2.11-1.2.0-shade-hadoop2.2.0.jar copy to the Carbondata_lib directory, if you need to set the Carbondata parameter, Need to be carbondata-parent-1.2.0/ Carbon.properties.template files under the Conf directory are renamed to Carbon.properties and copied to the spark-2.1.0-bin-hadoop2.6/conf directory
CD bin
#启动命令为:
./spark-submit \
–master yarn \
–deploy-mode client \
–conf spark.sql.hive.thriftserver.singlesession=true \
–class org.apache.carbondata.spark.thriftserver.CarbonThriftServer \
.. /carbondata_lib/carbondata_2.11-1.2.0-shade-hadoop2.2.0.jar \
Hdfs://[namenodeip]:8020/carbondata thriftserver Use

Beeline-u jdbc:hive2://[start node ip]:10000-n user name PostScript

Personally think this way to operate Carbondata is the best way to use the JDBC approach after all is the most convenient.
Spark-sql and Spark-shell can be operated, different modes of operation of the same version may appear to support different operation situations.
If a lot of resources, you can first adjust the spark parameters to start the good container, so that the SQL task can be used as soon as possible to the resources, if more than one application using a yarn resource, you can use the dynamic distribution of spark, the need for resources to start requesting resources, Establish container.

Please correct me if you have any questions.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More