Hive on Spark Configuration summary

Last Update:2018-07-26 Source: Internet

Author: User

Tags join

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Environment configuration

Maven-3.3.3

JDK 7u79

Scala 2.10.6

Hive 2.0.1

Spark 1.5.0 Source

Hadoop 2.6.4

The hive version of the spark version to match, so download the hive source code in Pom.xml to see Spark.version to determine which version of Spark to use.

Note that you must has a version of Spark which does not include the Hive jars. Meaning one which was wasn't built with the Hive profile.

Note: The Spark official web Pre-build spark-2.x are all integrated with hive, so if you want to use hive on spark then you have to download the source code to compile

Recommended hive-1.2.1 on spark-1.3.1/hive-2.0.1 on spark-1.5.2
compiling spark

The default is to use Scala 2.10.4来 compiled

Export maven_opts= "-xmx2g-xx:maxpermsize=512m-xx:reservedcodecachesize=512m"

Mvn-pyarn-phadoop-2.6-dskiptests Clean Package

./make-distribution.sh--name Xm-spark--tgz-phadoop-2.6-pyarn

If it's compiled with Scala 2.11.x.

./dev/change-scala-version.sh 2.11

Mvn-pyarn-phadoop-2.6-dscala-2.11-dskiptests Clean Package

./make-distribution.sh--name Xm-spark--tgz-phadoop-2.6-pyarn

The tar package will be generated in the Spark catalog.

hive-site.xml Configuration

<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>

spark-default.conf Configuration

Set Spark.master=<spark master url> #默认可以不用设置 set spark.eventlog.enabled=true; Set Spark.eventlog.dir=<spark Event log folder (must exist) > set spark.executor.memory=512m; Set Spark.serializer=org.apache.spark.serializer.kryoserializer; Spark.executor.instances=x

Hive website Recommended configuration

Hive.vectorized.execution.enabled=true hive.cbo.enable=true hive.optimize.reducededuplication.min.reducer=4
Hive.optimize.reducededuplication=true Hive.orc.splits.include.file.footer=false Hive.merge.mapfiles=true
Hive.merge.sparkfiles=false hive.merge.smallfiles.avgsize=16000000 hive.merge.size.per.task=256000000
Hive.merge.orcfile.stripe.level=true hive.auto.convert.join=true Hive.auto.convert.join.noconditionaltask=true
hive.auto.convert.join.noconditionaltask.size=894435328 Hive.optimize.bucketmapjoin.sortedmerge=false
hive.map.aggr.hash.percentmemory=0.5 hive.map.aggr=true Hive.optimize.sort.dynamic.partition=false
Hive.stats.autogather=true hive.stats.fetch.column.stats=true Hive.vectorized.execution.reduce.enabled=false
hive.vectorized.groupby.checkinterval=4096 hive.vectorized.groupby.flush.percent=0.1
Hive.compute.query.using.stats=true hive.limit.pushdown.memory.usage=0.4 Hive.optimize.index.filter=true hive.exec.reducers.bytes.per.reducer=67108864 hive.smbjoin.cache.rows=10hive.exec.orc.default.stripe.size=67108864 Hive.fetch.task.conversion=more
hive.fetch.task.conversion.threshold=1073741824 Hive.fetch.task.aggr=false
Mapreduce.input.fileinputformat.list-status.num-threads=5 Spark.kryo.referencetracking=false Spark.kryo.classestoregister=org.apache.hadoop.hive.ql.io.hivekey,org.apache.hadoop.io.byteswritable, Org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch

Summary of Issues

1.causedby:java.lang.noclassdeffounderror:org/apache/hive/spark/client/job

A. Add-phive or-phive-thrift when compiling spark

B. Errors caused by mismatch between Hive and spark compilation versions

2.Failed to execute Spark task with exception ' Org.apache.hadoop.hive.ql.metadata.HiveException (Failed to create spark C Lient.) ' Failed:execution Error, return code 1 from Org.apache.hadoop.hive.ql.exec.spark.SparkTask

A. Hive and spark versions do not match

B. A client startup failure due to a Scala environment error (Install Scala, restart yarn)

3. Errors caused by environment configuration

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More