International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Spark parameter Configuration description

Last Update:2015-10-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1 Modifying the spark-defaults.conf file in the $spark_home/conf directory

Add the following configuration items

Spark.sql.hive.convertMetastoreParquet false

Hive.exec.compress.output false

If Spark.sql.hive.convertMetastoreParquet is not set to false, the foreground inventory preview sees the contents as garbled.

Due to the parquet format of the file built-in compression, so the output does not need to compress, if set to compression, the inventory download function is abnormal.

2 Modify the $spark_home/conf directory under the spark-env.sh file, set the following parameters:

spark_executor_instances=11

spark_executor_cores=2

spark_executor_memory=1g

spark_driver_memory=3g

Configure as needed if the setting is exactly the same for all memory, there is no excess memory for execution of other task tasks

2.1 Parameter Spark_executor_instances

This parameter determines the number of instances of executor that can be started at the same time in the yarn cluster. The maximum number of executors that can actually be started in yarn is less than or equal to this value. If you cannot determine the maximum number of executors that can be started, we recommend that you set the value to be large enough first. (is set as large as possible)

2.2 Spark_executor_cores This parameter sets the number of CPU cores that each EXECUTOR can use.

The yarn cluster can have up to parallel task data for spark_executor_instances times Spark_executor_cores is typically set to 2

That is, if spark_executor_instances=11 the maximum number of parallel tasks is 22

2.3 spark_executor_memory

This parameter sets the amount of memory allocated for each executor. It is important to note that the amount of memory is the amount of memory shared by the number of cores set in Spark_executor_cores.

For example, in the example above, the 2-core CPU is public 1G memory.

2.4 spark_driver_memory

This parameter sets the size of the memory allocated by the driver. That is, the amount of memory allocated to Thriftserver on the start-thriftserver.sh machine is executed.

3 YARN.NODEMANAGER.RESOURCE.MEMORY-MB

$HADOOP the Yarn-site.xml file in the _home/etc/hadoop directory, the parameter YARN.NODEMANAGER.RESOURCE.MEMORY-MB configures the amount of physical memory that each machine yarn can use, in megabytes.

If you find that the memory usage in the cluster is significantly smaller than the room memory, you can modify this parameter

4 Spark.yarn.executor.memoryOverhead

This parameter specifies the amount of memory that each executor can gain in addition to the allocated memory, which is 7% by default.

Spark parameter Configuration description

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark parameter Configuration description

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support