Spark parameter Configuration description

Source: Internet
Author: User

1 Modifying the spark-defaults.conf file in the $spark_home/conf directory

Add the following configuration items

Spark.sql.hive.convertMetastoreParquet false

Hive.exec.compress.output false

If Spark.sql.hive.convertMetastoreParquet is not set to false, the foreground inventory preview sees the contents as garbled.

Due to the parquet format of the file built-in compression, so the output does not need to compress, if set to compression, the inventory download function is abnormal.

2 Modify the $spark_home/conf directory under the spark-env.sh file, set the following parameters:

spark_executor_instances=11

spark_executor_cores=2

spark_executor_memory=1g

spark_driver_memory=3g

Configure as needed if the setting is exactly the same for all memory, there is no excess memory for execution of other task tasks

2.1 Parameter Spark_executor_instances

This parameter determines the number of instances of executor that can be started at the same time in the yarn cluster. The maximum number of executors that can actually be started in yarn is less than or equal to this value. If you cannot determine the maximum number of executors that can be started, we recommend that you set the value to be large enough first. (is set as large as possible)

2.2 Spark_executor_cores This parameter sets the number of CPU cores that each EXECUTOR can use.

The yarn cluster can have up to parallel task data for spark_executor_instances times Spark_executor_cores is typically set to 2

That is, if spark_executor_instances=11 the maximum number of parallel tasks is 22

2.3 spark_executor_memory

This parameter sets the amount of memory allocated for each executor. It is important to note that the amount of memory is the amount of memory shared by the number of cores set in Spark_executor_cores.

For example, in the example above, the 2-core CPU is public 1G memory.

2.4 spark_driver_memory

This parameter sets the size of the memory allocated by the driver. That is, the amount of memory allocated to Thriftserver on the start-thriftserver.sh machine is executed.

3 YARN.NODEMANAGER.RESOURCE.MEMORY-MB

$HADOOP the Yarn-site.xml file in the _home/etc/hadoop directory, the parameter YARN.NODEMANAGER.RESOURCE.MEMORY-MB configures the amount of physical memory that each machine yarn can use, in megabytes.

If you find that the memory usage in the cluster is significantly smaller than the room memory, you can modify this parameter

4 Spark.yarn.executor.memoryOverhead

This parameter specifies the amount of memory that each executor can gain in addition to the allocated memory, which is 7% by default.

Spark parameter Configuration description

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.