Problem Description:
In the process of using spark, there are two types of errors that sometimes occur because of data increase:
Java.lang.OutOfMemoryError:Java Heap Space
Java.lang.OutOfMemoryError:GC Overhead limit exceeded
These two kinds of errors before I always think is executor memory to give enough, but careful analysis found not executor memory to the lack of, but driver memory to give insufficient. When submitting a task with Spark-submit in standalone client mode (standalone mode is deployed by default, standalone client mode submission task is used), our own program (main) is called driver, When you do not assign memory to driver, the default allocation is 512M. In this case, if the data being processed or the data being loaded is large (I am loading data from hive), the driver can burst memory and the Oom error above occurs.
Workaround:
Reference: http://spark.apache.org/docs/latest/configuration.html
Method One: Specify the--driver-memory memsize parameter in Spark-submit to set the JVM memory size of driver, and you can view other parameters that can be set by Spark-submit--help.
eg
./spark-submit --master spark:// Span style= "color: #008000;" >7070 \ --class $MAIN _class --executor- Memory 3G --total-executor-cores 10 --driver- memory 2g --name $APP _name --conf " spark.executor.extrajavaoptions=-xx:+printgcdetails-xx:+printgctimestamps " $SPARK _app_jar
Method Two: In the spark_home/conf/directory, a copy of the spark-defaults.conf.template template file is copied to the/spark_home/conf directory, Name Spark-defaults.conf, then set the Spark.driver.memory Memsize property inside to change the driver memory size.
eg
Spark.master Spark://master:7077 spark.default.parallelism Spark.driver.memory 2g spark.serializer Org.apache.spark.serializer.KryoSerializer Spark.sql.shuffle.partitions
Spark Oom:java heap SPACE,OOM:GC overhead limit exceeded workaround