Performance optimization Parameters
The tuning parameters for the spark SQL performance are as follows:
code example
import java.util.list;import org.apache.spark.sparkconf;import Org.apache.spark.api.java.javasparkcontext;import org.apache.spark.sql.api.java.javasqlcontext;import org.apache.spark.sql.api.java.row;import org.apache.spark.sql.hive.api.java.javahivecontext;public class performancetunedemo { public static void main ( String[] args) { sparkconf conf = new sparkconf (). Setappname ("Simpledemo"). Setmaster ("local"); Conf.set ("Spark.sql.codegen", "false"); conf.set (" Spark.sql.inMemoryColumnarStorage.compressed ", " false "); Conf.set ("Spark.sql.inMemoryColumnarStorage.batchSize", "+"); conf.set ("Spark.sql.parquet.compression.codec", "Snappy "); javasparkcontext sc = new Javasparkcontext (conf); javasqlcontext sqlctx = New javasqlcontext (SC); javahivecontext hivectx = new javahivecontext (SC); list<row> Result = hivectx.sql ("select foo,bar,name from pokes2 limit 10"). Collect (); for (Row row : result) { system.out.println (row.getString (0) + ", " + row.getstring (1) + ", " + row.getstring (2)); } }}
Beeline command line setting optimization parameters
Beeline> set spark.sql.codegen=true; SET spark.sql.codegen=truespark.sql.codegen=truetime taken:1.196 seconds
Important parameter Description
Spark.sql.codegen Spark SQL compiles the Java bytecode for SQL queries each time it executes. This configuration can speed up queries for long-executing SQL queries or frequently executed SQL queries, because it produces special bytecode to execute. However , for a short (1-2-second) Ad hoc query, this can increase overhead because it must compile each query first.
spark.sql.inMemoryColumnarStorage.batchSize:
When caching Schemardds, Spark SQL groups together the records in the RDD in batches of the size given by this option (Def ault:1000), and compresses each batch. Very Small batch sizes leads to low compression, and on the other hand Very large sizes can also is problematic, as each BA TCH might is too large to build on memory.
Spark SQL Performance Optimization