Spark SQL Performance Optimization

Source: Internet
Author: User

Performance optimization Parameters

The tuning parameters for the spark SQL performance are as follows:

code example
import java.util.list;import org.apache.spark.sparkconf;import  Org.apache.spark.api.java.javasparkcontext;import org.apache.spark.sql.api.java.javasqlcontext;import  org.apache.spark.sql.api.java.row;import org.apache.spark.sql.hive.api.java.javahivecontext;public  class performancetunedemo {    public static void main ( String[] args)  {        sparkconf conf = new  sparkconf (). Setappname ("Simpledemo"). Setmaster ("local");         Conf.set ("Spark.sql.codegen",  "false");         conf.set (" Spark.sql.inMemoryColumnarStorage.compressed ", " false ");         Conf.set ("Spark.sql.inMemoryColumnarStorage.batchSize",  "+");         conf.set ("Spark.sql.parquet.compression.codec",  "Snappy ");        javasparkcontext sc = new  Javasparkcontext (conf);        javasqlcontext sqlctx =  New javasqlcontext (SC);        javahivecontext hivectx  = new javahivecontext (SC);        list<row>  Result = hivectx.sql ("select foo,bar,name from pokes2 limit 10"). Collect ();         for  (Row row : result)  {             system.out.println (row.getString (0)  +  ", " + row.getstring (1)  + ", " + row.getstring (2));         }    }}
Beeline command line setting optimization parameters
Beeline> set spark.sql.codegen=true; SET spark.sql.codegen=truespark.sql.codegen=truetime taken:1.196 seconds
Important parameter Description

Spark.sql.codegen Spark SQL compiles the Java bytecode for SQL queries each time it executes. This configuration can speed up queries for long-executing SQL queries or frequently executed SQL queries, because it produces special bytecode to execute. However , for a short (1-2-second) Ad hoc query, this can increase overhead because it must compile each query first.

spark.sql.inMemoryColumnarStorage.batchSize:

When caching Schemardds, Spark SQL groups together the records in the RDD in batches of the size given by this option (Def ault:1000), and compresses each batch. Very Small batch sizes leads to low compression, and on the other hand Very large sizes can also is problematic, as each BA TCH might is too large to build on memory.

Spark SQL Performance Optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.