Spark distributed SQL engine

Source: Internet
Author: User

Spark distributed SQL engine

I. Overview
In addition to entering the interactive execution environment using the Spark-SQL command, spark SQL can also use JDBC/ODBC or the command line interface for Distributed Query. In this mode, end users or applications can directly perform interactive SQL queries with Spark SQL without writing any scala code.

Ii. Use Thrift JDBC server

Spark version: 1.4.0

Yarn version: CDH5.4.0

1. Preparations

Copy or link the hive-site.xml to $ SPARK_HOME/conf

2. Run the script in the spark installation directory to start the hive thrift server. If no parameters are added by default, the hive thrift server is started in local mode, occupying a local JVM process.

Sbin/start-thriftserver.sh

3. Start in yarn-client mode. The default value is port 10001.

Sbin/start-thriftserver.sh -- master yarn

Next, we observe that 25 iner containers are started on the yarn UI.

 

Why does a JDBC service occupy so many resources? This is because the conf/spark-env.sh is configured with SPARK_EXECUTOR_INSTANCES for 24 instances, plus a driver instance of the yarn client

Export SPARK_EXECUTOR_INSTANCES = 24

Observe the progress of the Yarn nodemanagernode. thriftserverwill often be in the Process named org.apache.spark.exe cutor. CoarseGrainedExecutorBackend, and start the Task for subsequent SQL jobs at any time. The advantage of this is that when Spark SQL is run, the time consumption on the container startup is reduced, and the cost is that when the thrift server is idle, these container resources are still in use and will not be released to other spark or mapreduce jobs.

4. Use beeline to connect to the Spark SQL interactive Engine

Bin/beeline-u jdbc: hive2: // localhost: 10001-n root-p root

Note: In non-secure Hadoop mode, the user name is used by the current system user, and the password is null or the value can be freely transferred. In kerberos Hadoop mode, you must pass a valid principal token to log on to beeline.

Iii. Command Line Help

1. Thrift server

Mandatory arguments to long options are mandatory for short options too.
-A, -- all do not ignore entries starting.
-A, -- almost-all do not list implied. and ..
-- Author with-l, print the author of each file
-B, -- escape print octal escapes for nongraphic characters
-- Block-size = SIZE use SIZE-byte blocks. See SIZE format below
-B, -- ignore-backups do not list implied entries ending ~
-C with-lt: sort by, and show, ctime (time of last
Modification of file status information)
With-l: show ctime and sort by name
Otherwise: sort by ctime
-C list entries by columns
-- Color [= WHEN] colorize the output. WHEN defaults to 'always'
Or can be 'never 'or 'auto'. More info below
-D, -- directory list directory entries instead of contents,
And do not dereference symbolic links
-D, -- dired generate output designed for Emacs 'd D mode
-F do not sort, enable-aU, disable-ls -- color
-F, -- classify append indicator (one of */=> @ |) to entries
-- File-type likewise, except t do not append '*'
-- Format = WORD plain SS-x, commas-m, horizontal-x, long-l,
Single-column-1, verbose-l, vertical-C
-- Full-time like-l -- time-style = full-iso
-G like-l, but do not list owner
-- Group-directories-first
Group directories before files.
Augment with a -- sort option, but any
Use of -- sort = none (-U) disables grouping
-G, -- no-group in a long listing, don't print group names
-H, -- human-readable with-l, print sizes in human readable format
(E.g., 1 K 234 M 2G)
-- Si likewise, but use powers of 1000 not 1024
-H, -- dereference-command-line
Follow symbolic links listed on the command line
-- Dereference-command-line-symlink-to-dir
Follow each command line symbolic link
That points to a directory
-- Hide = PATTERN do not list implied entries matching shell PATTERN
(Overridden by-a or-)
-- Indicator-style = WORD append indicator with style WORD to entry names:
None (default), slash (-p ),
File-type (-- file-type), classify (-F)
-I, -- inode print the index number of each file
-I, -- ignore = PATTERN do not list implied entries matching shell PATTERN
-K like -- block-size = 1 K
-L use a long listing format
-L, -- dereference when showing file information for a symbolic
Link, show information for the file the link
References rather than for the link itself
-M fill width with a comma separated list of entries
-N, -- numeric-uid-gid like-l, but list numeric user and group IDs
-N, -- literal print raw entry names (don't treat e.g. control
Characters specially)
-O like-l, but do not list group information
-P, -- indicator-style = slash
Append/indicator to directories
-Q, -- hide-control-chars print? Instead of non graphic characters
-- Show-control-chars show non graphic characters as-is (default
Unless program is 'LS' and output is a terminal)
-Q, -- quote-name enclose entry names in double quotes
-- Quoting-style = WORD use quoting style WORD for entry names:
Literal, locale, shell, shell-always, c, escape
-R, -- reverse order while sorting
-R, -- recursive list subdirectories recursively
-S, -- size print the allocated size of each file, in blocks
-S sort by file size
-- Sort = WORD sort by WORD instead of name: none-U,
Extension-X, size-S, time-t, version-v
-- Time = WORD with-l, show time as WORD instead of modification
Time: atime-u, access-u, use-u, ctime-c,
Or status-c; use specified time as sort key
If -- sort = time
-- Time-style = STYLE with-l, show times using style STYLE:
Full-iso, long-iso, iso, locale, + FORMAT.
FORMAT is interpreted like 'date'; if FORMAT is
FORMAT1 <newline> FORMAT2, FORMAT1 applies
Non-recent files and FORMAT2 to recent files;
If STYLE is prefixed with 'posix-', STYLE
Takes effect only outside the POSIX locale
-T sort by modification time
-T, -- tabsize = COLS assume tab stops at each COLS instead of 8
-U with-lt: sort by, and show, access time
With-l: show access time and sort by name
Otherwise: sort by access time
-U do not sort; list entries in directory order
-V natural sort of (version) numbers within text
-W, -- width = COLS assume screen width instead of current value
-X list entries by lines instead of by columns
-X sort alphabetically by entry extension
-1 list one file per line
 
SELinux options:
 
-- Lcontext Display security context. Enable-l. Lines
Will probably be too wide for most displays.
-Z, -- context Display security context so it fits on most
Displays. Displays only mode, user, group,
Security context and file name.
-- Scontext Display only security context and file name.
-- Help display this help and exit
-- Version output version information and exit

2. beeline


-U <database url> the jdbc url to connect
-N <username> the username to connect
-P <password> the password to connect
-D <driver class> the driver class to use
-E <query> query that shoshould be executed
-F <file> script file that shoshould be executed
-- Hiveconf property = value Use value for given property
-- Hivevar name = value hive variable name and value
This is Hive specific settings in which variables
Can be set at session level and referenced in Hive
Commands or queries.
-- Color = [true/false] control whether color is used for display
-- ShowHeader = [true/false] show column names in query results
-- HeaderInterval = ROWS; the interval between which heades are displayed
-- FastConnect = [true/false] skip building table/column list for tab-completion
-- AutoCommit = [true/false] enable/disable automatic transaction commit
-- Verbose = [true/false] show verbose error messages and debug info
-- ShowWarnings = [true/false] display connection warnings
-- ShowNestedErrs = [true/false] display nested errors
-- NumberFormat = [pattern] format numbers using DecimalFormat pattern
-- Force = [true/false] continue running script even after errors
-- MaxWidth = MAXWIDTH the maximum width of the terminal
-- MaxColumnWidth = MAXCOLWIDTH the maximum width to use when displaying columns
-- Silent = [true/false] be more silent
-- Autosave = [true/false] automatically save preferences
-- Outputformat = [table/vertical/csv/tsv] format mode for result display
-- Isolation = LEVEL set the transaction isolation level
-- Nullemptystring = [true/false] set to true to get historic behavior of printing null as empty string
-- Help display this message

-------------------------------------- Split line --------------------------------------

Spark1.0.0 Deployment Guide

Install Spark0.8.0 in CentOS 6.2 (64-bit)

Introduction to Spark and its installation and use in Ubuntu

Install the Spark cluster (on CentOS)

Hadoop vs Spark Performance Comparison

Spark installation and learning

Spark Parallel Computing Model

-------------------------------------- Split line --------------------------------------

Spark details: click here
Spark: click here

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.