Sparksql using the Spark SQL CLI

Last Update:2014-09-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Spark SQL CLI Description

The introduction of the spark SQL CLI makes it easier to query hive directly via Hive Metastore in Sparksql, and the current version cannot be used to interact with Thriftserver using the Spark SQL CLI.

Note: The Hive-site.xml configuration file needs to be copied to the $spark_home/conf directory when using the SPARK SQL CLI.

The Spark SQL CLI command parameter describes:

CD $SPARK _home/binspark-sql--help

Usage:./bin/spark-SQL [Options] [CLI option]spark assembly have been built with Hive, including DataNucleus jars on classpathoptions: 
    --master Master_url spark://host:port, mesos://host:port, yarn, or Local.--deploy-mode Deploy_mode Whether to launch the driver program locally ("Client") or on one of the worker machines inside the cluster ("Cluster") (default:client). --class class_name Your Application'S main class (for Java/scala apps).--name Name A name of your application. --jars Jars comma-separated list of local jars to include on the driver and executor classpaths. --py-files py_files comma-separated List of.Zip,. Egg, or. py files to place on the PYTHONPATH forPython Apps. --files files comma-separated List of files to be placedinchThe working directory of each executor. --conf prop=VALUE arbitrary Spark configuration property. --properties-fileFILE Path to afileFromwhichTo load extra properties. If not specified, the this would look forconf/spark-defaults.conf.--driver-memory MEM Memory fordriver (e.g. 1000M, 2G) (default:512m). --driver-java-options Extra Java options to pass to the driver. --driver-library-Path Extra Library path entries to pass to the driver. --driver-class-path Extra class path entries to pass to the driver. Note that jars added with--jars is automatically includedinchThe classpath. --executor-memory MEM memory per executor (e.g. 1000M, 2G) (default:1g). --help,-h Show This help message and exit--verbose,-v Print additional debug output Spark standalone with cluster deploy mode only:--driver-cores NUM Cores forDriver (Default:1). --supervise If Given, restarts the driver on failure. Spark Standalone and Mesos only:--total-executor-cores NUM Total Cores forAll executors. YARN-Only :--executor-cores NUM number of cores per executor (Default:1). --queue queue_name the YARN queue to submit to (Default:"default"). --num-executors num number of executors to launch (Default:2). --Archives Archives Comma separated list of archives to BES extracted into the Worki ng directory of each executor. CLI Options:-D,--Define <key=value>Variable subsitution to the Apply to hive commands. e.g.-D a=b or--define a=B--database <databasename>Specify the database to use-E <quoted-query-string>SQL from command line-F <filename>SQL from Files-H <hostname>connecting to Hive Server on remote host--hiveconf <property=value> Use value forgiven property--hivevar <key=value>Variable subsitution to the Apply to hive commands. e.g.--hivevar a=B-I <filename> initialization SQLfile-P <port>connecting to Hive Server on port number-S,--silent silent modeinchInteractive Shell-V,--verbose verbose mode (Echoexecuted SQL to the console)

When you start Spark-sql, if you do not specify master, run as local, master can specify either standalone address or yarn;

When you set Master to yarn (spark-sql--master yarn), you can monitor the entire job execution process through the http://hadoop000:8088 page;

Note: If Spark.master is configured in $spark_home/conf/spark-defaults.conf spark://hadoop000 : 7077, not specifying master when starting Spark-sql is also running on the standalone cluster.

Spark-sql use

Start Spark-sql: Since I have configured Spark.master spark://hadoop000:7077 in spark-defaults.conf, I did not specify master at Spark-sql boot.

CD $SPARK _home/binspark-sql

SELECT track_time, URL, session_id, Referer, IP, end_user_id, city_id from page_views WHERE city_id =-10 ;
SELECT session_id, Count (;

The above two SQL statements used in the presence of hive, if not created manually, create scripts and import data script as follows:

Create Table  by't';

Load ' /home/spark/software/data/page_views.dat '  into table page_views;

Sparksql using the Spark SQL CLI

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Sparksql using the Spark SQL CLI

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Sparksql using the Spark SQL CLI

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support