Sparksql using the Spark SQL CLI

Source: Internet
Author: User

Spark SQL CLI Description

The introduction of the spark SQL CLI makes it easier to query hive directly via Hive Metastore in Sparksql, and the current version cannot be used to interact with Thriftserver using the Spark SQL CLI.

Note: The Hive-site.xml configuration file needs to be copied to the $spark_home/conf directory when using the SPARK SQL CLI.

The Spark SQL CLI command parameter describes:

CD $SPARK _home/binspark-sql--help
Usage:./bin/spark-SQL [Options] [CLI option]spark assembly have been built with Hive, including DataNucleus jars on classpathoptions: 
    --master Master_url spark://host:port, mesos://host:port, yarn, or Local.--deploy-mode Deploy_mode Whether to launch the driver program locally ("Client") or on one of the worker machines inside the cluster ("Cluster") (default:client). --class class_name Your Application'S main class (for Java/scala apps).--name Name A name of your application. --jars Jars comma-separated list of local jars to include on the driver and executor classpaths. --py-files py_files comma-separated List of.Zip,. Egg, or. py files to place on the PYTHONPATH forPython Apps. --files files comma-separated List of files to be placedinchThe working directory of each executor. --conf prop=VALUE arbitrary Spark configuration property. --properties-fileFILE Path to afileFromwhichTo load extra properties. If not specified, the this would look forconf/spark-defaults.conf.--driver-memory MEM Memory fordriver (e.g. 1000M, 2G) (default:512m). --driver-java-options Extra Java options to pass to the driver. --driver-library-Path Extra Library path entries to pass to the driver. --driver-class-path Extra class path entries to pass to the driver. Note that jars added with--jars is automatically includedinchThe classpath. --executor-memory MEM memory per executor (e.g. 1000M, 2G) (default:1g). --help,-h Show This help message and exit--verbose,-v Print additional debug output Spark standalone with cluster deploy mode only:--driver-cores NUM Cores forDriver (Default:1). --supervise If Given, restarts the driver on failure. Spark Standalone and Mesos only:--total-executor-cores NUM Total Cores forAll executors. YARN-Only :--executor-cores NUM number of cores per executor (Default:1). --queue queue_name the YARN queue to submit to (Default:"default"). --num-executors num number of executors to launch (Default:2). --Archives Archives Comma separated list of archives to BES extracted into the Worki ng directory of each executor. CLI Options:-D,--Define <key=value>Variable subsitution to the Apply to hive commands. e.g.-D a=b or--define a=B--database <databasename>Specify the database to use-E <quoted-query-string>SQL from command line-F <filename>SQL from Files-H <hostname>connecting to Hive Server on remote host--hiveconf <property=value> Use value forgiven property--hivevar <key=value>Variable subsitution to the Apply to hive commands. e.g.--hivevar a=B-I <filename> initialization SQLfile-P <port>connecting to Hive Server on port number-S,--silent silent modeinchInteractive Shell-V,--verbose verbose mode (Echoexecuted SQL to the console)

When you start Spark-sql, if you do not specify master, run as local, master can specify either standalone address or yarn;

When you set Master to yarn (spark-sql--master yarn), you can monitor the entire job execution process through the http://hadoop000:8088 page;

Note: If Spark.master is configured in $spark_home/conf/spark-defaults.conf spark://hadoop000 : 7077, not specifying master when starting Spark-sql is also running on the standalone cluster.

Spark-sql use

Start Spark-sql: Since I have configured Spark.master spark://hadoop000:7077 in spark-defaults.conf, I did not specify master at Spark-sql boot.

CD $SPARK _home/binspark-sql
SELECT track_time, URL, session_id, Referer, IP, end_user_id, city_id from page_views WHERE city_id =-10 ;
SELECT session_id, Count (;

The above two SQL statements used in the presence of hive, if not created manually, create scripts and import data script as follows:

Create Table  by't';
Load ' /home/spark/software/data/page_views.dat '  into table page_views;   

Sparksql using the Spark SQL CLI

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.