Hive Learning Pathway (18) shell operations for Hive

Source: Internet
Author: User

One, hive command line 1, hive support some of the commands

Command Description

quit Use quit or exit to leave the interactive shell.

Set Key=value Use the To set value of particular configuration variable. One thing to note here's if you misspell the variable name, the CLI won't show an error.

Set This would print a list of configuration variables that is overridden by user or hive.

set-v This would print all Hadoop and hive configuration variables.

add file [file] [file]* Adds a file to the list of resources

Add Jar Jarname

list FILE list all the files added to the distributed cache

list FILE [file]* Check If given resources is already added to distributed cache

! [CMD] executes a shell command from the hive shell

DFS [dfs cmd] Executes a DFS command from the hive shell

[query] Executes a hive query and prints results to

Source FILE Used to execute a script file inside the CLI.

2. Grammatical structure
Hive [-hiveconf x=y]* [<-i filename>]* [<-f filename>|<-e query-string;] [-S]

Description

1,-I from the file initialization HQL

2.-e executes the specified HQL from the command line

3.-F Execute HQL script

4,-V output executes the HQL statement to the console

5.-P Connect to Hive Server on port number

6.-hiveconf X=y (use the To set Hive/hadoop configuration variables)

7,-S: Indicates that a naming operation is performed in the form of a log that does not print

3. Example (1) Run a query
Hive-e "SELECT * from Cookie.cookie1;"

(2) running a file

Writing Hive.sql files

Running a written file

(3) Operation parameter file

Start hive from the configuration file and load configuration parameters from the configuration file

Second, the configuration of hive parameters 1, hive parameter configuration Daquan

Https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties

2. How to set the parameters of hive

When developing a hive application, it is inevitably necessary to set the parameters of the hive. Setting the parameters of Hive allows you to tune the execution efficiency of the HQL code or help locate the problem. However, one of the frequently encountered problems in practice is why the parameters set are not functioning? This is usually caused by the wrong way of setting

For general parameters, there are three ways to set it up:

1. configuration file (globally valid)

2. Command line parameters (valid for Hive launch instance)

3. Parameter declaration (valid for Hive connection session)

(1) configuration file

Configuration files for Hive include:

A. User-defined profile: $HIVE _conf_dir/hive-site.xml

B. Default profile: $HIVE _conf_dir/hive-default.xml

The user-defined configuration overrides the default configuration.

In addition, hive reads the configuration of Hadoop, because Hive is started as a client of Hadoop, and Hive's configuration overrides Hadoop's configuration.

Configuration file settings are valid for all Hive processes that are natively started.

(2) command line parameters

When you start Hive (client or Server mode), you can add-hiveconf param=value at the command line to set parameters such as:

This setting is valid for this startup session (which is the session of all requests for Server mode startup).

(3) Parameter declaration

Parameters can be set using the Set keyword in HQL, for example:

The scope of this setting is also the session level.

set hive.exec.reducers.bytes.per.reducer= average load data per reduce task hive estimates the total amount of data and then divides that value by the above parameter values to get the Number of Reducetask

Set hive.exec.reducers.max= Sets the limit on the number of reduce tasks

Set mapreduce.job.reduces= Specifies the number of fixed reduce tasks

However, this parameter < business logic determines that only one reduce task> hive will be ignored if necessary, such as set mapreduce.job.reduces = 3, but if you use order by in the HQL statement, you will suddenly The settings for this parameter are omitted.

The priority of the above three settings is incremented in turn. That is , the parameter declaration overrides the command-line arguments, and the command-line parameters override the configuration file Settings . Note that some system-level parameters, such as log4j-related settings, must be set in the first two ways, because the read of those parameters was completed before the session was established.

Hive Learning Pathway (18) shell operations for Hive

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.