Common SQL command operations for Hive

Source: Internet
Author: User
Tags time and date

Hive provides a number of functions that can be listed on the command line functions all functions, you will find that these function names are very similar to MySQL, the vast majority of the same, through the describe function functionname to view functions using methods. The data types supported by hive are simple int (4 byte integer), BIGINT (8 byte integer), FLOAT (single precision), double (double precision), BOOLEAN, String and other atomic types, even datetime type is not supported, but through To_date, Unix_timestamp, Date_diff, Date_add, date_sub and other functions can be completed MySQL same time and date complex operation. The following example: SELECT * FROM tablename where to_date (cz_time) > to_date (' 2050-12-31 '); select * FROM TableName where Unix_timestam P (cz_time) > Unix_timestamp (' 2050-12-31 15:32:28 '); Partition hive is somewhat different from MySQL partition, where the MySQL partition is partitioned with fields from the table structure (Range,list,hash, etc.), and hive differs by manually specifying the partition column, which is independent of the table structure, but belongs to a column in the table, and specifies the partition manually when the data is loaded.

Create a table

hive> CREATE TABLE pokes (foo INT, bar STRING COMMENT ' This is bar ');

Create a table and create an indexed field DS

Hive> CREATE TABLE invites (foo INT, bar string) partitioned by (DS string);

Show All Tables

Hive> SHOW TABLES;

Displays the table by a positive condition (regular expression),

hive> SHOW TABLES '. *s ';

Table adds a column

Hive> ALTER TABLE pokes ADD COLUMNS (New_col INT);

Add a column and add a column field comment

Hive> ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT ' a COMMENT ');

Change table name

hive> ALTER TABLE Events RENAME to 3KOOBECAF;

Delete Column

Hive> DROP TABLE pokes;

Meta data storage

To load data from a local file into a table

hive> LOAD DATA LOCAL inpath './examples/files/kv1.txt ' OVERWRITE into TABLE pokes;

Load local data, given partition information

hive> LOAD DATA LOCAL inpath './examples/files/kv2.txt ' OVERWRITE into TABLE invites PARTITION (ds= ' 2008-08-15 ');

Loading DFS data with a given partition information

hive> LOAD DATA inpath '/user/myname/kv2.txt ' OVERWRITE into TABLE invites PARTITION (ds= ' 2008-08-15 ');

The above command would load data from a HDFS file/directory to the table. Note that loading data from HDFS would result in moving the file/directory. As a result, the operation is almost instantaneous.

SQL operations

Search by First piece

Hive> SELECT A.foo from invites a WHERE a.ds= ';

Output query data to a directory

hive> INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out ' SELECT a.* from invites a WHERE a.ds= ';

Output query results to a local directory

hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/local_out ' SELECT a.* from pokes A;

Select all columns to a local directory

Hive> INSERT OVERWRITE TABLE events SELECT a.* from profiles A;

Hive> INSERT OVERWRITE TABLE events SELECT a.* from profiles a WHERE A.key < 100;

hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/reg_3 ' SELECT a.* from events A;

hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_4 ' select A.invites, a.pokes from profiles A;

hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5 ' SELECT COUNT (1) from invites a WHERE a.ds= ';

hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5 ' SELECT A.foo, A.bar from invites A;

hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/sum ' SELECT sum (a.pc) from PC1 A;

Insert a table's statistical results into another table

Hive> from invites a inserts OVERWRITE TABLE events SELECT A.bar, COUNT (1) WHERE a.foo > 0 GROUP by A.bar;

Hive> INSERT OVERWRITE TABLE Events SELECT A.bar, COUNT (1) from invites a WHERE a.foo > 0 GROUP by A.bar;

JOIN

Hive> from pokes T1 joins invites T2 on (T1.bar = t2.bar) INSERT OVERWRITE TABLE events SELECT T1.bar, T1.foo, T2.foo;

Inserting multiple table data into the same table

From SRC

INSERT OVERWRITE TABLE dest1 SELECT src.* WHERE Src.key < 100

INSERT OVERWRITE TABLE dest2 SELECT src.key, src.value WHERE src.key >= and Src.key < 200

INSERT OVERWRITE TABLE dest3 PARTITION (ds= ' 2008-04-08 ', hr= ') SELECT src.key WHERE Src.key >= < 3 00

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/dest4.out ' SELECT src.value WHERE src.key >= 300;

Inserting a file stream directly into a file

Hive> from invites a INSERT OVERWRITE TABLE events SELECT TRANSFORM (A.foo, A.bar) as (Oof, Rab) USING '/bin/cat ' WHERE A.ds > ' 2008-08-09 ';

This streams the data in the map phase through the Script/bin/cat (like Hadoop streaming). Similarly-streaming can used on the reduce side (please see the Hive Tutorial or examples)

Practical examples

Create a table

CREATE TABLE U_data (

UserID INT,

MovieID INT,

Rating INT,

Unixtime STRING)

ROW FORMAT Delimited

Fields TERMINATED by ' \ t '

STORED as Textfile;

Download the sample data file and unzip

wget http://www.grouplens.org/system/files/ml-data.tar__0.gz

Tar xvzf ml-data.tar__0.gz

Loading data into a table

LOAD DATA LOCAL inpath ' Ml-data/u.data '

OVERWRITE into TABLE u_data;

Total statistical data

SELECT COUNT (1) from U_data;

Now do some complicated data analysis.

Create a weekday_mapper.py: file, split as data by week

Import Sys

Import datetime

For line in Sys.stdin:

line = Line.strip ()

UserID, MovieID, rating, unixtime = line.split (' \ t ')

Generate weekly information for data

Weekday = Datetime.datetime.fromtimestamp (float (unixtime)). Isoweekday ()

print ' \ t '. Join ([UserID, MovieID, rating, STR (weekday)])

Using Mapping scripts

Create a table, split the field values in rows by the delimiter

CREATE TABLE U_data_new (

UserID INT,

MovieID INT,

Rating INT,

Weekday INT)

ROW FORMAT Delimited

Fields TERMINATED by ' \ t ';

To load a python file into the system

Add FILE weekday_mapper.py;

Split the data by week

INSERT OVERWRITE TABLE u_data_new

SELECT

TRANSFORM (userid, MovieID, rating, Unixtime)

USING ' Python weekday_mapper.py '

As (userid, MovieID, rating, weekday)

From U_data;

SELECT Weekday, COUNT (1)

From U_data_new

GROUP by weekday;

Common SQL command operations for Hive

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.