Hadoop hive SQL (hql) syntax explanation

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

DDL operations
Create a table
Hive> Create Table pokes (FOO int, bar string );
Create a table and create an index field DS
Hive> Create Table invites (FOO int, bar string) partitioned by (DS string );
Show all tables
Hive> show tables;
Displays tables based on positive conditions (regular expressions,
Hive> show tables '. * s ';
Add a column to the table
Hive> alter table pokes add columns (new_col INT );
Add a column and add column field comments
Hive> alter table invites add columns (new_col2 int comment 'A comment ');
Change table name
Hive> alter table events Rename to 3 koobecaf;
Delete column
Hive> drop table pokes;
Metadata Storage
Load the data in the file into the table
Hive> load data local inpath'./examples/files/kv1.txt 'Overwrite into Table pokes;
Load local data and specify partition information
Hive> load data local inpath'./examples/files/kv2.txt 'Overwrite into Table invites partition (DS = '2017-08-15 ');
Load DFS data and specify partition information
Hive> load data inpath'/user/myname/kv2.txt 'Overwrite into Table invites partition (DS = '2017-08-15 ');
The above command will load data from an HDFS file/directory to the table. note that loading data from HDFS will result in moving the file/directory. as a result, the operation is almost instantaneous.
SQL operations
Query by presto
Hive> select a. Foo from invites a where a. DS = '<date> ';
Output query data to a directory
Hive> insert overwrite directory '/tmp/hdfs_out' select a. * From invites a where a. DS = '<date> ';
Output query results to local directory
Hive> insert overwrite local directory '/tmp/local_out' select a. * From pokes;
Select all columns to the local directory
Hive> insert overwrite table events select a. * from profiles;
Hive> insert overwrite table events select a. * from profiles a where a. Key <100;
Hive> insert overwrite local directory '/tmp/reg_3' select a. * from events;
Hive> insert overwrite directory '/tmp/reg_4' select a. Invites, A. Pokes from profiles;
Hive> insert overwrite directory '/tmp/reg_5' select count (1) From invites a where a. DS = '<date> ';
Hive> insert overwrite directory '/tmp/reg_5' select a. Foo, A. Bar from invites;
Hive> insert overwrite local directory '/tmp/Sum' select sum (A. PC) from pC1;
Insert the statistical results of one table into another table
Hive> from invites a insert overwrite table events select a. Bar, count (1) Where a. Foo> 0 group by A. bar;
Hive> insert overwrite table events select a. Bar, count (1) From invites a where a. Foo> 0 group by A. bar;
Join
Hive> from pokes T1 join invites T2 on (t1.bar = t2.bar) insert overwrite table events select t1.bar, t1.foo, t2.foo;
Insert multi-table data into the same table
From SRC
Insert overwrite table dest1 select SRC. * Where SRC. Key <100
Insert overwrite table dest2 select SRC. Key, SRC. value where SRC. Key> = 100 and SRC. Key <200
Insert overwrite table dest3 partition (DS = '1970-04-08 ', HR = '12') Select SRC. Key where SRC. Key> = 2008 and SRC. Key <200
Insert overwrite local directory '/tmp/dest4.out' select SRC. value where SRC. Key >=300;
Insert a file stream directly into the file
Hive> from invites a insert overwrite table events select transform (. foo,. bar) as (oof, Rab) using '/bin/cat' where. DS> '2017-08-09 ';
This streams the data in the map phase through the script/bin/CAT (like hadoop streaming). Similarly-streaming can be used on the reduce side (please see the hive tutorial or examples)
Actual Example
Create a table
Create Table u_data (
Userid int,
Movieid int,
Rating int,
Unixtime string)
Row format delimited
Fields terminated by '/t'
Stored as textfile;
Download the sample data file and decompress it.
Wget http://www.grouplens.org/system/files/ml-data.tar__0.gz
Tar xvzf ml-data.tar__0.gz
Load data to a table
Load data local inpath 'ml-data/U. data'
Overwrite into Table u_data;
Total statistics
Select count (1) From u_data;
Perform some complex data analysis
Create a weekday_mapper.py file, which is divided by week as data.
Import sys
Import datetime

For line in SYS. stdin:
Line = line. Strip ()
Userid, movieid, rating, unixtime = line. Split ('/t ')
Weekly information of generated data
Weekday = datetime. datetime. fromtimestamp (float (unixtime). isoweekday ()
Print '/t'. Join ([userid, movieid, rating, STR (weekday)])
Use the ing script
// Create a table and split the Field Values in the row by delimiter
Create Table u_data_new (
Userid int,
Movieid int,
Rating int,
Weekday INT)
Row format delimited
Fields terminated by '/T ';
// Load the python file to the System
Add file weekday_mapper.py;
Split data by week
Insert overwrite table u_data_new
Select
Transform (userid, movieid, rating, unixtime)
Using 'python weekday_mapper.py'
As (userid, movieid, rating, weekday)
From u_data;

Select weekday, count (1)
From u_data_new
Group by weekday;
Processing Apache weblog data
Combine Web logs with regular expressions and input them to the table based on the required conditions.
Add jar ../build/contrib/hive_contrib.jar;

Create Table apachelog (
Host string,
Identity string,
User string,
Time string,
Request string,
Status string,
Size String,
Referer string,
Agent string)
Row format serde 'org. Apache. hadoop. hive. contrib. serde2.regexserde'
With serdeproperties (
"Input. regEx "=" ([^] *) ([^] *) ([^] *) (-| // [[^/] * //]) ([^/"] * |/" [^/"] */") (-| [0-9] *) (-| [0-9] *) (? : ([^/"] * |/" [^/"] */") ([^/"] * |/" [^/"] */")? ",
"Output. format. string "=" % 1 $ S % 2 $ S % 3 $ S % 4 $ S % 5 $ S % 6 $ S % 7 $ S % 8 $ S % 9 $ s"
)
Stored as textfile;

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop hive SQL (hql) syntax explanation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop hive SQL (hql) syntax explanation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support