Hive provides a number of functions that can be listed on the command line functions all functions, you will find that these function names are very similar to MySQL, the vast majority of the same, through the describe function functionname to view functions using methods. The data types supported by hive are simple int (4 byte integer), BIGINT (8 byte integer), FLOAT (single precision), double (double precision), BOOLEAN, String and other atomic types, even datetime type is not supported, but through To_date, Unix_timestamp, Date_diff, Date_add, date_sub and other functions can be completed MySQL same time and date complex operation. The following example: SELECT * FROM tablename where to_date (cz_time) > to_date (' 2050-12-31 '); select * FROM TableName where Unix_timestam P (cz_time) > Unix_timestamp (' 2050-12-31 15:32:28 '); Partition hive is somewhat different from MySQL partition, where the MySQL partition is partitioned with fields from the table structure (Range,list,hash, etc.), and hive differs by manually specifying the partition column, which is independent of the table structure, but belongs to a column in the table, and specifies the partition manually when the data is loaded.
Create a table
hive> CREATE TABLE pokes (foo INT, bar STRING COMMENT ' This is bar ');
Create a table and create an indexed field DS
Hive> CREATE TABLE invites (foo INT, bar string) partitioned by (DS string);
Show All Tables
Hive> SHOW TABLES;
Displays the table by a positive condition (regular expression),
hive> SHOW TABLES '. *s ';
Table adds a column
Hive> ALTER TABLE pokes ADD COLUMNS (New_col INT);
Add a column and add a column field comment
Hive> ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT ' a COMMENT ');
Change table name
hive> ALTER TABLE Events RENAME to 3KOOBECAF;
Delete Column
Hive> DROP TABLE pokes;
Meta data storage
To load data from a local file into a table
hive> LOAD DATA LOCAL inpath './examples/files/kv1.txt ' OVERWRITE into TABLE pokes;
Load local data, given partition information
hive> LOAD DATA LOCAL inpath './examples/files/kv2.txt ' OVERWRITE into TABLE invites PARTITION (ds= ' 2008-08-15 ');
Loading DFS data with a given partition information
hive> LOAD DATA inpath '/user/myname/kv2.txt ' OVERWRITE into TABLE invites PARTITION (ds= ' 2008-08-15 ');
The above command would load data from a HDFS file/directory to the table. Note that loading data from HDFS would result in moving the file/directory. As a result, the operation is almost instantaneous.
SQL operations
Search by First piece
Hive> SELECT A.foo from invites a WHERE a.ds= ';
Output query data to a directory
hive> INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out ' SELECT a.* from invites a WHERE a.ds= ';
Output query results to a local directory
hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/local_out ' SELECT a.* from pokes A;
Select all columns to a local directory
Hive> INSERT OVERWRITE TABLE events SELECT a.* from profiles A;
Hive> INSERT OVERWRITE TABLE events SELECT a.* from profiles a WHERE A.key < 100;
hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/reg_3 ' SELECT a.* from events A;
hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_4 ' select A.invites, a.pokes from profiles A;
hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5 ' SELECT COUNT (1) from invites a WHERE a.ds= ';
hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5 ' SELECT A.foo, A.bar from invites A;
hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/sum ' SELECT sum (a.pc) from PC1 A;
Insert a table's statistical results into another table
Hive> from invites a inserts OVERWRITE TABLE events SELECT A.bar, COUNT (1) WHERE a.foo > 0 GROUP by A.bar;
Hive> INSERT OVERWRITE TABLE Events SELECT A.bar, COUNT (1) from invites a WHERE a.foo > 0 GROUP by A.bar;
JOIN
Hive> from pokes T1 joins invites T2 on (T1.bar = t2.bar) INSERT OVERWRITE TABLE events SELECT T1.bar, T1.foo, T2.foo;
Inserting multiple table data into the same table
From SRC
INSERT OVERWRITE TABLE dest1 SELECT src.* WHERE Src.key < 100
INSERT OVERWRITE TABLE dest2 SELECT src.key, src.value WHERE src.key >= and Src.key < 200
INSERT OVERWRITE TABLE dest3 PARTITION (ds= ' 2008-04-08 ', hr= ') SELECT src.key WHERE Src.key >= < 3 00
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/dest4.out ' SELECT src.value WHERE src.key >= 300;
Inserting a file stream directly into a file
Hive> from invites a INSERT OVERWRITE TABLE events SELECT TRANSFORM (A.foo, A.bar) as (Oof, Rab) USING '/bin/cat ' WHERE A.ds > ' 2008-08-09 ';
This streams the data in the map phase through the Script/bin/cat (like Hadoop streaming). Similarly-streaming can used on the reduce side (please see the Hive Tutorial or examples)
Practical examples
Create a table
CREATE TABLE U_data (
UserID INT,
MovieID INT,
Rating INT,
Unixtime STRING)
ROW FORMAT Delimited
Fields TERMINATED by ' \ t '
STORED as Textfile;
Download the sample data file and unzip
wget http://www.grouplens.org/system/files/ml-data.tar__0.gz
Tar xvzf ml-data.tar__0.gz
Loading data into a table
LOAD DATA LOCAL inpath ' Ml-data/u.data '
OVERWRITE into TABLE u_data;
Total statistical data
SELECT COUNT (1) from U_data;
Now do some complicated data analysis.
Create a weekday_mapper.py: file, split as data by week
Import Sys
Import datetime
For line in Sys.stdin:
line = Line.strip ()
UserID, MovieID, rating, unixtime = line.split (' \ t ')
Generate weekly information for data
Weekday = Datetime.datetime.fromtimestamp (float (unixtime)). Isoweekday ()
print ' \ t '. Join ([UserID, MovieID, rating, STR (weekday)])
Using Mapping scripts
Create a table, split the field values in rows by the delimiter
CREATE TABLE U_data_new (
UserID INT,
MovieID INT,
Rating INT,
Weekday INT)
ROW FORMAT Delimited
Fields TERMINATED by ' \ t ';
To load a python file into the system
Add FILE weekday_mapper.py;
Split the data by week
INSERT OVERWRITE TABLE u_data_new
SELECT
TRANSFORM (userid, MovieID, rating, Unixtime)
USING ' Python weekday_mapper.py '
As (userid, MovieID, rating, weekday)
From U_data;
SELECT Weekday, COUNT (1)
From U_data_new
GROUP by weekday;
Common SQL command operations for Hive