Hive Programming Guide _ Learning Note 01

Source: Internet
Author: User

The fourth chapter: HQL data definition
1: Create a database
Create DATABASE Financials;
Create database if not exists financials;


2: View Database
show databases;
Fuzzy Query Database
Show databases like ' h.* ';


3: Create database to modify the default location of the database
Create DATABASE Financials localtion '/my/preferred/directory '


4: Add descriptive information to the database
Create DATABASE Financials Comment ' holds all financials tables '
5: Information that shows the description of the database
Describe database financials;
6: Add key-value pair property information for some and related properties
Create DATABASE Financials
With Dbproperties (' create ' = ' Mark moneybags ', ' data ' = ' 2012-12-12 ');
Describe database extended financials;

7: There is no command prompt to let the user see the database that is currently located. Use can be reused
Use financials;
You can display the currently located database in the prompt by setting a property value
Set hive.cli.print.current.db = true;
Set hive.cli.print.current.db= false;

8: Delete Database
Drop database if exists financials;
Hive is not allowed to delete a database that contains a table.
If you add the keyword: cascade, it's ready, Hive automatically deletes the table in the database
Drop database if exists financials cascade;

9: Modify database, set Dbproperties key value pair property value
Alert Database Financials Set Dbproperties (' edited-by ' = ' Joe dba ');

10: CREATE TABLE:
CREATE table if not EXISTS employees (
Name string Comment ' Employee name ',
Salary float Comment ' Employee salary ',
Subordinates array<string> Comment ' Employee name of subordinates ',
Deductions Map<string,float>
Address struct<street:string,city:string,state:string,zip:int>
)
Comment ' Description of the table '
Tblproperties (' creater ' = ' me ', ' created_at ' = ' 2012-12-12 ');
Location '/user/hive/warehouse/mydb.db/employees '

--Tblproperties's main function is: key-value pairs in the format of the table to add additional documentation description

11: Enumerate the Tblproperties property information for a table
Show Tblproperties employees;

12: Copy Table
CREATE table if not exists mydb.employees2 like Mydb.employees2


13: Select Database
Use MyDB
Show Table
Show tables;
Show tables in MyDB;
14: View detailed result information for this table
Describe extended mydb.employees
Use the formatted keyword instead Extended
Describe formatted Mydb.employees

15: Management table: Internal table: Deletes the table's data when the table is deleted
Create an external table: it can read all the comma-separated data in the/data/stocks directory
Create external table if not existsStocks
Exchange String,
Symbol String,
Ymd String,
Price_open float,
Price_hight float,
Price_low float,
Price_close float,
Volume Int,price_adj_close float)
Row format delimited terminated by ', '
Location '/data/stocks '

16: See if the table is a management table or an external table
Describe extended tablename
Output information:
tabletype.managed_table--Management Table
tabletype.external_table--External Table

--Copy tables but not copy data
CREATE table if not EXISTS mydb.employees3 (new table)
Like Mydb.employees2 (original table) location '/data/stocks '


17: Create a partitioned table
CREATE TABLE Employees (
Name String,
Salary float,
Subordinates Array<string>
Deductions Map<string,float>
Address struct<street:string,city:string,state:string,zip:int>
)
Partitioned by (country string,state String);

Partition from segment:
Country String,state String is the same as a normal field, which is equivalent to an indexed field,
Query based on partitioned fields, improve query performance by submitting efficiency

18:set hive.mapred.mode=strict;
If you query a partitioned table and the WHERE clause is not filtered by partition,
It will be forbidden to submit this task.
Can be set to: Nostrict

19: View all partitions that exist in the table
Show partitions employees;

20: See if a partition with a specific partition key is stored
Show partitions employees partition (country= ' US ');
Describe Extended employees command also displays the partition key


The most common scenario for managing large production datasets: Using an external partition table
21: In the Management table, the user can create the partition by loading the data:
Load data local inpath '/home/hive/california-employees '
into table Employees
Partition (country= ' US ', state= ' CA ');


Hive will create the directory corresponding to this partition ..../employees/country=us/state=ca


22: Create external partition table

CREATE table if not exists log_messages (
HMS Int,
Severity string,
Server String,
process_id int,
Message string

)
Partitioned by (year int,month Int,day int)
Row format delimited fields terminated by ' \ t '





1:order by will sort the input globally

2:sort can control that each of the files produced by the reduce is sorted, and then a number of good files sorted two times merge sort.

The sort by features are as follows:
1). The sort by is basically affected by whether Hive.mapred.mode is strict, nonstrict, but partitioning is required if there is a partition.
2). The data in the sort by is sorted by the specified field in the same reduce.
3). Sort by can specify the number of reduce to be executed, for example: Set Mapred.reduce.tasks=5, and then perform a merge order on the output data, that is, all results can be obtained.

Result Description: In strict mode, sort by does not specify a limit number and can be executed normally. Sort by is less affected by hive.mapred.mode=sctrict.

3:distribute by
Distribute by IS to control how the map end is split to the reduce side. The hash algorithm is used by default for data distribution based on the number of columns behind distribute by and reduce. Distribute can use the length method to divide the lengths of string types into different reduce and eventually output to different files. Length is a built-in function, or you can specify other functions or this uses custom functions.

4:cluster by

Cluster by, in addition to the distribute by function, will also sort the field, so cluster by = Distribute by +sort by



































Hive Programming Guide _ Learning Note 01

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.