Hive notes (self-paced)

Last Update:2016-05-05 Source: Internet

Author: User

Tags stocks s

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The first part: User Management

User created: Create DATABASE XXX

View User: Show DATABASES; key View User: Show DATABASES like ' de.* ' explanation: Creating a user is equivalent to creating a file in the physical directory that ends in. db, The default path is:/user/hive/warehouse/zqx.db When creating a user you can specify the path: Create DATABASE XX x &NB Sp location '/my/preferred/directory ' explanation: Convenient for later maintenance, Add description When you can create a user create database xxx &NBSP;CO Mment ' This is my first database '; When using describe XXX, you will see user description information Delete User: Drop dat Abase if exists xxx --> if exists is optional, plus can avoid database XXX does not exist and error explanation: In principle, hive is not allowed to delete a table containing the database. 1, the user will either delete all of the users under the table, and then delete the user 2, or in the deletion of the user statement plus keyword CASCADE, It means to have hive automatically delete all the tables under the user first &nbsp drop database if exists XXX cascade; If a user is deleted, the corresponding physical path and file are also deleted Part II: Table Management Creation table: Create tables if not EXISTS employees ( emp_id string comment ' id ', name string comment ' name ', phone_number string comment ' phone ', depar_id string comment ' depart_id ') comment ' employees_table ' Location '/user/hive/warehouse/zqx.db/employees '; the---If not EXISTS is optional, and if the table already exists, hive ignores the subsequent execution statement without prompting. ---location is the URL path that specifies the table in HDFs. View tables under a User: Show tables in ZQX; View table structure: DESC employees; Query a column: desc employees.phone_number; Modify table: Alter table Copy the table structure of the existing data table: CREATE table if not exists zqx.copy_table like zqx.employees; explained: The employees created above is called the "internal table" in hive, after the table is deleted , the data in the table is also deleted. Explanation: Internal tables are inconvenient to share data with other fields, assuming that the pig domain is used to manipulate data, the data in the Hive field will be used, may not have given pig the right to use, We can create an external table that points to this part of the data, so you can access the data external tables in hive directly: Create external table if not exists Departitions ( depart_id string comment ' depart_id ', depart_name string comment ' departition name ') Comment ' departition nam E ' row format delimited fields terminated by ', ' location '/user/hive/warehouse/zqx.db/departitions '- --the keyword external indicates that the table is an external table---External table feature: Hive does not think the table has this part of the data, and when the table is deleted, the data is not deleted. Query data table is an external table or internal table: Describe extended departitions; internal table : Tabletype:manager table &N Bsp external table: Tabletype:external tabl e partition table: Create TABLE departements ( depart_id string comment ' Depart ID ', depart_name string comment ' Depar T name ' ' Partitioned by (Acct_month string) row format delimited fields TerminatEd by ' | ' stored as textfile;---Specify to partition view partition table by Acct_month: Show partitions departments; View specified partition: Show partitions Departments partition (Acct_month = ' 201509 '); add partition to table: ALTER TABLE employees ADD partition (Acct_month = ' 201509 '); nbsp Table renaming: ALTER TABLE employees rename to employees_new; operations on table partitioning 1, adding table partitioning: ALTER TABLE employees add PA Rtition (acct_month = ' 201509 ') location = ' XXXX ' &NBS P , &NB Sp partition (Acct_month = ' 201510 ') location = ' XXXX ' &N Bsp , &NB Sp , &NB Sp &nbSp ... in the same how-to statement, you can add multiple partitions. 2, delete table partition: ALTER TABLE employees DROP partition (Acct_month = ' 201509 '); add column: ALTER TABLE employees ADD CO Lumns (alter_1 string comment ' alter one ', &NB Sp , &NB Sp alter_2 string comment ' alter '); Explanation: Field add Chinese comment error problem: due to inconsistent coding, Execute comment ", Chinese will be saved to MySQL, When the encoding method is inconsistent, the saved in will be garbled, so error. 1: When adding comments to a field, you need to convert the encoding, set it to utf-8

(1) Modify table field annotations and tables annotations

ALTER TABLE COLUMNS_V2 Modify column COMMENT varchar (n) character set UTF8;

ALTER TABLE table_params Modify column Param_value varchar (4000) character set UTF8;

(2) Modify the partition field annotations:

ALTER TABLE partition_params Modify column Param_value varchar (4000) character set UTF8;

ALTER TABLE partition_keys Modify column pkey_comment varchar (4000) character set UTF8; Resolution 2:Modify hive connection MySQL connection for utf-8 <property> <name></name>

<value>jdbc:mysql://ip:3306/hive?createdatabaseifnotexist=true&characterencoding=utf-8</value >
<DESCRIPTION>JDBC connect string for a JDBC metastore</description>
</property>

The third part: Data loading explanation: At present, row-level data insertions, changes, and deletions are not supported in hive, but to have data in the table, you can only load data through a file to a data table through a "large amount" of data loading operations. Load data:local inpath ' files are placed in the directory ' and if the directory does not exist, the directory will be created first overwrite into table employees & nbsp partition (Acct_month = ' 201509 '); ----> If employees is not a partitioned table, omit the partition clause local keyword: If the local keyword is used, then the "file placed directory" is the native file system path and the data is copied to the target location. If the local keyword is omitted, the path should be the path in the Distributed file system, in which case , the data is from sub- -style file system The system path to the target location (not copy) Summary: Load data Local. . Copy Local data to the target location on the distributed File system &NBS P Load Data ... From the HDFS system, transfer data from one cluster HDFs to another cluster HDFs explanation: Hive does not verify that the user-loaded data and the schema of the table match. Hive verifies that the file format is consistent with the table structure , &NB Sp , &NB Sp Part III: Data exportInsert overwrite local directory ' XXXX ' select * from Xxx;sqlldr Userid=mid/[email protected]Control=tag_file.ctlsqlldr Userid=mid/[email protected]Control=sqlldr_ora.ctl direct=true parallel=true--direct=true does not show process progress--Parallel=true concurrent Conversion code: ICONV-F UTF-8-T GB18 030 T_m_make_tag_file.txt1-o T_m_make_tag_file.txt Part IV: Data query Limit statement: Number of bars in a query record case ... When ... Then statement: Consistent with Oracle usage: What happens when Hive avoids mapreduce Query principle: Suppose the Query Employees table, is the Hive Access Employees table corresponding to the storage path of the file. 1, only in the case of partitioned fields as query conditions, does not trigger MapReduce, this situation is called: Local mode & nbsp Example: SELECT * FROM departments where Acct_month = ' 201509 ' --> acct_month is the partition field &NB Sp 2, attribute Hive.exec.mode.local.auto parameter set to True,hive will always use local mode to perform other actions In addition to the above two cases, Hive performs the operation, Will trigger MapReduce to execute all queries. Explanation: Hive initiates a mapreduce task for each join connection object. Explanation: About the order in which hive is executed, such as: select A.ymd,a.price_close,b.price_close,c.price_ close &NBSP;FROM stocks A JOIN stocks B on a.ymd = b.ymd join stocks C &NB Sp;on A.ymd = c.ymd where a.symbol = ' AAPL ' &NBS P and b.symbol = ' IBM ' &NBsp;and c.symbol = ' GE '; example, first a mapreduce job is started to connect table A and Table B, &NBSP ; Then start a mapreduce job to connect the output of the first MapReduce job to table C and nbsp hive is executed in the order of execution from left to right. about join connections 1, Optimization: In the above SQL statement, tables A, B, and C are used in the join association when the associated field is YMD, which is the case . (that is, the associated field you are using is the same field) : When 3 or more tables are associated with a join, if each on is using the same associated field, it will only produce the one mapreduce. 2, relevance principle: hive When executing a SQL script, the last table in the query is assumed to be the largest amount of data, and when associated with each row of records, hive caches the other tables and then scans the last table. So put the table with the largest amount of data at the end. 3, hive tagging mechanism: When writing SQL scripts, it is not always necessary to place the largest table at the end of a query statement, because Hive provides a "tag" that specifies which table has the largest amount of data, and which table the hive will first scan. For example: select/*+streamtable (s) */ S.ymd, S.symbol, D.price_close From stocks S joins dividends D on s.ymd = D.YMD; Data type conversions: Double type---> String type cast (CAST (user_id as bigint) as String) data type:

Data type	Occupied bytes	Start Support version
TINYINT	1byte,-128 ~ 127
SMALLINT	2byte,-32,768 ~ 32,767
Int	4byte,-2,147,483,648 ~ 2,147,483,647
BIGINT	8byte,-9,223,372,036,854,775,808 ~ 9,223,372,036,854,775,807
BOOLEAN
FLOAT	4byte Single Precision
DOUBLE	8byte Double Precision
STRING
BINARY		Support Starting from Hive0.8.0
TIMESTAMP		Support Starting from Hive0.8.0
DECIMAL		Support Starting from Hive0.11.0
CHAR		Support Starting from Hive0.13.0
VARCHAR		Support Starting from Hive0.12.0
DATE		Support Starting from Hive0.12.0

Hive notes (self-paced)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More