Hive metadata Parsing

Source: Internet
Author: User
Tags serdes
Hive metadata parsing this article is a Hive metadata table prepared by the author. If there is any inaccuracy, please pat it. I will add it later. 1. hive0.11 meta data Table Summary online Hive0.11metastore includes the following 39 tables, mainly divided into the following categories: Database-related Table-related data storage-related SDSCOLUMN-related SERDE-related (serialization) P

Hive metadata parsing this article is a Hive metadata table prepared by the author. If there is any inaccuracy, please pat it. I will add it later. 1. hive 0.11 meta data Table Summary online Hive 0.11 metastore includes the following 39 tables, mainly divided into the following categories: Database-related Table-related data storage-related sds column-related SERDE-related (serialization) P

Hive metadata Parsing

This is the Hive metadata table that I have prepared. If there is any inaccuracy, pat it. I will add it later.

1. Hive 0.11 yuan data table Summary

Online Hive 0.11 metastore contains 39 tables, which are divided into the following types:

Database Problems

Table-related

SDS related to data storage

COLUMN-related

SERDE-related (serialization)

Partition)

SKEW-related (data SKEW)

BUCKET-related (BUCKET-based)

PRIVS related (permission management)

mysql> show tables;+---------------------------+| Tables_in_hive_yz_test    |+---------------------------+| BUCKETING_COLS            || CDS                       || COLUMNS_V2                || DATABASE_PARAMS           || DBS                       || DB_PRIVS                  || GLOBAL_PRIVS              || IDXS                      || INDEX_PARAMS              || NUCLEUS_TABLES            || PARTITIONS                || PARTITION_EVENTS          || PARTITION_KEYS            || PARTITION_KEY_VALS        || PARTITION_PARAMS          || PART_COL_PRIVS            || PART_COL_STATS            || PART_PRIVS                || ROLES                     || ROLE_MAP                  || SDS                       || SD_PARAMS                 || SEQUENCE_TABLE            || SERDES                    || SERDE_PARAMS              || SKEWED_COL_NAMES          || SKEWED_COL_VALUE_LOC_MAP  || SKEWED_STRING_LIST        || SKEWED_STRING_LIST_VALUES || SKEWED_VALUES             || SORT_COLS                 || TABLE_PARAMS              || TAB_COL_STATS             || TBLS                      || TBL_COL_PRIVS             || TBL_PRIVS                 || TYPES                     || TYPE_FIELDS               || VERSION                   |+---------------------------+39 rows in set (0.00 sec)

2. Meaning of each table

2.1 Database table: DBS

Describes the metadata information of Hive Database stored in this table. DB_ID indicates the Database ID, NAME indicates the Database NAME, DB_LOCATION_URI indicates the location of the Database in HDFS, and DESC indicates the description of the Database.

Mysql> desc DBS; + ----------------- + --------------- + ------ + ----- + --------- + ------- + | Field | Type | Null | Key | Default | Extra | + ----------------- + ------------- + ------ + ----- + --------- + ------- + | DB_ID | bigint (20) | NO | PRI | NULL | DESC | varchar (4000) | YES | NULL | DB_LOCATION_URI | varchar (4000) | NO | NULL | NAME | varchar (128) | YES | UNI | NULL | + ----------------- + --------------- + ------ + ----- + --------- + ------- + example: the following information about the acorn_3g database indicates that the database ID is 81 and the path is hdfs: // YZSJHL19-42.opi.com/warehouse.test/acorn_3g.dbmysql> select * from DBS where NAME = 'acorn _ 3g '; + ------- + ------ + bytes + ---------- + | DB_ID | DESC | DB_LOCATION_URI | NAME | + ------- + ------ + bytes + ---------- + | 81 | NULL | hdfs: // YZSJHL19-42.opi.com/warehouse.test/acorn_3g.db | acorn_3g | + ------- + ------ + -------------------------------------------------- +

2. 2 Table

Description:

Tsung stores the metadata information of Hive Table. Each Table has a unique TBL_ID

The foreign key of SD_ID points to the Database to which it belongs, and SD_IID associates with the primary key of the SDS table. SDS stores information such as the column CD_ID. Tsung. SD_ID is associated with SDS. SD_ID, SDS. SD_ID is associated with CDS. CD_ID,

CDS. CD_ID Association COLUMNS_V2.CD_ID

Example: * acorn_3g.user_act table information: TBL_ID is 41231, TBL_TYPE is MANAGED_TABLE normal table (if the value is EXTERNAL, it indicates EXTERNAL table), DB_ID is 81, it indicates Database belonging to DB_ID = 81.

mysql> select * from TBLS where TBL_NAME='user_act' and DB_ID=81 \G*************************** 1. row ***************************            TBL_ID: 41231       CREATE_TIME: 1366188055             DB_ID: 81  LAST_ACCESS_TIME: 0             OWNER: xianbing.liu         RETENTION: 0             SD_ID: 263311          TBL_NAME: user_act          TBL_TYPE: MANAGED_TABLEVIEW_EXPANDED_TEXT: NULLVIEW_ORIGINAL_TEXT: NULL

2.3 SDS table (Data Storage table)

Description:

The SDS table stores all HDFS data file information in the Hive Data Warehouse. Each SD_ID uniquely identifies a data storage record.

CD_ID is associated with COLUMN_V2.CD_ID to specify the field information of the data.

SERDE_ID is associated with SERDES. SERDE_ID to specify the serialization information of the data (for example, whether the data is a serialized table or a DELIMITED field)

Example:

According to the SDS table, the CD_ID of the acorn_3g.user_act table is 263311, and SERDE_ID is 263301. The default storage location is

hdfs://YZSJHL19-42.opi.com/warehouse.test/acorn_3g.db/user_actmysql> select * from SDS where SD_ID=263311 \G*************************** 1. row ***************************                    SD_ID: 263311                    CD_ID: 263311             INPUT_FORMAT: org.apache.hadoop.mapred.TextInputFormat            IS_COMPRESSED:                 LOCATION: hdfs://YZSJHL19-42.opi.com/warehouse.test/acorn_3g.db/user_act              NUM_BUCKETS: -1            OUTPUT_FORMAT: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                 SERDE_ID: 263301IS_STOREDASSUBDIRECTORIES: 

2.4 CDS and COLUMN_V2 (column information)

CDS table

Description:

The table has only one field, CD_ID, which is permanently stored in the Hive data warehouse.

Example:

We can see that the CD_ID corresponding to the acorn_3g.user_act table is recorded in the CDS

mysql> desc CDS;+-------+------------+------+-----+---------+-------+| Field | Type       | Null | Key | Default | Extra |+-------+------------+------+-----+---------+-------+| CD_ID | bigint(20) | NO   | PRI | NULL    |       |+-------+------------+------+-----+---------+-------+1 row in set (0.00 sec)mysql> select * FROM CDS where CD_ID=263311;+--------+| CD_ID  |+--------+| 263311 |+--------+1 row in set (0.00 sec)

COLUMN_V2 table

Description:

This table stores information about all fields corresponding to a CD_ID.

Example:

View the COLUMN information of the acorn_3g.user_act table. We can see that the acorn_3g.user_act table has 14 fields, COLUMN_NAME is the field name, TYPE_NAME is the field type, and INTEGER_IDX is the field number.

Mysql & gt; select * from COLUMNS_V2 where CD_ID = 263311 order by integer_idx; + -------- + --------- + response + ----------- + | CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX | + -------- + --------- + response + ----------- + ------------- + | 263311 | NULL | id | bigint | 0 | 263311 | NULL | action_id | int | 1 | 263311 | NULL | user_id | bigint | 2 | 263311 | NULL | request | string | 3 | 263311 | NULL | visit_time | string | 4 | 263311 | NULL | source_id | int | 5 | 263311 | NULL | sess_id | string | 6 | 263311 | NULL | mobile_number | string | 7 | 263311 | NULL | from_id | string | 8 | 263311 | NULL | app_id | string | 9 | 263311 | NULL | version | string | 10 | 263311 | NULL | reg_type | int | 11 | 263311 | NULL | uniqid | string | 12 | 263311 | NULL | failure | int | 13 | + -------- + --------- + --------------- + ----------- + ------------- + 2.5 SERDES and SERDE_PARAMS (serialization)

Description:

SERDES stores all serialization information (SERDE_ID, SLIB). SLIB indicates the Java class used for serialization.

SERDES_PARAMS stores serialization parameters and values

Example:

SERDE_ID = 263301 for the acorn_3g.user_act table indicates that the hive default serialization class org is used. apache. hadoop. hive. serde2.lazy. lazySimpleSerDe, DELIMITED character is '\ t' (that is, the name specified when the table is created... delimited by '\ t '...) mysql> select * from serdes where SERDE_ID = 263301; + ---------- + ------ + region + | SERDE_ID | NAME | SLIB | + ---------- + ------ + region + | 263301 | NULL | org. apache. hadoop. hive. serde2.lazy. lazySimpleSerDe | + ---------- + ------ + response + mysql> select SERDE_ID, PARAM_KEY, REPLACE (PARAM_VALUE, '\ t',' \ t') from SERDE_PARAMS where SERDE_ID = 263301; + ---------- + -------------------- + response + | SERDE_ID | PARAM_KEY | REPLACE (PARAM_VALUE, '\ t',' \ t') | + ---------- + -------------------- + field + | 263301 |. delim | \ t | 263301 | serialization. format | \ t | + ---------- + ---------------------- + ------------------------------- +

2.6 PARTITIONS PARTITION_KEY and PARTITION_KEY_VALS (partition)

PARTITION_KEYS

Description:

PARTITION_KEYS stores all partition tables used for partition Fields

Example:

Use PARTITION_KEYS to view the partition information of acorn_3g.user_act. You can see that the table is a partition table with the partition field log_date. The INTEGER_IDX indicates the sequence number of the partition field, which corresponds to the partition field one by one.

mysql> desc PARTITION_KEYS;+--------------+---------------+------+-----+---------+-------+| Field        | Type          | Null | Key | Default | Extra |+--------------+---------------+------+-----+---------+-------+| TBL_ID       | bigint(20)    | NO   | PRI | NULL    |       || PKEY_COMMENT | varchar(4000) | YES  |     | NULL    |       || PKEY_NAME    | varchar(128)  | NO   | PRI | NULL    |       || PKEY_TYPE    | varchar(767)  | NO   |     | NULL    |       || INTEGER_IDX  | int(11)       | NO   |     | NULL    |       |+--------------+---------------+------+-----+---------+-------+mysql> select * FROM PARTITION_KEYS WHERE TBL_ID=41231;+--------+--------------+-----------+-----------+-------------+| TBL_ID | PKEY_COMMENT | PKEY_NAME | PKEY_TYPE | INTEGER_IDX |+--------+--------------+-----------+-----------+-------------+|  41231 | NULL         | log_date  | string    |           0 |+--------+--------------+-----------+-----------+-------------+PARTITIONS

Description:

PARTITIONS stores all the partition information of the Hive Data Warehouse. Each partition is identified by PART_ID, where TBL_ID is the Table and SD_ID is the SDS (see 2.3)

Example:

View the partition information of the acorn_3g.user_act table in the PARTITIONS table. For example, PART_ID is 168301, its name is log_date =, and SD_ID is 231621.

mysql> desc PARTITIONS;+------------------+--------------+------+-----+---------+-------+| Field            | Type         | Null | Key | Default | Extra |+------------------+--------------+------+-----+---------+-------+| PART_ID          | bigint(20)   | NO   | PRI | NULL    |       || CREATE_TIME      | int(11)      | NO   |     | NULL    |       || LAST_ACCESS_TIME | int(11)      | NO   |     | NULL    |       || PART_NAME        | varchar(767) | YES  | MUL | NULL    |       || SD_ID            | bigint(20)   | YES  | MUL | NULL    |       || TBL_ID           | bigint(20)   | YES  | MUL | NULL    |       |+------------------+--------------+------+-----+---------+-------+mysql> select * FROM PARTITIONS WHERE TBL_ID=41231 order by PART_NAME limit 5;+---------+-------------+------------------+---------------------+--------+--------+| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME           | SD_ID  | TBL_ID |+---------+-------------+------------------+---------------------+--------+--------+|  168301 |  1366259946 |                0 | log_date=2013-03-01 | 231621 |  41231 ||  168321 |  1366260063 |                0 | log_date=2013-03-02 | 231641 |  41231 ||  168331 |  1366260176 |                0 | log_date=2013-03-03 | 231651 |  41231 ||  168346 |  1366260298 |                0 | log_date=2013-03-04 | 231666 |  41231 ||  168361 |  1366260398 |                0 | log_date=2013-03-05 | 231681 |  41231 |+---------+-------------+------------------+---------------------+--------+--------+PARTITION_KEY_VALS

Description:

PARTITION_KEY_VALS stores the value of the partition field described in PARTITION_KEY, which is usually used in combination with the PARTITIONS and PARTITION_KEYS tables.

Example:

View the values of each partition field in partition PART_ID = 168301. PARTITION_KEY_VALS stores the correspondence between all partition Field Numbers (INTEGER_IDX) and values (PART_KEY_VAL. In this example, the value of the acorn_3g.user_act PART_ID = 168301 partition log_date field is '2017-03-01'

mysql> select pk.PKEY_NAME,pk.PKEY_TYPE,pk.INTEGER_IDX,pkv.PART_KEY_VAL from PARTITION_KEYS pk,PARTITION_KEY_VALS pkv        where pk.INTEGER_IDX=pkv.INTEGER_IDX and pk.TBL_ID=41231 and  pkv.PART_ID=168301;+-----------+-----------+-------------+--------------+| PKEY_NAME | PKEY_TYPE | INTEGER_IDX | PART_KEY_VAL |+-----------+-----------+-------------+--------------+| log_date  | string    |           0 | 2013-03-01   |+-----------+-----------+-------------+--------------+

2.7 BUCKET-related tables

Description

The BUCKETING_COLS table describes all SDS using the BUCKET-based technology. Currently, the company does not use buckets. // TODO

mysql> desc BUCKETING_COLS;+-----------------+--------------+------+-----+---------+-------+| Field           | Type         | Null | Key | Default | Extra |+-----------------+--------------+------+-----+---------+-------+| SD_ID           | bigint(20)   | NO   | PRI | NULL    |       || BUCKET_COL_NAME | varchar(256) | YES  |     | NULL    |       || INTEGER_IDX     | int(11)      | NO   | PRI | NULL    |       |+-----------------+--------------+------+-----+---------+-------+

2.8 PRIVS permission management table

TBL_PRIVS DB_PRIVS PART_PRIVS, etc. Currently, Hive is far inferior to relational databases in terms of permission management, and the company has not uniformly managed permissions.

2.9 SKEW data SKEW table

Compared with version 0.8, table SKEWED_COL_NAMES related to data skew is added to table SKEWED_COL_NAMES metadata SKEWED_STRING_LIST SKEWED_STRING_LIST_VALUES SKEWED_VALUES. These advanced features are still in the test phase and are not available.

2.3 others

For example, VERSION describes VERSION information. Developers do not have to worry too much about such tables.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.