Two kinds of tables for hive

Source: Internet
Author: User
Tags types of tables

1. Internal tables

  • Internal tables when the LOAD data is used, hive copies The data files from the local file system to hive 's /warehouse directory. Conversely, the data files on HDFs are clipped to the /warehouse directory.
  • When Hive is in LOAD data, it does not check whether the files in the directory conform to the schema declared for the table. A mismatch can only be determined by returning null values through the Select query .
  • When an internal table is deleted, the metadata for the table ( in mysql ) and the data file ( in HDFs ) are deleted.
Create Table bigint  by ' \ t ';

2. External tables

    • External table: you simply display the stored location of the data in the specified table when you create the table. Hive is not cut to its own directory.
    • In fact, when creating a table, hive does not even check if this external location exists. This is a very important feature because it means that you can create a table before uploading a data file.
    • if so that hive creates a folder under the/user/hive/warehouse/folder on HDFs with the name of the External table table. And the data belonging to this table is stored here
    • is automatically created to store the data file.
    • When you delete a table, the external table does not touch the data file, but only the metadata information, that is,
Create Table bigint  by ' \ t ' ' /book ';

Finally, the differences between the two types of tables are summarized:

    1. When importing data into an external table, the data is not moved to the '/user/hive/warehouse/t_name ' directory, which means that the data in the external table is not managed by itself! But the internal table is different;
    2. When the internal table is deleted , hive will delete all the metadata and data files that belong to the internal table, and when the external table is deleted, hive simply deletes the external table's metadata and the data file is not deleted!

3. Selection of two types of tables

    • All processing is done by hive , and internal tables should be used .
    • Hive and other tools work together on the same dataset, or use external tables to correlate different schemas for the same dataset.

4. Storage format

    • If the row format or stored as clause is not used when creating the table, hive uses the default format to split each row.
    • The default inline delimiter is not a tab, but a control-a (its ASCII code is 1) in the ASCII control code collection
    • This is because the probability that it appears in the text is smaller relative to the tab.
CREATE TABLE  by ' \ t ';

English explanation: In-line format qualifier terminated with ' \ t ' end

Two kinds of tables for hive

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.