Hive internal table differs from external table

Source: Internet
Author: User
Tags relational database table

1. What is the difference between creating an internal table and an external table?
What is the role of the 2.external keyword?
3. What is the difference between an external table and an internal table?
4. What is the difference between an internal table and an external table when deleting a table?
5.load data local inpath '/home/wyp/data/wyp.txt ' into table wyp; What is the process like?
6. disk, hdfs,hive table What is the process like between them?


I believe that many users have used relational databases, we can create tables in the relational database (create table), the table to be discussed here and the relational database table is conceptually similar. We can create a table in hive with the following statement:

    1. Hive> CREATE TABLE WYP (id int,
    2. > Name string,
    3. > Age int,
    4. > Tele String)
    5. > ROW FORMAT Delimited
    6. > Fields TERMINATED by ' \ t '
    7. > STORED as Textfile;
    8. Ok
    9. Time taken:0.759 seconds
Copy Code

So we created a normal table in hive, and now we're going to import the data into this table:

    1. hive> Load Data local inpath '/home/wyp/data/wyp.txt ' into table wyp;
    2. Copying data from File:/home/wyp/data/wyp.txt
    3. Copying File:file:/home/hdfs/wyp.txt
    4. Loading Data to Table DEFAULT.WYP
    5. Table default.wyp Stats: [num_partitions:0, Num_files:1,
    6. num_rows:0, total_size:67, raw_data_size:0]
    7. Ok
    8. Time taken:3.289 seconds
    9. Hive> select * from WYP;
    10. Ok
    11. 1 WYP 25 13188888888888
    12. 2 Test 30 13888888888888
    13. 3 ZS 34 899314121
    14. Time taken:0.41 seconds, Fetched:3 row (s)
Copy Code


Note:The/home/wyp/data/path is the Linux local file system path, while/home/hdfs/is the path above the HDFs file system! From the above output we can see that the data is first copied from the local/home/wyp/data/folder to the/home/hdfs/wyp.txt in HDFs (this is caused by the configuration in hive) file! Finally, Hive will move the data from HDFs to the WYP table! Where does the data moved to the table reside in HDFs? In fact, the Hive.metastore.warehouse.dir property of the ${hive_home}/conf/hive-site.xml configuration file in hive points to the path of the Hive table data (the configuration in my store is/user/ Hive/warehouse), and hive creates a folder with the table name for each table created in Hive.metastore.warehouse.dir, and all data belonging to the table is stored in this folder. So, the data just imported into the WYP table is stored in the/user/hive/warehouse/wyp/folder, and we can go to see:

    1. Hive> Dfs-ls/user/hive/warehouse/wyp;
    2. Found 1 Items
    3. -rw-r--r--3 wyp supergroup 2014-01-14 22:23/user/hive/warehouse/wyp/wyp.txt
Copy Code


See, the above command shows all the contents of/USER/HIVE/WAREHOUSE/WYP on HDFs. If you need to delete the WYP table, you can use the following command:

    1. hive> drop table Wyp;
    2. Moved: ' Hdfs://mycluster/user/hive/warehouse/wyp ' to
    3. Trash at:hdfs://mycluster/user/hdfs/. Trash/current
    4. Ok
    5. Time taken:2.503 seconds
Copy Code



From the above output moved: ' Hdfs://mycluster/user/hive/warehouse/wyp ' to trash at:hdfs://mycluster/user/hdfs/. Trash/current we can tell that the data that originally belonged to the Wyp table was moved to hdfs://mycluster/user/hdfs/. Trash/current folder (if your Hadoop does not have a garbage bin mechanism, the drop table WYP command will delete all data belonging to the WYP table!) In fact, the data that belongs to the WYP table is deleted. Remember these, because these are very different from the external tables. At the same time, the metadata belonging to the table WYP has been deleted!
Let's try to create an external table:

    1. hive> Create external Table exter_table (
    2. > ID int,
    3. > Name string,
    4. > Age int,
    5. > Tel string)
    6. > Location '/home/wyp/external ';
    7. Ok
    8. Time taken:0.098 seconds
Copy Code


Look carefully at the difference between creating a table and an external table, and a careful classmate will find that creating an external table is more external keyword description and location '/home/wyp/external '. Yes, you're right! If you need to create an external table, you need to add the external keyword when creating the table, and specify the path to the external table to hold the data (you can, of course, not specify a storage path for the external table, so that hive will be on the HDFs/user/hive/warehouse/ folder, create a folder with the table name of the external table and store the data belonging to this table here):

    1. hive> Load Data local inpath '/home/wyp/data/wyp.txt ' into table exter_table;
    2. Copying data from File:/home/wyp/data/wyp.txt
    3. Copying File:file:/home/hdfs/wyp.txt
    4. Loading Data to Table default.exter_table
    5. Table default.exter_table Stats: [num_partitions:0, Num_files:
    6. 1, num_rows:0, total_size:67, raw_data_size:0]
    7. Ok
    8. Time taken:0.456 seconds
Copy Code


As with the import data above to the table, the local data is imported into the external table, and the data is copied from the local file system into the/home/hdfs/wyp.txt file in HDFs, but the last data is not moved to the external table/user/hive/warehouse/exter_ Table folder (unless you specify a data storage path when you create a table)! Everyone can go to the HDFS to see! For external tables, the data is moved to the directory specified when the table was created (this is the case in the/home/wyp/external folder)! If you want to delete the external table, take a look at the following actions:

    1. hive> drop table exter_table;
    2. Ok
    3. Time taken:0.093 seconds
Copy Code

Compared to the table above to remove hive, it can be found that there is no output to move data from one place to another! Does it delete the external table when the data is deleted directly? The answer is not this:

    1. Hive> dfs-ls/home/wyp/external;
    2. Found 1 Items
    3. -rw-r--r--3 wyp supergroup 2014-01-14 23:21/home/wyp/external/wyp.txt
Copy Code


You will find that when the external table is deleted, the data is not deleted, which is completely different from deleting the table data!

Finally, the difference between a table in hive and an external table is summarized:
1, in the import data to the external table, the data is not moved to their own data Warehouse directory, that is, the data in the external table is not managed by itself! And the table is not the same;
2. When the table is deleted, hive will delete all the metadata and data belonging to the table, and when the external table is deleted, hive simply deletes the external table's metadata and the data is not deleted!
So how do you choose which kind of table to use? In most cases there is not much difference, so the choice is just a matter of personal preference. But as an experience, if all the processing needs to be done by hive, then you should create the table, otherwise use the external table!

Hive internal table differs from external table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.