How to view data source files and specific locations in hive

Source: Internet
Author: User
Tags log

Typically, the user uses a SELECT statement in hive to determine which file or location information The result comes from, and in hive, you can specify three static columns in the virtual column:
1. The Input__file__name Map task reads the full path of the FILE
2. Block__offset__inside__file if it is rcfile or sequencefile block file offset, that is, the current fast in the first word of the file offset, if it is textfile, showing Shows the offset of the first byte of the current line in the file
3. Row__offset__inside__block rcfile and Sequencefile display ROW number, textfile shown as 0
Note: To display Row__offset__inside__block You must set the set hive.exec.rowoffset=true;

Test:
1.
Table:test_virtual_columns
InputFormat:org.apache.hadoop.mapred.TextInputFormat
Query
Select a, input__file__name,block__offset__inside__file,row__offset__inside__block from Test_virtual_columns;
Result

More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/database/extra/

Qweqwe hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t3.txt 0 0 dfdf hdfs://10.2.6 .102/user/hive/warehouse/tmp.db/test_virtual_columns/t3.txt 7 0 Sdafsafsaf hdfs://10.2.6.102/user/hive /warehouse/tmp.db/test_virtual_columns/t3.txt 0 DFDFFD Hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test _virtual_columns/t3.txt 0 DSF Hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t3.tx        T 0 1 hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t1.txt 0 0 2 Hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t1.txt 2 0 3 hdfs://10.2.6.1 02/user/hive/warehouse/tmp.db/test_virtual_columns/t1.txt 4 0 4 Hdfs://10.2.6.102/user/hive/warehouse /tmp.db/test_virtual_columns/t1.txt 6 0 5 Hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_c  Olumns/t1.txt 8     0 6 Hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t1.txt 0 7 HDFs: 10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t1.txt 0 8 Hdfs://10.2.6.102/user/hive /warehouse/tmp.db/test_virtual_columns/t2.txt 0 0 9 Hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test _virtual_columns/t2.txt 2 0 Hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t2.tx T 4 0 hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t2.txt 7 0

2.
Table:     Nginx
InputFormat:     Org.apache.hadoop.hive.ql.io.RCFileInputFormat
Query:    
Select hostname, Input__file__name,block__offset__inside__file,row__offset__inside__block From Nginx where dt= ' 2013-09-01 ' limit 10;
Result:

10.1.2.162 hdfs://10.2.6.102/share/data/log/nginx_rcfile/2013-09-01/000000_0 537155468 0 10.1.2.162 Hdfs://10.2.6.102/share/data/log/nginx_rcfile/2013-09-01/000000_0 537155468 1 10.1.2.162 hdfs://10.2 .6.102/share/data/log/nginx_rcfile/2013-09-01/000000_0 537155468 2 10.1.2.162 hdfs://10.2.6.102/share/d Ata/log/nginx_rcfile/2013-09-01/000000_0 537155468 3 10.1.2.162 hdfs://10.2.6.102/share/data/log/nginx_ Rcfile/2013-09-01/000000_0 537155468 4 10.1.2.162 hdfs://10.2.6.102/share/data/log/nginx_rcfile/2013-09       -01/000000_0 537155468 5 10.1.2.162 Hdfs://10.2.6.102/share/data/log/nginx_rcfile/2013-09-01/000000_0        537155468 6 10.1.2.162 Hdfs://10.2.6.102/share/data/log/nginx_rcfile/2013-09-01/000000_0 537155468 7 10.1.2.162 Hdfs://10.2.6.102/share/data/log/nginx_rcfile/2013-09-01/000000_0 537155468 8 10. 1.2.162 hdfs://10.2.6.102/share/data/log/nginx_rcfile/2013-09-01/000000_0 537155468 9 

If you encounter dirty data or abnormal results, you can select these three values to locate the original error file and location, very convenient.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.