Typically, the user uses a SELECT statement in hive to determine which file or location information The result comes from, and in hive, you can specify three static columns in the virtual column:
1. The Input__file__name Map task reads the full path of the FILE
2. Block__offset__inside__file if it is rcfile or sequencefile block file offset, that is, the current fast in the first word of the file offset, if it is textfile, showing Shows the offset of the first byte of the current line in the file
3. Row__offset__inside__block rcfile and Sequencefile display ROW number, textfile shown as 0
Note: To display Row__offset__inside__block You must set the set hive.exec.rowoffset=true;
Test:
1.
Table:test_virtual_columns
InputFormat:org.apache.hadoop.mapred.TextInputFormat
Query
Select a, input__file__name,block__offset__inside__file,row__offset__inside__block from Test_virtual_columns;
Result
More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/database/extra/
Qweqwe hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t3.txt 0 0 dfdf hdfs://10.2.6 .102/user/hive/warehouse/tmp.db/test_virtual_columns/t3.txt 7 0 Sdafsafsaf hdfs://10.2.6.102/user/hive /warehouse/tmp.db/test_virtual_columns/t3.txt 0 DFDFFD Hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test _virtual_columns/t3.txt 0 DSF Hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t3.tx T 0 1 hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t1.txt 0 0 2 Hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t1.txt 2 0 3 hdfs://10.2.6.1 02/user/hive/warehouse/tmp.db/test_virtual_columns/t1.txt 4 0 4 Hdfs://10.2.6.102/user/hive/warehouse /tmp.db/test_virtual_columns/t1.txt 6 0 5 Hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_c Olumns/t1.txt 8 0 6 Hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t1.txt 0 7 HDFs: 10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t1.txt 0 8 Hdfs://10.2.6.102/user/hive /warehouse/tmp.db/test_virtual_columns/t2.txt 0 0 9 Hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test _virtual_columns/t2.txt 2 0 Hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t2.tx T 4 0 hdfs://10.2.6.102/user/hive/warehouse/tmp.db/test_virtual_columns/t2.txt 7 0
2.
Table: Nginx
InputFormat: Org.apache.hadoop.hive.ql.io.RCFileInputFormat
Query:
Select hostname, Input__file__name,block__offset__inside__file,row__offset__inside__block From Nginx where dt= ' 2013-09-01 ' limit 10;
Result:
10.1.2.162 hdfs://10.2.6.102/share/data/log/nginx_rcfile/2013-09-01/000000_0 537155468 0 10.1.2.162 Hdfs://10.2.6.102/share/data/log/nginx_rcfile/2013-09-01/000000_0 537155468 1 10.1.2.162 hdfs://10.2 .6.102/share/data/log/nginx_rcfile/2013-09-01/000000_0 537155468 2 10.1.2.162 hdfs://10.2.6.102/share/d Ata/log/nginx_rcfile/2013-09-01/000000_0 537155468 3 10.1.2.162 hdfs://10.2.6.102/share/data/log/nginx_ Rcfile/2013-09-01/000000_0 537155468 4 10.1.2.162 hdfs://10.2.6.102/share/data/log/nginx_rcfile/2013-09 -01/000000_0 537155468 5 10.1.2.162 Hdfs://10.2.6.102/share/data/log/nginx_rcfile/2013-09-01/000000_0 537155468 6 10.1.2.162 Hdfs://10.2.6.102/share/data/log/nginx_rcfile/2013-09-01/000000_0 537155468 7 10.1.2.162 Hdfs://10.2.6.102/share/data/log/nginx_rcfile/2013-09-01/000000_0 537155468 8 10. 1.2.162 hdfs://10.2.6.102/share/data/log/nginx_rcfile/2013-09-01/000000_0 537155468 9
If you encounter dirty data or abnormal results, you can select these three values to locate the original error file and location, very convenient.