Null in hive (hive null processing)

Source: Internet
Author: User
Tags serialization hadoop fs
By default, NULL is saved as \ n in the hive table, and you can view the table's source file (Hadoop fs-cat or Hadoop fs-text), where a large amount of \ n is stored in the file,  
resulting in a lot of wasted space. And in Java, Python directly into the path to manipulate the source data, the resolution should also be noted. In

addition, in the source file of the hive table, the default column delimiter is \001 (SOH), the line delimiter is \ n (currently only \ n is supported, the other is not available, so you do not need to display the declaration when defining). The delimiter between the elements delimiter \002,map in key and value is \003.

For example, if a record in the source file is:
10000042soh77soh435soh16soh22soh1156120000soh\nsoh\nsoh\nsoh\nsoh\nsoh\nsoh\nsohyoukusoh85133.0soh111
You can see that storing null is a waste of space.

However, the null of hive is sometimes required:
1) The INSERT statement in hive must match the number of columns, does not support no write, and a column without a value must use a null placeholder.

2) separate columns by delimiter in the data file of the Hive table. An empty column holds null (\ n) to preserve the column position. However, if the columns are not sufficient when the external table loads some data, such as Table 13 columns, and the file data has only 2 columns, the remaining columns at the end of the table when the table is queried have no data corresponding to them and are automatically displayed as null.

therefore, null conversion to an empty string, you can save disk space, the implementation method has several
    1) the direct designation when the table is built (two ways)
        A, the phrase
ROW FORMAT SERDE ' Org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe ' with
    serdeproperties (' serialization.null.format ' = ')
        implementation, note that both must be used together, such as
   CREATE TABLE hive_tb (ID int,name string)
   partitioned by (' Day ' string, ' type ' tinyint COMMENT ' 0 as bid, 1 as win, 2 As CK ', ' hour ' tinyint)
   ROW FORMAT SERDE ' org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe '
   with Serdeproperties (
        ' field.delim ' = '/t ',
        ' escape.delim ' = '//',
        ' serialization.null.format ' = '
   ) STORED as Textfile;
        b, or via row FORMAT delimited NULL DEFINED as '
   CREATE TABLE hive_tb (ID int,name string)
   partitioned by (' Day ' string, ' type ' tinyint COMMENT ' 0 as bid, 1 as win, 2 As CK ', ' hour ' tinyint)
   ROW FORMAT delimited 
        NULL DEFINED as ' STORED as 
   textfile;
    2) Modifying a table that already exists
    ALTER TABLE HIVE_TB set Serdeproperties (' serialization.null.format ' = ');
The 
 space-saving validation results are as follows: Hadoop fs-du/hivedata/warehouse/pmp.db/hive_tb/day=2016-05-14/type=1/hour=00/0* 1137 Hadoo P fs-du/hivedata/warehouse/pmp.db/hive_tb/day=2016-05-14/type=1/hour=01/0* 319753------------------------------ -----Hadoop fs-du/hivedata/warehouse/pmp.db/hive_tb/day=2016-05-14/type=1/hour=00/0* 885 Hadoop Fs-du/hiv edata/warehouse/pmp.db/hive_tb/day=2016-05-14/type=1/hour=01/0* 249529 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.