Parquet support for data nesting in a tabular data storage format

Source: Internet
Author: User
Tags hadoop ecosystem

Brief introduction

Apache Parquet is a columnstore format used primarily for the Hadoop ecosystem. Regardless of the data processing framework, model, and programming language. Cloudera's Big Data Online analysis (OLAP) project uses this format as a column store in Impala.

Parquet is a columnstore internal to Twitter, currently open source and hosting the code in Parquet-format on

        parquet is a tabular storage format for use with Hadoop. The Parquet provides column-based data representations that support high-efficiency compression for all projects in the Hadoop ecosystem, and is not related to data processing frameworks, data models, or programming languages.

just as Google Dremel , parquet consists of a number of complex nested data structures that encode data structures using a repeating level/definition level (repetition/definition levels) approach. This approach enables excellent simple, flat nested namespaces.

Parquet supports compressing a column of data, and more coding is possible in the future. By separating the concept of coding from compression, Parquet users can directly implement and manipulate the encoded data without having to go through the process of decompression and compression.

Parquet is designed for anyone to use. There are a number of data processing frameworks in the Hadoop ecosystem, and an efficient, easy-to-implement Columnstore template should be used by all frameworks.

Java building blocks for working with tabular data, and the Hadoop input/output format tool, pig storage/load, and integrated Parquet format conversion tool (PARQUET-MR) are included.

Parquet metadata is encoded using Apache Thrift.


Parquet support for data nesting in a tabular data storage format

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.