Parquet is a tabular storage format for analytic business, developed by Twitter and Cloudera, and graduated from the Apache incubator in May 2015 as an Apache top-level project, so here's a summary of what the Parquet data structure really looks like.
A parquet file consists of a header and one or more block blocks, ending with a footer. The header contains only a 4-byte digital PAR1 to identify the entire Parquet file format. All metadata in the file are present in the footer. The metadata in footer contains the format version information, schema information, Key-value Paris, and all metadata information in the block. The last two fields in footer are a metadata with 4-byte-length footer and the same PAR1 as the header contains.
Note here that unlike the header of sequence files and the Avro data format file and sync markers, it is used to split blocks. The Parquet format file does not require sync markers, so the boundaries of the block are stored with the Meatada of footer.
In the parquet file, each block has a set of row groups, which are column data that consists of a set of columns chunk. Continue down, and each of the column chunk contains the pages it has. Each page contains values from the same column.
Parquet also uses a more compact form of encoding that, when written to a parquet file, automatically fits an appropriate encoding based on the column type, for example, a Boolean value will be used for run-length encoding.
Reference: "Hadoop:the Definitive Guide, 4th Edition"
Parquet File Structure notes