Spark reads the GZ file with the Parquet file

Source: Internet
Author: User
Tags gz file
1.spark Read the compressed file of HDFs GZ

spark1.5 later versions support direct reading of files in GZ format, no difference from reading other plain text files.
Start the spark shell interface and read the GZ file in the same way as a plain text file:

Sc.textfile ("/your/path/*.gz"). map{...}

The above code will take care of the need to read GZ compressed files. 2.spark Read parquet format file

Spark naturally supports files in parquet format.
Also enter the interactive interface of the spark shell and do the following:

Val parquetfile = Sqlcontext.parquetfile ("/your/path/*.parquet")

Print the schema of the parquet file:

Parquetfile.printschema ()

To view specific content:

Parquetfile.take (2). foreach (println)

You can view the details in the file. 3. Using Parquet-tools

Https://github.com/apache/parquet-mr/tree/master/parquet-tools

Download the appropriate jar package first.
Then perform the following locally:

Alias parquetview= ' Hadoop--cluster c3prc-hadoop Jar/path/to/your/downloaded/parquet-tools-1.8.1.jar '

Next use Meta view Schema,head to view the data

Parquetview meta/hdfs/path/single/file/faster
Parquetview head/hdfs/path/single/file/faster

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.