Spark reads a CSV file

Source: Internet
Author: User
Tags first row databricks
1. Spark reads the CSV file;

You can use Databricks's third-party package to read a CSV file, download a third-party package, and put it in the specified path

1.1 Defining data formats
We need to define the data format according to the specific data format before importing the data.
Use Structtype to define the field format, corresponding to each field in the DataSet one by one.

The three parameters in Structfield are field names, field data types, and whether they are not allowed to be empty.
val FieldSchema = Structtype (Array (
  structfield ("TID", StringType, True),
  Structfield ("Lat", Doubletype , true),
  Structfield ("Lon", Doubletype, True),
  Structfield ("Time", StringType, True)))

2. Spark reads data

After the field format has been defined, the read interface provided by SqlContext is called, specifying the format to load format com.databricks.spark.csv as defined in the third-party library. Because the first row in the dataset used in this lesson does not have field names for each column, you need to set the Read option header to false. Finally, the path of the dataset file to be read is indicated in the Load method.

Val taxidf = SqlContext.read.format ("Com.databricks.spark.csv"). Option ("Header", "false"). Schema (FieldSchema). Load ("/home/shiyanlou/taxi.csv")

Note that since we are entering code in the Spark shell as an interactive command line, the spark Shell has created the SqlContext object during the boot process and we can use it directly. If you are developing a spark program in a standalone application, create the SQL context manually via spark context.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.