Spark SQL Create Dataframes

Source: Internet
Author: User

Table of Contents

    • 1. Spark SQL
    • 2. SqlContext
      • 2.1. SQL context is all the functional entry points for spark SQL
      • 2.2. Create SQL context from spark context
      • 2.3. Hive context functions more than SQL context, future SQL context also adds functionality
    • 3. Dataframes
      • 3.1. function
      • 3.2. Create Dataframes
      • 3.3. DSL
1Spark SQL
    • It's a module of Spark.
    • Working with structured data
    • Provides dataframes as the abstraction layer for programming
    • It is also a distributed SQL query engine
    • Data can be read from hive
2SqlContext2.1SQL context is all the functional entry points for spark SQL2.2Create SQL context from spark context
Val Sc:sparkcontext//an existing sparkcontext.val SqlContext = new Org.apache.spark.sql.SQLContext (SC)

Note that the above Val SC: ... In fact, do not need to write, because the start Spark-shell when there is such a hint, so direct use is good.

Spark context available as SC.
2.3Hive context functions more than SQL context, and future SQL context adds functionality

But since I'm not interested in hive, I won't create a hive context

3Dataframes3.1Function
    • Distributed data collection
    • The way the columns are organized
    • Tables that are understood as relational databases
    • Can be constructed from a structured file, hive table, external database, or RDD
3.2Create Dataframes

Create Dataframes through SqlContext, which can be created from an external file, hive table, or RDD. To test the local file, do not use –master to connect to spark master when starting Spark-shell, otherwise you will not be prompted to find the local file. If you want to access the HDFs file, start Spark-shell with –master.

scala> val sqlcontext = new Org.apache.spark.sql.SQLContext (SC) sqlContext:org.apache.spark.sql.SQLContext = [email protected]scala> val df = SqlContext.read.json ("/home/smile/people.json") Df:org.apache.spark.sql.DataFrame = [age : bigint, Name:string]scala> Df.printschema () df.printschema () root |--age:long (nullable = True) |--name:string (nu Llable = True) scala> df.show () df.show () +---+----+|age|name|+---+----+| 38| dean|+---+----+
3.3Dsl

Dataframes provides the DSL aspect to manipulate structured data, which are functions such as SELECT, DF, Printschema, show, GroupBy, filter, and so on.

Scala> df.select ("name"). Show () +----+|name|+----+| dean|+----+

For details, please refer to the official documentation

Author:dean

CREATED:2015-11-12 Four 09:33

Validate


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Spark SQL Create Dataframes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.