Table of Contents
- 1. Spark SQL
- 2. SqlContext
- 2.1. SQL context is all the functional entry points for spark SQL
- 2.2. Create SQL context from spark context
- 2.3. Hive context functions more than SQL context, future SQL context also adds functionality
- 3. Dataframes
- 3.1. function
- 3.2. Create Dataframes
- 3.3. DSL
1Spark SQL
- It's a module of Spark.
- Working with structured data
- Provides dataframes as the abstraction layer for programming
- It is also a distributed SQL query engine
- Data can be read from hive
2SqlContext2.1SQL context is all the functional entry points for spark SQL2.2Create SQL context from spark context
Val Sc:sparkcontext//an existing sparkcontext.val SqlContext = new Org.apache.spark.sql.SQLContext (SC)
Note that the above Val SC: ... In fact, do not need to write, because the start Spark-shell when there is such a hint, so direct use is good.
Spark context available as SC.
2.3Hive context functions more than SQL context, and future SQL context adds functionality
But since I'm not interested in hive, I won't create a hive context
3Dataframes3.1Function
- Distributed data collection
- The way the columns are organized
- Tables that are understood as relational databases
- Can be constructed from a structured file, hive table, external database, or RDD
3.2Create Dataframes
Create Dataframes through SqlContext, which can be created from an external file, hive table, or RDD. To test the local file, do not use –master to connect to spark master when starting Spark-shell, otherwise you will not be prompted to find the local file. If you want to access the HDFs file, start Spark-shell with –master.
scala> val sqlcontext = new Org.apache.spark.sql.SQLContext (SC) sqlContext:org.apache.spark.sql.SQLContext = [email protected]scala> val df = SqlContext.read.json ("/home/smile/people.json") Df:org.apache.spark.sql.DataFrame = [age : bigint, Name:string]scala> Df.printschema () df.printschema () root |--age:long (nullable = True) |--name:string (nu Llable = True) scala> df.show () df.show () +---+----+|age|name|+---+----+| 38| dean|+---+----+
3.3Dsl
Dataframes provides the DSL aspect to manipulate structured data, which are functions such as SELECT, DF, Printschema, show, GroupBy, filter, and so on.
Scala> df.select ("name"). Show () +----+|name|+----+| dean|+----+
For details, please refer to the official documentation
Author:dean
CREATED:2015-11-12 Four 09:33
Validate
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Spark SQL Create Dataframes