DSL style syntax
1. View content in Dataframe
Scala> df1.show+---+--------+---+| id| name|age|+---+--------+---+| 1|zhansgan| 16| | 2| lisi| 18| | 3| wangwu| 21| | 4|xiaofang| 22|+---+--------+---+
2. View the data in the Dataframe section column
Scala> Df1.select (Df1.col ("name")). show+--------+| name|+--------+|zhansgan| | lisi| | wangwu| | xiaofang|+--------+
Scala> Df1.select (col ("name"), col ("Age")). show+--------+---+| name|age|+--------+---+|zhansgan| 16| | lisi| 18| | wangwu| 21| | Xiaofang| 22|+--------+---+
Scala> df1.select ("name"). show+--------+| name|+--------+|zhansgan| | lisi| | wangwu| | xiaofang|+--------+
3. View Dataframe schema information
Scala> df1.printschemaroot|--Id:integer (nullable = False) |--name:string (nullable = True) |--Age:integer (nullabl E = False)
4. Query name and age and age + 1
Scala> Df1.select (col ("name"), col ("age") + 1). show+--------+---------+| name| (age + 1) |+--------+---------+|zhansgan| 17| | lisi| 19| | wangwu| 22| | Xiaofang| 23|+--------+---------+
Scala> Df1.select (DF1 ("name"), Df1 ("age") + 1). show+--------+---------+| name| (age + 1) |+--------+---------+|zhansgan| 17| | lisi| 19| | wangwu| 22| | Xiaofang| 23|+--------+---------+
5, filter The age of more than 20 people
Scala> Df1.filter (Col ("Age") >). show+---+--------+---+| id| name|age|+---+--------+---+| 3| wangwu| 21| | 4|xiaofang| 22|+---+--------+---+
6, by age group, and statistics of the same age number
Scala> df1.groupby ("Age"). Count (). show+---+-----+ |age|count|+---+-----+| 16| 1| | 18| 1| | 21| 1| | 22| 1|+---+-----+
SQL style
Before using the SQL style, you first need to register dataframe as a table
Df1.registertemptable ("T_person")
1. Check the top two people of the oldest age
Scala> sqlcontext.sql ("SELECT * from T_person ORDER BY age desc limit 2"). show+---+--------+---+| id| name|age|+---+--------+---+| 4|xiaofang| 22| | 3| wangwu| 21|+---+--------+---+
2. Display the schema information of the table
Scala> sqlcontext.sql ("desc T_person"). show+--------+---------+-------+|col_name|data_type|comment|+--------+ ---------+-------+| id| int| | | name| String| | | age| int| |+--------+---------+-------+
DataFrame API Operations
Package Bigdata.spark.sqlimport Org.apache.spark.sql.SQLContextimport Org.apache.spark. {sparkcontext, Sparkconf}import scala.reflect.internal.util.tabledef.column/** * Created by Administrator on 2017/4/ */object Sparksqldemo {def main (args:array[string]) {val conf = new sparkconf () conf.setappname ("Sparksqlde Mo ") conf.setmaster (" local ") val sc = new Sparkcontext (conf) val sqlcontext = new SqlContext (SC) val rdd1 = SC . Textfile ("Hdfs://m1:9000/persons.txt"). Map (_.split ("")) Val rdd2 = rdd1.map (x = person (x (0). ToInt, X (1), X (2). ToI NT)//import implicit conversion, which contains the method of the Rdd implicit conversion to dataframe import Sqlcontext.implicits._//DF1 is now dataframe val df1 = rdd2.to DF df1.show df1.select ("Age"). Show () Df1.select (col= "age"). Show Df1.select (Df1.col ("Age")). Show Import Df1. _ Df1.select (Col ("Age")). Show Df1.select (col (' age ') >). Show Df1.select (col ("age") + 1). Show Df1.filter (c OL ("Age") >). Show () df1.registertemptable ("T_person") Sqlcontext.sql ("SELECT * from T_person"). Show () sqlcontext.sql ("SELECT * from T_person ORDER BY age desc limit 2"). Sh ow () sc.stop ()}//This class must be placed outside the main method, otherwise it will error case class person (Id:int, name:string, Age:int)}
Structtype Specifying schema
Package Bigdata.spark.sqlimport Org.apache.spark.sql.types. {StringType, Integertype, Structfield, Structtype}import org.apache.spark.sql. {Row, Sqlcontext}import Org.apache.spark. {sparkcontext, Sparkconf}import scala.reflect.internal.util.tabledef.column/** * Created by Administrator on 2017/4/ */object Sparksqldemo {def main (args:array[string]) {val conf = new sparkconf () conf.setappname ("Sparksqlde Mo ") conf.setmaster (" local ") val sc = new Sparkcontext (conf) val sqlcontext = new SqlContext (SC) val rdd1 = SC . Textfile ("Hdfs://m1:9000/persons.txt"). Map (_.split ("")) Val rdd2 = rdd1.map (x = + Row (x (0). ToInt, X (1), X (2). ToInt) )//Create schema val schema = Structtype (List (///name type can be null Structfield ("id", Integertype, False ), Structfield ("name", StringType, False), Structfield ("Age", Integertype, False))//Create Datafr Ame val df1 = Sqlcontext.createdataframe (rdd2, Schema) df1.registertemptable ("T_person") Sqlcontext.sql ("SELECT * from T_person"). Show () Sc.stop ()}}
Spark SQL Operations