scala> Import Org.apache.spark.sql.SparkSessionimport org.apache.spark.sql.SparkSession scala> val spark= Sparksession.builder (). Getorcreate () spark:org.apache.spark.sql.SparkSession = [email protected]// Convert the support Rdds to dataframes and subsequent SQL Operations scala> import spark.implicits._import spark.implicits._ scala> val df = Spark.read.json ("File:///usr/local/spark/examples/src/main/resources/people.json") DF: Org.apache.spark.sql.DataFrame = [Age:bigint, name:string] scala> df.show () +----+-------+| age| name|+----+-------+|null| michael| | 30| andy| | 19| justin|+----+-------+//Print mode information scala> df.printschema () root |--age:long (nullable = True) |--name:string (nullable = true)//select Multi-column scala> Df.select (DF ("name"), DF ("age") +1). Show () +-------+---------+| name| (age + 1) |+-------+---------+| michael| null| | andy| 31| | justin| 20|+-------+---------+//Conditional filter scala> Df.filter (DF ("age") >). Show () +---+----+|age|name|+---+----+| 30| andy|+---+----+//Group aggregation scala&Gt Df.groupby ("Age"). Count (). Show () +----+-----+| age|count|+----+-----+| 19| 1| | null| 1| | 30| 1|+----+-----+//Sort scala> df.sort (DF ("age"). Desc). Show () +----+-------+| age| name|+----+-------+| 30| andy| | 19| justin| | null| michael|+----+-------+//Multi-column sort scala> df.sort (DF ("age"). DESC, DF ("name"). ASC). Show () +----+-------+| age| name|+----+-------+| 30| andy| | 19| justin| | null| michael|+----+-------+//To rename the column scala> Df.select (DF ("name"). As ("username"), DF ("Age")). Show () +--------+----+| username| age|+--------+----+| michael|null| | andy| 30| | justin| 19|+--------+----+//use Spark SQL statement Scala>df.createtempview ("table1") scala> spark.sql ("select * FROM table1 limit 10 ")
The above is the basic operation of our common Dataframe
specifically see the Blog
52802150
Sparksql official website
Http://spark.apache.org/docs/1.6.2/api/scala/index.html#org.apache.spark.sql.DataFrame
Common operations for the "Sparksql" Dataframe