The spark version tested in this article is 1.3.1
Text File test
A simple Person.txt file contains:
JChubby,13Looky,14LL,15
Name and age, respectively.
Create a new object in idea with the original code as follows:
object TextFile{ def main(args:Array[String]){ }}
Sparksql Programming Model:
The first step:
Requires a SqlContext object, which is the entry for the sparksql operation
and building a SqlContext object requires a Sparkcontext
Step Two:
After building the Portal object, the implicit conversion method is introduced to convert the various files read to Dataframe,dataframe is a data type that is uniformly manipulated on sparksql
Step Three:
Build a sample class based on the format of the data. The function is to provide an implicit conversion of the various data types to be read into a unified data format for easy programming
Fourth Step:
Use the SqlContext object to read the file and convert it to Dataframe
Fifth Step:
The data is related to the operation.
1.DataFrame comes with the operation mode. Dataframe provides a number of ways to manipulate data, such as Where,select
2.DSL mode. The DSL actually uses the method provided by Dataframe, but it is easy to manipulate the properties by using the ' + property name '
3. Register data as a table and manipulate it with SQL statements
Object textfile{def main (args:array[string]) {//First step //Build Sparkcontext object, mainly use new to call the construction method, otherwise it becomes the Apply method of using the sample class.Val sc =NewSparkcontext ()//Build SqlContext objectsVal SqlContext =NewSqlContext (SC)//Second step ImportSqlcontext.implicits._//Step three CasePerson (Name:string,age:int)//Fourth, Textfile reads files from the specified path if the cluster mode is to write the HDFs file address, the read file is converted to the object of the person class with two map operations, and each row corresponds to a person object; todf convert it to Dataframe Val people = Sc.textfile ("File path").Map(_.split (",")).Map{ Case(name,age) = person (name,age.toint)}.todf ()//Fifth step //dataframe Method println("------------------------DataFrame------------------------------------")//Race to select the record of the age>10, and then choose only the Name property, the Show method outputs itPeople.where (People ("Age") >Ten).Select(People ("Name"). Show ()//DSL println("---------------------------DSL---------------------------------") People.where (' Age > '). Select ('Name). Show ()//sql println("-----------------------------SQL-------------------------------")//Register people as a people formPeople.registertemptable ("People")//Use SqlContext's SQL method to write SQL statements //The query returns an RDD, so it is collect operation, and then the loop printsSqlcontext.sql ("SELECT name from people where age >"). Collect.foreach (println)//Save as Parquet file, after which the Parquet demo will usePeople.saveasparquet ("Saved Paths") }}
Parquet format file Test:
Val sc =NewSparkcontext () Val sql =NewSqlContext (SC)ImportSql.implicits._ val parquet = sql.parquetfile (args(0))println("------------------------DataFrame------------------------------------")println(Parquet.where (Parquet ("Age") >Ten).Select(Parquet ("Name")). Show ())println("---------------------------DSL---------------------------------")println(Parquet.where (' Age > '). Select ('Name). Show ())println("-----------------------------SQL-------------------------------") Parquet.registertemptable ("Parquet") Sql.sql ("SELECT name from Parquet where age >").Map(P ="Name:"+ P(0). Collect (). foreach (println)
JSON format test:
Val sc =NewSparkcontext () Val sql =NewSqlContext (SC)ImportSql.implicits._ val json = sql.jsonfile (args(0))println("------------------------DataFrame------------------------------------")println(Json.where (JSON ("Age") >Ten).Select(JSON ("Name")). Show ())println("---------------------------DSL---------------------------------")println(Json.where (' Age > '). Select ('Name). Show ())println("-----------------------------SQL-------------------------------") Json.registertemptable ("JSON") Sql.sql ("SELECT name from JSON where age > ten").Map(P ="Name:"+ P(0). Collect (). foreach (println)
You can see that the above code is almost identical to reading a text file, ignoring the Parquetfile/jsonfile method used by SC while reading the file, and then the operation is a touch of the same
Since parquet and JSON data are read in an actionable format and are automatically converted to Dataframe, the definition steps and TODF of the case class are omitted.
The above is a simple use of the Sparksql API
Spark (ix)--Sparksql API programming