Spark SQL Load Data
Sparksql data input and output mainly Dataframe,dataframe provides some common load and save operations.
You can create a dataframe by using the load, save the Dataframe data to a file or in a specific format to indicate what format the file is to be read or what format the output data is, and directly read the specified type of file:
SqlContext Source:
Load and Save methods
@deprecated("Use Read.load (path). This is being removed in Spark 2.0. ","1.4.0")
defLoad(Path:String): DataFrame = {
Read.load (PATH)
}
/**
* Returns the dataset stored at path ASA DataFrame, using the given data source.
*
* @group genericdata
* @deprecated as of 1.4.0,replaced by`read (). Format (source). Load (path)`.
* This'll be removed in Spark 2.0.
*/
@deprecated(Useread.format (source). Load (path). This is being removed in Spark 2.0. ","1.4.0")
defLoad(Path:String,Source:String): DataFrame = {
Read.format (source). Load (path)
}
Dataframereader Source:
1.4.0 */formatString this. this }
* Loads input Inas a[[DataFrame]], for data sources the that ' t require a path (e.g. external
* Key-value stores).
*
* @since 1.4.0
*/
defLoad(): DataFrame = {
Valresolved=Resolveddatasource(
SqlContext,
Userspecifiedschema =Userspecifiedschema,
partitioncolumns = Array. Empty [String],
Provider =Source,
options =extraoptions. Tomap)
DataFrame (SqlContext, logicalrelation(resolved.relation))
}
Resolveddatasource Source
ObjectResolveddatasourceextendsLogging {
/** A map to maintain backward compatibility in case wemove data sources around. */
Private ValBackwardcompatibilitymap= Map(
"Org.apache.spark.sql.jdbc"-classof[jdbc. Defaultsource].getcanonicalname,
"Org.apache.spark.sql.jdbc.DefaultSource"- classof[jdbc. Defaultsource].getcanonicalname,
"Org.apache.spark.sql.json"-classof[JSON. Defaultsource].getcanonicalname,
"Org.apache.spark.sql.json.DefaultSource"- classof[JSON. Defaultsource].getcanonicalname,
"Org.apache.spark.sql.parquet"classof[parquet. Defaultsource].getcanonicalname,
"Org.apache.spark.sql.parquet.DefaultSource" classof[parquet. Defaultsource].getcanonicalname
)
Data formats can be read directly: Jdbc,parquet
defApply(
Sqlcontext:sqlcontext,
provider:String,
partitioncolumns:array[String],
Mode:savemode,
Options:Map[String,String],
data:dataframe): Resolveddatasource = {
Dataframtwriter Source:
/**
* Specifies the behavior when data ortable already exists. Options include:
* - `Savemode.overwrite` : Overwrite the existing data.
* - `Savemode.append` : Append the data.
* - `Savemode.ignore` : Ignore the operation (i.e. no-op).
* - `savemode.errorifexists`: Default option, throw an exception at runtime.
*
* @since 1.4.0
*/
defMode(savemode:savemode): Dataframewriter = {
This.Mode= Savemode
This
}
Import Java.util.arraylist;import Java.util.list;import Org.apache.spark.sparkconf;import Org.apache.spark.api.java.javardd;import Org.apache.spark.api.java.javasparkcontext;import Org.apache.spark.api.java.function.function;import Org.apache.spark.sql.dataframe;import Org.apache.spark.sql.row;import Org.apache.spark.sql.rowfactory;import Org.apache.spark.sql.sqlcontext;import org.apache.spark.sql.types.structfield;/** * @author author E-Mail: * @version created: May 8, 2016 7:54:28 class description */public classes Spa Rksqlloadsaveops {public static void Main (string[] args) { sparkconf conf = new sparkconf (). Setmaster ("local" ). Setappname ("rdd2d"); Javasparkcontext sc = new Javasparkcontext (); SqlContext sqlcontext = new SqlContext (SC); DataFrame PEOPLEDF = Sqlcontext.read (). Format ("JSON"). Load ("D://person.json"); Peopledf.select ("name"). Write (). Format ("JSON"). Save ("D://logs//personname.json");
File Append method: Whether to create a new file or a append append
Day61-spark SQL data loading and saving insider deep decryption combat