usage of the load and save methodsDataFrame usersdf = Sqlcontext.read (). Load ("Hdfs://spark1:9000/users.parquet");
Usersdf.Select("name","Favorite_Color"). Write ()
. Save ("Hdfs://spark1:9000/namesandfavcolors.parquet");
Load, Save method ~ Specify file format
DataFrame PEOPLEDF = Sqlcontext.read (). Format ("JSON")
. Load ("Hdfs://spark1:9000/people.json");
Peopledf.Select("name"). write (). Format ("Parquet")
. Save ("hdfs://spark1:9000/peoplename_java");
Parquet Data Source:-"Load parquet data"
DataFrame usersdf = Sqlcontext.read (). Parquet ("Hdfs://spark1:9000/spark-study/users.parquet");
-"Parquet partition auto-Inference"
Save only two fields of User.parquet to the/users/gender=male/country=us/directory (below),
After loading Users.parquet data with the following code, there will be 4 fields in the resulting USERSDF
DataFrame usersdf = Sqlcontext.read (). Parquet ("hdfs://spark1:9000/spark-study/users/gender=male/country=us/ Users.parquet ");
Where the value of the gender field is male,country value is us
-"Merge meta-data"
Parquet Merge meta data: http://www.cnblogs.com/key1309/p/5332089.html
JSON data Source:
DataFrame studentscoresdf = Sqlcontext.read (). JSON ("Hdfs://spark1:9000/spark-study/students.json");
Format requirements for JSON data sources:
Hive Data Source
Cond...
JDBC Data Source:
http://www.cnblogs.com/key1309/p/5350179.html
Several data sources for load, save method, spark SQL