The
Java version code is as follows:
Import org.apache.spark.SparkConf;
Import Org.apache.spark.SparkContext;
Import Org.apache.spark.sql.DataFrame;
Import Org.apache.spark.sql.SQLContext; /** * Use Java to combat dataframe Operations/public class Dataframeops {public static void main (string[) args) {//Create
Sparkconf is used to read system configuration information and set the name of the current application sparkconf conf = new sparkconf (). Setmaster ("local")
. Setappname ("Dataframeops");
Create a Javasparkcontext object instance as the core cornerstone of the entire driver Sparkcontext sc = new Sparkcontext (conf);
Create a SqlContext context object for SQL analysis SqlContext sqlcontext = new SqlContext (SC);
To create a dataframe, you can simply think of dataframe as a table.
Dataframe df = Sqlcontext.read (). JSON ("E://people.json");
SELECT * FROM table Df.show ();
Desc table Df.printschema ();
Select name from table df.select (' name '). Show ();
Select Name,age + 1 from tables; Df.select (Df.col ("name"), Df.col ("age").
SELECT * FROM table where age > Df.filter (Df.col (' age '). GT (). Show ();
Select COUNT (1) from table group by age Df.groupby (Df.col (' age '). Count (). Show (); }
}
Scala version;
Import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
Import Org.apache.spark.sql.SQLContext
Object Dataframeops {
def main (args:array[string]): unit={
val conf = new Sparkconf (). Setmaster ("local"). Setappname ("Dataframeops")
val sc = new Sparkcontext (conf)
val SqlContext = New SqlContext (SC)
val df = SqlContext.read.json ("E://people.json")
df.show ()
Df.printschema (
) Df.select ("name"). Show ()
Df.select (Df.col ("name"), Df.col (' age '). Show ()
Df.filter (Df.col (" Age "). A. Show ()
df.groupby (Df.col (' age ')). Count (). Show ()