sparkconf sparkconf =Newsparkconf (). Setmaster ("Local"). Setappname ("Clzmap")); Javasparkcontext Javasparkcontext=NewJavasparkcontext (sparkconf); Javardd<String> line_str = Javasparkcontext.textfile ("C:\\users\\administrator\\desktop\\stud.txt"); Javardd<KK> Line_kk = Line_str.map (NewFunction<string, kk>() {@Override PublicKK Call (String s)throwsException {String attr[]= S.split (","); KK k=NewKK (); K.setname (attr[0]); K.setage (Integer.parseint (attr[1])); K.setyear (attr[2]); returnK; } }); SqlContext SqlContext=NewSqlContext (Javasparkcontext); DataFrame DF= Sqlcontext.createdataframe (Line_kk, KK.class); //in this case two methods for data filtering (1: Using Dataframe's javaapi,2: SQL query using temporal tables)//-------------------------the 1th kind-----------------------DataFrame df_filter = Df.filter (Df.col ("Age"). Geq (19)); //-------------------------End-----------------------//-------------------------the 2nd kind-----------------------DataFrame df_filter1 = Df.filter (Df.col ("Age"). Geq (19)); Df_filter1.registertemptable ("KK");//Create a temporary table with the parameter table nameSqlcontext.sql ("select * from KK where age>=19"); //-------------------------End-----------------------Javardd<Row> Df_row = Df_filter1.javardd ();//Convert dataframe into an rddJavardd<KK> Df_kk = Df_row.map (NewFunction<row, kk>() {@Override PublicKK Call (Row row)throwsException {//The order of row and the original file input may be differentKK k =NewKK (); K.setage (Row.getint (0)); K.setname (Row.getstring (1)); K.setyear (Row.getstring (2)); returnK; } }); Df_kk.foreach (NewVoidfunction<kk>() {@Override Public voidCall (KK KK)throwsException {System.out.println ("Getage->" +kk.getage ()); System.out.println ("Getyear->" +kk.getyear ()); System.out.println ("Getname->" +kk.getname ()); System.out.println ("============="); } });
Content of the text file:
As can be seen from the above code, KK is an entity type and serializable (Serializable)!
zzq,19,2016
yyu,18,2016
uui,90,2015
Spark-sql two ways to convert an rdd to a dataframe operation