ExampleSpark SQL Registers "temp table" to perform "join" (Inner join, left Outer join, right Outer join, full Outer join)
Code
fromPysparkImportsparkconf, Sparkcontext fromPyspark.sqlImportSqlContext, Row conf= Sparkconf (). Setappname ("Spark_sql_table_join") SC= Sparkcontext (conf=conf) Sqlctx=SqlContext (SC) line1= Sc.parallelize (["name1 a","Name3 C","Name4 D"]) line2= Sc.parallelize (["name1 1","name2 2","Name3 3"]) Word1= Line1.map (LambdaLine:line.split (" ")) Word2= Line2.map (LambdaLine:line.split (" ")) Table1= Word1.map (LambdaWords:row (Name=words[0], title=words[1])) table2= Word2.map (LambdaWords:row (Name=words[0], fraction=words[1])) TABLESCHEMA1=Sqlctx.inferschema (table1) tableSchema2=Sqlctx.inferschema (table2) tableschema1.registertemptable ("table1") tableschema2.registertemptable ("table2") defprintrows (rows):ifrows: forRowinchrows:PrintRow#INNER JOINrows =Sqlctx.sql ("Select Table1.name, Table1.title, table2.fraction from table1 join table2 on table1.name = Table2.name"). Collect () printrows (rows)Print "=============================================" #Left outer joinrows =Sqlctx.sql ("Select Table1.name, Table1.title, table2.fraction from table1 to outer join table2 on table1.name = Table2.name"). Collect () printrows (rows)#Right outer joinrows =Sqlctx.sql ("Select Table1.name, Table1.title, table2.fraction from table1 right outer join table2 on table1.name = Table2.name"). Collect ()Print "============================================="printrows (rows)#Full outer JOINrows =Sqlctx.sql ("Select Table1.name, Table1.title, table2.fraction from table1 full outer join table2 on table1.name = Table2.name"). Collect ()Print "============================================="printrows (rows)"""Row (name=u ' name1 ', title=u ' a ', fraction=u ' 1 ') row (name=u ' Name3 ', title=u ' C ', fractio N=u ' 3 ') =============================================row (name=u ' name1 ', title=u ' a ', fraction=u ' 1 ') Row (name=u ' Name3 ', title=u ' C ', Fraction=u ' 3 ') row (name=u ' name4 ', title=u ' d ', Fraction=none) ====================== =======================row (name=u ' name1 ', title=u ' a ', fraction=u ' 1 ') row (Name=none, Title=none, Fraction=u ' 2 ') row ( Name=u ' Name3 ', title=u ' C ', Fraction=u ' 3 ') =============================================row (name=u ' name1 ', Title=u ' A ', fraction=u ' 1 ') row (Name=none, Title=none, Fraction=u ' 2 ') row (name=u ' Name3 ', title=u ' C ', Fraction=u ' 3 ') row (Name=u ') Name4 ', title=u ' d ', Fraction=none)"""sc.stop ()
Spark SQL Table Join (Python)