[Spark] [Python] Dataframe examples of left and right connections
$ HDFs Dfs-cat People.json
{"Name": "Alice", "Pcode": "94304"}
{"Name": "Brayden", "age": +, "Pcode": "94304"}
{"Name": "Carla", "age": +, "Pcoe": "10036"}
{"Name": "Diana", "Age": 46}
{"Name": "Etienne", "Pcode": "94104"}
$ HDFs Dfs-cat Pcodes.json
{"Pcode": "10036", "City": "New York", "state": "NY"}
{"Pcode": "87501", "City": "Santa Fe", "state": "NM"}
{"Pcode": "94304", "City": "Palo Alto", "state": "CA"}
{"Pcode": "94104", "City": "San Francisco", "state": "CA"}
$pyspark
SqlContext = Hivecontext (SC)
PEOPLEDF = SqlContext.read.json ("People.json")
Peopledf.limit (5). Show ()
+----+-------+-----+-----+
| age| name|pcode| pcoe|
+----+-------+-----+-----+
|null| alice|94304| null|
| 30| brayden|94304| null|
| 19| carla| null|10036|
| 46| diana| null| null|
|null| etienne|94104| null|
+----+-------+-----+-----+
SqlContext = Hivecontext (SC)
PCODESDF = SqlContext.read.json ("Pcodes.json")
Pcodesdf.limit (5). Show ()
+-------------+-----+-----+
| city|pcode|state|
+-------------+-----+-----+
| New york|10036| ny|
| Santa fe|87501| nm|
| Palo alto|94304| ca|
| San francisco|94104| ca|
+-------------+-----+-----+
mydf000 = Peopledf.join (PCODESDF, "Pcode")
Mydf000.limit (5). Show ()
+-----+----+-------+----+-------------+-----+
|pcode| age| name|pcoe| city|state|
+-----+----+-------+----+-------------+-----+
|94304|null| alice|null| Palo alto| ca|
|94304| 30| brayden|null| Palo alto| ca|
|94104|null| etienne|null| San francisco| ca|
+-----+----+-------+----+-------------+-----+
Mydf001=peopledf.join (PCODESDF, "Pcode", "Leftsemi")
Mydf001.limit (5). Show ()
+-----+----+-------+----+
|pcode| age| name|pcoe|
+-----+----+-------+----+
|94304|null| alice|null|
|94304| 30| brayden|null|
|94104|null| etienne|null|
+-----+----+-------+----+
Mydf002=peopledf.join (PCODESDF, "Pcode", "Left_outer")
Mydf002.limit (5). Show ()
+-----+----+-------+-----+-------------+-----+
|pcode| age| name| pcoe| city|state|
+-----+----+-------+-----+-------------+-----+
|94304|null| alice| null| Palo alto| ca|
|94304| 30| brayden| null| Palo alto| ca|
| null| 19| carla|10036| null| null|
| null| 46| diana| null| null| null|
|94104|null| etienne| null| San francisco| ca|
+-----+----+-------+-----+-------------+-----+
Mydf003=peopledf.join (PCODESDF, "Pcode", "Right_outer")
Mydf003.limit (5). Show ()
+-----+----+-------+----+-------------+-----+
|pcode| age| name|pcoe| city|state|
+-----+----+-------+----+-------------+-----+
|10036|null| null|null| New york| ny|
|87501|null| null|null| Santa fe| nm|
|94304|null| alice|null| Palo alto| ca|
|94304| 30| brayden|null| Palo alto| ca|
|94104|null| etienne|null| San francisco| ca|
+-----+----+-------+----+-------------+-----+
[Spark] [Python] Dataframe examples of left and right connections