[Email protected] ~]$ HDFs dfs-cat People.json
{"Name": "Alice", "Pcode": "94304"}
{"Name": "Brayden", "age": +, "Pcode": "94304"}
{"Name": "Carla", "age": +, "Pcoe": "10036"}
{"Name": "Diana", "Age": 46}
{"Name": "Etienne", "Pcode": "94104"}
[Email protected] ~]$
HDFs Dfs-cat Pcodes.json
{"Pcode": "10036", "City": "New York", "state": "NY"}
{"Pcode:" 87501 "," City ":" Santa Fe "," state ":" NM "}
{"Pcode": "94304", "City": "Palo Alto", "state": "CA"}
{"Pcode": "94104", "City": "San Francisco", "state": "CA"}
SqlContext = Hivecontext (SC)
PEOPLEDF = SqlContext.read.json ("People.json")
SqlContext = Hivecontext (SC)
PCODESDF = SqlContext.read.json ("Pcodes.json")
Mydf001=peopledf.join (PCODESDF, "Pcode")
Mydf001.limit (5). Show ()
+-----+----+-------+----+---------------+-------------+-----+
|pcode| age| name|pcoe|_corrupt_record| city|state|
+-----+----+-------+----+---------------+-------------+-----+
|94304|null| alice|null| null| Palo alto| ca|
|94304| 30| brayden|null| null| Palo alto| ca|
|94104|null| etienne|null| null| San francisco| ca|
+-----+----+-------+----+---------------+-------------+-----+
[Spark] [Python] Spark Join Small Example