We create two files in the/data/join of HDFs:
Upload the first file name is 1.txt
Content first column is date, second column UID (normal user ID)
Upload a second file name of 2.txt
Content first column is date, second column UID (normal user ID)
perform upload to HDFs:
HDFs command-line query:
Web Console Management Queries:
First, set the date format on the command line:
Then declare two case Class:register, Login
Read the first file (1.txt) and do the following:
Take operation:
This is done by reading the contents of the file first, then using the TAB key to make the word, then the second column is key, and all the contents of each line are the register that value is constructed for value;
Read the second file (2.txt) and do the following:
Take operation:
The following join operation is performed on the file:
Take out the result of the join operation:
Take result:
Or save the execution results to HDFs:
To view the results of the execution on the Web console:
Check the results of the execution in HDFs:
Spark API programming Hands-on combat -07-join operation in-depth combat