[Shell] dataset and difference set of data
Statistical data often requires files A and B. Now you need to know the data in the file, which are shared by A and B, which only appear in file A (or those only appear in file B ). You can use the uniq command to complete this requirement.
Uniq-d will output duplicate rows
Uniq-u only displays unique rows
You can use uniq sort to slightly execute subtotal.
For example, file
100101102100
File B
103102102
The script is as follows:
Cat A | sort | uniq> tmpA # A dataset deduplication cat B | sort | uniq> tmpB # B dataset deduplication cat tmpA tmpB | sort | uniq-d # A and B intersection cat tmpA tmpB | sort | uniq-u # data in A and in B (A-B)
What is the Computing Principle of intersection and difference set? In fact, it is clear to execute it in one step.
Cat A | sort | uniq> tmpA; cat tmpA100101102cat B | sort | uniq> tmpB; cat tmpB102103
Then combine two files with cat tmpA tmpB
Cat tmpA tmpB100101102102103
Finally, use uniq to find the duplicate rows in the file, that is, the intersection of A and B.
Cat tmpA tmpB | sort | uniq-d102
As for the difference set, the principle is similar.