How to calculate the intersection and difference set of two files in Linux Shell

Source: Internet
Author: User

Assume that the two files file1 and file2 are represented by a and B. The content of file1 is as follows:

A <br/> B <br/> C <br/> E <br/> d <br/>

The content of file2 is as follows:

C <br/> d <br/> A <br/> C

There are basically two methods: one is the comm command and the other is the grep command. They are described as follows:

 

Comm command
, Compare sorted files file1 and file2 line by line. with no options, produce three-column output. column one contains lines unique to file1, column two contains lines unique to file2, and column three contains lines common to both files. note that the two files must be sorted and unique, with the default output being three columns, the first column being the A-B, the second column being the B-A, and the third column Being A to B.

The running result is as follows:

$ Comm a.txt B .txt <br/> A <br/> B <br/> C <br/> d <br/> A <br/> C <br/> E <br/> d <br/> A <br/>

Sort only:

$ Comm <(sort a.txt) <(sort B .txt) <br/> A <br/> B <br/> C <br/> d <br/> E <br/>

Sorted and unique:

$ Comm <(sort a.txt | uniq) <(sort B .txt | uniq) <br/> A <br/> B <br/> C <br/> d <br/> E <br/>

If you only want to intersection, you can:

$ Comm-12 <(sort a.txt | uniq) <(sort B .txt | uniq) <br/> A <br/> C <br/> d <br/>

As for the difference set, the reader thinks for himself.

 

Grep
The command is commonly used to search text content. You can find the intersection as follows:

P $ grep-f a.txt B .txt <br/> C <br/> d <br/> A <br/> C <br/>

Grep does not require sorting, but it is only required because it is a set operation (or how is it a set ?). Therefore:

$ Grep-f a.txt B .txt | sort | uniq <br/> A <br/> C <br/> d <br/>

Difference set?

$ Grep-F-v-F a.txt B .txt | sort | uniq <br/> $ grep-F-v-F B .txt a.txt | sort | uniq <br/> B <br/> E <br/>

The first row results in a B-A, so it is empty; the second behavior is a-B. Attention order is important!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.