Remove duplicate rows with uniq

Source: Internet
Author: User
 
 
 
Repeated rows usually do not cause problems, but sometimes they do. In this case, you don't have to spend an afternoon preparing filters for them. The uniq command is a handy tool. Learn how it saves your time and energy.

After sorting, you will find that some rows are duplicated. Sometimes this duplicate information is not required. you can remove it to save disk space. You do not have to sort text lines, but rememberuniqWhen reading rows, they are compared and only two or more consecutive rows are removed. The following example shows how it actually works:

Listing 1. Remove duplicate rows with uniq

        $ cat happybirthday.txtHappy Birthday to You!Happy Birthday to You!Happy Birthday Dear Tux!Happy Birthday to You!        $ sort happybirthday.txt Happy Birthday Dear Tux!Happy Birthday to You!Happy Birthday to You!Happy Birthday to You!        $ sort happybirthday.txt | uniqHappy Birthday Dear Tux!Happy Birthday to You!      

Warning do not useuniqOr any other tool that removes duplicate rows from a file that contains financial or other important data. In this case, repeated rows almost always represent another transaction of the same amount, removing it will cause a lot of difficulties for the accounting department. Never do this!

More information about uniq

This series of articles introduces the text utility, which supplements the information found on the book page and information page. If you open a new terminal window and enterman uniqOrinfo uniqOr open a new browser window and view the uniq manual page at gnu.org.

What if you want to make your work easier, such as displaying only unique or repeated rows? You can use-u(Unique) and-d(Repeated) options to achieve this, for example:

Listing 2. Use the-U and-D options

        $ sort happybirthday.txt | uniq -uHappy Birthday Dear Tux!        $ sort happybirthday.txt | uniq -dHappy Birthday to You!      

You can also use-cOption fromuniqTo obtain some statistics:

Listing 3. Use the-C option

        $ sort happybirthday.txt | uniq -uc      1 Happy Birthday Dear Tux!        $ sort happybirthday.txt | uniq -dc      3 Happy Birthday to You!      

Even ifuniqIt is still useful to compare the complete line, but it is not all of the functions of the command. It is particularly convenient to use:-fOption, followed by the number of fields to be skipped. It can skip a specified number of fields. This is useful when you view system logs. Generally, some items are replicated many times, which makes it difficult to view logs. Easy to useuniqThe task cannot be completed because each item starts with a different time stamp. However, if you tell it to skip all the time fields, your logs will become easier to manage at once. Tryuniq -f 3 /var/log/messages.

There is another option-s, Its function is like-fSame, but skipped the given number of characters. You can use it together-fAnd-s.uniqSkip the field and then skip the character. If you only want to use pre-configured characters for comparison, what should you do? Try it-w.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.