Linux under uniq command to remove duplicate rows of examples

Source: Internet
Author: User

One, uniq what to do with the

The duplicate lines in the text are basically not what we want, so we need to get rid of them. Linux has other commands to remove duplicate rows, but I think Uniq is a more convenient one. When using Uniq, pay attention to the following two points
1, when manipulating text, it is typically used in combination with the sort command, because Uniq does not check for duplicate rows unless they are adjacent rows. If you want to sort the input first, use Sort-u.
2, when the text operation, if the field for the first null characters (usually include spaces and tabs), and then Non-null characters, fields in the field before the character of the null character will be skipped

Second, uniq parameter description


[Zhangy@blackghost ~]$ Uniq--help
Usage: uniq [options] ... File
Filters adjacent matching rows from the input file or standard input and writes to the output file or standard output.

Matching rows are merged at the first occurrence when no options are attached.

Parameters that must be used for long options are also required for short options.
-C,--count///before each line plus prefix number indicating the number of occurrences of the corresponding line
-D,--repeated//output duplicate rows only
-D,--all-repeated//Only outputs duplicate rows, but several lines output several lines
-F,--skip-fields=n//-f ignores the number of segments,-F 1 ignores the first paragraph
-I,--ignore-case//Case insensitive
-S,--skip-chars=n//root-F is a bit like, but-S is ignored, after how many characters-s 5 ignores the following 5 characters
-U,--unique//Remove duplicates after all show up, root MySQL distinct feature on a bit like
-Z,--zero-terminated end lines with 0 byte, not newline
-W,--check-chars=n//No control over the contents of nth characters per line
--HELP//Display this Help information and exit
--version//Display version information and exit
Where z doesn't know what's the use

Third, test text file Uniqtest

This is a test
This is a test
This is a test
I am tank
I love tank
I love tank
This is a test
whom have a try
WhoM have a try
You have a try
I want to abroad
Those are good Men
We are good Men

Iv.. Detailed examples


[Zhangy@blackghost mytest]$ uniq-c uniqtest
3 This is a test
1 I am tank
2 I love tank
1 This is a test//Haldi row is duplicated
1 whom have a try
1 WhoM have a try
1 You have a try
1 I want to abroad
1 Those are good Men
1 We are good Men

As we can see from the example above, a feature of uniq that checks for duplicate rows only checks the adjacent rows. Repeat data, there must be a lot of not adjacent to each other.

[Zhangy@blackghost mytest]$ sort Uniqtest |uniq-c
1 WhoM have a try
1 I am tank
2 I love tank
1 I want to abroad
4 This is a test
1 Those are good Men
1 We are good Men
1 whom have a try
1 You have a try

This will solve the problem mentioned in the previous example

[Zhangy@blackghost mytest]$ uniq-d-C uniqtest
3 This is a test
2 I love tank

Uniq-d Show only Duplicate rows

[Zhangy@blackghost mytest]$ uniq-d uniqtest
This is a test
This is a test
This is a test
I love tank
I love tank

UNIQ-D displays only duplicate rows, and displays the repeating lines. He can't use it with-C.


[Zhangy@blackghost mytest]$ uniq-f 1-c uniqtest
3 This is a test
1 I am tank
2 I love tank
1 This is a test
2 whom have a try
1 You have a try
1 I want to abroad
2 Those are good men//Only one line, show two lines
Here those only one line, the display is repeated, this is because the-F 1 ignores the first column, check repeat starting from the second field.


[Zhangy@blackghost mytest]$ uniq-i-C uniqtest
3 This is a test
1 I am tank
2 I love tank
1 This is a test
2 whom have a try//one uppercase, one lowercase
1 You have a try
1 I want to abroad
1 Those are good Men
1 We are good Men

When checked, case insensitive


[Zhangy@blackghost mytest]$ uniq-s 4-c uniqtest
3 This is a test
1 I am tank
2 I love tank
1 This is a test
3 whom have a try//root An example of what's different
1 I want to abroad
1 Those are good Men
1 We are good Men

When checking, the first 4 characters are not considered, so whom have a try is the same as you have a try.

[Zhangy@blackghost mytest]$ uniq-u uniqtest
I am tank
This is a test
whom have a try
WhoM have a try
You have a try
I want to abroad
Those are good Men
We are good Men

To repeat the items and then show them all

[Zhangy@blackghost mytest]$ uniq-w 2-c uniqtest
3 This is a test
3 I am tank
1 This is a test
1 whom have a try
1 WhoM have a try
1 You have a try
1 I want to abroad
1 Those are good Men
1 We are good Men
The content after the 2nd character of each line is not checked, so I am tank root I love tank is the same.

Delete a row in a large data file where some fields are duplicated

A recently written data acquisition program generated a file containing more than 10 million rows of data, the data consists of 4 fields, according to the requirements need to delete the second field duplicate row, find to find the Linux also did not find the right tool, Sed/gawk and other flow processing tools can only be handled on one line, The row that the field repeats cannot be found. It seems to have to own Python a program, suddenly remembered to use MySQL, so the great diversion of the universe:
1. Use Mysqlimport--local dbname data.txt import data into the table, the table name to be consistent with the file name
2. Execute the following SQL statements (requires a unique field of Uniqfield)


Use dbname;
ALTER TABLE tablename add ROWID int auto_increment NOT NULL;
CREATE TABLE t Select min (rowid) as ROWID from TableName Group by Uniqfield;
CREATE TABLE t2 Select TableName. * FROM Tablename,t where tablename.rowid= t.rowid;</p> <p>drop table Tablena Me
Rename table T2 to TableName;

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.