Linux text processing tools are very rich and powerful, such as a file:
Copy Code code as follows:
Cat Log
Www.jb51.net 192.168.1.1
Www.jb51.net 192.168.1.1
Www.jb51.net 192.168.1.2
Ffffffffffffffffff
Ffffffffffffffffff
Eeeeeeeeeeeeeeeeeeee
Fffffffffffffffffff
Eeeeeeeeeeeeeeeeeeee
Eeeeeeeeeeeeeeeeeeee
Gggggggggggggggggggg
You can remove duplicate rows in the following ways:
1. Delete duplicate rows using Uniq/sort
Note: Simple Uniq is not a good one.
Copy Code code as follows:
shell> sort-k2n File | Uniq > A.out
Here I have a simple test that uniq cannot delete all duplicate rows when the duplicate rows in file are no longer together. After sorting, all the same rows are adjacent, so Unqi can remove duplicate rows normally.
2. Use the Sort+awk command
Note: Simple awk is not the same, for the same reason.
Copy Code code as follows:
shell> sort-k2n File | awk ' {if ($0!=line) print;line=$0} '
You can also use awk '!i[$1]++ ' log;
3. The sort+sed command also requires the sort command to be sorted first.
Copy Code code as follows:
shell> sort-k2n File | Sed ' $! N /^\ (. *\) \n\1$/! P D