Linux text processing tool sort and uniq instance details
Linux text processing tool sort and uniq instance details
Sort: Sorts input rows by key value field, data type option, and locate
Syntax: sort [option] [file (s)]
Main options:
-B. Ignore the white space at the beginning.
-C. Check whether the files are correctly sorted.
-F case-insensitive sorting is considered as uppercase letters
-M combines several sorted files into a sorted output data stream.
-M: sort the first three letters by the abbreviation of the month.
-K defines the sort key value field and sorts it by that field (file ).
-N is sorted by the value size.
-O outfile: saves the sorted results to the specified file.
-R is sorted in reverse order, from large to small
-T chat uses a single character chat as the default field delimiter to replace the default blank character
-U only has unique records. All records with the same key value are discarded and the same data appears once.
-- Help: displays help.
-- Version: displays version information.
1). sort by field
Sort key value field type
Letter |
Description |
B |
Ignore blank at the beginning |
D |
Dictionary order |
F |
Case Insensitive |
G |
Compare with a general floating point number, only applicable to the GNU version |
I |
Ignore unprintable characters |
N |
Compare with an integer (number) |
R |
Inverted sorting order |
Instance 1: ordered by traditional ASCII codes
[Bkjia @ test ~] $ LC_ALL = C sort/etc/passwd
# Gzdev1: x: 829: 829:/home/gzdev1:/bin/bash
# Gzdev2: x: 830: 830:/home/gzdev2:/bin/bash
...
Meat: x: 814: 814:/home/Meat:/bin/bash
Adm: x: 3: 4: adm:/var/adm:/sbin/nologin
...
Bin: x: 1: 1: bin:/sbin/nologin
Cvsroot: x: 778: 502:/home/cvsroot:/bin/bash
...
Messages: x: 81: 81: System message bus: // sbin/nologin
Dovecot: x: 99: 99: dovecot:/usr/libexec/dovecot:/sbin/nologin
...
Ftpuser: x: 505: 505:/home/ftpuser:/bin/bash
EM: x: 42: 42:/var/EM:/sbin/nologin
Appendix:
# LC_ALL = C is used to remove all localization settings so that the command can be correctly executed.
# LC_ALL: it is a macro. If this value is set, this value will overwrite the setting values of all LC. Note that the LANG value is not affected by this macro.
# "C" is the default locale of the system, and "POSIX" is the alias of "C. So when we install a new system, the default locale is C or POSIX.
Instance 2: sorted by user name
[Bkjia @ test ~] $ Sort-t:-k1, 1/etc/passwd
Adm: x: 3: 4: adm:/var/adm:/sbin/nologin
Avahi: x: 70: 70: Avahi daemon: // sbin/nologin
Bin: x: 1: 1: bin:/sbin/nologin
Cvsroot: x: 778: 502:/home/cvsroot:/bin/bash
Daemon: x: 2: 2: daemon:/sbin/nologin
Ftp: x: 14: 50: FTP User:/var/ftp:/sbin/nologin
EM: x: 42: 42:/var/EM:/sbin/nologin
...
#-T specifies that the separator is a semicolon, and-K specifies that the first character of the first field is sorted.
Instance 3: reverse UID sorting
[Bkjia @ test ~] $ Sort-t:-k3nr/etc/passwd
[Bkjia @ test ~] $ Sort-t:-k3nr, 3/etc/passwd
# The more precise field type should be-k3, 3nr or-k3nr, 3 or-k3, 3-n-r,
# Start from Field 3, sort in reverse order of value type, and end with field 3
Nfsnobody: x: 65534: 65534: Anonymous NFS User:/var/lib/nfs:/sbin/nologin
Sninf_kenchoi: x: 860: 860:/home/sninf_kenchoi:/bin/bash
Bkjia: x: 859: 859:/home/bkjia:/bin/bash
Gz_kinma: x: 857: 857:/home/gz_kinma:/bin/bash
Sninf_tonyhung: x: 856: 856:/home/sninf_tonyhung:/bin/bash
Sninf_simonlau: x: 855: 855:/home/sninf_simonlau:/bin/bash
Sninf_kenchan: x: 854: 854:/home/sninf_kenchan:/bin/bash
Sninf_thomaschan: x: 853: 853:/home/sninf_thomaschan:/bin/bash
Gz_jones: x: 851: 851:/home/gz_jonesyan:/bin/bash
...
#-T indicates that the Delimiter is a semicolon,-K indicates that the fields are sorted by 3rd, n indicates that the fields are compared by integer, and r indicates that the fields are sorted in reverse order.
Instance 4: sort by unique GID
[Bkjia @ test ~] $ Sort-t:-k4n-u/etc/passwd
Root: x: 0: 0: root:/bin/bash
Bin: x: 1: 1: bin:/sbin/nologin
Daemon: x: 2: 2: daemon:/sbin/nologin
Adm: x: 3: 4: adm:/var/adm:/sbin/nologin
Lp: x: 4: 7: lp:/var/spool/lpd:/sbin/nologin
Mail: x: 8: 12: mail:/var/spool/mail:/sbin/nologin
News: x: 9: 13: news:/etc/news:
...
#-T specifies that the Delimiter is a semicolon, and-K indicates that the fields are sorted by 4th. n indicates that the fields are compared by integers, And the u table indicates that the fields are uniquely sorted.
2). Sort text blocks
[Bkjia @ test ~] $ Cat> my-friends
# SORTKEY: ma, Kin
Kin ma
Zhujiangxincheng 78
D-305 Letaijie
TaiShan
# SORTKEY: yan, Jones
Jones yan
Dongpu 68
B _602 Dongpujie
YangJiang
# SORTKEY: wu, Will
Will wu
Shangshe 36
A_205 Heguanlu
MaoMing
[Bkjia @ test ~] $ Cat my-friends |
Awk-v RS = "" '{gsub ("\ n", "^ Z"); print}' |
Sort-f
# SORTKEY: ma, Kin ^ ZKin ma ^ Zzhujiangxincheng 78 ^ ZD-305 Letaijie ^ ZTaiShan
# SORTKEY: wu, Will ^ ZWill wu ^ ZShangshe 36 ^ ZA_205 Heguanlu ^ ZMaoMing
# SORTKEY: yan, Jones ^ ZJones yan ^ ZDongpu 68 ^ ZB_602 Dongpujie ^ ZYangJiang
[Bkjia @ test ~] $ Cat my-friends | # pipeline in the address data file
Awk-v RS = "" '{gsub ("\ n", "^ Z"); print}' | # convert the address to a single row
Sort-f | # sort address data, case insensitive
Awk-v ORS = "\ n" '{gsub ("^ Z", "\ n"); print }'
# Restore the row structure. Note: Some versions cannot be restored.
# Gsub () is replaced globally, similar to the s/x/y/g architecture under sed.
# SORTKEY: ma, Kin
Kin ma
Zhujiangxincheng 78
D-305 Letaijie
TaiShan
# SORTKEY: wu, Will
Will wu
Shangshe 36
A_205 Heguanlu
MaoMing
# SORTKEY: yan, Jones
Jones yan
Dongpu 68
B _602 Dongpujie
YangJiang
[Bkjia @ test ~] $ Cat my-friends |
Awk-v RS = "" '{gsub ("\ n", "^ Z"); print}' |
Sort-f |
Awk-v ORS = "\ n" '{gsub ("^ Z", "\ n"); print}' |
Grep-V' # SORTKRY '# delete a marked row
3). sort stability: unstable
[Bkjia @ test ~] $ Sort-t _-k1, 1-k2, 2 <EOF
> One_two
> One_two_three
> One_two_four
> One_two_five
> EOF
One_two
One_two_five
One_two_four
One_two_three
4). Duplicate Deletion
Uniq: filter data
Option:
-C: Display repeated times
-D: only duplicate rows are displayed.
-U only displays non-duplicate rows
Usage:
Sort... | uniq...
Instance:
[Bkjia @ test ~] $ Cat> number
One
Two
Threefour
Four
Five
Two
One
One
[Bkjia @ test ~] $ Sort number | uniq # sort
Five
Four
One
Threefour
Two
[Bkjia @ test ~] $ Sort number | uniq-c # show the number of duplicates along with the band
1 five
2 one
1 threefour
2 two
[Bkjia @ test ~] $ Sort number | uniq-d # Show duplicate rows only
One
Two
[Bkjia @ test ~] $ Sort number | uniq-u # only show non-duplicate rows
Five
Threefour
[Bkjia @ test ~] $
This article permanently updates the link address: