Merge and split files:
1. Sort
Command Format:
Sort-CMU-O output_file [other options] + pos1 + pos2 input_file
-C: whether the test file has been classified.
-M combines two classification files.
-U: delete all duplicate rows.
-O stores the output file name of the s o r t result.
Other options include:
-When B uses a domain for classification, the first space is ignored.
-N indicates that the category is a digital classification in the domain.
-T domain separator; use non-spaces or tab keys to separate fields.
-R: reverse sorting.
+ N is the domain number. Use this domain number to start classification. [The first column is the domain 0]
Pos1 pos2 is passed to M, N. M is the domain number, and N is the number of characters that start to be classified. For example, 4 or 6 means that the data is classified by 5th domains and starts from 7th characters.
$ Sort f1.txt
$ Sort-T: f2.txt
For numeric fields, use
$ Sort + 0n f1.txt % 0 indicates 1st domains. Any domain can be used.
$ Sort + 2 f1.txt % sorted by 3rd Domains
Remove duplicate rows when sorting $ sort-u f1.txt %
$ Head-N f1.txt % display the first n rows of the file
$ Tail-N f1.txt
$ Sort + 0n-r f1.txt | head-1 | awk '{print "worst case", $1, "has been rented by", $2 }'
Worst Case 483 has been rented by tfj
Merge files:
$ Sort-M file1 file2
Extract and sort the usernames in/etc/passwd:
$ Sort-T: + 0/etc/passwd | awk-F: '{print $1 }'
It is sorted by the last domain of the IP Address:
$ Sort-T. + 3N iplist_file % the file content is NNN. NNN xx
Uniq: delete * consecutive * repeated rows
Format: uniq-UDC-F input_file [output_file]
Option description:
-U only displays non-duplicate rows.
-D: only duplicate data rows are displayed. Only one row is displayed for each duplicate row.
-C: print the number of times each duplicate row appears.
-F n is a number, and the first n fields are ignored.
Some systems do not recognize the-F option. In this case, use-N instead.
$ More a.txt
May
May
May
Haha
May
$ Uniq
May
Haha
May
Connection file: Join
Two files must have the same domain (similar to the Join Operation of a database)
Cut: extract File Content
The general format of c u t is:
Cut [Options] file1 file2
The following describes the available options:
-C List specifies the number of characters to cut.
-F field specifies the number of cut fields.
-D specifies the domain separator different from space and t a B key.
-C is used to specify the cut range, as shown below:
-C 1st-7 Cut 5th characters, followed by 7th to characters.
-C1-50 cut the first 5 0 characters.
-F format is the same as-C format.
-F 1st cut 5th domain, domain.
-F 1st-12 cut the 1st domain, 1st 0 domain to 2 domain.
$ More
P. joines: Office Runner: id897
S. Round: Unix admin: id666
L. clip: personel chief: id982
$ Cut-D:-F3
Id897
Id666
Id982
Extract the username and shell used from/etc/passwd:
$ Cut-D:-F1, 7/etc/passwd
Root:/bin/sh
Tfj:/bin/bash
....
Paste: Paste
Paste-D: F1 F2 % the two files do not have to have the same number of rows
-D: Specifies the default delimiter space.
$ More
Tfj
ZYC
$ More B
Id111
Id222
$ Paste a B
Tfj id111
ZYC id222
Split: Split large files
Split-N file % N is the number of lines of the split file (up to 1000)
The generated small file is named xAA, XAB, xac... xzz.
TR: character conversion from stdin by replacement or Deletion
Tr-CDs pattern-string string_to_be_manipulated
$ Echo "aaaaaabbcccd" | tr-s "[A-Z]"
ABCD
Delete empty rows:
$ Tr-s "[/n]" <File
Convert uppercase to lowercase:
$ Cat file | TR "[A-Z]" [A-Z]"
All tr functions can be implemented using sed.