Linux entry notes: 17th, Linux Command Line Text/file processing tools, linux File Processing
I. File browsing
cat View file contents
more View the contents of the file in pages (only pages down)
less View the contents of the file in pages (you can page up and down)
head View the first few lines of the file (default 10 lines)
tail View the last few lines of the file (default 10 lines)
Ii. Regular Expression matching
Run grep to globally match the Regular Expression and print the row:
grep 'mingc' / etc / passwd matches the user information of mingc in this file and prints the line
find / -user mingc | grep ". * \. png $" Find all png files of mingc (pipeline operation)
Common parameters:
-i ignore case when searching
-n shows the number of rows
-v outputs lines that do not match the regular (similar to a negation operation)
-An includes the specified n lines after the result line when outputting
3. Text cutting and printing Fields
Command cut is often used to cut text lines and print certain fields:
Cut-d: fl/etc/passwd print the 1st fields separated by colons in the passwd file (User Name) (name of the username listing multiple rows) grep mingc/etc/passwd | cut-d: -f3 print the user information of mingc. The 3rd fields separated by a colon (uid)
Common parameters:
-d specifies split character (default Tab)
-f Display fields with a specific sequence number (starting from 1)
-c Display a specific range of characters (from several to several)
Example:
grep mingc / etc / passwd | cut -d: -f3 print the third field (uid) separated by colons in the user information of mingc
grep mingc / etc / passwd | cut -d: -f6,7 print mingc user home directory and login shell
grep mingc / etc / passwd | cut -c1-5 print the first 1 ~ 5 characters in the user information of mingc
grep mingc / etc / passwd | cut -c1- print all characters after the first character in the user information of mingc
grep mingc / etc / passwd | cut -c-5 print all characters before the 5th character in the user information of mingc
Iv. Text statistics
Run the wc command to count the number of lines, words, and characters of a file:
wc test.md
A row is output by default without parameters. The field format is as follows:
Number of lines word count Character Count file name
Common parameters:
-l only count the number of rows
-w only count words
-c only count bytes
-m only count characters
V. Text sorting
The command sort is used to sort the file content (you can also sort STDIN ):
sort filename
Common parameters:
-r reverse (reverse) sort
-n Sort by numbers
-f ignore case
-u deduplicate (remove duplicate lines)
-t <separator> specifies the separator (usually used with the -k parameter, simple division is meaningless)
-k n When specifying a delimiter, sort by the nth field (the number n starts from 1)
-R,-n,-t, and-k parameters can be used in combination:
Split each line of the content of the test. md file by a colon and reverse sort the 3rd fields based on numbers.
6. Remove duplicate rows
The command sort-u can remove duplicate lines from the file content, but the side effect is sorting.
The uniq command can be used to remove duplicate lines of file content (adjacent:
VII. Text comparison
The diff command is used to compare the differences between the two files:
diff test1.md test2.md
Common parameters:
-i ignore case
-b ignore space characters
-u Unified display comparison information (usually used to generate patch files)
Example:
Diff-u old. md new. md> update. patch: generate the file update information to the patch file.
8. spelling check
The command aspell is used to display and check English spelling:
aspell check filenameaspell list < filename
(CentOS 6.9 64-bit systems do not seem to have this command, and it is not commonly used, not detailed)
9. character conversion
The tr command is used to convert characters from standard input. If you are processing input from a file, you need to redirect it.
Delete keywords:
tr -d 'keyword' < filename
Case sensitivity:
tr 'a-z' 'A-Z' < filename
10. Stream editing-search replacement
Sed is a stream editing tool that can be used with regular expressions. When processing a file stream, store the currently processed rows in a temporary buffer zone, called pattern space, and then process the content in the buffer zone. After processing, send the buffer content to the screen. Next, process the next row, and repeat until the end of the file. The file content has not changed unless the redirected storage output is used. Sed commands are mainly used to automatically edit one or more files, simplify repeated operations on files, and write conversion programs.
Syntax:
sed [options] 'command' file(s)sed [options] -f scriptfile file(s)
Common Parameters options:
-e <script> specifies a script to process the text
-f <script> specifies a script to process the text
-n Display only processed results
Common commands:
d delete line
D delete the first line
s replace the specified character
h Copy content to buffer
H append content to buffer
g Get the contents of the buffer and replace the current text
G Get the contents of the buffer, appended to the current text
p print line
P prints the first line
q exit sed
Example:
sed 's / linux / unix / g' filename s means search and replace, linux replaces with unix, g means global replacement, filename target file
sed 'l, 50s / linux / unix / g' filename 1 to 50 for global replacement
sed -e 's / linux / unix / g' -e 's / ming / mingc / g' filename -e
sed -f script filename specifies a script to process the file
Link: sed command-Linux Command Overview