In Linux, unlike windows, windows can be used for window operations. it consists of many different commands. This article describes the most common tools used in Linux to process text using Shell: find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, awk
The examples and parameters provided are the most common and practical;
The principle I use for shell scripts is to write a single line of commands. try not to exceed 2 lines;
If you have more complex task requirements, consider python;
Find File Search
• Find txt and PDF files
Copy codeThe code is as follows: find. \ (-name "*. txt"-o-name "*. pdf" \)-print
• Regular expression search for .txt and pdf
Copy codeThe code is as follows: find.-regex ". * \ (\. txt | \. pdf \) $"
-Iregex: ignore case-insensitive regular expressions.
• Negative parameters
Search for all non-txt text
Copy codeThe code is as follows: find .! -Name "*. txt"-print
• Specify search depth
Print the file in the current directory (depth: 1)
Copy codeThe code is as follows: find.-maxdepth 1-type f
Custom Search
• Search by type:
Copy codeThe code is as follows: find.-type d-print // only list all directories
-Type f file/l symbolic link
• Search by time:
-Atime access time (unit: Day, unit:-amin, similar to the following)
-Mtime modification time (content modified)
-Ctime change time (metadata or permission change)
All files accessed in the last seven days:
Copy codeThe code is as follows: find.-atime 7-type f-print
• Search by size:
W words k M G
Search for files larger than 2 k
Copy codeThe code is as follows: find.-type f-size + 2 k
Search by permission:
Copy codeThe code is as follows: find.-type f-perm 644-print // find all files with executable permissions
Search by User:
Copy codeThe code is as follows: find.-type f-user weber-print // find the file owned by the user weber
Subsequent actions after finding
• Delete:
Delete all swp files in the current directory:
Copy codeThe code is as follows: find.-type f-name "*. swp"-delete
• Execute actions (powerful exec)
Copy codeThe code is as follows: find.-type f-user root-exec chown weber {}\; // change the ownership in the current directory to weber
Note: {} is a special string. for each matching file, {} is replaced with the corresponding file name;
Eg: copy all the files found to another Directory:
Copy codeThe code is as follows: find.-type f-mtime + 10-name "*. txt"-exec cp {} OLD \;
• Combined with multiple commands
Tips: If you need to execute multiple commands later, you can write multiple commands as one script. -Execute the script when exec is called;
Copy codeThe code is as follows:-exec./commands. sh {}\;
-The identifier of the print
'\ N' is used by default as the delimiter of the file;
-Print0 uses '\ 0' as the delimiter of the file to search for files containing spaces;
Grep text search
Grep match_patten file // Default Access matching row
• Common parameters
-O only outputs matched text lines VS-v only outputs unmatched text lines
-C: The number of times the statistics file contains text
Copy codeThe code is as follows: grep-c "text" filename
-N: print the matched row number.
-I case insensitive during search
-L print only the file name
• Recursive text search in multi-level directories (a programmer's favorite search code ):
Copy codeThe code is as follows: grep "class".-R-n
• Match multiple modes
Copy codeThe code is as follows: grep-e "class"-e "vitural" file
• Grep output the name of the file ending with \ 0: (-z)
Copy codeThe code is as follows: grep "test" file *-lZ | xargs-0 rm
Xargs command line parameter conversion
Xargs can convert input data into command line parameters of a specific command. in this way, it can be used together with many commands. Such as grep, such as find;
• Convert multi-row output to single-row output
Cat file.txt | xargs
\ N is the delimiter between multiple texts.
• Convert a single row into multiple rows of output
Cat single.txt | xargs-n 3
-N: specifies the number of fields displayed in each row.
Xargs parameter description
-D defines the delimiters (by default, the delimiters of multiple lines with spaces are \ n)
-N: specify the output as multiple rows.
-I {} specifies the replacement string, which will be replaced during xargs extension. when multiple parameters are required for the command to be executed
Eg:
Copy codeThe code is as follows: cat file.txt | xargs-I {}./command. sh-p {}-1
-0: specify \ 0 as the input delimiter.
Eg: count the number of program lines
Copy codeThe code is as follows: find source_dir/-type f-name "*. cpp"-print0 | xargs-0 wc-l
Sort sorting
Field description:
-N sort by Number VS-d sort by lexicographically
-R reverse sorting
-K N indicates sorting by column N
Eg:
Copy codeThe code is as follows:
Sort-nrk 1 data.txt
Sort-bd data // ignore leading blank characters such as spaces
Uniq eliminates duplicate rows
• Eliminate duplicate rows
Copy codeThe code is as follows: sort unsort.txt | uniq
• Count the number of times each row appears in a file
Copy codeThe code is as follows: sort unsort.txt | uniq-c
• Identify duplicate rows
Copy codeThe code is as follows: sort unsort.txt | uniq-d
You can specify the repeated content to be compared in each line:-s start position-w comparison character count
Conversion using tr
• Common usage
Copy codeThe code is as follows:
Echo 12345 | tr '0-9 ''123' // encryption/decryption conversion, replace the corresponding character
Cat text | tr' \ t''' // Convert a tab to a space
• Tr delete characters
Copy codeThe code is as follows: cat file | tr-d '0-9' // delete all numbers
-C: Complement
Copy codeThe code is as follows:
Cat file | tr-c '0-9' // Obtain all numbers in the file
Cat file | tr-d-c '0-9 \ n' // delete non-numeric data
• Tr compressed characters
Duplicate characters in tr-s compressed text; most commonly used to compress redundant spaces
Copy codeThe code is as follows: cat file | tr-s''
• Character classes
Various character classes are available in tr:
Alnum: letters and numbers
Alpha: letter
Digit: number
Space: White space
Lower: lower case
Upper: uppercase
Cntrl: Control (non-printable) characters
Print: Printable characters
Usage: tr [: class:] [: class:]
Copy codeThe code is as follows: eg: tr '[: lower:]' [: upper:]'
Cut Split text by column
• 2nd and 4th columns of the captured File:
Copy codeThe code is as follows: cut-f2, 4 filename
• Remove all columns except 3rd columns from the file:
Copy codeThe code is as follows: cut-f3 -- complement filename
•-D specifies the delimiters:
Copy codeThe code is as follows: cat-f2-d ";" filename
• Cut range
N-the nth field to the end
-M: 1st fields are M.
N-M N to M fields
• Unit of cut
-B is in bytes.
-C in characters
-F is in the unit of fields (using delimiters)
• Eg:
Copy codeThe code is as follows:
Cut-C1-5 file // print the first to five characters
Cut-C-2 file // print the first 2 characters
Paste concatenates text by column
Concatenates two texts by column;
Copy codeThe code is as follows:
Cat file1
1
2
Cat file2
Colin
Book
Paste file1 file2
1 colin
2 book
The default delimiter is a tab. you can use-d to specify the delimiter.
Paste file1 file2-d ","
1, colin
2, book
Wc statistical line and character tools
Wc-l file // count the number of rows
Wc-w file // count words
Wc-c file // Number of characters
Sed text replacement tool
• First Place replacement
Copy codeThe code is as follows: seg's/text/replace_text/'file // replace the first matched text in each line
• Global replacement
Copy codeThe code is as follows: seg's/text/replace_text/g' file
After replacement by default, the replaced content is output. if you need to replace the original file directly, use-I:
Copy codeThe code is as follows: seg-I's/text/repalce_text/g' file
• Remove blank rows:
Copy codeThe code is as follows: sed '/^ $/d' file
• Variable conversion
Matched strings are referenced by tag &
Copy codeThe code is as follows:
Echo this is en example | seg's/\ w +/[&]/g'
$> [This] [is] [en] [example]
• Substring matching tag
The content of the first matching bracket is referenced by Mark \ 1.
Copy codeThe code is as follows: sed's/hello \ ([0-9] \)/\ 1 /'
• Double quotation marks
Sed is usually referenced by single quotes. you can also use double quotation marks. after double quotation marks are used, double quotation marks evaluate the expression:
Copy codeThe code is as follows: sed's/$ var/HLLOE /'
When using double quotation marks, we can specify variables in the sed style and replacement string;
Copy codeThe code is as follows:
Eg:
P = patten
R = replaced
Echo "line con a patten" | sed "s/$ p/$ r/g"
$> Line con a replaced
• Other examples
String insertion character: converts each line of content (PEKSHA) in the text to PEK/SHA
Copy codeThe code is as follows: sed's/^. \ {3 \}/& \/g' file
Awk data stream processing tool
• Awk script structure
Awk 'In in {statements} statements2 END {statements }'
• Work style
1. execute the begin statement block;
2. read a row from a file or stdin, and then execute statements2. Repeat this process until all the files are read;
3. execute the end statement block;
Print current row
• When print without parameters is used, the current row is printed;
Copy codeThe code is as follows: echo-e "line1 \ nline2" | awk 'In in {print "start" }{ print} END {print "End "}'
• When print is separated by commas (,), the parameters are bounded by spaces;
Copy codeThe code is as follows:
Echo | awk '{var1 = "v1"; var2 = "V2"; var3 = "v3 ";\
Print var1, var2, var3 ;}'
$> V1 V2 v3
• Use-concatenation operator ("" As concatenation operator );
Copy codeThe code is as follows:
Echo | awk '{var1 = "v1"; var2 = "V2"; var3 = "v3 ";\
Print var1 "-" var2 "-" var3 ;}'
$> V1-V2-v3
Special variable: nr nf $0 $1 $2
NR: indicates the number of records, which corresponds to the current row number during execution;
NF: indicates the number of Fields. the total number of fields corresponding to the current row during execution;
$0: The variable contains the text of the current row during execution;
$1: text content of the first field;
$2: text content of the second field;
Copy codeThe code is as follows: echo-e "line1 f2 f3 \ n line2 \ n line 3" | awk '{print NR ": "$0"-"$1"-"$2 }'
• Print the second and third fields of each row:
Copy codeThe code is as follows: awk '{print $2, $3}' file
• Number of statistics files:
Copy codeThe code is as follows: awk 'end {print NR} 'file
• Add the first field of each row:
Copy codeThe code is as follows:
Echo-e "1 \ n 2 \ n 3 \ n 4 \ n" | awk 'begin {num = 0;
Print "begin" ;}{ sum + = $1;} END {print "="; print sum }'
Passing external variables
Copy codeThe code is as follows:
Var = 1000.
Echo | awk '{print vara}' vara = $ var # Input from stdin
Awk '{print vara}' vara = $ var file # Input from file
Filter the rows processed by awk using styles
Awk 'NR <5' # the row number is smaller than 5
Awk 'NR = 1, NR = 4 {print} 'file # print the row numbers equal to 1 and 4
Awk '/linux/' # lines containing linux text (can be specified using regular expressions, super powerful)
Awk '! /Linux/'# lines that do not contain linux text
Set the delimiter
Use-F to set the delimiter (space by default)
Awk-F: '{print $ NF}'/etc/passwd
Read command output
Use getline to read the output of the external shell command into the variable cmdout;
Copy codeThe code is as follows:
Echo | awk '{"grep root/etc/passwd" | getline cmdout; print cmdout }'
Use loops in awk
For (I = 0; I <10; I ++) {print $ I ;}
For (I in array) {print array [I];}
Eg:
Print rows in reverse order: (implementation of the tac command)
Copy codeThe code is as follows:
Seq 9 | \
Awk '{lifo [NR] = $0; lno = NR }\
END {for (; lno>-1; lno --) {print lifo [lno];}
}'
Awk implements head and tail commands
• Head:
Copy codeThe code is as follows: awk 'NR <= 10 {print} 'filename
• Tail:
Copy codeThe code is as follows:
Awk '{buffer [NR % 10] = $0;} END {for (I = 0; I <11; I ++ ){\
Print buffer [I % 10]} 'filename
Print specified column
• Awk implementation:
Copy codeThe code is as follows: ls-lrt | awk '{print $6 }'
• Cut implementation
Copy codeThe code is as follows: ls-lrt | cut-f6
Print the specified text area
• Determine the row number
Copy codeThe code is as follows: seq 100 | awk 'NR = 4, NR = 6 {print }'
• Confirm the text
Print the text between start_pattern and end_pattern;
Copy codeThe code is as follows: awk '/start_pattern/,/end_pattern/'filename'
Eg:
Copy codeThe code is as follows:
Seq 100 | awk '/13/,/15 /'
Cat/etc/passwd | awk '/mai. * mail/,/news. * news /'
Common built-in functions of awk
Index (string, search_string): returns the position where search_string appears in string.
Sub (regex, replacement_str, string): replace the first part of the regular expression with replacement_str;
Match (regex, string): checks whether regular expressions can match strings;
Length (string): returns the string length.
Copy codeThe code is as follows: echo | awk '{"grep root/etc/passwd" | getline cmdout; print length (cmdout )}'
Printf is similar to printf in C language.
Eg:
Copy codeThe code is as follows: seq 10 | awk '{printf "-> % 4s \ n", $1 }'
Iterate the rows, words, and characters in the file
1. Each row in the iteration file
• While loop method
Copy codeThe code is as follows:
While read line;
Do
Echo $ line;
Done <file.txt
Change to sub-shell:
Cat file.txt | (while read line; do echo $ line; done)
• Awk method:
Cat file.txt | awk '{print }'
2. iterate every word in a row
Copy codeThe code is as follows:
For word in $ line;
Do
Echo $ word;
Done
3. iterate every character
$ {String: start_pos: num_of_chars}: extract a character from the string. (bash text slicing)
$ {# Word}: returns the length of the variable word.
Copy codeThe code is as follows:
For (I = 0; I <$ {# word}; I ++ ))
Do
Echo $ {word: I: 1 );
Done