This article describes the most common tools for using the shell to process text under Linux:
Find, grep, Xargs, Sort, uniq, tr, cut, paste, WC, sed, awk;
The examples and parameters provided are the most commonly used and most practical.
I use the principle of Shell script is command line writing, try not to more than 2 lines;
If you have more complex task requirements, consider python.
Find File Lookup
- Find txt and PDF files
Find. (-name "*.txt"-o-name "*.pdf")-print
- Regular way to find. txt and PDF
Find. -regex ". * (. txt|. PDF) $ "
-iregex: Ignoring case-sensitive regular
- Negation parameter find all non-txt text
Find. ! -name "*.txt"-print
- Specifies that the search depth prints out the current directory file (depth is 1)
Find. -maxdepth 1-type F
Custom Search
- Search By Type:
Find. -type d-print //List all directories only
-type f File/L Symbolic link
- Search by Time:-atime access Time (in days, minutes units-amin, similar)-mtime modification time (content modified)-ctime change time (metadata or permission changes)
All files that have been visited in the last 7 days:
Find. -atime 7-type F-print
- Search by Size: W-word k M g looking for files larger than 2k
Find. -type f-size +2k
Search by permissions:
Find. -type f-perm 644-print//Find all files with executable permissions
Search by User:
Find. -type f-user weber-print//Find the files owned by the user Weber
Follow-up action found after
- Delete: Delete all SWP files in the current directory:
Find. -type f-name "*.SWP"-delete
- Perform actions (powerful exec)
Find. -type f-user root-exec chown Weber {}; Change ownership in the current directory to Weber
Note: {} is a special string, for each matching file, {} will be replaced with the corresponding file name;
Eg: Copy all the found files to another directory:
Find. -type f-mtime +10-name "*.txt"-exec cp {} old;
- Combine multiple commands tips: If you need to execute multiple commands later, you can write multiple commands into a single script. Then execute the script when the-exec is called;
-exec./commands.sh {} \;
Delimiter of the-print
Use ' n ' as the file delimiter by default;
-print0 use "as the delimiter for the file, so you can search for files that contain spaces;
grep Text Search
grep match_patten File//default access matching line
- Common Parameters-O output only matching lines of text vs-v only output no matching lines of text the number of times the text is contained in the. C Statistic File
Grep-c "text" filename
-N Prints matching line numbers
-I ignore case when searching
-L print File name only
- Recursive search of text in a multilevel directory (the programmer searches for code favorites):
grep "Class". -r-n
- Match multiple patterns
Grep-e "Class"-E "vitural" file
- grep output as a filename for the trailing character: (-Z)
grep "Test" file*-lz| xargs-0 RM
Xargs command-line argument conversions
Xargs can convert input data into command-line arguments for a particular command, so that it can be combined with a number of commands. Like grep, like find;
- Convert multi-line output to single-line output cat file.txt| XARGSN is a delimiter between multiple lines of text
- Convert a single line to multiline output cat Single.txt | Xargs-n 3-n: Specify the number of fields to display per row
Xargs parameter Description
-D defines delimiters (the delimiter is n for multiple lines by default)
-n Specifies that the output is multiple lines
-I {} Specifies the replacement string, which is replaced when the xargs extension is used, when multiple arguments are required for the command to be executed
eg
Cat File.txt | Xargs-i {}./command.sh-p {}-1
-0: Specified as input delimiter
Eg: number of statistical program lines
Find source_dir/-type f-name "*.cpp"-print0 |xargs-0 wc-l
Sort sorts
Field Description:
-N Sort by number vs-d in dictionary order
-R Reverse Order
-k n Specifies sorting by nth column
eg
SORT-NRK 1 DATA.TXTSORT-BD Data//ignore leading whitespace characters such as spaces
Uniq Eliminating Duplicate rows
Convert with TR
- General usage
echo 12345 | Tr ' 0-9 ' 9876543210 '//Add decryption conversion, replace the corresponding character cat text| tr ' t ' ' //tab Turn space
- TR Delete character
Cat File | Tr-d ' 0-9 '//Delete all numbers
-C Seeking complement set
Cat File | Tr-c ' 0-9 '//Get all the numbers in the file cat file | tr-d-C ' 0-9 n ' //Delete non-numeric data
- TR compressed characters tr-s the repeated characters that appear in compressed text; most commonly used to compress extra spaces
Cat File | Tr-s "
- Character class TR can be used in various character classes: Alnum: Letters and Numbers Alpha: letters
Digit: Digital
Space: white space characters
Lower: lowercase
Upper: Uppercase
Cntrl: Controlling (non-printable) characters
Print: Printable characters
How to use: TR [: Class:] [: Class:]
Eg:tr ' [: Lower:] ' [: Upper:] '
Cut split text by column
Paste stitching text by column
Stitch two text together by column;
Cat File112cat file2colinbookpaste file1 file21 colin2 Book
The default delimiter is a tab character, which can be specified with-D
Paste File1 file2-d ","
1,colin
2,book
Tools for WC statistics lines and characters
Wc-l File//Count rows
Wc-w File//Count of words
Wc-c File//Count characters
Sed Text Replacement tool
- First place replacement
Seg ' s/text/replace_text/' file //replace the first matching text of each line
- Global substitution
Seg ' s/text/replace_text/g ' file
After the default substitution, output the replaced content, if you need to replace the original file directly, use-I:
Seg-i ' s/text/repalce_text/g ' file
- To remove a blank line:
Sed '/^$/d ' file
- Variable conversions A matched string is referenced by Tag &.
echo this is en example | Seg ' s/\w+/[&]/g ' $>[this] [is ] [en] [Example]
- SUBSTRING match tag The first matching parenthesis content is referenced using a tag
Sed ' S/hello ([0-9])/1/'
- Double quotation marks the SED is usually quoted as a single quotation mark, or double quotation marks, and double quotation marks are used to evaluate an expression:
Sed ' s/$var/hlloe/'
When using double quotes, we can specify variables in the SED style and in the replacement string;
Eg:p=pattenr=replacedecho "line con a patten" | Sed "s/$p/$r/g" $>line con a replaced
- Additional sample string Insert character: Converts each line of content in the text (Peksha) to Pek/sha
Sed ' s/^. {3}/
awk Data Flow processing tool
- AWK script Structure awk ' begin{statements} statements2 end{statements} '
- How it works 1. Execute the statement block in Begin, 2. Read a line from a file or stdin, and then execute statements2, repeating the process until the file is fully read; 3. Execute the end statement block;
Print printing when moving forward
- When you use print without parameters, the current line is printed;
Echo-e "Line1nline2" | awk ' Begin{print ' "start"} {print} end{print "END"} '
- When print is separated by commas, the parameters are bounded by spaces;
echo | awk ' {var1 = ' v1 '; var2 = "V2"; var3= "V3"; print var1, var2, var3;} ' $>V1 V2 v3
- Use the-stitching method ("" as the stitching character);
echo | awk ' {var1 = ' v1 '; var2 = "V2"; var3= "V3"; print var1 "-" var2 "-" VAR3;} ' $>v1-v2-v3
Special variable: NR NF $ $ $
NR: Indicates the number of records, in the course of the implementation of the forward number;
NF: Indicates the number of fields, the total number of fields that should go forward during the execution;
$: This variable contains the text content of the current line during execution;
$: The text content of the first field;
$: The text content of the second field;
Echo-e "line1 F2 f3n line2 n Line 3" | awk ' {print NR ': ' $ '-' $ '-' $ '
- Print the second and third fields of each line:
awk ' {print $, $ $} ' file
- Number of rows in the statistics file:
awk ' END {print NR} ' file
- Accumulate the first field of each row:
Echo-e "1n 2n 3n 4n" | awk ' begin{num = 0; print "Begin";} {sum + = $;} END {print "= ="; Print sum} '
Passing external variables
Var=1000echo | awk ' {print Vara} ' vara= $var # input from Stdinawk ' {print Vara} ' vara= $var file # input from files
To filter the rows that awk handles with a style
awk ' NR awk ' nr==1,nr==4 {print} ' file #行号等于1和4的打印出来
awk '/linux/' #包含linux文本的行 (can be specified with regular expressions, super powerful)
awk '!/linux/' #不包含linux文本的行
Set delimiter
Use-F to set delimiters (default is a space)
Awk-f: ' {print $NF} '/etc/passwd
Read command output
Using Getline, the output of the external shell command is read into the variable cmdout;
echo | awk ' {"grep root/etc/passwd" | getline cmdout; print Cmdout} '
Using loops in awk
For (I=0;ifor (i in array) {print array[i];}
eg
Print lines in reverse order: (Implementation of the TAC command)
Seq 9| awk ' {LIFO[NR] = $ LNO=NR} end{for (; lno>-1;lno--) {print Lifo[lno];}} ‘
AWK implements head, tail commands
Print the specified column
Print the specified text area
- Determine line number
Seq 100| awk ' Nr==4,nr==6{print} '
- Determines text that is printed between Start_pattern and End_pattern;
awk '/start_pattern/,/end_pattern/' filename
eg
SEQ 100 | awk '/13/,/15/' cat/etc/passwd| awk '/mai.*mail/,/news.*news/'
awk common built-in functions
Index (string,search_string): Returns the position search_string appears in the string
Sub (regex,replacement_str,string): Replace the first content of the regular match with the REPLACEMENT_STR;
Match (regex,string): Checks if the regular expression matches the string;
Length (String): Returns the string length
echo | awk ' {"grep root/etc/passwd" | getline cmdout; print length (cmdout)} '
printf, similar to the C language, formats the output
eg
Seq 10 | awk ' {printf '->%4sn ', ' $ '
Iterate over the lines, words, and characters in the file 1. Iterate through each line in a file
2. Iterate through each word in a row
For word in $line;d o echo $word;d one
3. Iterate through each of the characters
${string:start_pos:num_of_chars}: Extracts a character from a string; (bash text slices)
${#word}: Returns the length of a variable word
For ((i=0;i<${#word};i++)) Doecho ${word:i:1);d One
Linux Shell Text Processing tool collection