This article will cover the most common tools used by Linux to process text using the shell: Find, grep, Xargs, Sort, uniq, tr, cut, paste, WC, sed, awk, and the examples and parameters provided are the most common and practical. The principle I use with shell scripts is to write command lines, try not to exceed 2 lines, or consider Python if you have more complex task requirements.
Find File Lookup
Find txt and PDF files
Find txt and PDF files
Find. (-name "*.txt"-o-name "*.pdf")-print
Regular way to find. txt and PDF
Find. -iregex ". * (. txt|. PDF) ___fckpd___1quot; #-iregex: Ignoring case-sensitive regular
Negative parameters to find all non-txt text
Find. ! -name "*.txt"-print
Specify Search Depth
Print out the current directory file (depth is 1)
Find. -maxdepth 1-type F
Custom Search
Search By Type:
-type f File/L Symbolic link
Find. -type d-print//List all directories only
Search by Time:
-atime access Time (in days, minutes units is-amin, similar to the following)
-mtime modification Time (content modified)
-ctime Change Time (metadata or permission changes)
All files that have been visited in the last 7 days:
Find. -atime 7-type F-print
Search by Size:
W-word k M G, looking for files larger than 2k
Find. -type f-size +2k
Search by permissions:
Find. -type f-perm 644-print//Find all files with executable permissions
Search by User:
Find. -type F-user weber-print//Find the files owned by the user Weber
Follow-up action found after
Delete all SWP files in the current directory:
Find. -type f-name "*.SWP"-delete
Perform actions (powerful exec)
Find. -type f-user root-exec chown Weber {}; Change ownership in the current directory to Weber
Note: {} is a special string, for each matching file, {} will be replaced with the corresponding file name;
Eg: Copy all the found files to another directory:
Find. -type f-mtime +10-name "*.txt"-exec cp {} old;
Combine multiple commands
Tips: If you need to execute multiple commands later, you can write multiple commands into a script, and then execute the script-exec the call.
-exec./commands.sh {};
-print delimiter, using ' n ' as the delimiter of the file by default;
-print will add a carriage return newline character after each output, and-print0 will not, use ' ' as the delimiter of the file, so that you can search for files containing spaces;
In the current directory, the files are sorted from large to small (including hidden files), and the file name is not "." :
Find. -maxdepth 1! -name "."-print0 | xargs-0 Du-b | Sort-nr | head-10 | nl
grep Text Search
grep match_patten File//default access matching line
Common parameters
-O outputs only matched lines of text vs-v output no matching lines of text
The number of times the text is contained in the-C statistic file
Grep-c "text" filename
-N Prints matching line numbers
-I ignore case when searching
-L print File name only
Recursive search of text in a multilevel directory (the programmer searches for code favorites):
grep "Class". -r-n
Match multiple patterns
Grep-e "Class"-E "vitural" file
grep output as a filename for the trailing character: (-Z)
grep "Test" file*-lz| xargs-0 RM
Xargs command-line argument conversions
Xargs can convert input data into command-line arguments for a particular command, so that it can be combined with a number of commands. Like grep, like find;
Convert multi-line output to single-line output
Cat file.txt| Xargs
n is the delimiter between multiple lines of text
Convert a single line to multiple lines of output
Cat Single.txt | Xargs-n 3-n: Specify the number of fields to display per row
Xargs parameter Description
-D defines delimiters (the delimiter is n for multiple lines by default)
-n Specifies that the output is multiple lines
-I {} Specifies the replacement string, which is replaced when the xargs extension is used, when multiple arguments are required for the command to be executed
eg
Cat File.txt | Xargs-i {}./command.sh-p {}-1
-0: Specified as input delimiter
Eg: number of statistical program lines
Find source_dir/-type f-name "*.cpp"-print0 |xargs-0 wc-l
Sort sorts
Field Description:
-N Sort by number vs-d in dictionary order
-R Reverse Order
-k n Specifies sorting by nth column
eg
SORT-NRK 1 DATA.TXTSORT-BD Data//ignore leading whitespace characters such as spaces
Uniq Eliminating Duplicate rows
Eliminate duplicate rows
Sort Unsort.txt | Uniq
Count the number of times each line appears in a file
Sort Unsort.txt | Uniq-c
Find duplicate rows
Sort Unsort.txt | Uniq-d
You can specify the duplicates that need to be compared in each row:-S start position-W comparison character number
Convert with TR
General usage
echo 12345 | Tr ' 0-9 ' 9876543210 '//Add decryption conversion, replace the corresponding character
Cat text| TR ' t '//tab to space
TR Delete character
Cat File | Tr-d ' 0-9 '//Delete all numbers
-C Seeking complement set
Cat File | Tr-c ' 0-9 '//Get all the numbers in the file cat file | Tr-d-C ' 0-9 n '//delete non-numeric data
TR compression characters
Tr-s repeating characters that appear in compressed text; most commonly used to compress extra spaces
Cat File | Tr-s "
Character class
Various character classes are available in TR:
Alnum: Letters and Numbers
Alpha: Letters
Digit: Digital
Space: white space characters
Lower: lowercase
Upper: Uppercase
Cntrl: Controlling (non-printable) characters
Print: Printable characters
How to use: TR [: Class:] [: Class:]
Eg:tr ' [: Lower:] ' [: Upper:] '
Cut split text by column
Intercept the 2nd and 4th columns of the file:
cut-f2,4 filename
Go to all columns except column 3rd of the file:
CUT-F3--complement filename
-D Specify delimiter:
Cat-f2-d ";" FileName
Range of cut and take
N-nth field to end
-M 1th Field M
N-m N to M Fields
Cut-to-take units
-B in bytes
-C in Characters
-F in fields (using delimiters)
Cut-c1-5 File//print first to 5th character
Cut-c-2 File//print first 2 characters
Paste stitching text by column
Stitch two text together by column;
Cat File112cat file2colinbookpaste file1 file21 colin2 Book
The default delimiter is a tab character, which can be specified with-D
Paste File1 file2-d ","
1,colin
2,book
Tools for WC statistics lines and characters
Wc-l File//Count rows
Wc-w File//Count of words
Wc-c File//Count characters
Sed Text Replacement tool
First place replacement
Seg ' s/text/replace_text/' file//replace the first matching text of each line
Global substitution
Seg ' s/text/replace_text/g ' file
After the default substitution, output the replaced content, if you need to replace the original file directly, use-I:
Seg-i ' s/text/repalce_text/g ' file
To remove a blank line:
Sed '/^$/d ' file
Variable conversions
The matched string is referenced by the tag &.
echo this is en example | Seg ' s/w+/[&]/g ' ___fckpd___37gt; [This] [IS] [En] [Example]
SUBSTRING matching tag
The first matching parenthesis content is referenced using the tag one
Sed ' S/hello ([0-9])/1/'
Double quotation mark Evaluation
Sed is usually quoted as a single quotation mark, or double quotation marks are used, and double quotation marks are used to evaluate an expression:
Sed ' s/$var/hlloe/'
When using double quotes, we can specify variables in the SED style and in the replacement string;
P=pattenr=replacedecho "line con a patten" | Sed "s/$p/$r/g" ___fckpd___40gt;line con a replaced
Other examples
String insertion character: Converts each line of content in the text (Peksha) to Pek/sha
Sed ' s/^. {3}/&//g ' file
awk Data Flow processing tool
AWK script Structure
awk ' begin{statements} statements2 end{statements} '
Working style
1. Execute the statement block in begin;
2. Read a line from the file or stdin, and then execute the STATEMENTS2, repeating the process until the file is fully read;
3. Execute the end statement block;
Print printing when moving forward
When you use print without parameters, the current line is printed;
Echo-e "Line1nline2" | awk ' Begin{print ' "start"} {print} end{print "END"} '
When print is separated by commas, the parameters are bounded by spaces;
echo | awk ' {var1 = ' v1 '; var2 = "V2"; var3= "V3"; print var1, var2, var3;} ' ___FCKPD___43GT;V1 V2 v3
Use the-stitching method ("" as the stitching character);
echo | awk ' {var1 = ' v1 '; var2 = "V2"; var3= "V3"; print var1 "-" var2 "-" VAR3;} ' ___fckpd___44gt;v1-v2-v3
Special variable: NR NF $ $ $
NR: Indicates the number of records, in the course of the implementation of the forward number;
NF: Indicates the number of fields, the total number of fields that should go forward during the execution;
$: This variable contains the text content of the current line during execution;
$: The text content of the first field;
$: The text content of the second field;
Echo-e "line1 F2 f3n line2 n Line 3" | awk ' {print NR ': ' $ '-' $ '-' $ '
Print the second and third fields of each line:
awk ' {print $, $ $} ' file
Number of rows in the statistics file:
awk ' END {print NR} ' file
Accumulate the first field of each row:
Echo-e "1n 2n 3n 4n" | awk ' begin{num = 0; print ' BEGIN ';} {sum + = $;} END {print "= ="; Print sum} '
Passing external variables
var=1000 Echo | awk ' {print Vara} ' vara= $var # input from Stdinawk ' {print Vara} ' vara= $var file # input from files
To filter the rows that awk handles with a style
awk ' NR < 5 ' #行号小于5
awk ' nr==1,nr==4 {print} ' file #行号等于1和4的打印出来
awk '/linux/' #包含linux文本的行 (can be specified with regular expressions, super powerful)
awk '!/linux/' #不包含linux文本的行
Set delimiter
Use-F to set delimiters (default is a space)
Awk-f: ' {print $NF} '/etc/passwd
Read command output
Using Getline, the output of the external shell command is read into the variable cmdout;
echo | awk ' {"grep root/etc/passwd" | getline cmdout; print Cmdout} '
Using loops in awk
for (i=0;i<10;i++) {print $i;}
For (i in array) {print array[i];}
Print lines in reverse order: (Implementation of the TAC command)
Seq 9| awk ' {LIFO[NR] = $ LNO=NR} end{for (; lno>-1;lno--) {print Lifo[lno];}} ‘
AWK implements head, tail commands
Head
awk ' nr< =10{print} ' filename
Tail
awk ' {buffer[nr%10] = $;} End{for (i=0;i<11;i++) {print buffer[i%10]}} ' filename
Print the specified column
The awk approach implements:
LS-LRT | awk ' {print $6} '
Cut Mode implementation
LS-LRT | Cut-f6
Print the specified text area
Determine line number
Seq 100| awk ' Nr==4,nr==6{print} '
Determine text
Print text that is between Start_pattern and End_pattern;
awk '/start_pattern/,/end_pattern/' filename
EG:SEQ 100 | awk '/13/,/15/' cat/etc/passwd| awk '/mai.*mail/,/news.*news/'
awk common built-in functions
Index (string,search_string): Returns the position search_string appears in the string
Sub (regex,replacement_str,string): Replace the first content of the regular match with the REPLACEMENT_STR;
Match (regex,string): Checks if the regular expression matches the string;
Length (String): Returns the string length
echo | awk ' {"grep root/etc/passwd" | getline cmdout; print length (cmdout)} '
printf, similar to the C language, formats the output
Seq 10 | awk ' {printf '->%4sn ', ' $ '
Iterate over lines, words, and characters in a file
1. Iterate through each line in the file
While Loop method
While read Line;doecho $line;d One < file.txt to sub shell:cat file.txt | (While read line;do echo $line;d one)
Awk method:
Cat file.txt| awk ' {print} '
2. Iterate through each word in a row
For word in $line;d o echo $word;d one
3. Iterate through each of the characters
${string:start_pos:num_of_chars}: Extracts a character from a string; (bash text slices)
${#word}: Returns the length of a variable word
For ((i=0;i< ${#word};i++) Doecho ${word:i:1);d One
The command Encyclopedia of the Linux shell handling text