Guide |
This article will cover the most common tools used by Linux to process text using the shell: Find, grep, Xargs, Sort, uniq, tr, cut, paste, WC, sed, awk, and the examples and parameters provided are the most common and practical. The principle I use with shell scripts is to write command lines, try not to exceed 2 lines, or consider Python if you have more complex task requirements. |
Find File Lookup
Find txt and PDF files
Find txt and PDF files
Find. \ (-name "*.txt"-o-name "*.pdf" \)-print
Regular way to find. txt and PDF
Find. -iregex #-iregex: Ignore case of regular
Negative parameters to find all non-txt text
Find. ! -name "*.txt"-print
Specify search Depth
Print out the current directory file (depth is 1)
Find. -maxdepth 1-type F
Custom Search
Search By Type:
-type f File/L Symbolic link
List all directories only
Search by Time:
-atime access Time (in days, minutes units is-amin, similar to the following)
-mtime modification Time (content modified)
-ctime Change Time (metadata or permission changes)
All files that have been visited in the last 7 days:
Find. -atime 7-type F-print
Search by Size:
W-word k M G, looking for files larger than 2k
Find. -type f-size +2k
Search by permissions:
Find all files that have executable permissions
Search by User:
Find the files that the user Weber owns
Follow-up action found after
Delete all SWP files in the current directory:
Find. -type f-name "*.SWP"-delete
Perform actions (powerful exec)
Change ownership in the current directory to Weber
Note: {} is a special string, for each matching file, {} will be replaced with the corresponding file name;
Eg: Copy all the found files to another directory:
Find. -type f-mtime +10-name "*.txt"-exec cp {} old \;
Combine multiple commands
Tips: If you need to execute multiple commands later, you can write multiple commands into a script, and then execute the script-exec the call.
-exec./commands.sh {} \;
-print delimiter, using ' \ n ' as the delimiter of the file by default;
-print will add a carriage return newline character after each output, and-print0 will not, use ' \ n ' as the file delimiter, so you can search for files containing spaces;
In the current directory, the files are sorted from large to small (including hidden files), and the file name is not "." :
Find. -maxdepth 1! -name "."-print0 | xargs-0 Du-b | Sort-nr | head-10 | nl
grep Text Search
grep match_patten File//default access matching line
Common Parameters-O output only matched lines of text vs-v only output lines with no matching text
The number of times the text is contained in the-C statistic file
Grep-c "text" filename
-N Prints matching line numbers
-I ignore case when searching
-L print File name only
Recursive search of text in a multilevel directory (the programmer searches for code favorites):
grep "Class". -r-n
Match multiple patterns
Grep-e "Class"-E "vitural" file
grep outputs the file name of the trailing character as: (-Z)
grep "Test" file*-lz| xargs-0 RM
xargs command-line argument conversions
Xargs can convert input data into command-line arguments for a particular command, so that it can be combined with a number of commands. Like grep, like find;
Convert multi-line output to single-line output
Cat file.txt| Xargs
\ n is a delimiter between multiline text to convert a single line to multiple lines of output
-N: Specify the number of fields to display per row
Xargs parameter Description
-D defines delimiters (the delimiter is \ n for multiple lines by default)
-n Specifies that the output is multiple lines
-I {} Specifies the replacement string, which will be replaced when the xargs extension is used, when multiple parameters are required for the command to be executed eg:
Cat File.txt | Xargs-i {}./command.sh-p {}-1
-0: Specify as input delimiter eg: count the number of program lines
Find source_dir/-type f-name "*.cpp"-print0 |xargs-0 wc-l
Sort Sorts
Field Description:
-N Sort by number VS
-D Sort by dictionary order
-R Reverse Order
-k n Specifies sorting by nth column
eg
Ignore leading whitespace characters such as spaces
Uniq duplicate row de-duplication row
Sort Unsort.txt | Uniq
Count the number of times each line appears in a file
Sort Unsort.txt | Uniq-c
Find duplicate rows
Sort Unsort.txt | Uniq-d
You can specify the duplicates that need to be compared in each row:-S start position-W comparison character number
Convert with TR
General usage
//Add decryption conversion, replace the corresponding character //tab to space
TR Delete character
Cat File | Tr-d ' 0-9 '//Delete all numbers
-C Seeking complement set
Cat File | Tr-c ' 0-9 '//Get all the numbers in the file //Delete non-digital data
TR compression characters
Tr-s repeating characters that appear in compressed text; most commonly used to compress extra spaces
Cat File | Tr-s "
character class
Various character classes are available in TR:
Alnum: Letters and Numbers
Alpha: Letters
Digit: Digital
Space: white space characters
Lower: lowercase
Upper: Uppercase
Cntrl: Control (non-printable)
Character print: printable character
How to use: TR [: Class:] [: Class:]
Eg:tr ' [: Lower:] ' [: Upper:] '
cut split text by column
Intercept the 2nd and 4th columns of the file:
cut-f2,4 filename
Go to all columns except column 3rd of the file:
CUT-F3--complement filename
-D Specify delimiter:
Cat-f2-d ";" FileName
Range of cut and take
N-nth field to end
-M 1th field mn-m N to M Fields
Cut-to-take units
-B in bytes
-C in Characters-f in fields (using delimiters)
//Print a first to 5th character //Print the first 2 characters
Paste stitching text by column
Stitch two text together by column;
Cat File112cat file2colinbookpaste file1 file21 colin2 Book
The default delimiter is a tab character, which can be specified with-D
Paste File1 file2-d ","
1,colin
2,book
Tools for WC statistics lines and characters
//Count rows
//Statistics of words
Statistics number of characters
sed text Replacement tool
First place replacement
Replace the first matching text of each line
Global substitution
Seg ' s/text/replace_text/g ' file
After the default substitution, output the replaced content, if you need to replace the original file directly, use-I:
Seg-i ' s/text/repalce_text/g ' file
To remove a blank line:
Sed '/^$/d ' file
Variable conversions
The matched string is referenced by the tag &.
echo this is en example | Seg ' s/\w+/[&]/g ' ___fckpd___37gt; [This] [IS] [En] [Example]
SUBSTRING matching tag
The first matching parenthesis content is referenced using the tag \ One
Sed ' s/hello\ ([0-9]\)/\1/'
Double quotation mark Evaluation
Sed is usually quoted as a single quotation mark, or double quotation marks are used, and double quotation marks are used to evaluate an expression:
Sed ' s/$var/hlloe/'
When using double quotes, we can specify variables in the SED style and in the replacement string;
P=pattenr=replacedecho "line con a patten" | Sed "s/$p/$r/g" ___fckpd___40gt;line con a replaced
Other examples
String insertion character: Converts each line of content in the text (Peksha) to Pek/sha
Sed ' s/^.\{3\}/&\//g ' file
awk Data Flow processing tool
AWK script Structure
awk ' begin{statements} statements2 end{statements} '
Working style
1. Execute the statement block in begin;
2. Read a line from the file or stdin, and then execute statements2, repeating the process until the file is fully read; 3. Execute the end statement block;
Print prints the current line when print is currently exercised with no parameters;
Echo-e "Line1\nline2" | awk ' Begin{print ' "start"} {print} end{print "END"} '
When print is separated by commas, the parameters are bounded by spaces;
echo | awk ' {var1 = ' v1 '; var2 = "V2"; var3= "V3"; \print var1, Var2, var3;} ' ___FCKPD___43GT;V1 V2 v3
Use the-stitching method ("" as the stitching character);
echo | awk ' {var1 = ' v1 '; var2 = "V2"; var3= "V3"; \print var1 "-" var2 "-" VAR3;} ' ___fckpd___44gt;v1-v2-v3
Special variables:
NR NF $ $2NR: Indicates the number of records, in the course of the implementation of the forward number;
NF: Indicates the number of fields, the total number of fields that should go forward during the execution;
$: This variable contains the text content of the current line during execution;
$: The text content of the first field;
$: The text content of the second field;
Echo-e "line1 f2 f3\n line2 \ Line 3" | awk ' {print NR ': ' $ '-' $ '-' $ '
Print the second and third fields of each line:
awk ' {print $, $ $} ' file
Number of rows in the statistics file:
awk ' END {print NR} ' file
Accumulate the first field of each row:
Echo-e "1\n 2\n 3\n 4\n" | awk ' begin{num = 0; print "Begin";} {sum + = $;} END {print "= ="; Print sum} '
Passing external variables
var=1000 Echo | awk ' {print Vara} ' vara= $var # input from Stdinawk ' {print Vara} ' vara= $var file # input from files
Filtering the lines that awk handles with Styles awk ' NR < 5 ' #行号小于5
awk ' nr==1,nr==4 {print} ' file #行号等于1和4的打印出来
awk '/linux/' #包含linux文本的行 (can be specified with regular expressions, super powerful)
awk '!/linux/' #不包含linux文本的行设置定界符使用-F to set delimiters (default to spaces)
Awk-f: ' {print $NF} '/etc/passwd
The read command output reads the output of the external shell command into the variable cmdout using Getline;
echo | awk ' {"grep root/etc/passwd" | getline cmdout; print Cmdout} '
Using loops in awk
for (i=0;i<10;i++) {print $i;} For (i in array) {print array[i];}
Print lines in reverse order: (Implementation of the TAC command)
Seq 9| \awk ' {LIFO[NR] = $; Lno=nr} \end{for (; lno>-1;lno--) {print Lifo[lno];}} ‘
AWK implements head, tail commands
Head: awk ' nr< =10{print} ' filename
Tail: awk ' {buffer[nr%10] = $;} End{for (i=0;i<11;i++) {\ print buffer[i%10]}} ' filename
Print the specified column
The awk approach implements:
LS-LRT | awk ' {print $6} '
Cut Mode implementation
LS-LRT | Cut-f6
Print the specified text area
Determine line number
Seq 100| awk ' Nr==4,nr==6{print} '
Determine text
Print text that is between Start_pattern and End_pattern;
awk '/start_pattern/,/end_pattern/' filename
EG:SEQ 100 | awk '/13/,/15/' cat/etc/passwd| awk '/mai.*mail/,/news.*news/'
awk common built-in functions
Index (string,search_string): Returns the position search_string appears in string sub (regex,replacement_str,string): Replace the first content of the regular match with Replacement_str;match (regex,string): Checks whether the regular expression matches the string;
Length (String): Returns the string length
echo | awk ' {"grep root/etc/passwd" | getline cmdout; print length (cmdout)} '
printf, similar to the C language, formats the output
Seq 10 | awk ' {printf '->%4s\n ', ' $ '
Iterate over lines, words, and characters in a file
1. iteration file for each line while loop method
While read Line;doecho $line;d One < file.txt to sub shell:cat file.txt | (While read line;do echo $line;d one)
Awk method:
Cat file.txt| awk ' {print} '
2. Iterate through each word in a row
For word in $line;d o echo $word;d one
3. Iterate through each of the characters
${string:start_pos:num_of_chars}: Extracts a character from a string; (bash text slices)
${#word}: Returns the length of a variable word
For ((i=0;i< ${#word};i++) Doecho ${word:i:1);d One
Free to provide the latest Linux technology tutorials Books, for open-source technology enthusiasts to do more and better: http://www.linuxprobe.com/
Linux shell handles text most commonly used tools for large inventory