This article describes the most commonly used tools for using shell to process text under Linux:
Find, grep, Xargs, Sort, uniq, tr, cut, paste, WC, sed, awk;
The examples and parameters provided are the most common and practical;
The principle I use for Shell scripting is to write the command line, and try not to exceed 2 lines;
If you have more complex task requirements, consider python.
Find File Lookup
Find txt and PDF files
Find. \ (-name "*.txt"-o-name "*.pdf" \)-print
Find txt and PDF in a regular way
Find. -regex ". *\ (\.txt|\.pdf\) $"
-iregex: Ignores case-insensitive regular
Negative parameters
Find all non-txt text
Find. ! -name "*.txt"-print
Specify Search Depth
Print out the current directory file (depth 1)
Find. -maxdepth 1-type F
Custom Search
Search By Type:
Find. -type D-print//Lists all catalogs only
-type f File/L Symbolic link
Search by Time:
-atime Access time (units are days, minutes are-amin, similar to the following)
-mtime modification Time (content modified)
-ctime change time (meta data or permission changes)
All files that have been visited in the last 7 days:
Find. -atime 7-type F-print
Search by Size:
W Word k M G
Looking for files larger than 2k
Find. -type f-size +2k
Find by Permission:
Find. -type f-perm 644-print//Find all files with executable permissions
Find by User:
Find. -type f-user weber-print//to find files owned by user Weber
Follow-up action after finding
Delete:
Delete all SWP files in the current directory:
Find. -type f-name "*.SWP"-delete
Execute action (Mighty exec)
Find. -type f-user root-exec chown Weber {} \; Change ownership under current directory to Weber
Note: {} is a special string, for each matching file, {} will be replaced with the corresponding filename;
Eg: Copy all found files to another directory:
Find. -type f-mtime +10-name "*.txt"-exec cp {} old \;
Combine multiple commands
Tips: If you need to execute multiple commands later, you can write multiple commands as a script. Then execute the script-exec the call;
-exec./commands.sh {} \;
The delimiter of the-print
Use ' \ n ' as the delimiter for the file by default;
-print0 uses ' I ' as the delimiter of the file, so that it can search for files containing spaces;
grep Text Search
grep match_patten File//default access matching row
Common parameters
-O outputs only matching lines of text vs-v output only lines that do not match
-C Statistics The number of times the file contains text
Grep-c "text" filename
-N Print matching line numbers
-I ignores case when searching
-L print File name only
Recursive search of text in a multilevel directory (programmers search for code favorites):
grep "Class". -r-n
Match multiple patterns
Grep-e "Class"-E "vitural" file
The name of the grep output with the end character: (z)
grep "Test" file*-lz| xargs-0 RM
Xargs command line parameter conversions
Xargs can convert input data into command-line arguments for specific commands, which can be combined with many commands. such as grep, such as find;
Convert multiline output to single-line output
Cat file.txt| Xargs
\ n is a delimiter between multiple lines of text
Convert a single line to multiline output
Cat Single.txt | Xargs-n 3
-N: Specifies the number of fields to display per line
Xargs parameter Description
-D definition delimiter (the default is a multiple-line delimiter for \ n)
-n Specifies that the output is multiple lines
-I {} Specifies a replacement string that is replaced when the Xargs is extended, when multiple parameters are required for a command to be executed
eg
Cat File.txt | Xargs-i {}./command.sh-p {}-1
-0: Specify the input delimiter
Eg: statistics of the number of program lines
Find source_dir/-type f-name "*.cpp"-print0 |xargs-0
Sort sorting
Field Description:
-N Sort by number vs-d in dictionary order
-R Reverse Order
-K-n Specifies sort by nth column
eg
SORT-NRK 1 Data.txt
SORT-BD Data//Ignore leading white space characters like spaces
Uniq Eliminate duplicate rows
Eliminate duplicate rows
Sort Unsort.txt | Uniq
Count the number of times each row appears in a file
Sort Unsort.txt | Uniq-c
Find duplicate rows
Sort Unsort.txt | Uniq-d
You can specify what duplicates you want to compare in each row:-S start position-W comparison character number
Converting with TR
Common usage
echo 12345 | Tr ' 0-9 ' 9876543210 '//Decrypt conversion, replace corresponding character
Cat text| Tr ' \ t '//tab to space
TR Delete character
Cat File | Tr-d ' 0-9 '//Delete all numbers
-C-Find the complement set
Cat File | Tr-c ' 0-9 '//Get all the numbers in the file
Cat File | Tr-d-C ' 0-9 \//Delete non-digital data
TR Compressed characters
Tr-s the repeated characters appearing in compressed text; most commonly used to compress extra spaces
Cat File | Tr-s '
Character class
Various character classes are available in TR:
Alnum: Letters and Numbers
Alpha: Letter
Digit: Digital
Space: whitespace characters
Lower: lowercase
Upper: Uppercase
Cntrl: Control (non-printable) character
Print: Printable characters
How to use: TR [: Class:] [: Class:]
Eg:tr ' [: Lower:] ' [: Upper:] '
Cut slitting text by column
Columns 2nd and 4th of the interception file:
cut-f2,4 filename
Go to file except all columns in column 3rd:
CUT-F3--complement filename
-d Specifies delimiters:
Cat-f2-d ";" FileName
Cut of the range
N-nth fields to the end
-M 1th field is M.
N-m N to M Fields
Cut units
-B in bytes
-C in character units
-F in fields (using delimiters)
eg
Cut-c1-5 File//print first to 5th character
Cut-c-2 File//print first 2 characters
Paste concatenation of text by column
The two text is spliced together by columns;
Cat File1
1
2
Cat File2
Colin
Book
Paste File1 file2
1 Colin
2 book
The default delimiter is a tab character, and you can use the-D to indicate the delimiter
Paste File1 file2-d ","
1,colin
2,book
WC statistic line and character tools
Wc-l File//statistic line number
Wc-w File//statistics number of words
Wc-c file//Statistics character count
Sed text replacement weapon
First place replacement
Seg ' s/text/replace_text/' file//replace the first matching text of each line
Global substitution
Seg ' s/text/replace_text/g ' file
After the default replacement, the output after the replacement of the content, if you need to replace the original file directly, use-I:
Seg-i ' s/text/repalce_text/g ' file
To remove a blank line:
Sed '/^$/d ' file
Variable conversion
The matched string is referenced by Tag &.
echo this is en example | Seg ' s/\w+/[&]/g '
$>[this] [is] [en] [Example]
SUBSTRING matching tag
The first matching bracket content uses the tag \ One reference
Sed ' s/hello\ ([0-9]\)/\1/'
Double quote Value
Sed is usually quoted in single quotes, and double quotes are used, and double quotes are evaluated over expressions:
Sed ' s/$var/hlloe/'
When using double quotes, we can specify variables in the SED style and in the replacement string;
eg
P=patten
r=replaced
echo "line con a patten" | Sed "s/$p/$r/g"
$>line Con a replaced
Other examples
String insertion character: Converts each line of text (Peksha) to Pek/sha
Sed ' s/^.\{3\}/&\//g ' file
awk Data Flow processing tools
AWK script Structure
awk ' begin{statements} statements2 end{statements} '
Working mode
1. Execute the statement block in begin;
2. Read a line from a file or stdin and execute statements2, repeating the process until all the files have been read;
3. Execute END statement block;
Print printing when moving forward
When you use print with no parameters, the current line is printed;
Echo-e "Line1\nline2" | awk ' Begin{print ' start} {print} end{print ' End '} '
When print is separated by commas, the parameters are bounded by a space;
echo | awk ' {var1 = "V1"; var2 = "V2"; var3= "V3"; \
Print var1, var2, VAR3; }'
$>V1 V2 v3
Use-a concatenation character ("" as a concatenation character);
echo | awk ' {var1 = "V1"; var2 = "V2"; var3= "V3"; \
Print Var1 "-" var2 "-" VAR3; }'
$>v1-v2-v3
Special variables: NR NF $ $
NR: Indicates the number of records, in the course of the implementation of the should forward number;
NF: Represents the number of fields, and the total number of fields that should go in the execution process;
$: This variable contains the textual content of the current line during execution;
$: The textual content of the first field;
$: The text content of the second field;
Echo-e "line1 f2 f3\n line2 \ n Line 3" | awk ' {print NR ': ' $ '-' $ '-' $} '
Print the second and third fields for each row:
awk ' {print $, $} ' file
Number of rows in the statistics file:
awk ' End {print NR} ' file
Add the first field of each row:
Echo-e "1\n 2\n 3\n 4\n" | awk ' begin{num = 0;
print "Begin"; {sum = $;} End {print = =; Print Sum} '
Passing external variables
var=1000
echo | awk ' {print Vara} ' vara= $var # input from stdin
awk ' {print Vara} ' vara= $var file # input from files
To filter the rows that awk handles by using a style
awk ' NR < 5 ' #行号小于5
awk ' nr==1,nr==4 {print} ' file #行号等于1和4的打印出来
awk '/linux/' #包含linux文本的行 (can be specified with regular expressions, super powerful)
awk '!/linux/' #不包含linux文本的行
Set delimiter
Use-F to set the delimiter (default is a space)
Awk-f: ' {print $NF} '/etc/passwd
Read command output
Using Getline, the output of the external shell command is read into the variable cmdout;
echo | awk ' {' grep root/etc/passwd ' | getline cmdout print Cmdout} '
Using loops in awk
for (i=0;i<10;i++) {print $i;}
For (i in array) {print array[i];}
eg
Print lines in reverse order: (Implementation of TAC commands)
Seq 9| \
awk ' {LIFO[NR] = $ LNO=NR} \
end{for (; lno>-1;lno--) {print Lifo[lno];}
} '
AWK implements head, tail commands
Head
awk ' Nr<=10{print} ' filename
Tail
awk ' {buffer[nr%10] = $;} End{for (i=0;i<11;i++) {\
Print buffer[i%10]} ' filename
Print the specified column
Awk method Implementation:
LS-LRT | awk ' {print $} '
Cut Mode implementation
LS-LRT | Cut-f6
Print the specified text area
Determine line number
Seq 100| awk ' Nr==4,nr==6{print} '
Determine text
Print the text between Start_pattern and End_pattern;
awk '/start_pattern/,/end_pattern/' filename
eg
SEQ 100 | awk '/13/,/15/'
cat/etc/passwd| awk '/mai.*mail/,/news.*news/'
Awk commonly used built-in functions
Index (string,search_string): Returns where Search_string appears in string
Sub (regex,replacement_str,string): Replaces the first content of a regular match with a replacement_str;
Match (regex,string): Checks whether the regular expression can match a string;
Length (String): returns string lengths
echo | awk ' {' grep root/etc/passwd ' | getline cmdout print Length (cmdout)} '
printf, which is similar to printf in C, formats the output
eg
Seq 10 | awk ' {printf '->%4s\n ', $} '
Lines, words, and characters in an iteration file
1. Each row in the iteration file
While Loop method
while read line;
Todo
Echo $line;
Done < file.txt
Change to a child shell:
Cat File.txt | (While read line;do echo $line;d one)
Awk method:
Cat file.txt| awk ' {print} '
2. Each word in the iteration line
for word in $line;
Todo
Echo $word;
Done
3. Iterate each character
${string:start_pos:num_of_chars}: Extract one character from string; (bash text slice)
${#word}: Returns the length of a variable word
For ((i=0;i<${#word};i++))
Todo
echo ${word:i:1);
Done
This article is a reading note for the Linux shell script Introduction, the main content and examples from