A gentleman is open-minded and optimistic; A small person is narrow-minded and pessimistic.
"The gentleman is magnanimous, the villain long Obsession"
Reference: Bird Brother's Linux private cuisine Basic Study (third edition)
The basis of regular expression
1. What is a regular expression
The regular expression (Regular expression, RE) is a logical formula for string manipulation, which is to make a "rule string" that is used to express a filter logic for a string by using some predefined characters and combinations of those specific characters.
2, why to learn the regular expression
Thanks to its powerful string processing capabilities, many tools and programming languages now support regular expressions, and mastering it can help us get things done more efficiently.
3. The influence of the family language on the regular expression
Because different language families are not encoded in the same order, the matching result of the regular expression may not be the same, so the current language family should be confirmed before use, or the following special symbols can be used to avoid the problems caused by different language families:
Special symbols |
Representative meaning |
[: Alnum:] |
English uppercase and lowercase letters and numbers, i.e. 0-9,a-z,a-z |
[: Alpha:] |
English uppercase and lowercase letters, a-z,a-z |
[: Blank:] |
Spaces and Tabs |
[: Cntrl:] |
Control keys, Cr,lf,tab,del, etc. |
[:d Igit:] |
Number, 0-9 |
[: Graph:] |
All keys except spaces and tabs |
[: Lower:] |
Lowercase letters, A-Z |
[:p rint:] |
Characters that can be printed out |
[:p UNCT:] |
Punctuation, that is, ",",?,!,;,:,#,$ |
[: Upper:] |
Capital letters, A-Z |
[: Space:] |
Any character that produces whitespace, including the SPACEBAR [Tab] CR, etc. |
[: Xdigit:] |
hexadecimal numeric type, including 0-9,a-f,a-f |
4. Basic Regular Expression characters
Re character |
Meaning and example |
^word |
Meaning: Start with Word Example: Finding the line that begins with ' # ' Grep-n ' ^# ' filename |
word$ |
Meaning: ending with Word Example: Find to '! ' The line at the end Grep-n '!$ ' filename |
. |
Meaning: A character that must have an arbitrary character Example: Finding the line that contains the character ' a ' and that has at least one character after ' a ' Grep-n ' A. ' filename |
\ |
Meaning: Escape character, remove special meaning of special symbol Example: Finding the line that contains single quotation marks GREP-N \ ' filename |
* |
Meaning: Repeat 0 to infinitely multiple of the previous character Example: Find the line containing (es) (Ess) (ESSS) strings Grep-n ' ess* ' filename |
[List] |
Meaning: Any one of the characters in the list can be matched Example: Finding rows that contain ' get ' or ' got ' Grep-n ' g[eo]t ' filename |
[N1-N2] |
Meaning: Any character within the N1-n2 range can be matched Example: Finding the line that contains numbers Grep-n ' [0-9] ' filename |
[^list] |
Meaning: Any other character that is not included in the list Example: Finding a row that does not contain uppercase letters Grep-n ' [^a-z] ' filename |
\{n,m\} |
Meaning: A continuous N to M of the previous character, if \{n\} is a continuous n the previous character, \{n,\} is a continuous n to infinity number of previous characters Example: Finding the line that contains ' goog ' or ' Gooog ' Grep-n ' go\{2,3\}g ' filename |
5. Extended Regular Expressions
Extended regular expressions can combine multiple lookups into lookups by combining functions, using GREP-E or Egrep to support extended regular expressions
6. Extended regular expression characters and examples
Re character |
Meaning and example |
+ |
Meaning: Repeat one or more of the previous re characters Example: Finding a string such as (God) (good) (Goood) Egrep-n ' go+d ' filename |
? |
Meaning: 0 or one of the previous re characters Example: Find (GD) (God) these two strings Egrep-n ' go?d ' filename |
| |
Meaning: Use the ' or ' approach to find strings that meet different conditions Example: Finding a string containing ' gd ' or ' good ' or ' dog ' Egrep-n ' gd|good|dog ' filename |
() |
Meaning: Find the ' group ' string Example: Finding a string containing (glad) or (good) Egrep-n ' G (la|oo) d ' filename |
()+ |
Significance: discrimination of multiple repeating groups Example: Finding ' A ' begins with ' C ' ending, and ' a ' and ' C ' contain more than one ' xyz ' string Egrep-n ' A (xyz) +c ' filename |
Ii. tools for using regular expressions
1. SED tools
SED is a tool that replaces, deletes, adds, selects specific rows, and also supports pipelines in a behavioral unit .
1Usage:sed[-NEFR] [Action]2 parameter Description:3-N: Use Quiet mode, plus-n causes only the line that is processed by SED to be displayed.4-e: Action edit of SED directly in command line mode5-F: Write the action of SED directly in a file,-f filename can perform the SED action within filename6-The r:sed action supports the syntax of an extended regular expression (the default is the underlying regular expression syntax)7-I: Directly modify the contents of the read file, not by the screen output
8Action Description: [N1[,N2]]function9 N1, N2: Generally represents the number of rows selected for an action, and does not necessarily existTen Funciton has the following parameters: One A: New, a can be followed by a string, these strings will appear on a new line (the next line in the current row) A c: Replace, C can be followed by strings, these strings can replace the lines between N1, N2 - d: Delete, usually no parameters after - I: Insert, followed by string, these strings will appear on a new line (the previous line of the current row) theP: Print, print a selected data, usually psed-N Run together -S: Replace, can be replaced directly, usually with regular expressions
Example:
1① in the Act unit of the new/Delete:2$sed '2,5d'FileName #删除第2 ~5 Rows3$sed '2a Hello World'filename #在第二行后面新增一行, "Hello World"4 ② with the substitution and display of the behavior unit:5$sed '2,5c No 2-5 number'The filename #将第2-to-line is replaced with a new row with the content ' No '2-5Number '6$sed-N'2,5p'FileName #打印第2-line, if not added-n will output all the data again7③ Find and replace part of the data:sed 's/old/new/g'FileName #将' Old'Replaced by'New'8$sed 's/#.*$//g'#删除注释9$sed '/^$/d'#删除空行Ten④ directly modify the contents of the file: plus the-i parameter can directly modify the contents of the file (using this function can be in the code file to comment out multiple lines of code)
Recommended Reading : A concise tutorial on SED
2. awk Tools
Awk is an excellent text processing tool that supports regular expressions and pipelines, and is often used to divide a line into several ' fields ' to handle
1 usage:awk' condition type 1{action 1} condition type 2{action 2} ... ' filename #字段分隔符默认为空格或 [tab] key
Processing Flow:
① reads the first line and fills the first row of data into the $0,$1,... and other variables;
② according to the restriction of the condition type, decide whether to carry on the subsequent action;
③ to finish all the action and condition types;
④ repeat ①~③, know the processing of all data;
1 # field variable name: $ A represents an entire row, $ $ for the first field, and so on 2 # built-in variables: NF (total number of fields per row), NR (the line that is currently being processed), FS (current delimiter) 3 #逻辑运算符:>, <, >=, <=, = =,! =4 # keyword: BEGIN, END5 # Syntax: begin{ This puts the statement before the execution}7 # end{This contains the statement to execute after all the rows are processed}9
Example:
Suppose you have a file named Pay.txt, which reads as follows:
1 # Name 1st 2nd 3th 2 # Vbird 23000 24000 250003 # Dmtsai 21000 20000 230004 # Bird2 43000 42000 41000
If I want to calculate the total for each person and print it in a certain format, then you can:
1 Cat awk ' nr==1{printf "%10s%10s%10s%10s%10s\n", $, $, $ $, $4, "total"} 2 > nr>=2{total = $2 + $3 + $43"%10s%10d %10d%10d%10.2f\n", $1, $2, $3, $4, total}'
The use of printf in the same way as in C, the output is similar to the result:
1 # Name 1st 2nd 3th Total 2 # vbird 23000 24000 2500072000.00321000 200002300064000.004 # Bird2 4300042000
41000126000.00
Recommended Reading : A concise tutorial for AWK
Third, other tools
1. File comparison tool: diff,cmp,patch
1#diff: Typically used for different old and new versions of the same file, compared in behavioral units2#用法:diff[-bbi] from-fileToFile #比较from-file and to-file and output a different place3#CMP: Compare in ' bytes '4#用法:CMP[-S] file1 file2 #按字节比较file1和file2, and outputs the first difference found, plus-s outputs all the different points5#Patch: Withdiffco-authoring patch files and upgrade files6 #用法:7$diff-naur Oldfile newfile >Patch_file #通过比较新旧文件制作补丁文件, Patch_file usually with a. Patch suffix8$Patch-PN <Patch_file #更新旧文件, n means canceling several levels of directory9$Patch-R-PN < Patch_file #还原为原来的文件
2, File Printing preparation: PR
1 PR : Set the title and page number for the file you want to print 2 #用法:PR filename #为filename的输出加上文件时间, file name, and page number
Summary : Regular expression is really a very useful tool, by simple rules can be paired with complex strings, but with the high efficiency is often difficult, need to continue to use in order to gradually grasp
Bash Shell Learning-Regular Expression Basics (note)