3. A regular expression defines a regular expression (regularexpression) to describe a string matching pattern, it can be used to check whether a string contains a seed string, replace matched substrings, or retrieve substrings that meet certain conditions from a string. In the column Directory, *. txt in dir *. txt or ls *. txt is SyntaxHighli.
3. Regular expression definition
A regular expression (regular expression) describes a string matching pattern, it can be used to check whether a string contains a seed string, replace matched substrings, or retrieve substrings that meet certain conditions from a string.
In the column Directory, *. txt in dir *. txt or ls *. txt is not a regular expression, because here * is different from the regular expression.
A regular expression is a text mode consisting of common characters (such as characters a to z) and special characters (such as metacharacters. A regular expression is used as a template to match a character pattern with the searched string.
3.1 Common characters
It consists of all the print and non-print characters that are not explicitly specified as metacharacters. This includes all uppercase and lowercase letter characters, all numbers, all punctuation marks, and some symbols.
3.2 non-printable characters
Character |
Description |
Cx |
Match the control characters specified by x. For example, cM matches a Control-M or carriage return character. The value of x must be either a A-Z or a-z. Otherwise, c is treated as a literal c character. |
F |
Match a form feed. It is equivalent to x0c and cL. |
|
Match a linefeed. It is equivalent to x0a and cJ. |
|
Match a carriage return. It is equivalent to x0d and cM. |
S |
Matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [fv]. |
S |
Match any non-blank characters. It is equivalent to [^ fv]. |
|
Match a tab. It is equivalent to x09 and cI. |
V |
Match a vertical tab. It is equivalent to x0b and cK. |
Special character 3.3
Special characters are characters with special meanings, such as *. txt. in simple words, they represent the meaning of any string. If you want to find a file with * in the file name, you need to escape *, that is, add one before it. Ls *. txt. Regular expressions have the following special characters.
Special characters |
Description |
$ |
Matches the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches or. To match the $ character, use $. |
() |
Mark the start and end positions of a subexpression. Subexpressions can be obtained for future use. To match these characters, use (and ). |
* |
Matches the previous subexpression zero or multiple times. To match * characters, use *. |
+ |
Match the previous subexpression once or multiple times. To match + characters, use +. |
. |
Match any single character except line breaks. To match., use. |
[ |
Mark the start of a bracket expression. To match [, use [. |
? |
Match the previous subexpression zero or once, or specify a non-greedy qualifier. To match? Character, please use ?. |
|
Mark the next character as or a special character, or a literal character, or backward reference, or an octal escape character. For example, n matches the character n. Match the line break. The sequence \ matches "", and (matches "(". |
^ |
Matches the start position of the input string. unless used in the square brackets expression, this character set is not accepted. To match the ^ character itself, use ^. |
{ |
Mark the start of a qualifier expression. To match {, use {. |
| |
Specifies a choice between two items. To match |, use |. |
The method for constructing a regular expression is the same as that for creating a mathematical expression. That is, a larger expression is created by combining a small expression with a variety of metacharacters and operators. The regular expression component can be a single character, character set combination, character range, choice between characters, or any combination of all these components.
3.4 qualifier
A qualifier is used to specify how many times a given component of a regular expression must appear to match. There are * or + OR? There are 6 types: {n}, {n,}, or {n, m.
*, +, And? The delimiters are greedy because they will match as many words as possible, and only add one? You can achieve non-greedy or minimum matching.
Regular expressions have the following delimiters:
Character |
Description |
* |
Matches the previous subexpression zero or multiple times. For example, zo * can match "z" and "zoo ". * Is equivalent to {0 ,}. |
+ |
Match the previous subexpression once or multiple times. For example, zo + can match "zo" and "zoo", but cannot match "z ". + Is equivalent to {1 ,}. |
? |
Match the previous subexpression zero or once. For example, "do (es )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }. |
{N} |
N is a non-negative integer. Match n times. For example, o {2} cannot match the o in "Bob", but can match the two o in "food. |
{N ,} |
N is a non-negative integer. Match at least n times. For example, o {2,} cannot match o in "Bob", but can match all o in "foooood. O {1,} is equivalent to o +. O {0,} is equivalent to o *. |
{N, m} |
Both m and n are non-negative integers, where n <= m. Match at least n times and at most m times. For example, "o {1, 3}" matches the first three o in "fooooood. O {0, 1} is equivalent to o ?. Note that there must be no space between a comma and two numbers. |
3.5 positioning operator
It is used to describe the boundary of a string or word. ^ and $ respectively indicate the start and end of a string. B indicates the boundary of a non-word. The delimiters cannot be used.
3.6 Select
Enclose all Selection items with parentheses, and separate adjacent selection items with |. But there is a side effect when parentheses are used, that is, the related matching will be cached. is it available now? : Put the first option to eliminate this side effect.
Where? : One non-capturing element, and two non-capturing elements? = And ?!, The two have more meanings. The former is forward pre-query, and matches the search string at any position starting to match the regular expression pattern in parentheses. The latter is negative pre-query, match the search string at any position that does not match the regular expression pattern.
3.7 back reference
Adding parentheses on both sides of a regular expression or partial expression will cause the matching to be stored in a temporary buffer, each captured sub-match is stored in the content from left to right in the regular expression mode. The buffer number that stores the sub-match starts from 1 and ranges from consecutive numbers to a maximum of 99 subexpressions. Each buffer zone can be accessed. n is one or two decimal digits that identify a specific buffer zone.
Can I use non-captured metacharacters? :,? =, Or ?! To ignore the save of the matching.