3. Regular expression Definitions
The regular expression (regular expression) describes a pattern of string matching that can be used to check whether a string contains a seed string, replaces a matched substring, or extracts a substring that matches a certain condition from a string.
When a directory is listed, the *.txt in dir *.txt or LS *.txt is not a regular expression, because the meaning of * is different from the regular type.
Regular expressions are text patterns that consist of ordinary characters, such as characters A through z, and special characters (called metacharacters). A regular expression, as a template, matches a character pattern to the string you are searching for.
3.1 Ordinary characters
Consists of all printed and nonprinting characters that are not explicitly specified as metacharacters. This includes all uppercase and lowercase alphabetic characters, all numbers, all punctuation marks, and some symbols.
3.2 Non-printable characters
| character |
meaning |
| Cx |
Matches the control character indicated by X. For example, CM matches a control-m or carriage return character. The value of x must be one of a-Z or a-Z. Otherwise, c is treated as a literal C character. |
| F |
Matches a page break. Equivalent to x0c and CL. |
|
Matches a line break. Equivalent to x0a and CJ. |
|
Matches a carriage return character. Equivalent to x0d and CM. |
| S |
Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [FV]. |
| S |
Matches any non-whitespace character. equivalent to [^ FV]. |
|
Matches a tab character. Equivalent to x09 and CI. |
| V |
Matches a vertical tab. Equivalent to x0b and CK. |
3.3 Special characters
The so-called special character, is some special meaning of the character, such as the above said "*.txt" in the *, simply means that the meaning of any string. If you are looking for a file with * in the file name, you need to escape the *, which is preceded by one. LS *.txt. The regular expression has the following special characters.
| Special Characters |
Description |
| $ |
Matches the end position of the input string. If the Multiline property of the RegExp object is set, then $ also matches or. To match the $ character itself, use $. |
| ( ) |
Marks the start and end positions of a subexpression. Sub-expressions can be obtained for later use. To match these characters, use (and). |
| * |
Matches the preceding subexpression 0 or more times. To match the * character, use *. |
| + |
Matches the preceding subexpression one or more times. to match the + character, use +. |
| . |
Matches any single character other than the newline character. to match., please use. |
| [ |
Marks the beginning of a bracket expression. To match [, use [. |
| ? |
Matches the preceding subexpression 0 or one time, or indicates a non-greedy qualifier. to match? character, please use?. |
|
Marks the next character as either a special character, a literal character, a backward reference, or an octal escape. For example, n matches the character N. Match line breaks. The sequence \ matches "", and (then matches "(". |
| ^ |
Matches the starting position of the input string, unless used in a square bracket expression, which indicates that the character set is not accepted at this time. To match the ^ character itself, please use ^. |
| { |
The start of the tag qualifier expression. To match {, use {. |
| | |
Indicates a choice between the two items. to match |, please use |. |
The method of constructing a regular expression is the same as the method for creating a mathematical expression. That is, using a variety of meta-characters and operators to combine small expressions together to create larger expressions. A component of a regular expression can be a single character, a character set, a range of characters, a selection between characters, or any combination of all of these components.
3.4 Qualifiers
Qualifiers are used to specify how many times a given component of a regular expression must appear to satisfy a match. There are 6 types of * or + or? or {n} or {n,} or {n,m}.
The *, +, and? Qualifiers are greedy because they match as many words as possible, but only after they are added with one? You can implement a non-greedy or minimal match.
The qualifiers for a regular expression are:
| character |
Description |
| * |
Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * Equivalent to {0,}. |
| + |
Matches the preceding subexpression one or more times. For example, zo+ can match "Zo" and "Zoo", but not "Z". + equivalent to {1,}. |
| ? |
Matches the preceding subexpression 0 or one time. For example, "Do (es)?" can match "do" in "do" or "does".? Equivalent to {0,1}. |
| N |
N is a non-negative integer. Matches the determined n times. For example, o{2} cannot match o in "Bob", but can match two o in "food". |
| {N,} |
N is a non-negative integer. Match at least n times. For example, o{2,} cannot match o in "Bob", but can match all o in "Foooood". O{1,} is equivalent to o+. O{0,} is equivalent to o*. |
| {N,m} |
Both M and n are non-negative integers, where n <= m. Matches at least n times and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". o{0,1} is equivalent to O?. Note that there can be no spaces between a comma and two numbers. |
3.5 Locator
Used to describe the boundaries of a string or word, ^ and $ refer to the beginning and end of a string, describing the front or back of a word, and b representing a non-word boundary. You cannot use qualifiers on locators.
3.6 Selection
Enclose all selections in parentheses, separating the adjacent selections by |. But with parentheses there is a side effect that the associated match is cached and available at this time?: Put the first option to eliminate this side effect.
Where?: one of the non-capturing elements, and two non-capturing elements are? = and?!, these two also have more meanings, the former is forward pre-check, in any beginning to match the position of the regular expression pattern within the parentheses to match the search string, the latter is a negative pre-check, Matches the search string at any start where the regular expression pattern does not match.
3.7 Back to reference
Adding parentheses around a regular expression pattern or part of a pattern causes the related match to be stored in a temporary buffer, and each captured sub-match is stored according to what is encountered in the regular expression pattern from left to right. The buffer number for the storage sub-match starts at 1 and continues numbering up to 99 sub-expressions. Access can be used for each buffer, where n is a single or two-bit decimal number that identifies a particular buffer.
Can I use non-capturing metacharacters?:,? =, or?! to ignore the save for the related match.
http://www.bkjia.com/PHPjc/532539.html www.bkjia.com true http://www.bkjia.com/PHPjc/532539.html techarticle 3. Regular expression definition regular expressions (regular expression) describes a pattern of string matching that can be used to check if a string contains a seed string, to replace a matched substring ...