Regular expression (Regular Expression)

Source: Internet
Author: User
Tags character set control characters expression integer printable characters reference regular expression
express| Regular Regular expression
is a literal pattern consisting of ordinary characters, such as characters A through z, and special characters (called metacharacters). A regular expression is used as a template to match a character pattern with the string being searched for.
You can construct a regular expression by putting together various components of an expression pattern between a pair of delimiters, namely/expression/


Ordinary characters

Consists of all print and nonprinting characters that are not explicitly specified as metacharacters. This includes all uppercase and lowercase alphabetic characters, all numbers, all punctuation marks, and some symbols.


Non-printable characters

Character meaning
\CX matches the control characters indicated by X. For example, \cm matches a control-m or carriage return character. The value of x must be one-a-Z or a-Z. Otherwise, c is treated as a literal ' C ' character.
\f matches a page feed character. Equivalent to \x0c and \CL.
\ n matches a newline character. Equivalent to \x0a and \CJ.
\ r matches a carriage return character. Equivalent to \x0d and \cm.
\s matches any white space character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s matches any non-white-space character. equivalent to [^ \f\n\r\t\v].
\ t matches a tab character. Equivalent to \x09 and \ci.
\v matches a vertical tab. Equivalent to \x0b and \ck.



Special characters

The so-called special characters, that is, some special meaning of the characters, such as the above "*.txt" in the *, simply to say that any string meaning. If you want to find files with * in the file name, you need to escape the *, which is preceded by a \. LS \*.txt. Regular expressions have the following special characters.
Special Character description
$ matches the end position of the input string. If the Multiline property of the RegExp object is set, then $ also matches ' \ n ' or ' \ R '. To match the $ character itself, use \$.
() marks the start and end position of a subexpression. The subexpression can be obtained for later use. To match these characters, use \ (and \).
* Match the preceding subexpression 0 or more times. To match the * character, use \*.
+ matches the preceding subexpression one or more times. to match the + character, use \+.
. Matches any single character except the newline character \ n. to match., please use \.
[Marks the beginning of a bracket expression. To match [, use \[.
? Matches the preceding subexpression 0 or more times, or indicates a non-greedy qualifier. Want to match? characters, please use \?.
\ marks the next character as either a special character, or a literal character, or a backward reference, or a octal escape character. For example, ' n ' matches the character ' n '. ' \ n ' matches line breaks. The sequence ' \ \ ' matches ' \ ' and ' \ (' matches '.
^ matches the starting position of the input string, unless used in a bracket expression, at which point it means that the character set is not accepted. To match the ^ character itself, use \^.
{marks the beginning of a qualifier expression.} To match {, use \{.
| Indicates a choice between two items. to match |, use \|.




The method for constructing regular expressions is the same as for creating mathematical expressions. That is, using multiple metacharacters and operators to combine small expressions to create larger expressions. The component of a regular expression can be a single character, character set, character range, selection between characters, or any combination of any of these components.




Qualifier

A qualifier is used to specify how many times a given component of a regular expression must appear to satisfy a match. There are * or + or? or {n} or {n,} or {n,m} altogether 6 species.
*, +, and? Qualifiers are greedy because they match as many words as possible, only to add one behind them. You can achieve a non greedy or minimal match.
The qualifiers for regular expressions are:
Character description
* Match the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * is equivalent to {0,}.
+ matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but cannot match "Z". + is equivalent to {1,}.
? Match the preceding subexpression 0 times or once. For example, "Do (es)" can match "do" in "do" or "does". is equivalent to {0,1}.
{n} n is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '.
{N,} n is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ' but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '.
{n,m} m and n are non-negative integers, where n <= m. Matches n times at least and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' o '. Notice that there is no space between the comma and the two number.




Locator character

Used to describe the boundary of a string or word, ^ and $, respectively, the beginning and end of a string, \b describes the front or back bounds of a word, and \b represents a non word boundary. Qualifiers cannot be used on the locator.



Choose

Enclose all the selections with parentheses, separating the adjacent selections with |. But with parentheses there is a side effect that the related match is cached and available at this time?: Put the first option to eliminate this side effect.
Among them: is one of the non-capture elements, and there are two not-captured dollars? = and?!, these two also have more meaning, the former is forward lookup, in any start matching the regular expression pattern within the parentheses position to match the search string, the latter is negative check, Matches the search string at any position that does not begin to match the regular expression pattern.



Back reference

Adding parentheses around a regular expression pattern or part of a pattern causes the correlation match to be stored in a temporary buffer, and each captured child match is stored in the content that is encountered from left to right in the regular expression pattern. The buffer number for the storage child match starts at 1 and is numbered consecutively up to 99 subexpression. Each buffer can be accessed using ' \ n ', where n is a single or two-bit decimal number that identifies a particular buffer.
You can use a non-capture meta character '?: ', '? = ', or '?! ' to ignore the preservation of the related match.



Operation Precedence for various operators

The same priority of the operation from left to right, the operation of different priorities first high and then low. The precedence of various operators from high to Low is as follows:
Operator description
\ escape Character
(), (?:), (? =), [] parentheses and square brackets
*, +,?, {n}, {n,}, {n,m} qualifier
^, $, \anymetacharacter position and order
| "or" action




All symbolic explanations

Character description
\ marks the next character as a special character, or a literal character, or a backward reference, or a octal escape character. For example, ' n ' matches the character ' n '. ' \ n ' matches a newline character. Sequence ' \ ' matches ' \ ' and ' \ (' Matches ' (".
^ matches the start position of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after ' \ n ' or ' \ R '.
$ matches the end position of the input string. If the Multiline property of the RegExp object is set, the $ also matches the position before ' \ n ' or ' \ R '.
* Match the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * is equivalent to {0,}.
+ matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but cannot match "Z". + is equivalent to {1,}.
? Match the preceding subexpression 0 times or once. For example, "Do (es)" can match "do" in "do" or "does". is equivalent to {0,1}.
{n} n is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '.
{N,} n is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ' but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '.
{n,m} m and n are non-negative integers, where n <= m. Matches n times at least and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' o '. Notice that there is no space between the comma and the two number.
? When the character is immediately following any of the other qualifiers (*, +,?, {n}, {n,}, {n,m}), the matching pattern is not greedy. Non-greedy patterns match as few strings as possible, while the default greedy pattern matches as many of the searched strings as possible. For example, for the string "oooo", ' o+? ' will match a single "O", and ' o+ ' will match all ' o '.
. Matches any single character except "\ n". To match any character including ' \ n ', use a pattern like ' [. \ n] '.
(pattern) matches the pattern and gets the match. The obtained matches can be obtained from the resulting matches collection, use the Submatches collection in VBScript, and use the $0...$9 property in JScript. To match the parentheses character, use ' \ (' or ' \ ').
(?:p Attern)



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.