Regular Expressions Learning notes-Regular expressions

Source: Internet
Author: User
Tags alphabetic character character set control characters numeric lowercase printable characters
Regular Expressions Learning Notes
A regular expression (regular expression) describes a pattern of string matching that can be used to check whether a string contains
There is a seed string, a matching substring is substituted, or a substring is taken from a string that matches a condition.
Column directory, the *.txt in dir *.txt or LS *.txt is not a regular expression, because here * and regular type *
The meaning is different.
For ease of understanding and memory, start with a few concepts, all special characters or combinations of characters have a total table in the back, the last
Some examples for understanding the corresponding concepts.
Regular expressions
is a literal pattern consisting of ordinary characters, such as characters A through z, and special characters (called metacharacters). Regular expression
Type as a template that matches a character pattern with the string you are searching for.
You can construct a regular expression by putting together various components of an expression pattern between a pair of delimiters.
namely/expression/
Ordinary characters
Consists of all print and nonprinting characters that are not explicitly specified as metacharacters. This includes all uppercase and lowercase alphabetic characters
, all the numbers, all the punctuation, and some symbols.
non-printable characters
Character meaning
\CX matches the control characters indicated by X. For example, \cm matches a control-m or carriage return character. The value of x must be a-Z
Or one-A-Z. Otherwise, c is treated as a literal ' C ' character.
\f matches a page feed character. Equivalent to \x0c and \CL.
\ n matches a newline character. Equivalent to \x0a and \CJ.
\ r matches a carriage return character. Equivalent to \x0d and \cm.
\s matches any white space character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s matches any non-white-space character. equivalent to [^ \f\n\r\t\v].
\ t matches a tab character. Equivalent to \x09 and \ci.
\v matches a vertical tab. Equivalent to \x0b and \ck.
Special Characters
The so-called special characters, that is, some special meaning of the characters, such as the above "*.txt" in the *, simply to say that the
The meaning of what string. If you want to find files with * in the file name, you need to escape the *, which is preceded by a \. Ls
\*.txt. Regular expressions have the following special characters.
Special Character Description
$ matches the end position of the input string. If the Multiline property of the RegExp object is set, then $ also matches ' \ n '
or ' \ R '. To match the $ character itself, use \$.
() marks the start and end position of a subexpression. The subexpression can be obtained for later use. To match these characters, make the
with \ (and \).
* Match the preceding subexpression 0 or more times. To match the * character, use \*.
+ matches the preceding subexpression one or more times. to match the + character, use \+.
. Matches any single character except the newline character \ n. to match., please use \.
[Marks the beginning of a bracket expression. To match [, use \[.
? Matches the preceding subexpression 0 or more times, or indicates a non-greedy qualifier. Want to match? characters, please use \?.
\ marks the next character as either a special character, or a literal character, or a backward reference, or a octal escape character. For example, ' n ' Horses
With the character ' n '. ' \ n ' matches line breaks. The sequence ' \ \ ' matches ' \ ' and ' \ (' matches '.
^ matches the starting position of the input string, unless used in a bracket expression, at which point it means that the character set is not accepted. Want a horse
With the ^ character itself, please use \^.
{marks the beginning of a qualifier expression.} To match {, use \{.
| Indicates a choice between two items. to match |, use \|.
The method for constructing regular expressions is the same as for creating mathematical expressions. That is, a small table with a variety of meta characters and operators
Together to create a larger expression. The component of a regular expression can be a single character, character set, character range
, the choice between characters, or any combination of all these components.
Qualifier
A qualifier is used to specify how many times a given component of a regular expression must appear to satisfy a match. have * or + or? or {n}
or {n,} or {n,m} altogether 6 kinds.
*, +, and? Qualifiers are greedy because they match as many words as possible, only to add one behind them?
achieve non-greedy or minimum matching
The qualifiers for regular expressions are:
Character description
* Match the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * is equivalent to {0,}.
+ matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but cannot match "Z". +
Equivalent to {1,}.
? Match the preceding subexpression 0 times or once. For example, "Do (es)" can match "do" in "do" or "does".
is equivalent to {0,1}.
{n} n is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can
With "food" in the two O.
{N,} n is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ', but can match
All o in "Foooood". ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '.
{n,m} m and n are non-negative integers, where n <= m. Matches n times at least and matches up to M times. For example, "o{1,3}"
The first three o in "Fooooood" will be matched. ' o{0,1} ' is equivalent to ' o '. Please note that there is no space between commas and two numbers
Pane
Locator Character
Used to describe the bounds of a string or word, ^ and $, respectively, at the beginning and end of a string, \b the front or back boundary of a word,
\b Represents a non word boundary. Qualifiers cannot be used on the locator.
Choose
Enclose all the selections with parentheses, separating the adjacent selections with |. But with parentheses there is a side effect that is
Related matches are cached and available at this time: put the first option to eliminate this side effect.
Among them?: is one of the non-capture elements, and there are two not-captured dollars. = and?!, these two have more meaning, the former is forward
Check to match the search string at any position that begins to match the regular expression pattern within the parentheses, which is the negative forward lookup, in either
How to start matching the search string by not matching the position of the regular expression pattern.
Back reference
Adding parentheses on either side of a regular expression pattern or partial mode causes the correlation match to be stored in a temporary buffer.
Each captured child match is stored in the content that is encountered from left to right in the regular expression pattern. Buffer for storing child matches
Numbering starts at 1, numbering consecutively until the largest 99 subexpression. Each buffer can be accessed using ' \ n ', where n
is a single or two-bit decimal number that identifies a particular buffer.
You can use a non-capture meta character '?: ', '? = ', or '?! ' to ignore the preservation of the related match.
Operation Precedence for various operators
The same priority of the operation from left to right, the operation of different priorities first high and then low. The precedence of various operators from high to low
As follows:
Operator description
\ escape Character
(), (?:), (? =), [] parentheses and square brackets
*, +,?, {n}, {n,}, {n,m} qualifier
^, $, \anymetacharacter position and order
| "or" action
All symbolic explanations
Character description
\ marks the next character as a special character, or a literal character, or a backward reference, or a octal escape character
。 For example, ' n ' matches the character ' n '. ' \ n ' matches a newline character. Sequence ' \ ' matches ' \ ' and ' \ (' Matches ' (".
^ matches the start position of the input string. If the Multiline property of the RegExp object is set, ^ also matches ' \ n ' or
The position after ' \ R '.
$ matches the end position of the input string. If the Multiline property of the RegExp object is set, $ also matches ' \ n ' or
Position prior to ' \ R '.
* Match the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * is equivalent to {0,}.
+ matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but cannot match "Z". +
Equivalent to {1,}.
? Match the preceding subexpression 0 times or once. For example, "Do (es)" can match "do" in "do" or "does".
is equivalent to {0,1}.
{n} n is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can
With "food" in the two O.
{N,} n is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ', but can match
All o in "Foooood". ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '.
{n,m} m and n are non-negative integers, where n <= m. Matches n times at least and matches up to M times. For example, "o{1,3}"
The first three o in "Fooooood" will be matched. ' o{0,1} ' is equivalent to ' o '. Please note that there is no space between commas and two numbers
Pane
? When the character is immediately following any of the other qualifiers (*, +,?, {n}, {n,}, {n,m}), the matching pattern is non-greedy
Greed's. Non-greedy mode matches as little as possible the searched string, while the default greedy pattern matches as many of the search
String. For example, for the string "oooo", ' o+? ' will match a single "O", and ' o+ ' will match all ' o '.
. Matches any single character except "\ n". To match any character including ' \ n ', use the image ' [. \ n] '
Mode.
(pattern) matches the pattern and gets the match. The obtained match can be obtained from the generated matches collection,
The Submatches collection is used in VBScript and the $0...$9 property is used in JScript. To match the parentheses character,
Use ' \ (' or ' \ ').
(?:p Attern) matches pattern but does not get matching results, which means that this is a non fetch match and is not stored for
After use. This is useful for combining parts of a pattern with the "or" character (|). For example, ' Industr
(?: y|ies) is a more abbreviated expression than ' industry|industries '.
(? =pattern) forward lookup, matching the find string at the beginning of any string matching pattern. This is a non-fetching
Match, which means that the match does not need to be acquired for later use. For example, ' Windows (? =95|98| nt|2000) ' Can match
Windows 2000, but it does not match Windows 3.1 in Windows +. Pre-check does not consume words
character, that is, after a match occurs, immediately after the last match starts the next matching search, not from the package
Start after the character that contains the pre-check.
(?! pattern), which matches the lookup string at the beginning of any string that does not match the pattern. This is a non-won
To match, that is, the match does not need to be acquired for later use. For example, ' Windows (?! 95|98| nt|2000) ' Can match
Windows 3.1, but it does not match Windows 2000 in Windows +. Pre-check does not consume characters
, that is, after a match occurs, immediately after the last match, the next matching search is started, not from the containing
Pre-checked characters start after
X|y matches x or Y. For example, ' Z|food ' can match "z" or "food". ' (z|f) Ood ' matches ' zood ' or
"Food".
[XYZ] Character set combination. Matches any one of the characters contained. For example, ' [ABC] ' can match ' a ' in ' plain '.
[^XYZ] Negative character set combination. Matches any characters that are not included. For example, ' [^ABC] ' can match ' P ' in ' plain '.
[A-z] character range. Matches any character within the specified range. For example, ' [A-z] ' can match ' a ' to ' Z ' range of
Any lowercase alphabetic character.
[^a-z] a negative character range. Matches any character that is not in the specified range. For example, ' [^a-z] ' can match any
Any character in the range of ' a ' to ' Z '.
\b Matches a word boundary, which refers to the position between the word and the space. For example, ' er\b ' can match the never in the
' er ', but cannot match ' er ' in ' verb '.
\b Matches a non word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '.
\CX matches the control characters indicated by X. For example, \cm matches a control-m or carriage return character. The value of x must be a-Z
Or one-A-Z. Otherwise, c is treated as a literal ' C ' character.
\d matches a numeric character. equivalent to [0-9].
\d matches a non-numeric character. equivalent to [^0-9].
\f matches a page feed character. Equivalent to \x0c and \CL.
\ n matches a newline character. Equivalent to \x0a and \CJ.
\ r matches a carriage return character. Equivalent to \x0d and \cm.
\s matches any white space character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s matches any non-white-space character. equivalent to [^ \f\n\r\t\v].
\ t matches a tab character. Equivalent to \x09 and \ci.
\v matches a vertical tab. Equivalent to \x0b and \ck.
\w matches any word character that includes an underscore. Equivalent to ' [a-za-z0-9_] '.
\w matches any non word character. Equivalent to ' [^a-za-z0-9_] '.
\XN matches N, where n is the hexadecimal escape value. The hexadecimal escape value must be a determined two digits long. Cases
For example, ' \x41 ' matches ' A '. ' \x041 ' is equivalent to ' \x04 ' & ' 1 '. You can use ASCII encoding in regular expressions ...
\num matches num, where num is a positive integer. A reference to the match that was obtained. For example, ' (.) \1 ' matches two connected
Continuation of the same character.
\ n identifies a octal escape value or a backward reference. If \ n gets at least a subexpression before, N is the
After the reference. Otherwise, if n is an octal number (0-7), then N is an octal escape value.
\NM identifies a octal escape value or a backward reference. If at least nm gets the subexpression before \nm, then NM
For backward reference. If there are at least N fetches before \nm, then n is a backward reference followed by a literal m. If the previous
Conditions are not satisfied, if both N and M are octal digits (0-7), then \nm will match octal escape value nm.
\NML if n is an octal number (0-3) and both M and L are octal digits (0-7), then the octal escape value is matched
Nml
\un matches N, where N is a Unicode character represented in four hexadecimal digits. For example, the \U00A9 match version
Right sign (?).
Part of the example
Description of regular expressions
/\b ([a-z]+) \1\b/gi the place where a word appears continuously
/(\w+): \/\/([^/:]+) (: \d*)? ([^#]*)/resolves a URL to a protocol, domain, port, and relative path
/^ (?: chapter| section) [Location of 1-9][0-9]{0,1}$/location
/[-a-z]/A to Z a total of 26 letters plus a-number.
/ter\b/can match chapter, but not terminal
/\bapt/can match chapter, but not aptitude
/windows (? =95 |98 | NT)/can match Windows95 or Windows98 or WindowsNT, when a match is found, from
The next retrieval match starts after Windows.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.