Regular Expression BASICS (Reading Notes), regular expression Reading Notes

Last Update:2015-03-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A regular expression (regex) is a tool.

Two basic functions of a regular expression: search and replace.

. Character (English period) can match any single character. (. Characters can match any single character, number, letter, or even. character itself .)

\ Escape character. This is a metacharacter, which indicates that this character has a special meaning rather than its own meaning .)

(Conclusion:. can match any character; \ is used to escape the character .)

[And] do not match any characters. They are only responsible for defining a character set combination.

-A hyphen (-) is a metacharacters that can be used to define a character range. As a metacharacters, it can only be used between [and]. It is only a common character except for a desirable character set.

Valid character range:

A-Z, matching all the uppercase letters from A to Z;

A-z: matches all lowercase letters from a to z;

A-z: matches all letters from ASCII character A to ASCII letter z (not commonly used );

^ Non-characters are also metacharacters used to perform non-operations on a character set combination.

Metacharacters can be roughly divided into two types: one is used to match text (for example,.), and the other is required by the regular expression syntax (for example, [and]).

// 2015.02.17

Blank metacharacters:

[\ B]	Roll back (and delete) one character (Backspace key)
\ F	Page feed
\ N	Line Break
\ R	Carriage Return
\ T	Tab (Tab key)
\ V	Vertical Tab

Numeric metacharacters:

\ D	Any numeric character (equivalent to [0-9])
\ D	Any non-numeric character (equivalent to [^ 0-9])

Alphanumeric metacharacters:

\ W	Any letter, digit (case-sensitive) or underscore (equivalent to [a-zA-Z0-9 _])
\ W	Any non-alphanumeric or non-underline character (equivalent to [^ a-zA-Z0-9 _])

Blank metacharacters:

\ S	Any blank character (equivalent to [\ f \ n \ r \ t \ v])
\ S	Any non-blank character (equivalent to [^ \ f \ n \ r \ t \ v])

+ Match one or more characters at a time or multiple times (at least one character does not match zero characters ).

* Matches one or more characters zero or multiple times.

? Matches zero or one occurrence of one or more characters.

{N} sets an exact value for the number of repeated matches (for example, {3} indicates that the previous character or character set must appear three times in a row ).

{N, m} sets an interval for the number of repeated matches (for example, {2, 4} indicates that the previous character or character set combination appears at least twice consecutively, at most 4 times, {3 ,} indicates that the previous character or character set must appear at least three times ).

Greedy metacharacters and their lazy versions:

*	*?
+	+?
{N ,}	{N ,}?

(Conclusion: the real power of a regular expression is reflected in the repeat matching. + One or more occurrences of matching characters or character sets, * zero or multiple occurrences of matching characters or character sets ,? Matches zero or one occurrence of a character or character set. To get more precise control, you can use the {} syntax to precisely control the minimum and maximum values of a repeat or repeat. Metacharacters are classified into two types: "greedy" and "lazy". To prevent over-matching, use the "lazy" metacharacters to construct regular expressions .)

\ B is used to match the start or end of a word.

\ B is used to match the start or end of a character.

^ Defines the start of a string, and $ defines the end of a string.

(? M) used to enable the Branch matching mode ,(? M) must appear at the beginning of the entire mode.

(Conclusion: regular expressions can be used not only to match text blocks of any length, but also to match text that appears at a specific position of a string. \ B is used to specify a word boundary (\ B is the opposite ). ^ And $ are used to customize the string boundary (the start of a string and the end of a string ). If (? M) in combination, ^ and $ will also match the string starting or ending at the beginning of a line break (at this time, the line break will be considered as a string separator ).)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Regular Expression BASICS (Reading Notes), regular expression Reading Notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Regular Expression BASICS (Reading Notes), regular expression Reading Notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support