Regular expression syntax and common examples _

Regular expression syntax and common examples __ regular expressions

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Brief Introduction

Regular expressions are pairs of strings, including ordinary characters (for example, the letters between A and z) and special characters (called "metacharacters") are a logical formula for manipulating a "rule string" with a predefined set of characters, and a combination of those particular characters, which A filtering logic used to express a string.
A regular expression is a text pattern that describes one or more strings to match when searching for text.
Regular expressions are often used to retrieve and replace text that conforms to a pattern (rule). Grammar

Ordinary characters

Normal characters include all printable and nonprinting characters that are not shown as metacharacters. This includes all uppercase and lowercase letters, all numbers, all punctuation marks, and some other symbols.

Non-printable characters

Nonprinting characters can also be part of a regular expression. The following table lists the escape sequences that represent nonprinting characters:

character	Description
\f	Match a page feed character
\ n	Match a line feed
\ r	Match a return character
\ t	Match a tab
\v	Match a vertical tab
\s	Matches any white space character, including spaces, tabs, page breaks, and so on
\s	Match any non-white space character

Special characters

The so-called special character, is some special meaning of the characters, such as ", meaning any character, and if only want to match the ' symbol, you need to escape the ' * ' symbol, which is preceded by the ' \ ' escape character.

The following table lists the special characters in the regular expression:

Special Characters	Description
^	Matches the starting position of a string, unless used in a bracket expression, at which point it indicates that the character set is not accepted. For example [^a], which means not beginning with a character, to match the ^ character itself, plus the escape character: \^
$	Match the end position of the string to match the $ character itself, plus the escape character: \$
()	Mark the beginning and end of a subexpression to match the $ character itself, plus the escape character: \ (and \)
[	Marks the beginning of a bracket expression
{	Start of a tag qualifier expression
\|	Indicates a choice between two items
\	Marks the next character as either a special character, or a literal character, or a backward reference, or a octal escape character. For example, ' n ' matches the character ' n ', ' \ n ' matches a newline character. ' Match '
.	Match any but characters outside of the newline character \ n
?	qualifier, matching the preceding subexpression 0 or 1 times, or indicating a non-greedy qualifier
*	qualifier, matching the preceding subexpression 0 or more times (with or without)
+	qualifier, matching the preceding subexpression 1 or more times (at least once)

Qualifier

A qualifier is used to specify how many times a given component of a regular expression appears to satisfy a match. The qualifiers for regular expressions are:

character	Description
N	n is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '
{N,}	n is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ' but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '
{N,m}	M and n are nonnegative integers, of which n <= M. Matches n times at least and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' o '. Note that you cannot have spaces between commas and two numbers
?	Matches the preceding subexpression 0 or 1 times. For example, ' Do (es)? ' Can match ' do ', ' does ' in ' does ', ' do ' in ' Doxy '. Equivalent to {0,1}
*	Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * Equivalent to {0,}
+	Matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but cannot match "Z". + equivalent to {1,}

Note: ' * ' and ' + ' are greedy, they will probably match the text, only after they add one. You can achieve a non greedy or minimal match.

For example, match the H1 tag in the following HTML document:

 
greedy mode : An expression that matches everything from the beginning less than the symbol (<) to the greater-than symbol (>) of the H1 tag

/<.*>/ 
non-greedy mode : If you only need to match the start and end H1 tags, the following non-greedy expressions only match <H1>

/<.*?>/ 
If you want to match only the starting H1 tag, the expression is:

/<\w+?>/Locator character
A locator allows you to pin a regular expression to the beginning or end of a line. They also enable you to create regular expressions that appear within a word, at the beginning of a word, or at the end of a word.

A locator character is used to describe a string or word boundary.


  
 
   
    
    character 
    Description 
    
   
 
   
    
    ^ 
    Match the start position of the string 
    
    
    $ 
    Matches the end position of a string 
    
    
    \b 
    Matches a word boundary, that is, the position of the word and the space, for example, ' er\b ' can match ' er ' in ' never ', but cannot match ' er ' in ' verb ' 
    
    
    \b 
    Matches a non word boundary, ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never ' 
    
   

  
Note : Qualifiers cannot be used with the locator character. Expressions such as ^* are not allowed because you cannot have more than one position immediately before or after a newline or word boundary.

To match the text at the beginning of a line of text, use the ^ character at the beginning of the regular expression. Do not confuse this usage of ^ with the usage within the bracket expression.

To match the text at the end of a line of text, use the $ character at the end of the regular expression. Other meta characters


  
 
   
    
    character 
    Description 
    
   
 
   
    
    X|y 
    Match x or Y. For example, ' Z 
    
    
    [XYZ] 
    Character set combination. Matches any one of the characters contained. For example, ' [ABC] ' can match ' a ' in ' plain ' 
    
    
    [^XYZ] 
    Negative character set combination. Matches any characters that are not included. For example, ' [^ABC] ' can match ' P ', ' l ', ' I ', ' n ' in ' plain ' 
    
    
    [A-z] 
    The range of characters. Matches any character within the specified range. For example, ' [A-z] ' can match any lowercase alphabetic character in the range ' a ' to ' Z ' 
    
    
    [^a-z] 
    Negative character range. Matches any character that is not in the specified range. For example, ' [^a-z] ' can match any character that is not in the range ' a ' to ' Z ' 
    
    
    \d 
    Matches a numeric character. equivalent to [0-9] 
    
    
    \d 
    Matches a non-numeric character. equivalent to [^0-9] 
    
    
    \w 
    Matches letters, numbers, underscores. equivalent to [a-za-z0-9_] 
    
    
    \w 
    Matches non-letters, numbers, underscores. equivalent to [^a-za-z0-9_] 
    
    
    \s 
    Matches any white space character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v] 
    
    
    \s 
    Matches any non-white-space character. equivalent to [^ \f\n\r\t\v] 
    
  

  
regular expressions that are commonly usedVerify Number

A. Number: ^[0-9]*$

B. N digits: ^\d{n}$

C. m-n digits: ^\d{m,n}$

D. 0 and non-0 digits: ^ (0|[ 1-9][0-9]*) $

E. Positive integer: ^[1-9]\d*$

F. Negative integers: ^-[1-9]\d*$

G. Integer: ^-? [1-9]\d*$ Checksum character

A. Chinese characters: ^[\u4e00-\u9fa5]{0,}$

B. Letters and numbers: ^[a-za-z0-9]+$ Special

A. Email address: ^\w+ ([-+.] \w+) *@\w+ ([-.] \w+) *\.\w+ ([-.] \w+) *$

B. Domain name: [a-za-z0-9][-a-za-z0-9]{0,62} (/.[ A-ZA-Z0-9][-A-ZA-Z0-9]{0,62}) +/.?

C. interneturl:[a-za-z]+://[^\s]* or ^http://([\w-]+\.) +[\w-]+ (/[\w-./?%&=]*)? $

D. Mobile phone Number: ^ (13[0-9]|14[5|7]|15[0|1|2|3|5|6|7|8| 9]|18[0|1|2|3|5|6|7|8| 9]) \d{8}$

E. Identification number (15-digit, 18-digit), the last is a check digit, may be a number or character x: (^\d{15}$) | (^\d{18}$) | (^\d{17} (\d| X|X) $)

F. Date format: ^\d{4}-\d{1,2}-\d{1,2}

G. Tencent QQ Number: [1-9][0-9]{4,}

H. IP address: (?:(? : 25[0-5]|2[0-4]\\d| [01]?\\d?\\d) \.) {3} (?: 25[0-5]|2[0-4]\\d| [01]?\\d?\\d)]

I. China ZIP code (6 digits in China): [1-9]\d{5} (?! \d)
Python Regular Expressions 
In the Python Basics tutorial of the Rookie tutorial, the Python Regular expression section is devoted to the knowledge of Python regular expressions. Online testing Tools Statement

The above is excerpted from rookie tutorials: Regular Expression Tutorials

character	Description
^	Match the start position of the string
$	Matches the end position of a string
\b	Matches a word boundary, that is, the position of the word and the space, for example, ' er\b ' can match ' er ' in ' never ', but cannot match ' er ' in ' verb '
\b	Matches a non word boundary, ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '

character	Description
X\|y	Match x or Y. For example, ' Z
[XYZ]	Character set combination. Matches any one of the characters contained. For example, ' [ABC] ' can match ' a ' in ' plain '
[^XYZ]	Negative character set combination. Matches any characters that are not included. For example, ' [^ABC] ' can match ' P ', ' l ', ' I ', ' n ' in ' plain '
[A-z]	The range of characters. Matches any character within the specified range. For example, ' [A-z] ' can match any lowercase alphabetic character in the range ' a ' to ' Z '
[^a-z]	Negative character range. Matches any character that is not in the specified range. For example, ' [^a-z] ' can match any character that is not in the range ' a ' to ' Z '
\d	Matches a numeric character. equivalent to [0-9]
\d	Matches a non-numeric character. equivalent to [^0-9]
\w	Matches letters, numbers, underscores. equivalent to [a-za-z0-9_]
\w	Matches non-letters, numbers, underscores. equivalent to [^a-za-z0-9_]
\s	Matches any white space character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v]
\s	Matches any non-white-space character. equivalent to [^ \f\n\r\t\v]

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Regular expression syntax and common examples __ regular expressions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Regular expression syntax and common examples __ regular expressions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support