Regular expression syntax and common instances _

Regular expression syntax and common instances __ Regular expressions

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction

A regular expression is a logical formula for a string (including ordinary characters (for example, letters between A and z) and special characters (called "metacharacters"), which is a "rule string" that is composed of a predefined set of characters, and a combination of those specific characters, which Used to express a filtering logic for a string.
A regular expression is a text pattern that describes one or more strings to match when searching for text.
Regular expressions are often used to retrieve and replace text that conforms to a pattern (rule). Grammar

Normal characters

Ordinary characters include all printable and non-printable characters that are specified as meta characters are not displayed. This includes all uppercase and lowercase letters, all numbers, all punctuation marks, and some other symbols.

Non-printable characters

Nonprinting characters can also be part of a regular expression. The following table lists the escape sequences that represent nonprinting characters:

character	Description
\f	Match a page break
\ n	Match a line break
\ r	Match a carriage return character
\ t	Match a tab
\v	Match a vertical tab
\s	Matches any whitespace character, including spaces, tabs, page breaks, and so on
\s	Match any non-whitespace character

Special characters

The so-called special characters, which are characters with special meanings, such as ', denote the meaning of any character, and if you only want to match the ' sign ', you need to escape the ' * ' symbol, which is preceded by the ' \ ' escape character.

The following table lists the special characters in the regular expression:

Special Characters	Description
^	Matches the starting position of a string, unless used in a square bracket expression, at which point it does not accept the character set. For example [^a], meaning not to start with the a character, to match the ^ character itself, then add the escape character: \^
$	Match the end position of the string to match the $ character itself, then add the escape character: \$
()	Mark the start and end of a subexpression, to match the $ character itself, then add the escape character: \ (and \)
[	Marks the beginning of a bracket expression
{	Marking the beginning of a qualifier expression
\|	Indicate a choice between two items
\	Marks the next character as either a special character, a literal character, a backward reference, or an octal escape. For example, ' n ' matches the character ' n ', ' \ n ' matches the line break. ' \ ' matches ' \ '
.	matches any but characters other than the newline character \ n
?	qualifier, matching the preceding subexpression 0 or 1 times, or specifying a non-greedy qualifier
*	qualifier, matching the preceding subexpression 0 or more times (yes or no)
+	qualifier, matching the preceding subexpression 1 or more times (at least once)

Qualifier

Qualifiers are used to specify how many times a given component of a regular expression will appear to satisfy a match. The qualifiers for a regular expression are:

character	Description
N	N is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '
{N,}	N is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ', but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '
{N,m}	Both M and n are non-negative integers, where n <= m. Matches at least n times and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' O? '. Note that there can be no spaces between the comma and two numbers
?	Matches the preceding subexpression 0 or 1 times. For example, ' Do (es)? ' Can match ' do ', ' does ' in ' does ', ' doxy ' in ' do '. Equivalent to {0,1}
*	Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * Equivalent to {0,}
+	Matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but not "Z". + equivalent to {1,}

Note: ' * ' and ' + ' are greedy, they will probably match the text more, only after they add one. You can implement a non-greedy or minimal match.

For example, match the H1 tag in the following HTML document:

 
Greedy mode : The expression matches everything from the beginning to less than the symbol (<) to the greater than sign (>) of the H1 tag

/<.*>/ 
non-greedy mode : If you only need to match the start and end H1 tags, the following non-greedy expression matches only <H1>

/<.*?>/ 
If you want to match only the starting H1 tag, the expression is:

/<\w+?>/Locator characters
Locators enable you to pin regular expressions to the beginning or end of a line. They also enable you to create regular expressions that appear within a word, at the beginning of a word, or at the end of a word.

A locator is used to describe the bounds of a string or word.


  
 
   
    
    character 
    Description 
    
   
 
   
    
    ^ 
    The starting position of the matching string 
    
    
    $ 
    Match the end position of the string 
    
    
    \b 
    Match a word boundary, that is, the position of the word and the space, for example, ' er\b ' can match ' er ' in ' never ', but cannot match ' er ' in ' verb ' 
    
    
    \b 
    Match non-word boundaries, ' er\b ' can match ' er ' in ' verb ', but not ' er ' in ' Never ' 
    
   

  
Note : You cannot use qualifiers with locators. Expressions such as ^* are not allowed because they cannot have more than one position immediately before or after a newline or word boundary.

To match the text at the beginning of a line of text, use the ^ character at the beginning of the regular expression. Do not confuse this usage of ^ with the usage within the brackets expression.

To match the text at the end of a line of text, use the $ character at the end of the regular expression. Other meta characters


  
 
   
    
    character 
    Description 
    
   
 
   
    
    X|y 
    Match x or Y. For example, ' Z 
    
    
    [XYZ] 
    The character set is combined. Matches any one of the characters contained. For example, ' [ABC] ' can match ' a ' in ' plain ' 
    
    
    [^XYZ] 
    Negative character set. Matches any character that is not contained. For example, ' [^ABC] ' can match ' P ', ' l ', ' I ', ' n ' in ' plain ' 
    
    
    [A-z] 
    The character range. Matches any character within the specified range. For example, ' [A-z] ' can match any lowercase alphabetic character in the ' a ' to ' Z ' range 
    
    
    [^a-z] 
    A negative character range. Matches any character that is not in the specified range. For example, ' [^a-z] ' can match any character that is not in the ' a ' to ' Z ' range 
    
    
    \d 
    Matches a numeric character. equivalent to [0-9] 
    
    
    \d 
    Matches a non-numeric character. equivalent to [^0-9] 
    
    
    \w 
    Matches letters, numbers, underscores. equivalent to [a-za-z0-9_] 
    
    
    \w 
    Matches non-alphabetic, numeric, underline. equivalent to [^a-za-z0-9_] 
    
    
    \s 
    Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v] 
    
    
    \s 
    Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v] 
    
  

  
Common Regular ExpressionsVerifying numbers

A. Number: ^[0-9]*$

B. N digits: ^\d{n}$

C. Number of m-n digits: ^\d{m,n}$

D. Numbers starting with 0 and non 0: ^ (0|[ 1-9][0-9]*) $

E. Positive integers: ^[1-9]\d*$

F. Negative integers: ^-[1-9]\d*$

G. Integer: ^-? [1-9]\d*$ Check character

A. Kanji: ^[\u4e00-\u9fa5]{0,}$

B. Letters and Numbers: ^[a-za-z0-9]+$ Special

A. Email address: ^\w+ ([-+.] \w+) *@\w+ ([-.] \w+) *\.\w+ ([-.] \w+) *$

B. Domain name: [a-za-z0-9][-a-za-z0-9]{0,62} (/.[ A-ZA-Z0-9][-A-ZA-Z0-9]{0,62}) +/.?

C. interneturl:[a-za-z]+://[^\s]* or ^http://([\w-]+\.) +[\w-]+ (/[\w-./?%&=]*)? $

D. Mobile number: ^ (13[0-9]|14[5|7]|15[0|1|2|3|5|6|7|8| 9]|18[0|1|2|3|5|6|7|8| 9]) \d{8}$

E. Social Security Number (15-bit, 18-digit number), the last one is a check digit, possibly a number or a character x: (^\d{15}$) | (^\d{18}$) | (^\d{17} (\d| X|X) $)

F. Date format: ^\d{4}-\d{1,2}-\d{1,2}

G. Tencent QQ number: [1-9][0-9]{4,}

H. IP Address: ((?:(? : 25[0-5]|2[0-4]\\d| [01]?\\d?\\d] \ \.) {3} (?: 25[0-5]|2[0-4]\\d| [01]?\\d?\\d)]

I. China postcode (China ZIP code is 6 digits): [1-9]\d{5} (?! \d)
Python Regular Expressions 
In the beginner's Python basic tutorial, the Python Regular Expressions section is devoted to the knowledge of Python regular expressions. Online testing Tools Statement

This excerpt from the Rookie Tutorial: regular-expression tutorials

character	Description
^	The starting position of the matching string
$	Match the end position of the string
\b	Match a word boundary, that is, the position of the word and the space, for example, ' er\b ' can match ' er ' in ' never ', but cannot match ' er ' in ' verb '
\b	Match non-word boundaries, ' er\b ' can match ' er ' in ' verb ', but not ' er ' in ' Never '

character	Description
X\|y	Match x or Y. For example, ' Z
[XYZ]	The character set is combined. Matches any one of the characters contained. For example, ' [ABC] ' can match ' a ' in ' plain '
[^XYZ]	Negative character set. Matches any character that is not contained. For example, ' [^ABC] ' can match ' P ', ' l ', ' I ', ' n ' in ' plain '
[A-z]	The character range. Matches any character within the specified range. For example, ' [A-z] ' can match any lowercase alphabetic character in the ' a ' to ' Z ' range
[^a-z]	A negative character range. Matches any character that is not in the specified range. For example, ' [^a-z] ' can match any character that is not in the ' a ' to ' Z ' range
\d	Matches a numeric character. equivalent to [0-9]
\d	Matches a non-numeric character. equivalent to [^0-9]
\w	Matches letters, numbers, underscores. equivalent to [a-za-z0-9_]
\w	Matches non-alphabetic, numeric, underline. equivalent to [^a-za-z0-9_]
\s	Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v]
\s	Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v]

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Regular expression syntax and common instances __ Regular expressions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Regular expression syntax and common instances __ Regular expressions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support