Regular expression syntax and common instances __ Regular expressions

Source: Internet
Author: User
Tags alphabetic character character set check character numeric lowercase printable characters regular expression
Introduction

A regular expression is a logical formula for a string (including ordinary characters (for example, letters between A and z) and special characters (called "metacharacters"), which is a "rule string" that is composed of a predefined set of characters, and a combination of those specific characters, which Used to express a filtering logic for a string.
A regular expression is a text pattern that describes one or more strings to match when searching for text.
Regular expressions are often used to retrieve and replace text that conforms to a pattern (rule). Grammar

Normal characters

Ordinary characters include all printable and non-printable characters that are specified as meta characters are not displayed. This includes all uppercase and lowercase letters, all numbers, all punctuation marks, and some other symbols.

Non-printable characters

Nonprinting characters can also be part of a regular expression. The following table lists the escape sequences that represent nonprinting characters:

character Description
\f Match a page break
\ n Match a line break
\ r Match a carriage return character
\ t Match a tab
\v Match a vertical tab
\s Matches any whitespace character, including spaces, tabs, page breaks, and so on
\s Match any non-whitespace character
Special characters

The so-called special characters, which are characters with special meanings, such as ', denote the meaning of any character, and if you only want to match the ' sign ', you need to escape the ' * ' symbol, which is preceded by the ' \ ' escape character.

The following table lists the special characters in the regular expression:

Special Characters Description
^ Matches the starting position of a string, unless used in a square bracket expression, at which point it does not accept the character set. For example [^a], meaning not to start with the a character, to match the ^ character itself, then add the escape character: \^
$ Match the end position of the string to match the $ character itself, then add the escape character: \$
() Mark the start and end of a subexpression, to match the $ character itself, then add the escape character: \ (and \)
[ Marks the beginning of a bracket expression
{ Marking the beginning of a qualifier expression
| Indicate a choice between two items
\ Marks the next character as either a special character, a literal character, a backward reference, or an octal escape. For example, ' n ' matches the character ' n ', ' \ n ' matches the line break. ' \ ' matches ' \ '
. matches any but characters other than the newline character \ n
? qualifier, matching the preceding subexpression 0 or 1 times, or specifying a non-greedy qualifier
* qualifier, matching the preceding subexpression 0 or more times (yes or no)
+ qualifier, matching the preceding subexpression 1 or more times (at least once)
Qualifier

Qualifiers are used to specify how many times a given component of a regular expression will appear to satisfy a match. The qualifiers for a regular expression are:

character Description
N N is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '
{N,} N is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ', but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '
{N,m} Both M and n are non-negative integers, where n <= m. Matches at least n times and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' O? '. Note that there can be no spaces between the comma and two numbers
? Matches the preceding subexpression 0 or 1 times. For example, ' Do (es)? ' Can match ' do ', ' does ' in ' does ', ' doxy ' in ' do '. Equivalent to {0,1}
* Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * Equivalent to {0,}
+ Matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but not "Z". + equivalent to {1,}

Note: ' * ' and ' + ' are greedy, they will probably match the text more, only after they add one. You can implement a non-greedy or minimal match.

For example, match the H1 tag in the following HTML document:

 

Greedy mode : The expression matches everything from the beginning to less than the symbol (<) to the greater than sign (>) of the H1 tag

/<.*>/

non-greedy mode : If you only need to match the start and end H1 tags, the following non-greedy expression matches only <H1>

/<.*?>/

If you want to match only the starting H1 tag, the expression is:

/<\w+?>/
Locator characters

Locators enable you to pin regular expressions to the beginning or end of a line. They also enable you to create regular expressions that appear within a word, at the beginning of a word, or at the end of a word.

A locator is used to describe the bounds of a string or word.

character Description
^ The starting position of the matching string
$ Match the end position of the string
\b Match a word boundary, that is, the position of the word and the space, for example, ' er\b ' can match ' er ' in ' never ', but cannot match ' er ' in ' verb '
\b Match non-word boundaries, ' er\b ' can match ' er ' in ' verb ', but not ' er ' in ' Never '

Note : You cannot use qualifiers with locators. Expressions such as ^* are not allowed because they cannot have more than one position immediately before or after a newline or word boundary.

To match the text at the beginning of a line of text, use the ^ character at the beginning of the regular expression. Do not confuse this usage of ^ with the usage within the brackets expression.

To match the text at the end of a line of text, use the $ character at the end of the regular expression. Other meta characters

character Description
X|y Match x or Y. For example, ' Z
[XYZ] The character set is combined. Matches any one of the characters contained. For example, ' [ABC] ' can match ' a ' in ' plain '
[^XYZ] Negative character set. Matches any character that is not contained. For example, ' [^ABC] ' can match ' P ', ' l ', ' I ', ' n ' in ' plain '
[A-z] The character range. Matches any character within the specified range. For example, ' [A-z] ' can match any lowercase alphabetic character in the ' a ' to ' Z ' range
[^a-z] A negative character range. Matches any character that is not in the specified range. For example, ' [^a-z] ' can match any character that is not in the ' a ' to ' Z ' range
\d Matches a numeric character. equivalent to [0-9]
\d Matches a non-numeric character. equivalent to [^0-9]
\w Matches letters, numbers, underscores. equivalent to [a-za-z0-9_]
\w Matches non-alphabetic, numeric, underline. equivalent to [^a-za-z0-9_]
\s Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v]
\s Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v]
Common Regular ExpressionsVerifying numbers
A. Number: ^[0-9]*$
B. N digits: ^\d{n}$
C. Number of m-n digits: ^\d{m,n}$
D. Numbers starting with 0 and non 0: ^ (0|[ 1-9][0-9]*) $
E. Positive integers: ^[1-9]\d*$
F. Negative integers: ^-[1-9]\d*$
G. Integer: ^-? [1-9]\d*$ Check character
A. Kanji: ^[\u4e00-\u9fa5]{0,}$
B. Letters and Numbers: ^[a-za-z0-9]+$ Special
A. Email address: ^\w+ ([-+.] \w+) *@\w+ ([-.] \w+) *\.\w+ ([-.] \w+) *$
B. Domain name: [a-za-z0-9][-a-za-z0-9]{0,62} (/.[ A-ZA-Z0-9][-A-ZA-Z0-9]{0,62}) +/.?
C. interneturl:[a-za-z]+://[^\s]* or ^http://([\w-]+\.) +[\w-]+ (/[\w-./?%&=]*)? $
D. Mobile number: ^ (13[0-9]|14[5|7]|15[0|1|2|3|5|6|7|8| 9]|18[0|1|2|3|5|6|7|8| 9]) \d{8}$
E. Social Security Number (15-bit, 18-digit number), the last one is a check digit, possibly a number or a character x: (^\d{15}$) | (^\d{18}$) | (^\d{17} (\d| X|X) $)
F. Date format: ^\d{4}-\d{1,2}-\d{1,2}
G. Tencent QQ number: [1-9][0-9]{4,}
H. IP Address: ((?:(? : 25[0-5]|2[0-4]\\d| [01]?\\d?\\d] \ \.) {3} (?: 25[0-5]|2[0-4]\\d| [01]?\\d?\\d)]
I. China postcode (China ZIP code is 6 digits): [1-9]\d{5} (?! \d) Python Regular Expressions

In the beginner's Python basic tutorial, the Python Regular Expressions section is devoted to the knowledge of Python regular expressions. Online testing Tools Statement

This excerpt from the Rookie Tutorial: regular-expression tutorials

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.