Regular expression syntax and common examples __ regular expressions

Source: Internet
Author: User
Tags alphabetic character character set numeric lowercase printable characters regular expression
Brief Introduction

Regular expressions are pairs of strings, including ordinary characters (for example, the letters between A and z) and special characters (called "metacharacters") are a logical formula for manipulating a "rule string" with a predefined set of characters, and a combination of those particular characters, which A filtering logic used to express a string.
A regular expression is a text pattern that describes one or more strings to match when searching for text.
Regular expressions are often used to retrieve and replace text that conforms to a pattern (rule). Grammar

Ordinary characters

Normal characters include all printable and nonprinting characters that are not shown as metacharacters. This includes all uppercase and lowercase letters, all numbers, all punctuation marks, and some other symbols.

Non-printable characters

Nonprinting characters can also be part of a regular expression. The following table lists the escape sequences that represent nonprinting characters:

character Description
\f Match a page feed character
\ n Match a line feed
\ r Match a return character
\ t Match a tab
\v Match a vertical tab
\s Matches any white space character, including spaces, tabs, page breaks, and so on
\s Match any non-white space character
Special characters

The so-called special character, is some special meaning of the characters, such as ", meaning any character, and if only want to match the ' symbol, you need to escape the ' * ' symbol, which is preceded by the ' \ ' escape character.

The following table lists the special characters in the regular expression:

Special Characters Description
^ Matches the starting position of a string, unless used in a bracket expression, at which point it indicates that the character set is not accepted. For example [^a], which means not beginning with a character, to match the ^ character itself, plus the escape character: \^
$ Match the end position of the string to match the $ character itself, plus the escape character: \$
() Mark the beginning and end of a subexpression to match the $ character itself, plus the escape character: \ (and \)
[ Marks the beginning of a bracket expression
{ Start of a tag qualifier expression
| Indicates a choice between two items
\ Marks the next character as either a special character, or a literal character, or a backward reference, or a octal escape character. For example, ' n ' matches the character ' n ', ' \ n ' matches a newline character. ' Match '
. Match any but characters outside of the newline character \ n
? qualifier, matching the preceding subexpression 0 or 1 times, or indicating a non-greedy qualifier
* qualifier, matching the preceding subexpression 0 or more times (with or without)
+ qualifier, matching the preceding subexpression 1 or more times (at least once)
Qualifier

A qualifier is used to specify how many times a given component of a regular expression appears to satisfy a match. The qualifiers for regular expressions are:

character Description
N n is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '
{N,} n is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ' but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '
{N,m} M and n are nonnegative integers, of which n <= M. Matches n times at least and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' o '. Note that you cannot have spaces between commas and two numbers
? Matches the preceding subexpression 0 or 1 times. For example, ' Do (es)? ' Can match ' do ', ' does ' in ' does ', ' do ' in ' Doxy '. Equivalent to {0,1}
* Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * Equivalent to {0,}
+ Matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but cannot match "Z". + equivalent to {1,}

Note: ' * ' and ' + ' are greedy, they will probably match the text, only after they add one. You can achieve a non greedy or minimal match.

For example, match the H1 tag in the following HTML document:

 

greedy mode : An expression that matches everything from the beginning less than the symbol (<) to the greater-than symbol (>) of the H1 tag

/<.*>/

non-greedy mode : If you only need to match the start and end H1 tags, the following non-greedy expressions only match <H1>

/<.*?>/

If you want to match only the starting H1 tag, the expression is:

/<\w+?>/
Locator character

A locator allows you to pin a regular expression to the beginning or end of a line. They also enable you to create regular expressions that appear within a word, at the beginning of a word, or at the end of a word.

A locator character is used to describe a string or word boundary.

character Description
^ Match the start position of the string
$ Matches the end position of a string
\b Matches a word boundary, that is, the position of the word and the space, for example, ' er\b ' can match ' er ' in ' never ', but cannot match ' er ' in ' verb '
\b Matches a non word boundary, ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '

Note : Qualifiers cannot be used with the locator character. Expressions such as ^* are not allowed because you cannot have more than one position immediately before or after a newline or word boundary.

To match the text at the beginning of a line of text, use the ^ character at the beginning of the regular expression. Do not confuse this usage of ^ with the usage within the bracket expression.

To match the text at the end of a line of text, use the $ character at the end of the regular expression. Other meta characters

character Description
X|y Match x or Y. For example, ' Z
[XYZ] Character set combination. Matches any one of the characters contained. For example, ' [ABC] ' can match ' a ' in ' plain '
[^XYZ] Negative character set combination. Matches any characters that are not included. For example, ' [^ABC] ' can match ' P ', ' l ', ' I ', ' n ' in ' plain '
[A-z] The range of characters. Matches any character within the specified range. For example, ' [A-z] ' can match any lowercase alphabetic character in the range ' a ' to ' Z '
[^a-z] Negative character range. Matches any character that is not in the specified range. For example, ' [^a-z] ' can match any character that is not in the range ' a ' to ' Z '
\d Matches a numeric character. equivalent to [0-9]
\d Matches a non-numeric character. equivalent to [^0-9]
\w Matches letters, numbers, underscores. equivalent to [a-za-z0-9_]
\w Matches non-letters, numbers, underscores. equivalent to [^a-za-z0-9_]
\s Matches any white space character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v]
\s Matches any non-white-space character. equivalent to [^ \f\n\r\t\v]
regular expressions that are commonly usedVerify Number
A. Number: ^[0-9]*$
B. N digits: ^\d{n}$
C. m-n digits: ^\d{m,n}$
D. 0 and non-0 digits: ^ (0|[ 1-9][0-9]*) $
E. Positive integer: ^[1-9]\d*$
F. Negative integers: ^-[1-9]\d*$
G. Integer: ^-? [1-9]\d*$ Checksum character
A. Chinese characters: ^[\u4e00-\u9fa5]{0,}$
B. Letters and numbers: ^[a-za-z0-9]+$ Special
A. Email address: ^\w+ ([-+.] \w+) *@\w+ ([-.] \w+) *\.\w+ ([-.] \w+) *$
B. Domain name: [a-za-z0-9][-a-za-z0-9]{0,62} (/.[ A-ZA-Z0-9][-A-ZA-Z0-9]{0,62}) +/.?
C. interneturl:[a-za-z]+://[^\s]* or ^http://([\w-]+\.) +[\w-]+ (/[\w-./?%&=]*)? $
D. Mobile phone Number: ^ (13[0-9]|14[5|7]|15[0|1|2|3|5|6|7|8| 9]|18[0|1|2|3|5|6|7|8| 9]) \d{8}$
E. Identification number (15-digit, 18-digit), the last is a check digit, may be a number or character x: (^\d{15}$) | (^\d{18}$) | (^\d{17} (\d| X|X) $)
F. Date format: ^\d{4}-\d{1,2}-\d{1,2}
G. Tencent QQ Number: [1-9][0-9]{4,}
H. IP address: (?:(? : 25[0-5]|2[0-4]\\d| [01]?\\d?\\d) \.) {3} (?: 25[0-5]|2[0-4]\\d| [01]?\\d?\\d)]
I. China ZIP code (6 digits in China): [1-9]\d{5} (?! \d) Python Regular Expressions

In the Python Basics tutorial of the Rookie tutorial, the Python Regular expression section is devoted to the knowledge of Python regular expressions. Online testing Tools Statement

The above is excerpted from rookie tutorials: Regular Expression Tutorials

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.