Regular Expression (4) example, matching rules

Source: Internet
Author: User
Tags alphanumeric characters

Some examples

Regular Expression Description
/\ B ([a-z] +) \ 1 \ B/gi Position where a word appears consecutively
/(\ W +): \/([^/:] +) (: \ d *)? ([^ #] *)/ Parses a URL into a protocol, domain, port, and relative path.
/^ (? : Chapter | Section) [1-9] [0-9] {0, 1} $/ Locate the Unit
/[-A-z]/ A to z contains 26 letters plus A-number.
/Ter \ B/ Can match chapter, but cannot terminal
/\ Bapt/ Can match chapter, but not aptitude
/Windows (? = 95 | 98 | NT )/

It can match Windows 95, Windows 98, or Windows snt. After a match is found, the next retrieval match starts after Windows.

Matching rules

7.1 basic mode matching
Everything starts from the most basic. Pattern is the most basic element of a regular expression. They are a set of characters that describe character strings. The mode can be very simple. It is composed of common strings and can be very complex. special characters are often used to indicate characters in a range, repeated occurrences, or context. For example:

^ Once

This mode contains a special character ^, indicating that this mode only matches strings starting with once. For example, this pattern matches the string "once upon a time" and does not match "There once was a man from NewYork. Like a ^ symbol, $ is used to match strings ending in a given pattern.

Bucket $

This mode matches "Who kept all of this cash in a bucket" and does not match "buckets. When both the character ^ and $ are used, it indicates exact match (the string is the same as the pattern ). For example:

^ Bucket $

Only matches the string "bucket ". If a mode does not include ^ and $, it matches any string containing this mode. Example: Mode

Once

And string

There once was a man from NewYork
Who kept all of his cash in a bucket.

Is matched.
In this mode, letters (o-n-c-e) are literal characters, that is, they indicate the letter itself, and numbers are the same. Escape sequences are used for other slightly complex characters, such as punctuation marks and white characters (spaces and tabs. All escape sequences start with a backslash. The escape sequence of the tab is \ t. So if we want to check whether a string starts with a Tab character, we can use this mode:

^ \ T

Similarly, \ n is used to represent a new line, and \ r is used to represent a carriage return. Other special symbols can be used in front with a backslash. For example, the backslash itself is represented by \, the period is represented by \., and so on.
7.2 character Cluster
In INTERNET programs, regular expressions are usually used to verify user input. After a user submits a FORM, it is not enough to determine whether the entered phone number, address, EMAIL address, and credit card number are valid.
Therefore, we need to use a more free way to describe the mode we want. It is a character cluster. To create a character cluster that represents all vowel characters, put all the vowel characters in a square bracket:

[AaEeIiOoUu]

This mode matches any vowel character, but can only represent one character. The font size can be used to indicate the range of a character, for example:

[A-z] // match all lowercase letters
[A-Z] // match all uppercase letters
[A-zA-Z] // match all letters
[0-9] // match all numbers
[0-9 \. \-] // match all numbers, periods, and periods
[\ F \ r \ t \ n] // match all white characters

Similarly, these are only one character, which is very important. If you want to match a string consisting of a lowercase letter and a digit, such as "z2", "t6", or "g7 ", if it is not "ab2", "r2d3", or "b52", use this mode:

^ [A-z] [0-9] $

Although [a-z] represents the range of 26 letters, it can only match strings with lowercase letters with the first character.
^ Indicates the start of a string, but it has another meaning. When ^ is used in square brackets, it indicates "not" or "excluded", which is often used to remove a character. In the preceding example, the first character must not be a number:

^ [^ 0-9] [0-9] $

This pattern matches "& 5", "g7", and "-2", but does not match "12", "66. The following are examples of how to exclude specific characters:

[^ A-z] // All characters except lowercase letters
[^ \/\ ^] // All characters except (\) (/) (^)
[^ \ "\ '] // All characters except double quotation marks (") and single quotation marks (')

The special character "." (point, period) is used to represent all characters except the "New Line" in a regular expression. Therefore, the pattern "^. 5 $" matches any two-character string that ends with a number 5 and starts with another non-New Line character. Mode "." can match any string, except empty strings and strings containing only one "New Line.
PHP regular expressions have some built-in general character clusters. The list is as follows:

Character cluster meaning
[[: Alpha:] Any letter
[[: Digit:] Any number
[[: Alnum:] Any letter or number
[[: Space:] any white characters
[[: Upper:] Any uppercase letter
[[: Lower:] Any lowercase letter
[[: Punct:] Any punctuation marks
[[: Xdigit:] Any hexadecimal number, equivalent to [0-9a-fA-F]

7.3 confirm repeated occurrence
Until now, you know how to match a letter or number, but in more cases, you may need to match a word or a group of numbers. A word may consist of several letters, and a group of numbers may consist of several singular numbers. Braces ({}) following the character or character cluster are used to determine the number of occurrences of the preceding content.

Character cluster meaning
^ [A-zA-Z _] $ all letters and underscores
^ [[: Alpha:] {3} $ all 3-letter words
^ A $ Letter
^ A {4} $ aaaa
^ A {2, 4} $ aa, aaa or aaaa
^ A {1, 3} $ a, aa or aaa
^ A {2, }$ contains more than two a strings
^ A {2,} For example, aardvark and aaab, but not apple
A {2,} such as baad and aaa, but not Nantucket
\ T {2} two tabs
. {2} All two characters

These examples describe three different usages of curly brackets. A number, {x} indicates "the character or character cluster appears only x times"; a number is added with a comma, {x ,} "The preceding content appears x or more times"; two numbers separated by commas (,). {x, y} indicates that "the preceding content appears at least x times, but not more than y times ". We can extend the pattern to more words or numbers:

^ [A-zA-Z0-9 _] {1, }$ // All strings containing more than one letter, number, or underline
^ [0-9] {1, }$ // all positive numbers
^ \-{0, 1} [0-9] {1, }$ // All integers
^ \-{0, 1} [0-9] {0 ,}\. {0, 1} [0-9] {0 ,}$ // all decimals

The last example is hard to understand, right? Let's take a look: It starts with an optional negative sign (\-{0, 1}) (^), followed by 0 or more numbers ([0-9] {0,}), and an optional decimal point (\. {0, 1}) followed by 0 or multiple numbers ([0-9] {0,}), and nothing else ($ ). Next you will know the simpler method that can be used.
Special Character "? "It is equal to {0, 1}, and both represent" 0 or 1 previous content "or" previous content is optional ". So the example just now can be simplified:

^ \-? [0-9] {0 ,}\.? [0-9] {0,} $

The special characters "*" and {0,} are equal. They all represent "0 or multiple front content ". Finally, the character "+" is equal to {1,}, indicating "one or more previous content". Therefore, the preceding four examples can be written as follows:

^ [A-zA-Z0-9 _] + $ // All strings that contain more than one letter, number, or underline
^ [0-9] + $ // all positive numbers
^ \-? [0-9] + $ // All integers
^ \-? [0-9] * \.? [0-9] * $ // all decimals

Of course, this does not technically reduce the complexity of regular expressions, but it can make them easier to read.

8. References
Regular Expressions of JScript and VBScript

Microsoft MSDN example (English ):

Scanning for HREFS
Provides an example that searches an input string and prints out all the href = "..." values and their locations in the string.
Changing Date Formats
Provides an example that replaces dates of the form mm/dd/yy with dates of the form dd-mm-yy.
Extracting URL Information
Provides an example that extracts a protocol and port number from a string containing a URL. For example, "http://www.contoso.com: 8080/letters/readme.html" returns "http: 8080 ".
Cleaning an Input String
Provides an example that strips invalid non-alphanumeric characters from a string.
Confirming Valid E-Mail Format
Provides an example that you can use to verify that a string is in valid e-mail format.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.