Regular Expression-Introduction to matching rules, regular expression matching rules

Source: Internet
Author: User

Regular Expression-Introduction to matching rules, regular expression matching rules
Basic mode matching

Everything starts from the most basic. Pattern is the most basic element of a regular expression. They are a set of characters that describe character strings. The mode can be very simple. It is composed of common strings and can be very complex. special characters are often used to indicate characters in a range, repeated occurrences, or context. For example:

^once

This mode contains a special character ^, indicating that this mode only matches strings starting with once. For example, this pattern matches the string "once upon a time" and does not match "There once was a man from NewYork. Like a ^ symbol, $ is used to match strings ending in a given pattern.

bucket$

This mode matches "Who kept all of this cash in a bucket" and does not match "buckets. When both the character ^ and $ are used, it indicates exact match (the string is the same as the pattern ). For example:

^bucket$

Only matches the string "bucket ". If a mode does not include ^ and $, it matches any string containing this mode. Example: Mode

once

And string

There once was a man from NewYorkWho kept all of his cash in a bucket.

Is matched.

In this mode, letters (o-n-c-e) are literal characters, that is, they indicate the letter itself, and numbers are the same. Escape sequences are used for other slightly complex characters, such as punctuation marks and white characters (spaces and tabs. All escape sequences start with a backslash. The escape sequence of the tab is \ t. So if we want to check whether a string starts with a Tab character, we can use this mode:

^\t 

Similarly, \ n is used to represent the "new line" and \ r is used to represent the carriage return. Other special symbols can be used in front with a backslash. For example, the backslash itself is represented by \, the period is represented by \., and so on.

Character Cluster

In INTERNET programs, regular expressions are usually used to verify user input. After a user submits a FORM, it is not enough to determine whether the entered phone number, address, EMAIL address, and credit card number are valid.

Therefore, we need to use a more free way to describe the mode we want. It is a character cluster. To create a character cluster that represents all vowel characters, put all the vowel characters in a square bracket:

[AaEeIiOoUu]

This mode matches any vowel character, but can only represent one character. The font size can be used to indicate the range of a character, for example:

[A-z] // match all lowercase letters [A-Z] // match all uppercase letters [a-zA-Z] // match all letters [0-9] // match all numbers [0-9 \. \-] // match all numbers, periods, and minus signs [\ f \ r \ t \ n] // match all white characters

Similarly, these are only one character, which is very important. If you want to match a string consisting of a lowercase letter and a digit, such as "z2", "t6", or "g7 ", if it is not "ab2", "r2d3", or "b52", use this mode:

^[a-z][0-9]$

Although [a-z] represents the range of 26 letters, it can only match strings with lowercase letters with the first character.

^ Indicates the start of a string, but it has another meaning. When ^ is used in square brackets, it indicates "not" or "excluded", which is often used to remove a character. In the preceding example, the first character must not be a number:

^[^0-9][0-9]$

This pattern matches "& 5", "g7", and "-2", but does not match "12", "66. The following are examples of how to exclude specific characters:

[^ A-z] // All characters except lower-case letters [^ \/\ ^] // except (\) (/) (^) all characters other than [^ \ "\ '] // All characters except double quotation marks (") and single quotation marks (')

The special character "." (point, period) is used to represent all characters except "New Line" in a regular expression. Therefore, the pattern "^. 5 $" matches any two-character string that ends with a number 5 and starts with another non-New Line character. Mode "." can match any string, except empty strings and strings containing only one "New Line.

PHP regular expressions have some built-in general character clusters. The list is as follows:

Character Cluster Description
[[: Alpha:] Any letter
[[: Digit:] Any number
[[: Alnum:] Any letter or number
[[: Space:] Any blank characters
[[: Upper:] Any uppercase letter
[[: Lower:] Any lowercase letter
[[: Punct:] Any punctuation
[[: Xdigit:] Any hexadecimal number, equivalent to [0-9a-fA-F]


Confirm repeated occurrence

Until now, you know how to match a letter or number, but in more cases, you may need to match a word or a group of numbers. A word may consist of several letters, and a group of numbers may consist of several singular numbers. Braces ({}) following the character or character cluster are used to determine the number of occurrences of the preceding content.

Character Cluster Description
^ [A-zA-Z _] $ All letters and underscores
^ [[: Alpha:] {3} $ All 3-letter words
^ A $ Letter
^ A {4} $ Aaaa
^ A {2, 4} $ Aa, aaa, or aaaa
^ A {1, 3} $ A, aa or aaa
^ A {2,} $ String containing more than two a strings
^ A {2 ,} For example, aardvark and aaab, but not apple
A {2 ,} For example, baad and aaa, but not Nantucket
\ T {2} Two tabs
. {2} All two characters

These examples describe three different usages of curly brackets. A number, {x} indicates "the character or character cluster appears only x times"; a number is added with a comma, {x ,} "The preceding content appears x or more times"; two numbers separated by commas (,). {x, y} indicates that "the preceding content appears at least x times, but not more than y ". We can extend the pattern to more words or numbers:

^ [A-zA-Z0-9 _] {1, }$ // All strings containing more than one letter, number, or underline ^ [0-9] {1 ,} $ // all positive numbers ^ \-{0, 1} [0-9] {1 ,} $ // All integers ^ \-{0, 1} [0-9] {0 ,}\. {0, 1} [0-9] {0, }$ // all decimals

The last example is hard to understand, right? Let's take a look: It starts with an optional negative sign (\-{0, 1}) (^), followed by 0 or more numbers ([0-9] {0,}), and an optional decimal point (\. {0, 1}) followed by 0 or multiple numbers ([0-9] {0,}), and nothing else ($ ). Next you will know the simpler method that can be used.

Special Character "? "It is equal to {0, 1}, and both represent:" 0 or 1 previous content "or" previous content is optional ". So the example just now can be simplified:

^\-?[0-9]{0,}\.?[0-9]{0,}$

The special characters "*" are equal to {0,}, and both represent "0 or multiple preceding content ". Finally, the character "+" is equal to {1,}, indicating "1 or more previous content". Therefore, the preceding four examples can be written as follows:

^ [A-zA-Z0-9 _] + $ // All strings containing more than one letter, number, or underline ^ [0-9] + $ // all positive numbers ^ \-? [0-9] + $ // All integers ^ \-? [0-9] * \.? [0-9] * $ // all decimals

Of course, this does not technically reduce the complexity of regular expressions, but it can make them easier to read.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.