PHP Regular Expression Complete tutorial of the essence of the regular expression

Source: Internet
Author: User
Tags lowercase php regular expression

Some examples of PHP regular expressions

Description of regular expressions

/b ([a-z]+) 1b/gi where a word appears continuously

/(w+):///([^/:]+) (:D *)? ([^#]*)/resolves a URL to a protocol, domain, port, and relative path

/^ (?: chapter| section) [Location of 1-9][0-9]{0,1}$/location

/[-a-z]/A to Z a total of 26 letters plus a-number.

/terb/can match chapter, but not terminal

/bapt/can match chapter, but not aptitude

/windows (? =95 |98 | NT/can match Windows95 or Windows98 or WindowsNT, when a match is found, start the next time from behind Windows

This section describes the matching rules for PHP regular expressions

  1. Basic Pattern Matching

Everything starts from the basics. Patterns, the most basic element of regular expressions, are a set of characters that describe the character of a string. Patterns can be simple, composed of ordinary strings, and can be very complex, often using special characters to represent a range of characters, repeat occurrences, or represent contexts. For example:

^once

This pattern contains a special character ^, which indicates that the pattern matches only those strings that start with once. This pattern, for example, matches the string "Once Upon a Time" and does not match the "There once was a mans from NewYork". Just as the ^ symbol indicates the beginning, the $ symbol is used to match the strings that end in the given pattern.

bucket$

This pattern matches the "who kept all of the this cash in a bucket" and does not match "buckets". Characters ^ and $ are used at the same time to represent exact matches (strings are the same as patterns). For example:

^bucket$

Matches only the string "bucket". If a pattern does not include ^ and $, it matches any string that contains the pattern. For example: mode

Once

and string

There once was a mans from NewYork

Who kept all of the cash in a bucket.

is a match.

The letters in the pattern (O-N-C-E) are literal characters, that is, they represent the letter itself, and the number is the same. Some other slightly more complex characters, such as punctuation and white space (spaces, tabs, and so on), use the escape sequence. All escape sequences are preceded by a backslash (). The escape sequence for tabs is: T. So if we're going to detect whether a string starts with tabs, you can use this pattern:

^t

Similarly, a "new row" is represented by N, and R represents a carriage return. Other special symbols, which can be preceded by a backslash, such as the backslash itself with the expression, the period.

2. Character clusters

In programs on the Internet, regular expressions are typically used to validate user input. When a user submits a form, it is not enough to determine whether the phone number, address, email address, credit card number, etc. are valid, and the usual literal characters are not sufficient.

So to use a more liberal way of describing the pattern we want, it's a character cluster. To create a character cluster that represents all the vowel characters, put all the vowel characters in one square bracket:

[Aaeeiioouu]

This pattern matches any vowel character, but can only represent one character. A hyphen can represent a range of characters, such as:

[A-z]//Match all lowercase letters

[A-z]//Match all uppercase letters

[A-za-z]//Match all the letters

[0-9]//Match all numbers

[0-9.-]//Match all numbers, periods and minus signs

[FRTN]//Match all white characters

Similarly, these also represent only one character, which is a very important one. If you want to match a string consisting of a lowercase and a digit, such as "Z2", "T6" or "G7", but not "ab2", "r2d3" or "B52", use this pattern:

^[a-z][0-9]$

Although [A-z] represents a 26-letter range, it can only match a string with the first character being a lowercase letter.

The previous reference to ^ represents the beginning of a string, but it has another meaning. When used in a set of square brackets, it denotes the meaning of "non" or "exclusion", which is often used to remove a character. In the previous example, we asked that the first character not be a number:

^[^0-9][0-9]$

This pattern matches "&5", "G7" and "2", but does not match "12" or "66". Here are a few examples of excluding specific characters:

[^a-z]//All characters except lowercase letters

[^/^]//All characters except () (/) (^)

[^ "']//all characters except double quotes (") and single quotes (')

Special character "." (dots, periods) are used in regular expressions to denote all characters except "new lines." So the pattern "^.5$" matches any two-character string that ends with the number 5 and begins with another non-"new line" character. Mode "." You can match any string except the empty string and a string that includes only a "new line".

PHP's regular expressions have some built-in common character clusters, which are listed below:

Character cluster meaning

[[: Alpha:]] any letter

[[:d Igit:]] any number

[[: Alnum:]] any letter or number

[[: Space:]] any white character

[[: Upper:]] Any capital letter

[[: Lower:]] any lowercase letter

[[:p UNCT:]] any punctuation mark

[[: Xdigit:]] Any number 16, equivalent to [0-9a-fa-f]

 3. Identification of repeated occurrences

Until now, you already know how to match a letter or number, but more often than not, you might want to match a word or a set of numbers. A word consists of several letters, and a set of numbers has several singular components. The curly braces ({}) that follow a character or a cluster of characters are used to determine how many occurrences of the preceding content are repeated.

Character cluster meaning

^[a-za-z_]$ all the letters and underscores

^[[:alpha:]]{3}$ all 3-letter words

^a$ Letter A

^a{4}$ AAAA

^a{2,4}$ aa,aaa or AAAA

^a{1,3}$ A,aa or AAA

^a{2,}$ contains more than two strings of a

^a{2,} such as: Aardvark and Aaab, but not Apple.

A{2,} such as: Baad and AAA, but Nantucket not.

T{2} two tabs

. {2} all two characters

These examples describe the three different uses of curly braces. A number, {x}, means "the preceding character or cluster of characters appears only x times"; A number with a comma, {x,} means "x or more times before"; two comma-delimited numbers, {x,y} indicates that "previous content appears at least x times, but not more than y times." We can extend the pattern to more words or numbers:

^[a-za-z0-9_]{1,}$//All strings containing more than one letter, number, or underscore

^[0-9]{1,}$//All positive numbers

^-{0,1}[0-9]{1,}$//all integers

^-{0,1}[0-9]{0,}. {0,1} [0-9] {0,}$//all decimals

The last example is not very well understood, is it? So see: With all the optional minus sign (-{0,1}) beginning (^), followed by 0 or more digits ([0-9]{0,}), and an optional decimal point (. { 0,1}) followed by 0 or more digits ([0-9]{0,}) and nothing else ($). Below you will know the simpler way to use it.

Special character "?" is equal to {0,1}, they all represent: "0 or 1 previous content" or "previous content is optional." So the example can be simplified as follows:

^-? [0-9] {0,}.? [0-9] {0,}$

The special character "*" is equal to {0, and they all represent "0 or more preceding content". Finally, the character "+" is equal to {1}, representing "1 or more preceding content", so the above 4 examples can be written as follows:

^[a-za-z0-9_]+$//All strings containing more than one letter, number, or underscore

^[0-9]+$//All positive numbers

^-? [0-9]+$//all integers

^-? [0-9]*.? [0-9]*$//All Decimal

This does not, of course, technically reduce the complexity of regular expressions, but it makes them easier to read.

PHP Regular form of the complete tutorial of the essence, all finished, I hope your understanding of the regular expression has been strengthened.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.