PHP regular expression full tutorial basics, regular expression full tutorial

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

At present, regular expressions have been widely used in many software applications, including * nix (Linux, Unix, etc.), HP and other operating systems, PHP, C #, Java and other development environments, and many applications can see the shadow of regular expressions.

The use of regular expressions can be implemented in a simple way.

In order to be simple, effective, and powerful, it makes the Regular Expression Code more difficult and difficult to learn.

Example: ^. + @. +... + $

Such code has been used to scare me out many times. Many people may be scared away by such code.

After learning this tutorial, you can also freely apply such code.

Regular Expression history

The "Ancestor" of regular expressions can be traced back to early studies on how the human nervous system works. Warren McCulloch and Walter Pitts, two neuroscientists, developed a mathematical method to describe these neural networks.

In 1956, a mathematician named Stephen Kleene published a paper titled "neural network event representation" based on McCulloch and Pitts's early work, introduces the concept of regular expressions. A regular expression is an expression used to describe the algebra of a positive set. Therefore, the regular expression is used.

Later, it was found that this work could be applied to some early research using Ken Thompson's computational search algorithm, which is the main inventor of Unix. The first utility of regular expressions is the qed editor in Unix.

Since then, regular expressions have been an important part of text-based editors and search tools.

A regular expression (regular expression) describes a string matching pattern, it can be used to check whether a string contains a seed string, replace matched substrings, or retrieve substrings that meet certain conditions from a string.

This section describes the definition of regular expressions.

In the column directory, *. txt in dir *. txt or ls *. txt is not a regular expression, because here * is different from the regular expression.

A regular expression is a text mode consisting of common characters (such as characters a to z) and special characters (such as metacharacters. A regular expression is used as a template to match a character pattern with the searched string.

1. common characters

It consists of all the print and non-print characters that are not explicitly specified as metacharacters. This includes all uppercase and lowercase letter characters, all numbers, all punctuation marks, and some symbols.

2. Non-printable characters

Character meaning

Cx matches the control characters specified by x. For example, cM matches a Control-M or carriage return character. The value of x must be either a A-Z or a-z. Otherwise, c is treated as an original 'C' character.
F matches a form feed. It is equivalent to x0c and cL.
N matches a linefeed. It is equivalent to x0a and cJ.
R matches a carriage return. It is equivalent to x0d and cM.
S matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [fnrtv].
S matches any non-blank characters. It is equivalent to [^ fnrtv].
T matches a tab. It is equivalent to x09 and cI.
V matches a vertical tab. It is equivalent to x0b and cK.

3. Special characters

Special characters are characters with special meanings, such as *. txt. in simple words, they represent the meaning of any string. If you want to find a file with * in the file name, you need to escape *, that is, add one before it. Ls *. txt. Regular expressions have the following special characters.

Special characters

$ Matches the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches 'n' or 'R '. To match the $ character, use $.

() Mark the start and end positions of a subexpression. Subexpressions can be obtained for future use. To match these characters, use (and ).

* Matches the previous subexpression zero or multiple times. To match * characters, use *.

+ Match the previous subexpression once or multiple times. To match + characters, use +.

. Match any single character except linefeed n. To match., use.

[Mark the start of a bracket expression. To match [, use [.

? Match the previous subexpression zero or once, or specify a non-Greedy qualifier. To match? Character, please use ?.
Mark the next character as or a special character, or a literal character, or backward reference, or an octal escape character. For example, 'n' matches the character 'n '. 'N' matches the linefeed. The sequence ''matches" ", while '(' matches "(".

^ Matches the start position of the input string. Unless used in the square brackets expression, this character set is not accepted. To match the ^ character itself, use ^.
{Mark the start of the qualifier expression. To match {, use {.

| Specify an option between the two items. To match |, use |.

The method for constructing a regular expression is the same as that for creating a mathematical expression. That is, a larger expression is created by combining a small expression with a variety of metacharacters and operators. The regular expression component can be a single character, Character Set combination, character range, choice between characters, or any combination of all these components.

4. Qualifier

A qualifier is used to specify how many times a given component of a regular expression must appear to match. There are * or + or? There are 6 types: {n}, {n,}, or {n, m.
*, +, And? The delimiters are greedy because they will match as many words as possible, and only add one? You can achieve non-greedy or minimum matching.
Regular expressions have the following delimiters:

Character Description

* Matches the previous subexpression zero or multiple times. For example, zo * can match "z" and "zoo ". * Is equivalent to {0 ,}.

+ Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "zoo", but cannot match "z ". + Is equivalent to {1 ,}.

? Match the previous subexpression zero or once. For example, "do (es )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }.

{N} n is a non-negative integer. Match n times. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food.

{N,} n is a non-negative integer. Match at least n times. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent

'O + '. 'O {0,} 'is equivalent to 'o *'.

Both {n, m} m and n are non-negative integers, where n <= m. Match at least n times and at most m times. For example, "o {1, 3}" matches the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note that there must be no space between a comma and two numbers.

5. Positioning character

It is used to describe the boundary of a string or word. ^ and $ respectively indicate the start and end of a string. B describes the boundary before or after a word, and B indicates non-word boundary. The delimiters cannot be used.

6. Select

Enclose all selection items with parentheses, and separate adjacent selection items with |. But there is a side effect when parentheses are used, that is, the related matching will be cached. Is it available now? : Put the first option to eliminate this side effect.

Where? : One non-capturing element, and two non-capturing elements? = And ?!, The two have more meanings. The former is forward pre-query, and matches the search string at any position starting to match the Regular Expression Pattern in parentheses. The latter is negative pre-query, match the search string at any position that does not match the regular expression pattern.

7. Back Reference

Adding parentheses on both sides of a regular expression or partial expression will cause the matching to be stored in a temporary buffer, each captured sub-match is stored in the content from left to right in the regular expression mode. The buffer number that stores the sub-match starts from 1 and ranges from consecutive numbers to a maximum of 99 subexpressions. Each buffer zone can be accessed using 'n', where n is one or two decimal digits that identify a specific buffer zone.

Can I use non-captured metacharacters '? :','? = ', Or '?! 'To ignore the save of the matching.

The content of this article is over now. The php regular expressions mentioned above are very useful and will be updated later to improve the regular expressions. Please stay tuned.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More