PHP Regular Expression Complete Tutorial Basic article _ Regular expression

Source: Internet
Author: User
Tags character set control characters php regular expression regular expression

At present, regular expressions have been widely used in many software applications, including *nix (Linux, UNIX, etc.), HP and other operating systems, Php,c#,java and other development environments, as well as many applications, can see the shadow of regular expression.

The use of regular expressions can be a simple way to achieve powerful features.

In order to be simple and effective without losing power, the regular expression code is difficult to learn, it is not easy to study.

Example: ^.+@.+. +$

Such code has repeatedly scared me out of myself. Maybe a lot of people are scared away by such code.

After completing this tutorial, you will also be free to apply such code.

History of regular expressions

The "ancestors" of regular expressions can be traced back to early studies of how the human nervous system works. Warren McCulloch and Walter Pitts, two neuroscientists, have developed a mathematical way of describing these neural networks.

In 1956, a mathematician named Stephen Kleene, based on the early work of McCulloch and Pitts, published a paper entitled "Representation of neural network events", introducing the concept of regular expressions. A regular expression is an expression that describes what he calls the algebra of a regular set, so the term "regular expression" is used.

Subsequently, it was found that this work could be applied to some early studies using Ken Thompson's computational Search algorithm, and Ken Thompson was the main inventor of Unix. The first practical application of regular expressions is the QED editor in Unix.

From then until now regular expressions are an important part of text-based editors and search tools.

A regular expression (regular expression) describes a pattern of string matching that can be used to check whether a string contains a seed string, to replace a matching substring, or to remove a substring from a string that matches a condition.

In this section we introduce the regular expression definition.

A regular expression (regular expression) describes a pattern of string matching that can be used to check whether a string contains a seed string, replaces a matching substring, or extracts a substring from a string that matches a condition.

Column directory, the *.txt in dir *.txt or LS *.txt is not a regular expression, because the meaning of this * is different from that of the regular type.

A regular expression is a literal pattern consisting of ordinary characters, such as characters A through z, and special characters, called metacharacters. A regular expression is used as a template to match a character pattern with the string being searched for.

1. Ordinary characters

Consists of all print and nonprinting characters that are not explicitly specified as metacharacters. This includes all uppercase and lowercase alphabetic characters, all numbers, all punctuation marks, and some symbols.

2. nonprinting characters

Character meaning

CX matches the control characters indicated by X. For example, CM matches a control-m or carriage return character. The value of x must be one-a-Z or a-Z. Otherwise, c is treated as a literal ' C ' character.
F matches a page feed character. Equivalent to x0c and CL.
n matches a newline character. Equivalent to x0a and CJ.
R matches a return character. Equivalent to x0d and CM.
s matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [FNRTV].
S matches any non-white-space character. equivalent to [^ FNRTV].
T matches a tab character. Equivalent to x09 and CI.
V matches a vertical tab. Equivalent to x0b and CK.

3. Special characters

The so-called special characters, that is, some special meaning of the characters, such as the above "*.txt" in the *, simply to say that any string meaning. If you are looking for a file with * in the file name, you need to escape the * by adding one before it. LS *.txt. Regular expressions have the following special characters.

Special Character description

$ matches the end position of the input string. If the Multiline property of the RegExp object is set, then $ also matches ' n ' or ' R '. To match the $ character itself, use $.

() marks the start and end position of a subexpression. The subexpression can be obtained for later use. To match these characters, use (and).

* Match the preceding subexpression 0 or more times. To match the * character, use the *.

+ matches the preceding subexpression one or more times. to match the + character, use the +.

. Matches any single character except the newline character N. to match., please use the.

[Marks the beginning of a bracket expression. To match [, use [.

? Matches the preceding subexpression 0 or more times, or indicates a non-greedy qualifier. Want to match? characters, please use?.
Marks the next character as either a special character, or a literal character, or a backward reference, or a octal escape character. For example, ' n ' matches the character ' n '. ' n ' matches the newline character. Sequence ' matches ', while ' (' Matches ' (".

^ matches the starting position of the input string, unless used in a bracket expression, at which point it means that the character set is not accepted. To match the ^ character itself, please use ^.
{marks the beginning of a qualifier expression.} To match {, use the {.

| Indicates a choice between two items. to match |, use |.

The method for constructing regular expressions is the same as for creating mathematical expressions. That is, using multiple metacharacters and operators to combine small expressions to create larger expressions. The component of a regular expression can be a single character, character set, character range, selection between characters, or any combination of any of these components.

4. Qualifier

A qualifier is used to specify how many times a given component of a regular expression must appear to satisfy a match. There are * or + or? or {n} or {n,} or {n,m} altogether 6 species.
*, +, and? Qualifiers are greedy because they match as many words as possible, only to add one behind them. You can achieve a non greedy or minimal match.
The qualifiers for regular expressions are:

Character description

* matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * is equivalent to {0,}.

+ matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but cannot match "Z". + is equivalent to {1,}.

? Match the preceding subexpression 0 times or once. For example, "Do (es)" can match "do" in "do" or "does". is equivalent to {0,1}.

{n} n is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '.

{N,} n is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ' but can match all o in ' Foooood '. ' O{1,} ' is equivalent to

' o+ '. ' O{0,} ' is equivalent to ' o* '.

{n,m} m and n are non-negative integers, where n <= m. Matches n times at least and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' o '. Notice that there is no space between the comma and the two number.

5. Locator character

Used to describe the boundary of a string or word, ^ and $ respectively, the beginning and end of the string, B describes the front or back bounds of the word, and B denotes a non word boundary. Qualifiers cannot be used on the locator.

6. Select

Enclose all the selections with parentheses, separating the adjacent selections with |. But with parentheses there is a side effect that the related match is cached and available at this time?: Put the first option to eliminate this side effect.

Among them: is one of the non-capture elements, and there are two not-captured dollars? = and?!, these two also have more meaning, the former is forward lookup, in any start matching the regular expression pattern within the parentheses position to match the search string, the latter is negative check, Matches the search string at any position that does not begin to match the regular expression pattern.

7. Back references

Adding parentheses around a regular expression pattern or part of a pattern causes the correlation match to be stored in a temporary buffer, and each captured child match is stored in the content that is encountered from left to right in the regular expression pattern. The buffer number for the storage child match starts at 1 and is numbered consecutively up to 99 subexpression. Each buffer can be accessed using ' n ', where n is a single or two-bit decimal number that identifies a particular buffer.

You can use a non-capture meta character '?: ', '? = ', or '?! ' to ignore the preservation of the related match.

This article is finished, the above for you to share the PHP regular expression is very useful, follow-up will be updated on the regular expression to improve the article, please continue to pay attention to friends.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.