PHP Regular Expression _php tutorial

Source: Internet
Author: User
Tags php regular expression printable characters
1. Intro
At present, regular expression has been widely used in many software, including *nix (Linux, UNIX, etc.), HP and other operating systems, Php,c#,java and other development environments, as well as many applications, can see the shadow of the regular expression.

The use of regular expressions can be achieved through a simple approach to powerful functions. In order to be simple and effective without losing strong, resulting in regular expression code difficult, learning is not very easy, so need to pay some effort to do, after the introduction of reference to certain references, use up or relatively simple and effective.

Example: ^.+@.+\\. +$

This kind of code used to scare me a lot. Maybe a lot of people are scared to run away by this kind of code. Continuing to read this article will allow you to freely apply such code.

2. History of regular expressions
The "ancestors" of regular expressions can be traced back to early studies of how the human nervous system works. Warren McCulloch and Walter Pitts the two neuroscientists have developed a mathematical approach to describe these neural networks.

In 1956, a mathematician named Stephen Kleene, on the basis of the early work of McCulloch and Pitts, published a paper titled "Representation of Neural network events", introducing the concept of regular expressions. The regular expression is the expression that describes what he calls "the algebra of the regular set", so the term "regular expression" is used.

Later, it was discovered that this work could be applied to some early studies of the computational search algorithm using Ken Thompson, the main inventor of Unix. The First Utility application of a regular expression is the QED editor in Unix.

As they say, the rest is a well-known history. Since then, regular expressions have been an important part of text-based editors and search tools.


3. Regular expression definitions
The regular expression (regular expression) describes a pattern of string matching that can be used to check whether a string contains a seed string, replaces a matched substring, or extracts a substring that matches a certain condition from a string.

Regular expressions are text patterns that consist of ordinary characters, such as characters A through z, and special characters (called metacharacters). A regular expression, as a template, matches a character pattern to the string you are searching for.

3.1 Ordinary characters
Consists of all printed and nonprinting characters that are not explicitly specified as metacharacters. This includes all uppercase and lowercase alphabetic characters, all numbers, all punctuation marks, and some symbols.

3.2 Non-printable characters

Word character containing righteousness
\cx Matches the control character indicated by X. For example, \cm matches a control-m or carriage return. The value of x must be one of a-Z or a-Z. Otherwise, c is treated as a literal ' C ' character.
\f Matches a page break. Equivalent to \x0c and \CL.
\ n Matches a line break. Equivalent to \x0a and \CJ.
\ r Matches a carriage return character. Equivalent to \x0d and \cm.
\s Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v].
\ t Matches a tab character. Equivalent to \x09 and \ci.
\v Matches a vertical tab. Equivalent to \x0b and \ck.


3.3 Special characters
The so-called special character, is some special meaning of the character, such as the above said "*.txt" in the *, simply means that the meaning of any string. If you are looking for a file with * in the file name, you need to escape the *, which is preceded by a \. LS \*.txt. The regular expression has the following special characters.

Character say Ming
$ Matches the end position of the input string if the Multiline property of the RegExp object is set, then $ also matches ' \ n ' or ' \ R ' to match the $ character itself, using the \$
() Marks the start and end of a subexpression where a subexpression can be obtained for later use to match these characters, use \ (and \)
* Match the preceding sub-expression 0 or more times to match the * character, use \*
+ Match the preceding subexpression one or more times to match the + character, use \+
. Matches any single character except for the newline character \ n., use \.
[ Marks the beginning of a bracket expression. to match [, please use \[
? Matches the preceding subexpression 0 or one time, or indicates a non-greedy qualifier. to match? characters, please use \?
\ Marks the next character as either a special character, a literal character, a backward reference, or an octal escape.
^ Matches the starting position of the input string, unless used in a square bracket expression, which indicates that the character set is not accepted at this time. To match the ^ character itself, use the \^
{ The start of the tag qualifier expression. To match {, use \{
| Indicates a choice between the two items. to match |, please use \|

3.4 Qualifiers

Qualifiers are used to specify how many times a given component of a regular expression must appear to satisfy a match. There are 6 types of * or + or? or {n} or {n,} or {n,m}. The *, +, and? Qualifiers are greedy because they match as many words as possible, but only after they are added with one? You can implement a non-greedy or minimal match.
The qualifiers for a regular expression are:

Special characters say Ming
* Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * Equivalent to {0,}
+ Matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but not "Z". + equivalent to {1,}
? Matches the preceding subexpression 0 or one time. For example, "Do (es)?" can match "do" in "do" or "does".? Equivalent to {0,1}
N N is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '
{N,} N is a non-negative integer. Match at least n times.
{N,m} Both M and n are non-negative integers, where n <= m. Matches at least n times and matches up to M times.

3.5 Locator

Used to describe the bounds of a string or word, ^ and $ refer to the beginning and end of the string, \b describes the front or back bounds of the word, and \b represents a non-word boundary. You cannot use qualifiers on locators.


3.6 Selection

Enclose all selections in parentheses, separating the adjacent selections by |. But with parentheses there is a side effect that the associated match is cached and available at this time?: Put the first option to eliminate this side effect.
Where?: one of the non-capturing elements, and two non-capturing elements are? = and?!, these two also have more meanings, the former is forward pre-check, in any beginning to match the position of the regular expression pattern within the parentheses to match the search string, the latter is a negative pre-check, Matches the search string at any start where the regular expression pattern does not match.

3.7 Back to reference

Adding parentheses around a regular expression pattern or part of a pattern causes the related match to be stored in a temporary buffer, and each captured sub-match is stored according to what is encountered in the regular expression pattern from left to right. The buffer number for the storage sub-match starts at 1 and continues numbering up to 99 sub-expressions. Each buffer can be accessed using ' \ n ', where n is a single or two-bit decimal number that identifies a particular buffer.
You can use the non-capturing metacharacters '?: ', '? = ', or '?! ' to ignore the save of the related match.

4. Operation priority for various operators

The operations of the same priority are left-to-right, and the operations of different priorities are higher and lower than before. The precedence of the various operators is from high to low as follows:


Word character Description described
\ Escape character
(), (?:), (?=), [] Parentheses and square brackets
*, +,?, {n}, {n,}, {n,m} Qualifier
^, $, \anymetacharacter Location and order
| "or" action


Excerpt from Lee's column

http://www.bkjia.com/PHPjc/478463.html www.bkjia.com true http://www.bkjia.com/PHPjc/478463.html techarticle 1. Introduction At present, the regular expression has been widely used in many software, including *nix (Linux, UNIX, etc.), operating systems such as HP, Php,c#,java and other development environments, as well as a lot of ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.