PHP Regular Expression

Source: Internet
Author: User
Tags control characters php regular expression printable characters

1. Introduction
At present, regular expressions have been widely used in many software applications, including * nix (Linux, Unix, etc.), HP and other operating systems, PHP, C #, Java and other development environments, and many applications can see the shadow of regular expressions.

The use of regular expressions can be implemented in a simple way. In order to be simple, effective, and powerful, it makes the Regular Expression Code more difficult and difficult to learn. Therefore, you need to make some effort. After getting started, refer to some references, it is relatively simple and effective to use.

Example: ^. + @. + \... + $

Such code has been used to scare me out many times. Many people may be scared away by such code. Continue reading this article to allow you to freely apply such code.

2. Regular Expression history
The "Ancestor" of regular expressions can be traced back to early studies on how the human nervous system works. Warren McCulloch and Walter Pitts, two neuroscientists, developed a mathematical method to describe these neural networks.

In 1956, a mathematician named Stephen Kleene published a paper titled "neural network event representation" based on McCulloch and Pitts's early work, introduces the concept of regular expressions. A regular expression is an expression used to describe the algebra of a positive set. Therefore, the regular expression is used.

Later, it was found that this work could be applied to some early research using Ken Thompson's computational search algorithm, which is the main inventor of Unix. The first utility of regular expressions is the qed editor in Unix.

As they said, the rest is the well-known history. Since then, regular expressions have been an important part of text-based editors and search tools.


3. Regular Expression Definition
A regular expression (regular expression) describes a string matching pattern, it can be used to check whether a string contains a seed string, replace matched substrings, or retrieve substrings that meet certain conditions from a string.

A regular expression is a text mode consisting of common characters (such as characters a to z) and special characters (such as metacharacters. A regular expression is used as a template to match a character pattern with the searched string.

3.1 common characters
It consists of all the print and non-print characters that are not explicitly specified as metacharacters. This includes all uppercase and lowercase letter characters, all numbers, all punctuation marks, and some symbols.

3.2 non-printable characters

WordCharacter IncludingYi
\ Cx Match the control characters specified by x. For example, \ cM matches a Control-M or carriage return character. The value of x must be either a A-Z or a-z. Otherwise, c is treated as an original 'C' character.
\ F Match a form feed. It is equivalent to \ x0c and \ cL.
\ N Match A linefeed. It is equivalent to \ x0a and \ cJ.
\ R Match a carriage return. It is equivalent to \ x0d and \ cM.
\ S Matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v].
\ S Match any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v].
\ T Match a tab. It is equivalent to \ x09 and \ cI.
\ V Match a vertical tab. It is equivalent to \ x0b and \ cK.
 

Special Character 3.3
Special characters are characters with special meanings, such as *. txt. in simple words, they represent the meaning of any string. If you want to find a file with * in the file name, you need to escape *, that is, add a \ before it \. Ls \ *. txt. Regular expressions have the following special characters.

Character DescriptionMing
$ Match the end position of the input string if the Multiline attribute of the RegExp object is set, $ also matches '\ n' or' \ R' to match the $ character itself. Use \ $
() The start and end positions of a subexpression can be obtained for later use to match these characters. Use \ (and \)
* Match the previous sub-expression zero or multiple secondary * characters. Use \*
+ Match the previous subexpression with one or more secondary matching + characters. Use \ +
. To match any single character except linefeed \ n, use \.
[ Mark the start of a bracket expression. To match [, use \[
? Match the previous subexpression zero or once, or specify a non-Greedy qualifier. To match? Character, use \?
\ Mark the next character as or a special character, or a literal character, or backward reference, or an octal escape character.
^ Matches the start position of the input string. Unless used in the square brackets expression, this character set is not accepted. To match ^ characters, use \ ^
{ Mark the start of a qualifier expression. To match {, use \{
| Specifies a choice between two items. To match |, use \ |
 

3.4 qualifier

A qualifier is used to specify how many times a given component of a regular expression must appear to match. There are * or + or? There are 6 types: {n}, {n,}, or {n, m. *, +, And? The delimiters are greedy because they will match as many words as possible, and only add one? You can achieve non-greedy or minimum matching.
Regular expressions have the following delimiters:

Special characters DescriptionMing
* Matches the previous subexpression zero or multiple times. For example, zo * can match "z" and "zoo ". * Equivalent to {0 ,}
+ Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "zoo", but cannot match "z ". + Equivalent to {1 ,}
? Match the previous subexpression zero or once. For example, "do (es )? "Can match" do "in" do "or" does ".? Equivalent to {0, 1}
{N} N is a non-negative integer. Match n times. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two
{N ,} N is a non-negative integer. Match at least n times.
{N, m} Both m and n are non-negative integers, where n <= m. Match at least n times and at most m times.
 

3.5 positioning Operator

It is used to describe the boundary of a string or word. ^ and $ respectively refer to the start and end of a string, \ B describes the boundary before or after a word, and \ B indicates non-word boundary. The delimiters cannot be used.


3.6 select

Enclose all selection items with parentheses, and separate adjacent selection items with |. But there is a side effect when parentheses are used, that is, the related matching will be cached. Is it available now? : Put the first option to eliminate this side effect.
Where? : One non-capturing element, and two non-capturing elements? = And ?!, The two have more meanings. The former is forward pre-query, and matches the search string at any position starting to match the Regular Expression Pattern in parentheses. The latter is negative pre-query, match the search string at any position that does not match the regular expression pattern.

3.7 Back Reference

Adding parentheses on both sides of a regular expression or partial expression will cause the matching to be stored in a temporary buffer, each captured sub-match is stored in the content from left to right in the regular expression mode. The buffer number that stores the sub-match starts from 1 and ranges from consecutive numbers to a maximum of 99 subexpressions. Each buffer zone can be accessed using '\ n', where n is one or two decimal digits that identify a specific buffer zone.
Can I use non-captured metacharacters '? :','? = ', Or '?! 'To ignore the save of the matching.

4. Operation priority of various operators

Operations with the same priority are performed from left to right. Operations with different priorities are first high and then low. The priorities of operators are as follows:

 
 

WordCharacter DescriptionDescription
\ Escape Character
(),(? :),(? =), [] Parentheses and square brackets
*, + ,?, {N}, {n ,}, {n, m} Qualifier
^, $, \ Anymetacharacter Location and order
| "Or" Operation
 

From Lee.'s column

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.