PHPpreg_replace () regular expression replacement string _ PHP Tutorial

Source: Internet
Author: User
Tags php regular expression
PHPpreg_replace () regular replacement string. Data processed by a program is not always designed in advance by the database, or cannot be stored using the database structure. For example, Template engine parsing templates and junk sensitive information filtering

Data processed by a program is not always designed in advance by the database, or cannot be stored using the database structure.
For example, the template engine parses templates and filters sensitive junk information.
In general, we use regular expressions to match preg_match and replace preg_replace according to our rules.
However, in general applications, CRUD databases are nothing more than regular expressions.
According to the preceding two scenarios: statistical analysis, matching, and replacement.

PHP preg_replace () regular expression Replacement. Unlike Javascript regular expression replacement, PHP preg_replace () is used by default to replace all elements with symbol matching conditions.
Preg_replace (regular expression, replaced with, string, maximum number of replicas [default-1, countless times], replacement times)

The regular expressions in most languages are similar, but there are also slight differences.

PHP regular expression

Regular character regular expression
\ Mark the next character as a special character, an original character, or a backward reference, or an octal escape character. For example, "\ n" matches the character "n ". "\ N" matches a line break. The sequence "\" matches "\", and "\ (" matches "(".
^ Matches the start position of the input string. If the Multiline attribute of the RegExp object is set, ^ matches the position after "\ n" or "\ r.
$ Matches the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches the position before "\ n" or "\ r.
* Matches the previous subexpression zero or multiple times. For example, zo * can match "z" and "zoo ". * Is equivalent to {0 ,}.
+ Match the previous subexpression once or multiple times. For example, "zo +" can match "zo" and "zoo", but cannot match "z ". + Is equivalent to {1 ,}.
? Match the previous subexpression zero or once. For example, "do (es )?" It can match "do" in "does" or "does ".? It is equivalent to {0, 1 }.
{N} n is a non-negative integer. Match n times. For example, "o {2}" cannot match "o" in "Bob", but can match two o in "food.
{N,} n is a non-negative integer. Match at least n times. For example, "o {2,}" cannot match "o" in "Bob", but can match all o in "foooood. "O {1,}" is equivalent to "o + ". "O {0,}" is equivalent to "o *".
Both {n, m} m and n are non-negative integers, where n <= m. Match at least n times and at most m times. For example, "o {1, 3}" matches the first three o in "fooooood. "O {0, 1}" is equivalent to "o ?". Note that there must be no space between a comma and two numbers.
? When this character is followed by any other delimiter (*, + ,?, The matching mode after {n}, {n ,}, {n, m}) is not greedy. The non-greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible. For example, for strings "oooo", "o ?" A single "o" will be matched, while "o +" will match all "o ".
. Points match any single character except "\ n. To match any character including "\ n", use a pattern like "[\ s \ S.
(Pattern) matches pattern and obtains this match. The obtained match can be obtained from the generated Matches set. the SubMatches set is used in VBScript, and $0… is used in JScript... $9 attribute. To match the parentheses, use "\ (" or "\)".
(? : Pattern) matches pattern but does not get the matching result. that is to say, this is a non-get match and is not stored for future use. This is useful when you use the "(|)" character to combine all parts of a pattern. For example, "industr (? : Y | ies) "is a simpler expression than" industry | Industrial.
(? = Pattern) forward validation pre-query: match the search string at the beginning of any string that matches pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example (? = 95 | 98 | NT | 2000) "can match" Windows "in" Windows2000 ", but cannot match" Windows "in" Windows3.1 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
(?! Pattern) is a forward negative pre-query that matches the search string at the beginning of any string that does not match pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, "Windows (?! 95 | 98 | NT | 2000) "can match" Windows "in" Windows3.1 ", but cannot match" Windows "in" Windows2000 ".
(? <= Pattern) the reverse direction is certainly pre-query, which is similar to positive certainly pre-query, but in the opposite direction. For example, <= 95 | 98 | NT | 2000) Windows can match Windows in 2000Windows, but cannot match Windows in 3.1Windows ".
(? X | y matches x or y. For example, "z | food" can match "z" or "food ". "(Z | f) ood" matches "zood" or "food ".
[Xyz] character set combination. Match any character in it. For example, "[abc]" can match "a" in "plain ".
[^ Xyz] combination of negative character sets. Match any character not included. For example, "[^ abc]" can match "plin" in "plain ".
[A-z] character range. Matches any character in the specified range. For example, "[a-z]" can match any lowercase letter in the range of "a" to "z. Note: only when a hyphen is in a character group and between two characters is exceeded can the range of the characters be expressed. if the start of the group is exceeded, only the hyphen itself can be expressed.
[^ A-z] negative character range. Matches any character that is not within the specified range. For example, "[^ a-z]" can match any character that is not in the range of "a" to "z.
\ B matches a word boundary, that is, the position between a word and a space. For example, "er \ B" can match "er" in "never", but cannot match "er" in "verb ".
\ B matches non-word boundaries. "Er \ B" can match "er" in "verb", but cannot match "er" in "never ".
\ Cx matches the control characters specified by x. For example, \ cM matches a Control-M or carriage return character. The value of x must be either a A-Z or a-z. Otherwise, c is treated as a literal "c" character.
\ D matches a numeric character. It is equivalent to [0-9].
\ D matches a non-numeric character. It is equivalent to [^ 0-9].
\ F matches a break. It is equivalent to \ x0c and \ cL.
\ N matches a linefeed. It is equivalent to \ x0a and \ cJ.
\ R matches a carriage return. It is equivalent to \ x0d and \ cM.
\ S matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v].
\ S matches any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v].
\ T matches a tab. It is equivalent to \ x09 and \ cI.
\ V matches a vertical tab. It is equivalent to \ x0b and \ cK.
\ W matches any word characters that contain underscores. It is equivalent to "[A-Za-z0-9 _]".
\ W matches any non-word characters. It is equivalent to "[^ A-Za-z0-9 _]".
\ Xn matches n, where n is the hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers. For example, "\ x41" matches "". "\ X041" is equivalent to "\ x04 & 1 ". The regular expression can be ASCII encoded.
\ Num matches num, where num is a positive integer. References to the obtained matching. For example, "(.) \ 1" matches two consecutive identical characters.
\ N identifies an octal escape value or a backward reference. If at least n subexpressions are obtained before \ n, n is backward referenced. Otherwise, if n is an octal digit (0-7), n is an octal escape value.
\ Nm identifies an octal escape value or a backward reference. If at least one child expression is obtained before \ nm, the nm is backward referenced. If at least n records are obtained before \ nm, n is a backward reference followed by text m. If none of the preceding conditions are met, if n and m are octal numbers (0-7), \ nm matches the octal escape value nm.
\ Nml if n is an octal digit (0-7) and both m and l are octal digits (0-7), the octal escape value nml is matched.
\ Un matches n, where n is a Unicode character represented by four hexadecimal numbers. For example, \ u00A9 matches the copyright symbol (©).

The above table provides a comprehensive explanation of the regular expression, and the regular characters in the trademark have special meanings, which no longer represent the meaning of the original character. For example, in a regular expression, "+" does not represent the plus sign, but indicates matching once or multiple times. If you want "+" to represent the plus sign, you need to add the "\" escape before it, that is, use "\ +" to represent the plus sign.
1 + 1 = 2 the regular expression is: 1 \ + 1 = 2
The regular expression 1 + 1 = 2 can represent multiple 1 = 2, that is:
11 = 2 Regular expression: 1 + 1 = 2
111 = 2 Regular expression: 1 + 1 = 2
1111 = 2 Regular expression: 1 + 1 = 2
......
That is to say, all regular characters have specific meanings. if you need to represent the meaning of the original character, you need to add "\" escape before it, even if it is not a regular character, it is no problem to escape.
1 + 1 = 2 the regular expression can also be \ 1 \ + \ 1 \ = \ 2
All characters are escaped, but this is not recommended.
While regular expressions must be surrounded by delimiters. in Javascript, the delimiters are "/". in PHP, it is common to use "/" to define the delimiters, you can also use "#" to define a line, and enclose it in quotation marks.
If the regular expression contains these delimiters, you need to escape these characters.

PHP regular expression delimiters
In most languages, regular expressions use "/" as the delimiter. in PHP, you can also use "#" to define the delimiter. if a string contains a large number of "/" characters, when "/" is used to define a boundary, you need to escape these "/", while "#" does not need to be escaped, which is more concise.
$ Weigeti = 'w3cschool online tutorial URL is http://e.jbxue.com/, can you replace this URL with the correct URL? ';
// The above requirement is to replace the http://e.jbxue.com/with the http://e.jbxue.com/w3c/
//.:-All are regular characters. Therefore, escape is required, while/is a delimiter. if a string contains a/delimiter, escape is required.
Echo preg_replace ('/http \: \ // www \. jbxue \. com \/', 'http: // e.jbxue.com/w3c/', then weigeti );
// When # is used as the delimiters,/is no longer the meaning of the delimiters and does not need to be escaped.
Echo preg_replace ('# http \: // www \. jbxue \. com/#', 'http: // e.jbxue.com/w3c/', $weigeti );
// The above two output results are the same, [W3CSchool online tutorial web site is http://e.jbxue.com/w3c/, you can replace this Web site to the correct Web site ?]
?>
Through the above two PHP regular expression replacement codes, we can find that if a regular statement contains a large number of "/", no matter whether it uses "/" or "#" as the delimiter, however, "#" can make the code look more concise. However, we recommend that you still use "/" as the delimiter, because in Javascript and other languages, "/" can only be used as the delimiter, other languages.
PHP regular expression modifier

The modifier is placed at the end of the PHP regular expression separator "/", before the quotation marks at the end of the regular expression.
I. case-insensitive. case-insensitive matching
M multiple rows match independently. if the string does not contain line breaks such as [\ n], it is the same as a regular expression.
S: Set the regular symbol. it can match the line break [\ n]. if it is not set, the regular symbol. it cannot match the line break \ n.
X ignore spaces without escape
E eval () is used to execute functions on the matched elements.
A is pre-anchored, and the constraint matching only starts from the target string.
D: lock $ as the end. if there is no D, if the string contains line breaks such as [\ n], $ Still matches the line break. If the modifier m is set, the modifier D is ignored.
S analysis of non-anchored matching
U is not greedy. if "?" is added after the regular character quantifiers, To restore greed.
X open accessories incompatible with perl
U forces the string to be UTF-8 encoded, which is generally needed in a non-UTF-8 encoded document. It is recommended that you do not use this in the UTF-8 environment, there will be a Bug when using it according to E-dimensional technology survey. This Bug URL:
If you are familiar with the regular expression of Javascript, you may be familiar with the modifier "g" of the regular expression of Javascript, representing matching all elements that meet the condition. In PHP regular expression replacement, it is an element that matches all symbol conditions, so there is no Javascript modifier "g ".

PHP regular expression (Chinese) and case-insensitive PHP preg_replace () are case-sensitive and can only match strings in ASCII encoding, if you need to match characters that are not case sensitive or Chinese, you need to add the corresponding modifier I or u.
$ Weigeti = 'w3cschool online tutorial URL: http://www.jbxue.com/w3school /';
Echo preg_replace ('/W3CSchool/', 'w3c ', $ weigeti );
// Output [w3c online tutorial URL: http://www.jbxue.com/w3school /]
Echo preg_replace ('/W3CSchool/I', 'w3c ', $ weigeti );
// Ignore the case sensitivity and replace the output. [w3c online tutorial URL: http://e.jbxue.com/w3c /]
Echo preg_replace ('/URL/U', '', $ weigeti );
// Force UTF-8 Chinese, execute replacement, output [W3CSchool online tutorial: http://www.jbxue.com/w3school /]
?>
Both the case sensitivity and Chinese characters are sensitive in PHP, but in Javascript regular expressions, they are only case sensitive. ignoring the case sensitivity also applies the modifier I, but Javascript does not need to tell whether it is UTF-8 Chinese and other special characters, can directly match Chinese.

PHP regular line break instance
When a PHP regular expression encounters a line break, it regards the line break as a common character in the middle of the string. The general symbol. \ n cannot be matched, so there are many key points in character strings with line breaks.

$ Weigeti = "jbxue.com \ nIS \ nLOVING \ nYOU ";
// You want to replace $ weigeti with jbxue.com
Echo preg_replace ('/^ [A-Z]. * [A-Z] $/', '', $ weigeti );
// This regular expression is that the match only contains the element of \ w, $ weigeti starts with V, conforms to [A-Z], and ends with U, also conforms to [A-Z]. Cannot match \ n
// Output [jbxue.com is loveing you]
Echo preg_replace ('/^ [A-Z]. * [A-Z] $/S', '', $ weigeti );
// This modifier s, that is,. can match \ n. Therefore, the entire sentence is matched and the output is empty.
// Output 【]
Echo preg_replace ('/^ [A-Z]. * [A-Z] $/M', '', $ weigeti );
// Here, the modifier is used to match \ n as multiple rows independently. It is equivalent:
/*
$ Preg_m = preg_replace ('/^ [A-Z]. * [A-Z] $/M', '', $ weigeti );
$ P = '/^ [A-Z]. * [A-Z] $ /';
$ A = preg_replace ($ p, '', 'jbxue. com ');
$ B = preg_replace ($ p, '', 'Is ');
$ C = preg_replace ($ p, '', 'loving ');
$ D = preg_replace ($ p, '', 'you ');
$ Preg_m ===$ a. $ B. $ c. $ d;
*/
// Output [jbxue.com]
?>

In the future, when you use PHP to capture the content of a website and replace it with regular expressions in batches, you will not be able to ignore the obtained content that contains line breaks. therefore, be sure to pay attention when using regular expressions.
PHP regular expression matching execution function PHP regular expression replacement can use a modifier e, representing eval () to execute a function after matching.
$ Weigeti = 'w3cschool online tutorial URL: http://www.jbxue.com, you Jbzj! ? ';
// Convert the preceding URL to lowercase
Echo preg_replace ('/(http \: [\/\ w \. \-] + \/)/E', 'strtolower ("$1") ', $ weigeti );
// After modifier e is used, you can execute the PHP function strtolower () on the matched URL.
// Output [W3CSchool online tutorial URL: http://www.jbxue.com, you Jbzj! ?]
?>
According to the code above, although the matched function strtolower () is enclosed in quotation marks, it will still be executed by eval.

Regular expression replacement matching variable backward reference
If you are familiar with Javascript, you must be familiar with $1 $2 $3 ...... And so on. in PHP, these parameters can also be used as backward reference parameters. In PHP, \ 1 \ 1 can also be used to indicate backward reference.
The concept of backward reference is to match a large segment. this regular expression is cut into several small matching elements by parentheses, and each matching element is replaced by backward reference according to the sequence of parentheses.
$ Weigeti = 'w3cschool online tutorial URL: http://www.jbxue.com, you Jbzj! ? ';
Echo preg_replace ('/. + (http \: [\ w \-\/\.] + \/) [^ \ w \-\!] + ([\ W \-\!] +). +/',' $ 1', $ weigeti );
Echo preg_replace ('/. + (http \: [\ w \-\/\.] + \/) [^ \ w \-\!] + ([\ W \-\!] +). +/',' \ 1', $ weigeti );
Echo preg_replace ('/. + (http \: [\ w \-\/\.] + \/) [^ \ w \-\!] + ([\ W \-\!] +). +/',' \ 1', $ weigeti );
// The above three are output [http://www.jbxue.com]
Echo preg_replace ('/^ (. +) URL: (http \: [\ w \-\/\.] + \/) [^ \ w \-\!] + ([\ W \-\!] +). + $/',' Column: $1
URL: $2
Trademark: $ 3', $ weigeti );
/*
Topic: W3CSchool online tutorial
Web: http://www.jbxue.com
Trademark: Jbzj!
*/
// Brackets. The parentheses are counted first.
Echo preg_replace ('/^ (. +) URL: (http \: [\ w \-\/\.] + \/) [^ \ w \-\!] + ([\ W \-\!] +). +) $/', 'Original: $1
Topic: $2
URL: $3
Trademark: $ 4', $ weigeti );
/*
Original article: W3CSchool online tutorial URL: http://www.jbxue.com, you Jbzj! ?
Topic: W3CSchool online tutorial
Web: http://www.jbxue.com
Trademark: Jbzj!
*/

Bytes. For example, Template engine parsing templates and junk sensitive information filtering...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.