PHP preg_replace () Regular replacement, unlike JavaScript regular substitution, PHP preg_replace () is the default is to replace all symbol matching conditions of the elements need our program processing data is not always pre-designed database thinking, Or it cannot be stored using the structure of the database.
such as template engine parsing template, garbage sensitive information filtering and so on.
In general, we use regular rules to match preg_match and replace Preg_replace.
But in general applications, there are just a few database crud, and the chances of fiddling around are few.
According to the foregoing, there are two scenarios: statistical analysis, use of matching, and processing with substitution.
PHP preg_replace () Regular substitution, unlike JavaScript regular substitution, PHP preg_replace () defaults to the element that replaces all symbol matching criteria.
The code is as follows:
Preg_replace (regular expression, replace with, string, maximum number of replacements "default-1, countless times", number of replacements)
The regular expressions for most languages are similar, but there are subtle differences.
PHP Regular Expressions
Regular characters |
the regular explanation |
\ |
Marks the next character as a special character, or a literal character, or a backward reference, or an octal escape. For example, "\ n" matches the character "n". "\\n" matches a line break. The sequence "\ \" matches "\" and "\ (" Matches "(". |
^ |
Matches the starting position of the input string. If the multiline property of the RegExp object is set, ^ also matches the position after "\ n" or "\ r". |
$ |
Matches the end position of the input string. If the multiline property of the RegExp object is set, $ also matches the position before "\ n" or "\ r". |
* |
Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * Equivalent to {0,}. |
+ |
Matches the preceding subexpression one or more times. For example, "zo+" can Match "Zo" and "Zoo", but not "Z". + equivalent to {1,}. |
? |
Matches the preceding subexpression 0 or one time. For example, "Do (es)?" You can match "do" in "does" or "does".? = {0,1}. |
N |
N is a non-negative integer. Matches the determined n times. For example, "o{2}" cannot match "O" in "Bob", but can match two o in "food". |
{N,} |
N is a non-negative integer. Match at least n times. For example, "o{2,}" cannot match "O" in "Bob", but can match all o in "Foooood". "O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "o*". |
{N,m} |
Both M and n are non-negative integers, where n<=m. Matches at least n times and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". "o{0,1}" is equivalent to "O?". Note that there can be no spaces between a comma and two numbers. |
? |
When the character immediately follows any other restriction (*,+,?,{n},{n,},{n,m}), the matching pattern is non-greedy. The non-greedy pattern matches the searched string as little as possible, while the default greedy pattern matches as many of the searched strings as possible. For example, for the string "Oooo", "O?" A single "O" will be matched, and "o+" will match all "O". |
. Point |
Matches any single character except "\ n". To match any character that includes "\ n", use a pattern like "[\s\s]". |
(pattern) |
Match pattern and get this match. The obtained matches can be obtained from the resulting matches collection, the Submatches collection is used in VBScript, and the $0...$9 property is used in JScript. To match the parentheses character, use "\ (" or "\"). |
(?:p Attern) |
Matches pattern but does not get a matching result, which means that this is a non-fetch match and is not stored for later use. This is used in the or character "(|)" It is useful to combine the various parts of a pattern. For example, "Industr (?: y|ies)" is a more abbreviated expression than "industry|industries". |
(? =pattern) |
Positive pre-check to match the find string at the beginning of any string matching pattern. This is a non-fetch match, which means that the match does not need to be acquired for later use. For example, "Windows (? =95|98| nt|2000) "Can match" windows "in" Windows2000 ", but does not match" windows "in" Windows3.1 ". Pre-checking does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, rather than starting with the character that contains the pre-check. |
(?! Pattern |
Forward negation, matching the lookup string at the beginning of any mismatched pattern string. This is a non-fetch match, which means that the match does not need to be acquired for later use. For example, "Windows (?! 95|98| nt|2000) "Can match" windows "in" Windows3.1 ", but does not match" windows "in" Windows2000 ". |
(? <=pattern) |
Reverse positive pre-check, similar to positive pre-check, just the opposite direction. For example, "(? <=95|98| nt|2000) Windows can match "Windows" in 2000Windows, but not "windows" in "3.1Windows". |
(? <!pattern) |
Reverse negation is similar to positive negative pre-checking, except in the opposite direction. For example "(? <!95|98| nt|2000) Windows can match "Windows" in 3.1Windows, but not "windows" in "2000Windows". |
X|y |
Match x or Y. For example, "Z|food" can match "Z" or "food". "(z|f) Ood" matches "Zood" or "food". |
[XYZ] |
The character set is combined. Matches any one of the characters contained. For example, "[ABC]" can Match "a" in "plain". |
[^XYZ] |
Negative character set. Matches any character that is not contained. For example, "[^ABC]" can match "Plin" in "plain". |
[A-z] |
The character range. Matches any character within the specified range. For example, "[A-z]" can match any lowercase alphabetic character in the range "a" to "Z". Note: The range of characters can be represented only if the hyphen is inside a character group and is between two characters; If the beginning of the character group is out, only the hyphen itself can be represented. |
[^a-z] |
A negative character range. Matches any character that is not in the specified range. For example, "[^a-z]" can match any character that is not in the range "a" to "Z". |
\b |
Matches a word boundary, which is the position between a word and a space. For example, "er\b" can Match "er" in "never", but cannot match "er" in "verb". |
\b |
Matches a non-word boundary. "er\b" can Match "er" in "verb", but cannot match "er" in "Never". |
\cx |
Matches the control character indicated by X. For example, \cm matches a control-m or carriage return. The value of x must be one of a-Z or a-Z. Otherwise, c is considered to be a literal "C" character. |
\d |
Matches a numeric character. equivalent to [0-9]. |
\d |
Matches a non-numeric character. equivalent to [^0-9]. |
\f |
Matches a page break. Equivalent to \x0c and \CL. |
\ n |
Matches a line break. Equivalent to \x0a and \CJ. |
\ r |
Matches a carriage return character. Equivalent to \x0d and \cm. |
\s |
Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v]. |
\s |
Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v]. |
\ t |
Matches a tab character. Equivalent to \x09 and \ci. |
\v |
Matches a vertical tab. Equivalent to \x0b and \ck. |
\w |
Matches any word character that includes an underscore. Equivalent to "[a-za-z0-9_]". |
\w |
Matches any non-word character. Equivalent to "[^a-za-z0-9_]". |
\xn |
Match N, where n is the hexadecimal escape value. The hexadecimal escape value must be two digits long for a determination. For example, "\x41" matches "A". "\x041" is equivalent to "\x04&1". ASCII encoding can be used in regular expressions. |
\num |
Matches num, where num is a positive integer. A reference to the obtained match. For example, "(.) \1 "matches two consecutive identical characters. |
\ n |
Identifies an octal escape value or a backward reference. n is a backward reference if \ n is preceded by at least one of the sub-expressions obtained. Otherwise, if n is the octal number (0-7), N is an octal escape value. |
\nm |
Identifies an octal escape value or a backward reference. If at least NM has obtained a subexpression before \nm, then NM is a backward reference. If there are at least N fetches before \nm, then n is a backward reference followed by the literal m. If none of the preceding conditions are met, if both N and M are octal digits (0-7), then \nm will match the octal escape value nm. |
\nml |
If n is an octal number (0-7) and both M and L are octal digits (0-7), the octal escape value NML is matched. |
\un |
Match N, where N is a Unicode character represented by four hexadecimal digits. For example, \u00a9 matches the copyright symbol (?). |
The above table is a comprehensive interpretation of regular expressions, while the regular characters in a trademark have special meanings, and no longer represent the meaning of the original character. As in regular expressions, "+" does not represent a plus, but instead represents a match one or more times. If you want "+" to indicate a plus, you need to precede it with "\" to escape, that is, "\+" for the plus sign.
The code is as follows:
The 1+1=2 regular expression is: 1\+1=2
And the regular expression 1+1=2 can represent, multiple 1=2, namely:
11=2 Regular expression: 1+1=2
111=2 Regular expression: 1+1=2
1111=2 Regular expression: 1+1=2
......
This means that all regular characters have a specific meaning, and if they need to be used again to denote the meaning of the original character, they need to be escaped with "\" in front, even if the non-regular character is escaped with "\".
The code is as follows:
The 1+1=2 regular expression can also be: \1\+\1\=\2
All characters are escaped, but this is not recommended.
The regular expression must be surrounded by delimiters, in JavaScript the delimiter is "/", whereas in PHP, it is more common to use the "/" bound, you can use the "#" bound, but also outside the need to enclose in quotation marks.
If the regular expression contains these delimiters, you need to escape these characters.
PHP Regular Expression delimiter
Most language regular expressions are "/" as delimiters, and in PHP, you can also use the "#" bound, if the string contains a large number of "/" characters, in the use of "/" bound, you need to escape these "/", and use "#" do not need to escape, more concise.
The code is as follows:
<?php
$weigeti = ' W3cschool The URL of the online tutorial is http://e.jb51.net/, can you replace the URL with the correct one? ‘;
The above requirement is to replace the http://e.jb51.net/with http://e.jb51.net/w3c/.
// . :-All regular symbols, so you need to escape, and/are delimiters, if the string contains/delimiters, you need to escape
echo preg_replace ('/http\:\/\/www\.jb51\.net\//', ' http://e.jb51.net/w3c/', $weigeti);
In #As a delimiter,/is no longer the meaning of the delimiter, it does not need to be escaped.
echo preg_replace (' #http \://www\.jb51\.net/# ', ' http://e.jb51.net/w3c/', $weigeti);
The above two output is the same, "w3cschool online tutorial URL is http://e.jb51.net/w3c/, can you replace this URL to the correct URL?" 】
?>
Using the two PHP regular replacement code above, we can see that if a regular statement contains a lot of "/", it is OK to use "/" or "#" to do the delimiter, but use "#" to make the code look more concise. But e-dimensional technology suggests that you still use "/" as a delimiter, because in languages such as JavaScript, you can use only "/" as a delimiter, so that it forms a habit and runs through other languages.
PHP Regular Expression modifiers
The modifier is placed at the end of the PHP regular expression delimiter "/" before the regular expression trailing quotation marks.
The code is as follows:
I ignore case, match does not consider case
M multi-line independent matching, if the string does not contain [\ n] and other line characters like normal regular.
s sets the regular symbol. You can match the newline character [\ n], if not set, the regular symbol. The newline character cannot be matched \ n.
x ignores spaces that are not escaped
e eval () executes the function on the matched element.
A front anchor, constraint matching only starts the search from the target string
D Locks $ as the end, and if there is no D, if the string contains a newline character such as [\ n], the $ still matches the line break. If the modifier m is set, the modifier D is ignored.
S analysis of non-anchoring matching
U is not greedy, if you add "?" after the regular character quantifier, you can restore greed
X open an incompatible attachment with Perl
U forces the string to be UTF-8 encoded, which is generally required in non-UTF-8 encoded documents. It is recommended not to use this in the UTF-8 environment, according to the e-dimensional science and technology survey using this will have a bug. This bug URL:
If you are familiar with JavaScript regular expressions, you may be familiar with the modifier "G" of a JavaScript regular expression, which represents matching all eligible elements. In PHP regular substitution, it is the element that matches all symbol conditions, so there is no JavaScript modifier "G".
PHP regular Chinese and ignore case PHP preg_replace () are case-sensitive and can only match strings within ASCII encoding, if you need to match the case-insensitive and Chinese characters, you need to add the corresponding modifier i or U.
The code is as follows:
<?php
$weigeti = ' w3cschool online tutorial URL: http://www.jb51.net/w3school/';
echo preg_replace ('/w3cschool/', ' the ', ' $weigeti);
Different case, output "Online Web Tutorial website: http://www.jb51.net/w3school/"
echo preg_replace ('/w3cschool/i ', ' the ', ' $weigeti);
Ignore case, perform alternate output "online Web tutorial: http://e.jb51.net/w3c/"
echo preg_replace ('/url/u ', ', $weigeti);
Force UTF-8 Chinese, perform replace, output "w3cschool online tutorial: http://www.jb51.net/w3school/"
?>
Both case and Chinese are sensitive in PHP, but in JavaScript regular, it is only case sensitive, ignoring case is also through modifier I, but JavaScript does not need to tell whether it is a special character such as UTF-8 Chinese, it can match Chinese directly.
PHP Regular line Break instance
When a newline character is encountered by a PHP regular expression, the line break is treated as a normal characters in the middle of the string. While the generic symbol cannot match \ n, a string with a newline character will have many important points to be encountered.
The code is as follows:
<?php
$weigeti = "Jb51.net\nis\nloving\nyou";
Want to replace the above $weigeti with Jb51.net
echo preg_replace ('/^[a-z].*[a-z]$/', ', $weigeti);
This regular expression is matched with elements that contain only \w, $weigeti start with a V, conform to a [a-z], and End with U and [A-Z]: unable to match \ n
Output "Jb51.net is Loveing"
echo preg_replace ('/^[a-z].*[a-z]$/s ', ', $weigeti);
This is with the modifier s, that is. can match \ n, so the whole sentence matches, the output is empty
Output ""
echo preg_replace ('/^[a-z].*[a-z]$/m ', ', $weigeti);
The modifier is used here to match \ n as a multi-line stand-alone. It is also equivalent to:
/*
$preg _m=preg_replace ('/^[a-z].*[a-z]$/m ', ', $weigeti);
$p = '/^[a-z].*[a-z]$/';
$a =preg_replace ($p, ', ' jb51.net ');
$b =preg_replace ($p, ', ' is ');
$c =preg_replace ($p, ', ' loving ');
$d =preg_replace ($p, ' ', ' you ');
$preg _m = = = $a. $b. $c. $d;
*/
Output "Jb51.net"
?>
In the future when you use PHP to crawl a site content, and with regular batch replacement, there is always no way to avoid ignoring the obtained content contains newline characters, so you must pay attention when using regular replacement.
PHP regular match execution function PHP regular substitution can use a modifier e, which represents Eval () to execute a function that matches the content.
The code is as follows:
<?php
$weigeti = ' w3cschool online tutorial URL: http://www.jb51.net, have you jbzj!? ‘;
Convert the above URL to lowercase
echo preg_replace ('/(http\:[\/\w\.\-]+\/)/E ', ' Strtolower ("$") ', $weigeti);
After using the modifier e, you can execute PHP function strtolower () on the matching URL.
Output "w3cschool online tutorial URL: http://www.jb51.net, have you jbzj!? 】
?>
According to the above code, although the matched function strtolower () is inside the quotation marks, it will still be executed by eval ().
Regular substitution matching variable backward reference
If you are familiar with JavaScript, be sure to ... such as backward references are familiar, and in PHP these can also be used as backward reference parameters. In PHP, you can also use \1 \\1 to represent backward references.
The concept of backward referencing is to match a large fragment, in which the inside of the regular expression is cut into several small matching elements, and each matched element is replaced by a backward reference in the parentheses sequence.
The code is as follows:
<?php
$weigeti = ' w3cschool online tutorial URL: http://www.jb51.net, have you jbzj!? ‘;
echo preg_replace ('/.+ (http\:[\w\-\/\.) +\/) [^\w\-\!] + ([\w\-\!] +). +/', ' $ ', $weigeti);
echo preg_replace ('/.+ (http\:[\w\-\/\.) +\/) [^\w\-\!] + ([\w\-\!] +). +/', ' \1 ', $weigeti);
echo preg_replace ('/.+ (http\:[\w\-\/\.) +\/) [^\w\-\!] + ([\w\-\!] +). +/', ' \\1 ', $weigeti);
All three of the above are output "http://www.jb51.net"
echo preg_replace ('/^ (. +) URL: (http\:[\w\-\/\.] +\/) [^\w\-\!] + ([\w\-\!] +). +$/', ' column:$1<br> URL:$2<br> trademark: $ $ ', $weigeti);
/*
Section: W3cschool Online tutorials
Website: http://www.jb51.net
Trademark: jbzj!
*/
Brackets in brackets, the outer brackets are counted first
echo preg_replace ('/^ (. +) URL: (http\:[\w\-\/\.] +\/) [^\w\-\!] + ([\w\-\!] +). +) $/', ' original:$1<br> column:$2<br> URL:$3<br> trademark: $4 ', $weigeti);
/*
Original: W3cschool online tutorial URL: http://www.jb51.net, have you jbzj!?
Section: W3cschool Online tutorials
Website: http://www.jb51.net
Trademark: jbzj!
*/
?>
PHP preg_replace () regular replace all strings that match the criteria