The data that needs to be processed by our program is not always designed in advance by database thinking, or it cannot be stored with the structure of the database.
For example, template engine parsing template, garbage sensitive information filtering and so on.
In general this case, we use the regular according to our rules to match preg_match, replace Preg_replace.
But in general applications, nothing more than database crud, there is little chance of fiddling with it.
According to the previous, two scenarios: statistical analysis, matching, processing with substitution.
PHP preg_replace () is a regular replacement, unlike JavaScript replacement, PHP preg_replace () By default is the element that replaces all symbol matching criteria.
Copy Code code as follows:
Preg_replace (regular expression, replace with, string, maximum number of substitutions "Default-1, countless times", number of replacements)
The regular expressions in most languages are similar, but there are subtle differences.
The regular expression of PHP
Regular characters |
Regular Explanations |
\ |
Marks the next character as a special character, or a literal character, or a backward reference, or a octal escape character. For example, "\ n" matches the character "n". "\\n" matches a newline character. The sequence "\ \" matches "\" and "\ (matches" (). |
^ |
Matches the start position of the input string. If the multiline property of the RegExp object is set, ^ also matches the position after "\ n" or "\ r". |
$ |
Matches the end position of the input string. If the multiline property of the RegExp object is set, the $ also matches the position before "\ n" or "\ r". |
* |
Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * is equivalent to {0,}. |
+ |
Matches the preceding subexpression one or more times. For example, "zo+" can Match "Zo" and "Zoo", but cannot match "Z". + is equivalent to {1,}. |
? |
Match the preceding subexpression 0 times or once. For example, "Do (es)?" You can match ' do ' in ' does ' or ' does '. |
N |
n is a non-negative integer. Matches the determined n times. For example, "o{2}" cannot match "O" in "Bob", but can match two o in "food". |
{N,} |
n is a non-negative integer. Match at least n times. For example, "o{2,}" cannot match "O" in "Bob", but can match all o in "Foooood". "O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "o*". |
{N,m} |
M and n are non-negative integers, of which n<=m. Matches n times at least and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". "o{0,1}" is equivalent to "O?". Notice that there is no space between the comma and the two number. |
? |
When the character is immediately following any other qualifier (*,+,?,{n},{n,},{n,m}), the match pattern is not greedy. Non-greedy patterns match as few strings as possible, while the default greedy pattern matches as many of the searched strings as possible. For example, for the string "Oooo", "O?" A single "O" will be matched, and "o+" will match all "O". |
. Point |
Matches any single character except "\ n". To match any character including "\ n", use a pattern like "[\s\s]". |
(pattern) |
Match pattern and get this match. The obtained matches can be obtained from the resulting matches collection, use the Submatches collection in VBScript, and use the $0...$9 property in JScript. To match the parentheses character, use "\ (" or "\)". |
(?:p Attern) |
Matches pattern but does not get a matching result, which means it is a non fetch match and is not stored for later use. This is in use or the character "(|)" It is useful to combine parts of a pattern. For example, "Industr (?: y|ies)" is an expression more abbreviated than "Industry|industries". |
(? =pattern) |
Forward positive check, match the lookup string at the beginning of any string matching pattern. This is a non-fetch match, that is, the match does not need to be acquired for later use. For example, the Windows (? =95|98| nt|2000) "Can match windows in Windows2000, but cannot match windows in Windows3.1." It does not consume characters, that is, after a match occurs, the next matching search begins immediately after the last match, instead of starting after the character that contains the pre-check. |
(?! Pattern |
Forward negation, which matches the lookup string at the beginning of any string that does not match the pattern. This is a non-fetch match, that is, the match does not need to be acquired for later use. For example, Windows (?! 95|98| nt|2000) "Can match windows in Windows3.1, but cannot match windows in Windows2000." |
(? <=pattern) |
The reverse positive check is similar to positive, but in the opposite direction. For example, "(? <=95|98| nt|2000) Windows can match "Windows" in "2000Windows", but it does not match "windows" in "3.1Windows". |
(? <!pattern) |
Reverse negation is similar to positive negation, except in the opposite direction. For example, "(? <!95|98| nt|2000) Windows can match "Windows" in "3.1Windows", but it does not match "windows" in "2000Windows". |
X|y |
Match x or Y. For example, "Z|food" can match "Z" or "food". "(z|f) Ood" matches "Zood" or "food". |
[XYZ] |
Character set combination. Matches any one of the characters contained. For example, "[ABC]" can Match "a" in "plain". |
[^XYZ] |
Negative character set combination. Matches any characters that are not included. For example, "[^ABC]" can match "Plin" in "plain". |
[A-z] |
The range of characters. Matches any character within the specified range. For example, "[A-z]" can match any lowercase alphabetic character in the range "a" through "Z". Note: The range of characters can be expressed only when hyphens are within a group of characters, and between two characters. If the beginning of a group of characters, only the hyphen itself is represented. |
[^a-z] |
Negative character range. Matches any character that is not in the specified range. For example, "[^a-z]" can match any character that is not in the range "a" through "Z". |
\b |
Matches a word boundary, which is the position between the word and the space. For example, "er\b" can Match "er" in "never", but cannot match "er" in "verb". |
\b |
Matches a non-word boundary. "er\b" can Match "er" in "verb", but cannot match "er" in "Never". |
\cx |
Matches the control character indicated by X. For example, \cm matches a control-m or carriage return character. The value of x must be one-a-Z or a-Z. Otherwise, c is treated as a literal "C" character. |
\d |
Matches a numeric character. equivalent to [0-9]. |
\d |
Matches a non-numeric character. equivalent to [^0-9]. |
\f |
Matches a page feed character. Equivalent to \x0c and \CL. |
\ n |
Matches a line feed character. Equivalent to \x0a and \CJ. |
\ r |
Matches a carriage return character. Equivalent to \x0d and \cm. |
\s |
Matches any white space character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v]. |
\s |
Matches any non-white-space character. equivalent to [^ \f\n\r\t\v]. |
\ t |
Matches a tab character. Equivalent to \x09 and \ci. |
\v |
Matches a vertical tab. Equivalent to \x0b and \ck. |
\w |
Matches any word character that includes an underscore. Equivalent to "[a-za-z0-9_]". |
\w |
Matches any non word character. Equivalent to "[^a-za-z0-9_]". |
\xn |
Matches n, where n is the hexadecimal escape value. The hexadecimal escape value must be a determined two digits long. For example, "\x41" matches "A". "\x041" is equivalent to "\x04&1". ASCII encoding can be used in regular expressions. |
\num |
Matches num, where num is a positive integer. A reference to the match that was obtained. For example, "(.) \1 "matches two consecutive identical characters. |
\ n |
Identifies a octal escape value or a backward reference. n is a backward reference if you have at least n obtained subexpression before \ nthe. Otherwise, if n is an octal number (0-7), then N is an octal escape value. |
\nm |
Identifies a octal escape value or a backward reference. NM is a backward reference if at least NM has obtained the subexpression before \nm. If there are at least N fetches before \nm, then n is a backward reference followed by a literal m. If all the preceding conditions are not satisfied, if both N and M are octal digits (0-7), then \nm will match octal escape value nm. |
\nml |
If n is an octal number (0-7) and both M and L are octal digits (0-7), the octal escape value NML is matched. |
\un |
Matches n, where N is a Unicode character represented in four hexadecimal digits. For example, \u00a9 matches the copyright symbol (©). |
The above table is a more comprehensive interpretation of regular expressions, and the regular characters in the trademark have special meaning, no longer represent the original character meaning. If "+" in a regular expression does not represent a plus sign, it represents a match one or more times. And if you want "+" to represent the plus sign, you need to precede it with "\" escape, which means "\+" to indicate the plus sign.
Copy Code code as follows:
1+1=2 Regular expression is: 1\+1=2
and regular expression 1+1=2 can represent, multiple 1=2, namely:
11=2 Regular expression: 1+1=2
111=2 Regular expression: 1+1=2
1111=2 Regular expression: 1+1=2
......
That is, all regular characters have specific meanings, and if they need to be used to denote the meaning of the original character, they need to be preceded by "\" escape, even if not a regular character, with "\" Escape is no problem.
Copy Code code as follows:
1+1=2 Regular Expressions can also be: \1\+\1\=\2
is escaped for all characters, but this is not recommended.
The regular expression must be surrounded by delimiters, in JavaScript the delimiter is "/", and in PHP, it is more common to use the "/" delimitation, or the "#" delimitation, and outside also need to surround with quotes.
If the regular expression contains these delimiters, you need to escape those characters.
PHP Regular Expression delimiter
The regular expressions in most languages are made of "/" as delimiters, in PHP, you can also use the "#" delimitation, if the string contains a large number of "/" characters, in the use of the "/" delimitation, you need these "/" escape, and the use of "#" does not need to escape, more concise.
Copy Code code as follows:
<?php
$weigeti = ' W3cschool online tutorial is http://e.jb51.net/, can you replace this URL with the correct URL? ';
The above requirement is to replace http://e.jb51.net/with http://e.jb51.net/w3c/
// . :-All are regular symbols, so you need to escape, and/is a delimiter, if the string contains/delimiters, you need to escape
echo preg_replace ('/http\:\/\/www\.jb51\.net\//', ' http://e.jb51.net/w3c/', $weigeti);
In the #作为定界符,/is no longer the meaning of the delimiter, there is no need to escape.
echo preg_replace (' #http \://www\.jb51\.net/# ', ' http://e.jb51.net/w3c/', $weigeti);
The above two output is the same, "W3cschool online tutorials on the Web site is http://e.jb51.net/w3c/, you can replace this URL to the correct URL?" 】
?>
With the two PHP regular substitution code above, we can see that if the regular statement contains a lot of "/", whether using "/" or "#" to do the delimiter is OK, but using "#" can make the code look more concise. However, e-dimensional technology suggests that you still use the "/" as a delimiter, because in languages such as JavaScript, only "/" as delimiters, which can be written to form a habit, through other languages.
PHP Regular Expression modifiers
Modifiers are placed at the end of the PHP regular expression delimiter "/", before the regular expression trailing quotation marks.
Copy Code code as follows:
I ignore case, match does not consider case
m multiple lines of independent matching, if the string does not contain [\ n] and other newline characters as normal.
s sets the regular symbol. You can match a newline character [\ n], and if not set, the regular symbol. Cannot match the newline character \ n.
x ignores spaces that are not escaped
e eval () executes the function on the matched element.
A pre-anchoring, constraint matching only start search from target string
D Lock $ as the end, if there is no D, if the string contains a newline character such as [\ n], the $ still matches the newline character. If the modifier m is set, the modifier D is ignored.
Analysis of non-anchored matching by S
U is not greedy, if you add "?" after the regular character classifier, you can restore greed
X Open attachments incompatible with Perl
U forces the string to be UTF-8 encoding, which is typically required in documents that are not UTF-8 encoded. Suggest UTF-8 environment do not use this, according to e-dimensional technology survey Use this will have a bug. This bug URL:
If you are familiar with the regular expressions of JavaScript, you may be familiar with the modifier "G" of the JavaScript regular expression, representing all the elements that match the criteria. In the case of PHP substitution, which is the element that matches all the symbol conditions, there is no JavaScript modifier "G".
PHP regular Chinese and ignore case PHP preg_replace () are case-sensitive and can only match the ASCII-encoded strings, and if you need to match characters such as case-insensitive and Chinese, add the appropriate modifiers I or U.
Copy Code code as follows:
<?php
$weigeti = ' w3cschool online tutorial URL: http://www.jb51.net/w3school/';
echo preg_replace ('/w3cschool/', ' the ' consortium ', $weigeti);
Different case, output "www online tutorial URL: http://www.jb51.net/w3school/"
echo preg_replace ('/w3cschool/i ', ' the ' consortium ', $weigeti);
Ignore case, perform alternate output "www online tutorial URL: http://e.jb51.net/w3c/"
echo preg_replace ('/url/u ', ', $weigeti);
Force UTF-8 Chinese, perform replacement, output "W3cschool online tutorials: http://www.jb51.net/w3school/"
?>
Both capitalization and Chinese are sensitive in PHP, but in JavaScript regular, only case sensitive, ignoring case also through modifier i function, but JavaScript does not need to tell whether it is UTF-8 Chinese and other special characters, can directly match Chinese.
PHP Regular line Feed instance
When a line break is encountered in a PHP regular expression, the line break is treated as a normal character in the middle of the string. and universal notation. Cannot match \ n, so encountering a string with a newline character is a lot of points.
Copy Code code as follows:
<?php
$weigeti = "Jb51.net\nis\nloving\nyou";
Want to replace the top $weigeti with Jb51.net
echo preg_replace ('/^[a-z].*[a-z]$/', ', ', $weigeti);
The regular expression is that the match contains only the elements of the \w, $weigeti begins with a V, conforms to [A-z], and ends with a u and a [A-z]. Unable to match \ n
Output "Jb51.net is loveing to you"
echo preg_replace ('/^[a-z].*[a-z]$/s ', ', ', $weigeti);
This is in modifier s, which is. can match \ n, so the whole sentence matches, output empty
Output ""
echo preg_replace ('/^[a-z].*[a-z]$/m ', ', ', $weigeti);
Here, modifiers are used to match \ n as multiple lines independently. It is equivalent to:
/*
$preg _m=preg_replace ('/^[a-z].*[a-z]$/m ', ', $weigeti);
$p = '/^[a-z].*[a-z]$/';
$a =preg_replace ($p, ', ' jb51.net ');
$b =preg_replace ($p, ', ' is ');
$c =preg_replace ($p, ', ' loving ');
$d =preg_replace ($p, ', ');
$preg _m = = $a. $b. $c. $d;
*/
Output "Jb51.net"
?>
After you are using PHP to crawl a Web site content, and with regular batch substitution, it is always impossible to avoid ignoring the captured content including line breaks, so be sure to pay attention when using regular replacements.
PHP regular match execute function PHP regular replace you can use a modifier e to represent Eval () to perform a matching function.
Copy Code code as follows:
<?php
$weigeti = ' W3cschool online tutorial Web site: http://www.jb51.net, have you jbzj!? ';
Convert the URL above to lowercase
echo preg_replace ('/(http\:[\/\w\.\-]+\/)/e ', ' Strtolower ("$ ') ', $weigeti);
After using modifier e, you can perform PHP functions on a matching URL strtolower ()
Output "w3cschool online tutorial URL: http://www.jb51.net, have you jbzj!? 】
?>
According to the code above, although the matched function Strtolower () is in quotation marks, it is still executed by eval ().
Regular replace matching variable backward reference
If you are familiar with JavaScript, be sure to $ $ ... Backward references are familiar, and in PHP these can be used as reference parameters backwards. In PHP, you can also use \1 \\1 to represent a backward reference.
The concept of a backward reference is to match a large fragment, which is internally cut into several small matching elements with parentheses, so that each matching element is replaced with a backward reference in parentheses sequence.
Copy Code code as follows:
<?php
$weigeti = ' W3cschool online tutorial Web site: http://www.jb51.net, have you jbzj!? ';
echo preg_replace ('/.+ http\:[\w\-\/\.] +\/) [^\w\-\!] + ([\w\-\!] +). +/', ' $ ', $weigeti);
echo preg_replace ('/.+ http\:[\w\-\/\.] +\/) [^\w\-\!] + ([\w\-\!] +). +/', ' \1 ', $weigeti);
echo preg_replace ('/.+ http\:[\w\-\/\.] +\/) [^\w\-\!] + ([\w\-\!] +). +/', ' \\1 ', $weigeti);
The top three are all output "http://www.jb51.net"
echo preg_replace ('/^ (. +) URL: (http\:[\w\-\/\.] +\/) [^\w\-\!] + ([\w\-\!] +). +$/', ' column:$1<br> URL:$2<br> trademarks: $ ', $weigeti);
/*
Section: W3cschool Online tutorial
URL: http://www.jb51.net
Trademark: jbzj!
*/
Bracket brackets, outer brackets counting first
echo preg_replace ('/^ (. +) URL: (http\:[\w\-\/\.] +\/) [^\w\-\!] + ([\w\-\!] +). +) $/', ' original:$1<br> column:$2<br> URL:$3<br> trademark: $ ', $weigeti);
/*
Original: W3cschool online tutorial Web site: http://www.jb51.net, have you jbzj!?
Section: W3cschool Online tutorial
URL: http://www.jb51.net
Trademark: jbzj!
*/
?>