Php Regular Expression
General Mode
The delimiters. Generally, "/" is used as the start and end of the delimiters. You can also use "#".
When can I use? Generally, when your string contains many "/" characters, this character needs to be escaped during regular expressions, such as uri.
The code for using the "/" Delimiter is as follows.
The Code is as follows: |
Copy code |
? $ Regex = '/^ http: // ([w.] +)/([w] +)/(%w%%%%%%.html $/I '; $ Str = 'HTTP: // www.youku.com/show_page/id_abcdefg.html '; $ Matches = array ();
If (preg_match ($ regex, $ str, $ matches )){ Var_dump ($ matches ); }
Echo "n "; |
$ Matches [0] In preg_match will contain strings that match the entire pattern.
The code for using the "#" Delimiter is as follows. At this time, "/" is not escaped!
? $ Regex = '# ^ http: // ([w.] +)/([w] +)/(%w%%%%%%.html $ # I ';
$ Str = 'HTTP: // www.youku.com/show_page/id_abcdefg.html ';
$ Matches = array ();
If (preg_match ($ regex, $ str, $ matches )){
Var_dump ($ matches );
}
Echo "n ";
Modifier: used to change the behavior of a regular expression.
We can see ('/^ http: // ([w.] +)/([w] +)/([w]+).html/I ') the last "I" is a modifier, indicating that the case is ignored, we also often use "x" to ignore spaces.
Contribution code:
? $ Regex = '/HELLO /';
$ Str = 'Hello word ';
$ Matches = array ();
If (preg_match ($ regex, $ str, $ matches )){
Echo 'No I: Valid Successful! ', "N ";
}
If (preg_match ($ regex. 'I', $ str, $ matches )){
Echo 'Yes I: Valid Successful! ', "N ";
}
Character field: [w] the expanded part in square brackets is the character field.
Qualifier: for example, [w] {3, 5}, [w] *, or [w] +, all symbols after [w] indicate the qualifier. This section describes the specific meaning.
{3, 5} represents 3 to 5 characters. {3,} is more than 3 characters, {, 5} is up to 5 characters, and {3} is three characters.
* 0 to multiple
+ Indicates one or more.
Escape Character
^:
> Put it in the character field (for example, [^ w]) to represent a negative (excluding the meaning)-"reverse selection"
> Put it before the expression to start with the current character. (/^ N/I, which indicates starting with n ).
Note: We often choose "Escape Character ". Used to escape some special symbols, such ".","/"
Operator: the regular expression is generally in the following format:
/Love/
The part between the "/" delimiters is the pattern to be matched in the target object.
Metacharacters: special characters that have special meanings in regular expressions. They can be used to specify the mode in which the leading character (that is, the character located before the metacharacters) appears in the target object.
Frequently Used metacharacters include "+", "*", and "?".
The "+" metacharacter specifies that its leading character must appear one or more times consecutively in the target object.
The "*" metacharacter specifies that its leading character must appear zero or multiple times in the target object,
"?" Metacharacter specifies that its leading character must appear zero or once consecutively in the target object.
Next, let's take a look at the specific application of the regular expression metacharacters.
/Fo +/
Because the above regular expression contains the "+" metacharacters (the "o" before it is a leading character), it can be used with the "fool" in the target object ", "fo" and so on. One or more strings of the letter "o" appear consecutively after the letter "f.
In addition to metacharacters, you can also precisely specify the frequency of occurrence of a pattern in a matching object. For example,
/Jim {2, 6 }/
The above regular expression specifies that the character m can appear 2-6 times consecutively in the matching object. Therefore, the above regular expression can match strings such as jimmy or jimmm.pdf.
The usage of several other important metacharacters.
S: Used to match a single space character, including the tab key and line break;
S: Used to match all characters except a single space character;
D: Used to match numbers from 0 to 9;
W: Used to match letters, numbers, or underscores;
W: Used to match all characters that do not match w;
.: Used to match all characters except line breaks.
(Note: we can regard s and S and w and W as inverse operations)
Next, let's take a look at how to use the above metacharacters in regular expressions through examples.
/S +/
The above regular expression can be used to match one or more space characters in the target object.
In addition to the metacharacters described above, regular expressions also have a unique special character, that is, the positioning character.
Locator: Specifies the location where the matching mode appears in the target object.
Commonly used positioning symbols include "^", "$", "B", and "B ".
The "^" Locator specifies that the matching mode must start with the target string.
The "$" operator specifies that the matching mode must appear at the end of the target object.
Location B specifies that the matching mode must appear at the beginning or end of the target string.
The "B" Locator specifies that the matching object must be within the boundary of the start and end of the target string. That is, the matching object cannot start or end of the target string. Similarly, we
You can also regard "^", "$", and "B" and "B" as two sets of operators for inverse operation. For example:
/^ Hell/
Because the above regular expression contains the "^" operator, it can match a string starting with "hell", "hello", or "hellhound" in the target object.
/Ar $/
Because the above regular expression contains the "$" operator, it can match the string ending with "car", "bar", or "ar" in the target object.
/Bbom/
Because the above regular expression pattern starts with "B", it can match strings starting with "bomb" or "bom" in the target object.
/Manb/
Because the above regular expression pattern ends with the "B" operator, it can match the string ending with "human", "woman", or "man" in the target object.
To make it easier for users to set matching modes flexibly, regular expressions allow users to specify a range in the matching mode, not limited to specific characters. For example:
/[A-Z]/
The above regular expression will match any uppercase letter from A to Z.
/[A-z]/
The above regular expression will match any lowercase letter from a to z.
/[0-9]/
The above regular expression will match any number from 0 to 9.
/([A-z] [A-Z] [0-9]) +/
The above regular expression will match any string consisting of letters and numbers, such as "aB0. Note that you can use "()" in a regular expression to combine strings.
"()": The contained content must appear in the target object at the same time. Therefore, the above regular expression cannot match strings such as "abc", because the last character in "abc" is a letter rather than a number.
If we want to implement the "or" operation similar to the programming logic in a regular expression and select one of multiple different modes for matching, we can use the pipe character "| ". For example:
/To | too | 2/
The above regular expression will match "to", "too", or "2" in the target object.
Negative character: "[^]". Unlike the positioning character "^" described above, the "[^]" negation specifies that the target object cannot contain strings specified in the pattern. For example:
/[^ A-C]/
The above string will match any character except A, B, and C in the target object. In general, when "^" appears in "[]", it is regarded as a negative operator. When "^" is located outside of "[]" or, it should be regarded as a positioning character.
Finally, you can use
Escape Character: "". For example:
/Th */
The above regular expression will match "Th *" in The target object rather than ".
Practical experience
Let's talk about ^ and $. They are used to match the start and end of the string respectively. The following are examples:
"^ The": must start with a "The" string;
"Of despair $": the end must contain a "of despair" string;
So,
"^ Abc $": a string that must start with abc and end with abc. In fact, only abc matches;
"Notice": matches the string containing the notice;
You can see that if you do not use the two characters we mentioned (the last example), that is, the pattern (Regular Expression) can appear anywhere in the string to be tested, you didn't lock him to either side.
Next, let's talk about '*' + 'and '? '
They are used to indicate the number or order of occurrences of a character. They represent:
"Zero or more" is equivalent to {0 ,}
"One or more" is equivalent to {1 ,}
"Zero or one." is equivalent to {0, 1}
Here are some examples:
"AB *": it is synonymous with AB {0,}. It matches a string starting with a and followed by 0 or N B ("a", "AB ", "abbb", etc );
"AB +": it is synonymous with AB {1,}. It is the same as the above, but at least one B must exist ("AB" and "abbb );
"AB ?" : It is synonymous with AB {0, 1} and can have no or only one B;
"? B + $ ": match a string that ends with one or zero a plus more than one B.
Key points: '*' + 'and '? 'Only the character before it.
You can also limit the number of characters in braces, for example:
"AB {2}": requires that a be followed by two B (one cannot be less) ("abb ");
"AB {2,}": requires that there must be two or more B (such as "abb" and "abbbb") after );
"AB {3, 5}": requires that a can be followed by 2 to 5 B ("abbb", "abbbb", or "abbbbb ").
Now we can put a few characters in parentheses, for example:
"A (bc) *": matches 0 or a "bc" after ";
"A (bc) {}": one to five "bc ";
There is also a character '|', which is equivalent to the OR operation:
"Hi | hello": matches strings containing "hi" or "hello;
"(B | cd) ef": matches strings containing "bef" or "cdef;
"(A | B) * c": matches strings containing multiple (including 0) a or B, followed by a string of c;
A point ('.') can represent all single characters, excluding ""
What if we want to match all single characters including?
Use the '[.]' mode.
"A. [0-9]": Add a character to a pair and a number ranging from 0 to 9;
"^. {3} $": the end of any three characters.
The content enclosed in brackets only matches a single character.
"[AB]": matches a or B (the same as "a │ B );
"[A-d]": matches a single character from 'A' to 'D' (same as "a │ B │ c │ d" and "[abcd );
Generally, we use [a-zA-Z] to specify a character as a case:
"^ [A-zA-Z]": matches a string starting with an uppercase/lowercase letter;
"[0-9] %": matches a string containing an x % character;
', [A-zA-Z0-9] $': match a string that ends with a comma plus a number or a letter;
You can also include characters you don't want in brackets, you only need to use '^' in the brackets to start with "% [^ a-zA-Z] %" to match a non-letter string containing two percentage signs.
Key Point: ^ when used at the beginning of the brackets, it indicates that the characters in the brackets are excluded.
For PHP to be able to interpret, you must add "before these characters and escape some characters.
Do not forget that the characters in the brackets are exceptions of this rule-in the brackets, all special characters, including (), will lose their special properties "[* +? {}.] "Matches strings containing these characters:
Also, as the regx manual tells us: "If the list contains ']', it is best to use it as the first character in the list (probably after '^ ). If it contains '-', it is best to put it at the beginning or the end
, Or, or the '-' in the middle of the second end point of a range [a-d-0-9] will be valid.
After reading the example above, you should understand {n, m. Note that n and m cannot be negative integers, and n is always less than m. In this way, you can match at least n times and at most m times. For example, "p {}" will match
The first five p in "pvpppppp"
Next we will talk about
B says that he is used to match a word boundary, that is... For example, 'veb' can match the ve in love, but does not match the ve in very.
B is the opposite of B above.
Other regular expressions
Extract string
Ereg () and eregi () has a feature that allows users to extract part of a string using regular expressions (you can read the manual for specific usage ). For example, we want to extract the file name from path/URL.
The code is what you need:
Ereg ("([^ \/] *) $", $ pathOrUrl, $ regs );
Echo $ regs [1];
Advanced replacement
Ereg_replace () and eregi_replace () are also very useful. If we want to replace all the negative signs at intervals with commas:
Ereg_replace ("[t] +", ",", trim ($ str ));
Reference content is as follows:
Preg_match () and preg_match_all ()
Preg_quote ()
Preg_split ()
Preg_grep ()
Preg_replace ()
For specific functions, we can find them in the PHP manual. Below are some accumulated Regular Expressions:
Matching action attributes
Reference content is as follows:
The Code is as follows: |
Copy code |
$ Str = ''; $ Match = ''; Preg_match_all ('/s + action = "(?! Http :)(.*?) "S/', $ str, $ match ); Print_r ($ match );
|
Use callback in Regular Expressions
Reference content is as follows:
The Code is as follows: |
Copy code |
/** * Replace some string by callback function * */ Function callback_replace (){ $ Url = 'HTTP: // esfang.house.sina.com.cn '; $ Str = ''; $ Str = preg_replace ('/(? <= Saction = ")(?! Http :)(.*?) (? = "S)/E', 'search ($ url, \ 1) ', $ str );
Echo $ str; }
Function search ($ url, $ match ){ Return $ url. '/'. $ match; }
|
Regular Expression matching with assertions
Reference content is as follows:
The Code is as follows: |
Copy code |
$ Match = ''; $ Str = 'xxxxxx .com.cn bold font Paragraph text '; Preg_match_all ('/(? <= <(W {1})> ).*(? = </1>)/', $ str, $ match ); Echo "matches the content in HTML tags without attributes :"; Print_r ($ match ); |
Replace the address in the HTML source code
Reference content is as follows:
The Code is as follows: |
Copy code |
$ Form_html = preg_replace ('/(? <= Saction = "| ssrc =" | shref = ")(?! Http: | javascript )(.*?) (? = "S)/e ', 'add _ url ($ url,' \ 1') ', $ form_html ); |
Metacharacters
In the above example, the symbols ^, d, and $ represent specific matching meanings. We call them metacharacters. Common metacharacters are as follows:
Metacharacters |
Description |
. |
Match any character unexpected except the line break |
W |
Match letters, numbers, or underscores |
S |
Match any blank space character |
D |
Matching number |
B |
Start or end of a matching word |
^ |
Start of matching string |
$ |
End of matching string |
[X] |
Match x characters, suchMatch the characters a, B, and c in the string |
W |
It matches any character other than letters, numbers, underscores, and Chinese characters. |
S |
S, that is, matching any non-blank characters |
D |
D. It matches any non-numeric characters. |
B |
B. It is not the start or end position of a word. |
[^ X] |
Matches any character except x, for example, [^ abc] matches any character other than abc. |