The basic syntax of php regular expressions is summarized in detail. First, let's take a look at two special characters: & amp; rsquo; ^ & amp; rsquo; and & amp; lsquo; $ & amp; rsquo; they are used to match The start and end of The string respectively. for example, & quot; ^ The & quot;: match with & quot; The & quot; at the beginning, the basic syntax of php regular expressions is summarized in detail.
First, let's take a look at two special characters: '^' and '$'. they are used to match the start and end of the string, respectively.
"^ The": match a string starting with "The"; "of despair $": match a string ending with "of despair"; "^ abc $ ": matches a string starting with abc and ending with abc. In fact, only abc matches "notice": matches a string containing notice.
You can see that if you didn't use the two characters we mentioned (the last example), that is, the pattern (regular expression) can appear anywhere in the string to be tested, and you didn't lock it to either side.
How many characters are there: '*', '+', and '? ', Which indicates the number or sequence of occurrences of a character. they represent "zero or more", "one or more", and "zero or one. "Here are some examples:
"AB *": matches strings a and 0 or more B ("a", "AB", "abbb", etc .); "AB +": same as above, but at least one B ("AB", "abbb", etc .); "AB? ": Matches 0 or a B."? B + $ ": match the string ending with one or zero a plus more than one B.
You can also limit the number of characters in braces, such
"AB {2}": Match a with two B (one cannot be less) ("abb"); "AB {2 ,}": at least two B ("abb", "abbbb", etc .); "AB {3, 5}": 2-5 B ("abbb", "abbbb", or "abbbbb ").
You must also note that you must always specify (I. e, "{0, 2}", not "{, 2 }"). similarly, you must note that '*', '+', and '? 'Are the same as the following three range annotations: "{0,}", "{1,}", and "{0, 1 }".
Put a certain number of characters in parentheses, for example:
1 2 |
"A (bc) *": Match a with 0 or a "bc "; "A (bc) {}": one to five "bc ." |
There is also a character '│', which is equivalent to the OR operation:
"Hi │ hello": match string containing "hi" or "hello"; "(B │ cd) ef": match string containing "bef" or "cdef; "(a │ B) * c": match a string that contains multiple (including 0) a or B strings followed by a c string; a vertex ('. ') can represent all single characters: ". [0-9] ": a string with a character and a number (the string containing such a string will be matched and will be omitted later)" ^. {3} $ ": end with three characters. the content enclosed in brackets only matches a single character "[AB]": matches a single a or B (the same as "a │ B ); "[a-d]": match a single character from 'A' to 'D' (same effect as "a │ B │ c │ d" and "[abcd ); "^ [a-zA-Z]": match a string that starts with a letter "[0-9] %": match a string that contains x % ", [a-zA-Z0-9] $ ": match a string ending with a comma plus a number or letter
You can also column the characters you don't want in brackets. you just need to use '^' in the brackets to start with (I. e ., "% [^ a-zA-Z] %" matches a non-letter string with two percentage signs ).
To be able to explain, but "^. [$ () │ * +? {/"As a special character, you must add'' in front of these characters, and avoid using/at the beginning of the pattern in php3, for example, regular Expression "(/$ │? [0-9] + "should call ereg (" (// $ │? [0-9] + ", $ str) (I don't know if php4 is the same)
Do not forget that the characters in the brackets are exceptions of this rule-in the brackets, all special characters, including ("), will lose their special properties (I. e ., "[*/+? {}.] "Match strings containing these characters ). also, as the regx manual tells us: "If the list contains ']', it is best to use it as the first character in the list (possibly following '^ ). if it contains '-', it is best to put it at the beginning or the end, or the second end point of a range (I. e. the '-' in the [a-d-0-9] will be valid.
For completeness, I should involve collating sequences, character classes, and equivalence classes. however, I do not want to elaborate on these aspects, and these articles do not need to be involved. you can get more messages in regex man pages.
How to build a pattern to match the number of currency input
Now let's use what we have learned to do something useful: Build a matching pattern to check whether the input information is a number that represents money. We think there are four ways to indicate the amount of money: "10000.00" and "10,000.00", or there is no fractional part, "10000" and "10,000 ″. now let's start building this matching mode:
This variable must start with a number other than 0, but it also means that a single "0" cannot pass the test. The following is a solution:
1 |
^ (0 │ [1-9] [0-9] *) $ |
"Only numbers starting with 0 and not starting with 0 match", we can also allow a negative number before the number:
1 |
^ (0 │ -? [1-9] [0-9] *) $ |
This is: "0 or a digit starting with 0 may have a negative number in front. "Okay. now let's not be so rigorous. we can start with 0. now let's give up the negative number, because we don't need to use it to represent coins. we now specify a pattern to match the fractional part:
1 |
^ [0-9] + (/. [0-9] + )? $ |
This implies that the matched string must start with at least one Arabic number. but note that in the above mode, "10. "does not match. only" 10 "and" 10.2 "can be used. (Do you know why)
1 |
^ [0-9] + (/. [0-9] {2 })? $ |
We have specified two decimal places. if you think this is too harsh, you can change it:
1 |
^ [0-9] + (/. [0-9] {1, 2 })? $ |
This will allow one or two decimal places. Now we add a comma (every three digits) to increase readability, which can be expressed as follows:
1 |
^ [0-9] {1, 3} (, [0-9] {3}) * (/. [0-9] {1, 2 })? $ |
Do not forget the plus sign '+' to be multiplied by '*'. if you want to allow blank strings to be input (why ?). Also, do not forget the backslice bar '/'. errors may occur in php strings (common errors ). now we can confirm the string. now we can remove all commas (str_replace) (",", "", $ money) then we can regard the type as double and then use it for mathematical computation.
Construct a regular expression for checking email
Okay. let's continue to discuss how to verify an email address. there are three parts in a complete email address: POP3 user name (everything on the left of '@'), '@', and server name (that is, the remaining part ). the user name can contain uppercase and lowercase letters, Arabic numerals, periods ('. '), minus sign ('-'), and underline ('_'). the server name also complies with this rule, except for the underlines.
The start and end of the user name cannot be a period. the same is true for servers. you cannot have at least one character between two consecutive periods. now let's take a look at how to write a matching pattern for the user name:
The end cannot exist yet. we can add the following:
1 |
^ [_ A-zA-Z0-9-] + (/. [_ a-zA-Z0-9-] +) * $ |
The above meaning is: "At least one canonical character (except. unexpected) starts with 0 or multiple strings starting with a dot ."
To simplify it, we can use eregi () to replace ereg (). eregi () is case-insensitive and we don't need to specify two ranges, a-z and A-Z-you just need to specify one:
1 |
^ [_ A-z0-9-] + (/. [_ a-z0-9-] +) * $ |
The server name is the same, but the underline should be removed:
1 |
^ [A-z0-9-] + (/. [a-z0-9-] +) * $ |
Done. now you only need to use @ to connect the two parts:
1 |
^ [_ A-z0-9-] + (/. [_ a-z0-9-] +) * @ [a-z0-9-] + (/. [a-z0-9-] +) * $ |
This is the complete email authentication matching mode. you only need to call
1 |
Eregi ('^ [_ a-z0-9-] + (/. [_ a-z0-9-] +) * @ [a-z0-9-] + (/. [a-z0-9-] +) * $ ', $ eamil) |
Then you can check whether the email is used.
Other regular expressions
Extract string
Ereg () and eregi () has a feature that allows users to extract a part of a string through regular expressions (you can read the manual for specific usage ). for example, we want to extract the file name from path/URL-the following code is what you need:
1 2 |
Ereg ("([^ //] *) $", $ pathOrUrl, $ regs ); Echo $ regs [1]; |
Advanced replacement
Ereg_replace () and eregi_replace () are also very useful: if we want to replace all the negative signs at intervals with commas:
1 |
Ereg_replace ("[/n/r/t] +", trim ($ str )); |
PHP is widely used in Web background CGI development. it usually produces some results after user data. However, if the user input data is incorrect, a problem may occur, for example, a person's birthday is "may February 30 "! How can we check whether the summer vacation is correct? We have added support for regular expressions in PHP, so that we can easily perform data matching.
2. what is a regular expression:
In short, regular expressions are a powerful tool for pattern matching and replacement. Trace regular expressions in almost all UNIX/LINUX-based software tools, such as Perl or PHP scripting languages. In addition, the script language of the JavaScript client also provides support for regular expressions. now, regular expressions have become a common concept and tool and are widely used by various technical personnel.
On a Linux website, there is something like this: "If you ask Linux fans what they like most, he may answer regular expressions. if you ask him what he is most afraid, in addition to tedious installation configurations, he will definitely say regular expressions."
As mentioned above, regular expressions seem very complex and scary. most PHP beginners will skip this section and continue the following learning, however, regular expressions in PHP can use pattern matching to find matching strings, determine whether a string meets the conditions, or use a specified string to replace matching strings, it's a pity that you don't study ......
3. basic syntax of a regular expression:
A regular expression is divided into three parts: separator, expression, and modifier.
Separators can be any character except special characters (such "/!" And so on), the commonly used separator is "/". The expression is composed of special characters (see the following for special characters) and non-special strings, such as "[a-z0-9 _-] + @ [a-z0-9 _-.] + "can match a simple email string. Modifier is used to enable or disable a function/mode. The following is an example of a complete regular expression:
The above regular expression "/" is the separator, the expression is between two "/", and the string "is" after the second "/" is the modifier.
If the expression contains delimiters, you need to use the escape symbol "/", such as "/hello. +? // Hello/is ". In addition to separators, escape characters can also be used to execute special characters. all special characters consisting of letters must be escaped by "/". for example, "/d" indicates all numbers.
4 Special characters of the regular expression:
Special characters in regular expressions include metacharacters and positioning characters.
Metacharacter is a special character in a regular expression. it is used to describe how a primary character (the character before the metacharacter) appears in a matched object. Metacharacters are single characters, but different or identical metacharacters can be combined to form large metacharacters.
Metacharacters:
Braces: braces are used to precisely specify the number of occurrences of matching metacharacters, for example, "/pre {}/" indicates that the matched objects can be "pre", "pree", and "preeeee". one to five "e" appears after "pr ".. Or "/pre {, 5}/" indicates that pre appears between 0 and 5 times.
The plus sign (+) is used to match the characters before the metacharacters one or more times. For example, "/ac +/" indicates that the matched object can be "act", "account", and "acccc". one or more "c" objects appear after "". string." + "Is equivalent to" {1 ,}".
Asterisk (*) is used to match zero or multiple times before the metacharacter. For example, "/ac */" indicates that the matched object may be "app", "acp", and "accp". there may be zero or multiple "c" after ""." * "Is equivalent to" {0 ,}".
Question mark :"?" It is used to match the characters before the metacharacters zero or one time. For example, "/ac? /"Indicates that the matched object can be" a "," acp ", and" acwp ", so zero or one" c "string will appear after" ."?" There is also a very important role in regular expressions, that is, "greedy mode ".
There are two important special characters: "[]". They can match the characters that appear in "[]", for example, "/[az]/" can match a single character "a" or "z "; if you change the above expression to "/[a-z]/", you can match any single lowercase letter, such as "a" and "B.
If "^" appears in "[]", it indicates that this expression does not match the characters in, for example, "/[^ a-z]/" does not match any lower-case letters! In addition, the regular expression provides the following default values:
1 2 3 4 5 6 7 8 |
[: Alpha:]: match any letter [: Alnum:]: match any letter or number [: Digit:]: match any number [: Space:]: matches space characters. [: Upper:]: match any uppercase letter [: Lower:]: match any lowercase letter [: Punct:]: match any punctuation marks [: Xdigit:]: match any hexadecimal number |
In addition, the following special characters indicate the following meanings after escape characters:
1 2 3 4 5 6 7 |
S: matches a single space character. S: used to match all characters except a single space character. D: used to match numbers from 0 to 9, which is equivalent to "/[0-9]/". W: used to match letters, numbers or underscores, equivalent to "/[a-zA-Z0-9 _]/". W: used to match all characters that do not match w, equivalent to "/[^ a-zA-Z0-9 _]/". D: used to match any non-decimal numeric characters. .: Used to match all characters except line breaks. if the modifier "s" is modified, "." can represent any character. |
The special characters above can be used to easily express some complicated pattern matching. For example, "// d0000/" can use the above regular expression to match an integer string of more than 100,001 and.
Positioning character:
Positioning characters are another very important character in regular expressions. They are mainly used to describe the position of characters in matching objects.
^: Indicates that the matching mode appears at the beginning of the matching object (different from in "[]") $: indicates that the matching mode appears at the end of the matching object with a space: "/^ he/": it can match strings starting with "he", such as hello and height; "/he $/": it can match a string ending with "he", that is, "she"; "/he/": it must start with a space and act the same as ^, match a string starting with "he"; "/he/": ends with a space. it matches the string ending with "he" as $. "/^ he $ /": it indicates that it only matches the string "he.
Brackets:
In addition to user matching, regular expressions can also use parentheses () to record the required information, store it, and read the following expressions. For example:
1 |
/^ ([A-zA-Z0-9 _-] +) @ ([a-zA-Z0-9 _-] +) (. [a-zA-Z0-9 _-]) $/ |
Is to record the user name of the mail address, and the server address of the mail address (in the form of username@server.com and so on), in the end if you want to read the string recorded, you only need to use the "escape character + record order" for reading. For example, "/1" is equivalent to the first "[a-zA-Z0-9 _-] +", "/2" is equivalent to the second ([a-zA-Z0-9 _-] + ), "/3" is the third (. [a-zA-Z0-9 _-]). However, in PHP, "/" is a special character and needs to be escaped. Therefore, "" in PHP expressions, it should be written as "// 1 ″.
Other special symbols:
"|": Or symbol "|" is the same as or in PHP, but it is a "|", not the two "in PHP | "! It can be a character or another string, for example, "/abcd | dcba/" may match "abcd" or "dcba ".
5 greedy mode:
As mentioned in metacharacters "?" Another important role is "greedy mode". what is "greedy mode?
For example, we want to match a string ending with the letter "a" and ending with the letter "B", but the string to be matched contains many "B" after "", for example, "a bbbbbbbbbbbbbbbbb", will the regular expression match the first "B" or the last "B? If you use the greedy mode, it will match the last "B", and vice versa, it will only match the first "B ".
The expression for greedy mode is as follows:
The greedy mode is not used as follows:
The above uses a modifier U. for details, see the following section.
6 modifier:
Modifiers in regular expressions can change many features of regular expressions, making them more suitable for your needs (note: modifiers are case sensitive, this means that "e" is not equal to "E "). The modifiers in the regular expression are as follows:
I: if "I" is added to the modifier, the regular expression will be case insensitive, that is, "a" and "A" are the same. M: The default regular start "^" and end "$" only if "m" is added to the modifier of the regular string, the start and end will refer to each row of the string: each line starts with "^" and ends with "$ ". S: if "s" is added to the modifier, the default "." indicates that any character except the line break will become any character, that is, include a line break! X: if this modifier is added, spaces in the expression will be ignored unless it has been escaped. E: This modifier is only useful for replacement, which indicates to use as PHP code in replacement. A: If this modifier is used, the expression must be the start part of the matched string. For example, "/a/A" matches "abcd ". E: opposite to "m", if this modifier is used, "$" matches the end of the absolute string instead of the line break. this mode is enabled by default. U: Similar to question mark, used to set "greedy mode ".
7. PCRE-Related regular expression functions:
PHP Perl is compatible with multiple functions provided by regular expressions, including pattern matching, replacement, and matching quantity:
1. preg_match:
Function format: int preg_match (string pattern, string subject, array [matches]);
This function uses the pattern expression in the string for matching. if [regs] is given, the string will be recorded in [regs] [0, [regs] [1] indicates the first string recorded with parentheses (), [regs] [2] indicates the second string recorded, and so on. If a matched pattern is found in the string, "true" is returned; otherwise, "false" is returned ".
2. preg_replace:
Function format: mixed preg_replace (mixed pattern, mixed replacement, mixed subject );
This function replaces all strings matching the expression pattern with the expression replacement. If replacement needs to contain some characters of pattern, you can use "()" to record it. in replacement, you only need to use "/1" to read it.
3. preg_split:
Function format: array preg_split (string pattern, string subject, int [limit]);
This function is the same as the function split. The difference is that only the simple regular expression can be used to split the matched string with the split function, while the preg_split function uses a fully Perl-compatible regular expression. The third parameter "limit" indicates the number of values that meet the conditions allowed to be returned.
4. preg_grep:
Function format: array preg_grep (string patern, array input );
This function basically works with the preg_match function. However, preg_grep can match all elements in the input of the given array and return a new array.
The following is an example. for example, we want to check whether the Email address format is correct:
'; If (! EmailIsRight ('y10k @ fffff') echo 'incorrect
';?>
The above program will output "correct"
Incorrect ".
8. the differences between Perl-compatible regular expressions and Perl/Ereg regular expressions in PHP:
Although it is called "Perl Compatible Regular Expressions", PHP is different from Perl's regular expressions. for example, the modifier "G" indicates all matches in Perl, however, this modifier is not supported in PHP.
There is also the difference with the ereg series functions. ereg is also a regular expression function provided in PHP, but it is much weaker than preg.
1. ereg does not need or use delimiters and modifiers. Therefore, ereg is much weaker than preg.
2. about ".": points in the regular expression are generally all characters except line breaks, but "." in ereg is any character, that is, line breaks! If you want "." to include line breaks in the preg, you can add "s" to the modifier ".
3. ereg uses greedy mode by default and cannot be modified. This causes a lot of trouble for replacement and matching.
4. speed: This may be a concern of many people. Will the preg feature be powerful in exchange for speed? Don't worry, the preg speed is much faster than ereg. I did a program test:
Time test:
PHP code:
Result:
[/Php]
Preg_replace used time:5ereg_replace used time:15str_replace used time:2
Str_replace is faster than ereg_replace because it does not need to be matched.
9. PHP3.0 support for preg:
Preg support is added by default in PHP 4.0, but it does not exist in 3.0. If you want to use the preg function in 3.0, you must load the php3_pcre.dll file. you only need to add "extension = php3_pcre.dll" in the extension section of php. ini and then restart PHP!
In fact, regular expressions are also often used in UbbCode implementation. many PHP forums use this method (such as zForum zphp.com or vB vbullent.com), but the specific code is relatively long.