Quick learning of PHP Regular Expressions

Last Update:2018-12-05 Source: Internet

Author: User

Tags ereg

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Getting Started

In short, regular expressions are a powerful tool for pattern matching and replacement. We can find regular expressions in almost all UNIX-based tools, such as the VI Editor, Perl or PHP scripting language, and awk or SED Shell programs. In addition, client scripting languages such as JavaScript also provide support for regular expressions. It can be seen that regular expressions have gone beyond the limits of a language or system and become widely accepted concepts and functions.
Regular Expressions allow users to construct matching modes by using a series of special characters, and then compare the matching modes with target objects such as data files, program input, and form input on the web page, execute the corresponding program based on whether the comparison object contains the matching mode.
For example, the most common application of regular expressions is to verify whether the format of the email address entered by the user online is correct. If the regular expression is used to verify that the email address format is correct, the form information entered by the user will be processed normally. Otherwise, if the email address entered by the user does not match the regular expression mode, A prompt will pop up asking the user to re-enter the correct email address. It can be seen that regular expressions play an important role in the logic judgment of Web applications.

2. Basic syntax

After a preliminary understanding of the functions and functions of a regular expression, let's take a look at the syntax format of the regular expression.
The regular expression format is generally as follows:
/Love/
The part between the "/" delimiters is the pattern to be matched in the target object. You only need to place the pattern content of the desired matching object between the "/" delimiters. To enable users to customize the mode content more flexibly, regular expressions provide special "metacharacters ". Metacharacters are special characters that have special meanings in regular expressions. They can be used to specify the mode in which the leading character (that is, the character located before the metacharacters) appears in the target object.
Frequently Used metacharacters include "+", "*", and "?". The "+" metacharacter specifies that its leading character must appear one or more times consecutively in the target object, the "*" metacharacter specifies that the leading character must appear zero or multiple times in the target object, and "?" Metacharacter specifies that the leading object must appear zero or once consecutively in the target object.
Next, let's take a look at the specific application of the regular expression metacharacters.
/FO +/
Because the above regular expression contains the "+" metacharacter, it can be used with the "fool", "FO ", or "football", and so on, one or more character strings that match the letter "F" consecutively.
/EG */
Because the above regular expression contains the "*" metacharacter, it can be used with the "easy", "ego ", or, "egg" and other strings that appear after the letter E are matched with zero or multiple Letter g consecutively.
/Wil? /
Because the above regular expression contains "?" Metacharacter, indicating that it can match the "win" or "Wilson" in the target object, and matches zero or one character string after the letter I.
In addition to metacharacters, you can also precisely specify the frequency of occurrence of a pattern in a matching object. For example,
/Jim {2, 6 }/
The above regular expression specifies that the character m can appear 2-6 times consecutively in the matching object. Therefore, the above regular expression can match strings such as Jimmy or jimmm.pdf.
After a preliminary understanding of how to use regular expressions, let's take a look at the usage of several other important metacharacters.
\ S: Used to match a single space character, including the tab key and line break;
\ S: Used to match all characters except a single space character;
\ D: Used to match numbers from 0 to 9;
\ W: Used to match letters, numbers, or underscores;
\ W: Used to match all characters that do not match \ W;
.: Used to match all characters except line breaks.
(Note: \ s and \ W can be regarded as inverse operations)
Next, let's take a look at how to use the above metacharacters in regular expressions through examples.
/\ S +/
The above regular expression can be used to match one or more space characters in the target object.
/\ D000/
If we have a complex financial statement in our hands, we can easily find all the payments totaling thousands of yuan using the regular expression above.

In addition to the metacharacters described above, regular expressions also have a unique special character, that is, the positioning character. Specifies the position where the matching mode appears in the target object.
Commonly used positioning characters include "^", "$", "\ B", and "\ B ". The "^" operator specifies that the matching mode must start with the target string, and the "$" operator specifies that the matching mode must end with the target object, the \ B locator specifies that the matching mode must appear at either the beginning or end of the target string, the "\ B" Locator specifies that the matched object must be within the boundary of the start and end of the target string. That is, the matched object cannot start with the target string, it cannot end with the target string. Similarly, we can regard "^" and "$" as well as "\ B" and "\ B" as two sets of operators for inverse operation. For example:
/^ Hell/
Because the above regular expression contains the "^" operator, it can match a string starting with "hell", "hello", or "Hellhound" in the target object.
/AR $/
Because the above regular expression contains the "$" operator, it can match the string ending with "car", "bar", or "Ar" in the target object.

/\ Bbom/
Because the above regular expression pattern starts with "\ B", it can match strings starting with "bomb" or "Bom" in the target object.
/Man \ B/
Because the above regular expression pattern ends with the "\ B" operator, it can match the string ending with "human", "Woman", or "man" in the target object.
To make it easier for users to set matching modes flexibly, regular expressions allow users to specify a range in the matching mode, not limited to specific characters. For example:
/[A-Z]/
The above regular expression will match any uppercase letter from A to Z.
/[A-Z]/
The above regular expression will match any lowercase letter from A to Z.
/[0-9]/
The above regular expression will match any number from 0 to 9.
/([A-Z] [A-Z] [0-9]) +/
The above regular expression will match any string consisting of letters and numbers, such as "ab0. Note that you can use "()" in a regular expression to combine strings. The content contained by the "()" symbol must appear in the target object at the same time. Therefore, the above regular expression cannot match strings such as "ABC", because the last character in "ABC" is a letter rather than a number.

If we want to implement the "or" operation similar to the programming logic in the regular expression, and select one of multiple different modes for matching, we can use the pipe character "| ". For example:
/To | too | 2/
The above regular expression will match "to", "too", or "2" in the target object.
There is also a common operator in the regular expression, that is, the negative character "[^]". Unlike the positioning character "^" described above, the "[^]" negation specifies that the target object cannot contain strings specified in the pattern. For example:
/[^ A-C]/
The above string will match any character except A, B, and C in the target object. In general, when "^" appears in "[]", it is regarded as a negative operator. When "^" is located outside of "[]" or, it should be regarded as a positioning character.
Finally, you can use the Escape Character "\" to add metacharacters to the regular expression mode and find matching objects. For example:
/Th \*/
The above regular expression will match "th *" in the target object rather than ".

3. Use instances

① In PHP, The ereg () function can be used for pattern matching. The format of the ereg () function is as follows:
　
Reference content is as follows:
Ereg (pattern, string)
Here, pattern indicates the regular expression mode, while string indicates the target object for performing the search and replacement operation. Verify the email address. The code written in PHP is as follows:

<? PHP
If (ereg ('^ ([a-zA-Z0-9 _-]) + @ ([a-zA-Z0-9 _-]) + (\. [a-zA-Z0-9 _-]) + ", $ email )){
Echo "your email address is correct !";}
Else {
Echo "Please try again !";
}
?>

② JavaScript 1.2 contains a powerful Regexp () object that can be used for matching regular expressions. The test () method can check whether the target object contains the matching mode and return true or false accordingly.

We can use JavaScript to write the following script to verify the validity of the email address entered by the user.

Reference content is as follows:
<HTML>
<Head>
<Script language = \ "javascript1.2 \">
<! -- Start hiding
Function verifyaddress (OBJ)
{
VaR email = obj. Email. value;
VaR pattern =/^ ([a-zA-Z0-9 _-]) + @ ([a-zA-Z0-9 _-]) + (\. [a-zA-Z0-9 _-]) + /;
Flag = pattern. Test (email );
If (FLAG)
{
Alert ("your email address is correct !");
Return true;
}
Else
{
Alert ("Please try again !");
Return false;
}
}
// Stop hiding -->
</SCRIPT>
</Head>
<Body>
<Form verifyaddress (this); \ ">
<Input name = \ "email \" type = \ "text \"/>
<Input type = \ "Submit \"/>
</Form>
</Body>
</Html>

Many may have headaches for regular expressions. Today, I want to use the expressions that ordinary people can understand with my understanding and some online articles. Share your learning experience with you.

The beginning is still worth mentioning ^ and $ are used to match the start and end of the string respectively. The following are examples:

"^ The": must start with a "the" string;
"Of Despair $": the end must contain a "of despair" string;

So,
"^ ABC $": a string that must start with ABC and end with ABC. In fact, only ABC matches;
"Notice": matches the string containing the notice;

You can see that if you do not use the two characters we mentioned (the last example), that is, the pattern (Regular Expression) can appear anywhere in the string to be tested, you didn't lock him to either side.

Next, let's talk about '*' + 'and '? '
They are used to indicate the number or order of occurrences of a character. They represent:
"Zero or more" is equivalent to {0 ,}
"One or more" is equivalent to {1 ,}
"Zero or one." is equivalent to {0, 1}

Here are some examples:

"AB *": it is synonymous with AB {0,}. It matches a string starting with a and followed by 0 or n B ("A", "AB ", "abbb", etc );
"AB +": it is synonymous with AB {1,}. It is the same as the above, but at least one B must exist ("AB" and "abbb );
"AB ?" : It is synonymous with AB {0, 1} and can have no or only one B;
"? B + $ ": match a string that ends with one or zero a plus more than one B.

Key points: '*' + 'and '? 'Only the character before it.

You can also limit the number of characters in braces, for example:

"AB {2}": requires that a be followed by two B (one cannot be less) ("ABB ");
"AB {2,}": requires that there must be two or more B (such as "ABB" and "abbbb") after );
"AB {3, 5}": requires that a can be followed by 2 to 5 B ("abbb", "abbbb", or "abbbbb ").

Now we can put a few characters in parentheses, for example:

"A (BC) *": matches 0 or a "BC" after ";
"A (BC) {}": one to five "BC ";

There is also a character '|', which is equivalent to the OR operation:

"Hi | hello": match with "hi" or "hello"
String;
"(B | cd) EF": matches strings containing "BEF" or "cdef;
"(A | B) * C": matches strings containing multiple (including 0) A or B, followed by a string of C;

A point ('.') can represent all single characters, excluding "\ n"

What if we want to match all single characters including "\ n?

Use the '[\ n.]' mode.

"A. [0-9]": Add a character to a pair and a number ranging from 0 to 9;
"^. {3} $": the end of any three characters.

The content enclosed in brackets only matches a single character.

"[AB]": matches A or B (the same as "a │ B );
"[A-d]": matches a single character from 'A' to 'D' (same as "a │ B │ C │ D" and "[ABCD );

Generally, we use [A-Za-Z] to specify a character as a case:

"^ [A-Za-Z]": matches a string starting with an uppercase/lowercase letter;
"[0-9] %": Match contains
X %
String;
', [A-zA-Z0-9] $': match a string that ends with a comma plus a number or a letter;

You can also include characters you don't want in brackets, you only need to use '^' in the brackets to start with "% [^ A-Za-Z] %" to match a non-letter string containing two percentage signs.

Key Point: ^ when used at the beginning of the brackets, it indicates that the characters in the brackets are excluded.

For PHP to be able to interpret, you must add "before these characters and escape some characters.

Do not forget that the character in the brackets is an exception to this rule-in the brackets, all the special characters, including ("), all will lose their special nature "[* \ +? {}.] "Matches strings containing these characters:

Also, as the regx manual tells us: "If the list contains ']', it is best to use it as the first character in the list (probably after '^ ). If it contains '-', it is better to put it in front or at the end, or in the middle of the second end point of a range [a-d-0-9] '-' will be valid.

After reading the example above, you should understand {n, m. Note that N and M cannot be negative integers, and N is always less than M. In this way
Match at least N times and at most m times. For example, "P {}" matches the first five P in "pvpppp ".

Which of the following statements start \?

\ B indicates that it is used to match a word boundary, that is... For example, 've \ B 'can match the VE in love instead of very.

\ B is the opposite of \ B above. I will not give it an example.

..... Suddenly remembered .... You can look at the http://www.phpv.net.sixxs.org/article.php/251 to see other syntaxes starting \

Well, let's make an application: How to build a pattern to match the number of currency input.

Construct a matching pattern to check whether the input information is a number that represents money. We think there are four ways to indicate the amount of money: "10000.00" and "10,000.00", or there is no fractional part, "10000" and "10,000 ″. Now let's start building this matching mode:

^ [1-9] [0-9] * $

This variable must start with a number other than 0. But this also means that a single "0" cannot pass the test. The solution is as follows:

^ (0 | [1-9] [0-9] *) $

"Only numbers starting with 0 and not starting with 0 match", we can also allow a negative number before the number:

^ (0 | -? [1-9] [0-9] *) $

This is: 0 or a number that starts with 0 and may have a negative number in front. Now let's not be so rigorous. We can start with 0. Now let's give up the negative number, because we don't need to use it to represent coins. We now specify a pattern to match the fractional part:

^ [0-9] + (\. [0-9] + )? $

This implies that the matched string must start with at least one Arabic number. However, note that "10." does not match in the above mode,
Only "10" and "10.2" are allowed. Do you know why?

^ [0-9] + (\. [0-9] {2 })? $

We specify that there must be two decimal places after the decimal point. If you think this is too harsh, you can change it:

^ [0-9] + (\. [0-9] {1, 2 })? $

This will allow one to two decimal places. Now we add a comma (every three digits) to increase readability, which can be expressed as follows:

^ [0-9] {1, 3} (, [0-9] {3}) * (\. [0-9] {1, 2 })? $

Do not forget that '+' can be replaced by '*'. If you want to allow blank strings to be input, also, do not forget that the backslashes '\' may cause errors in PHP Strings (Common Errors ):

Now we can confirm the string. Now we can remove all the commas (,) from str_replace (",", "", $ money) then we can regard the type as double, and then we can use it for mathematical computation.

Construct a regular expression for checking email

There are three parts in a complete email address:

1. username (everything on the left)
2 .'@'
3. Server Name (that is, the remaining part)

The username can contain uppercase and lowercase letters, Arabic numerals, periods ('.'), periods ('-'), and underscores '_'). The server name also complies with this rule, except for underlines.

The start and end of the user name cannot be a period, and the server does the same. You cannot have at least one character between two consecutive periods. Now let's take a look at how to write a matching pattern for the user name:

^ [_ A-zA-Z0-9-] + $

Currently, periods cannot exist. We add:

^ [_ A-zA-Z0-9-] + (\. [_ a-zA-Z0-9-] +) * $

It means that at least one canonicalized character (except.) starts with 0 or more strings starting with a vertex.

Simpler,
We can use eregi () instead of ereg () and eregi () to be case insensitive,
We don't need to specify the two ranges "a-Z" and "A-Z". You just need to specify one:

^ [_ A-z0-9-] + (\. [_ a-z0-9-] +) * $

The server name is the same, but the underline should be removed:

^ [A-z0-9-] + (\. [a-z0-9-] +) * $

Okay. Now you only need to connect the two parts:

^ [_ A-z0-9-] + (\. [_ a-z0-9-] +) * @ [a-z0-9-] + (\. [a-z0-9-] +) * $

This is the complete email authentication matching mode. You only need to call:

Eregi ("^ [_ a-z0-9-] + (\. [_ a-z0-9-] +) * @ [a-z0-9-] + (\. [a-z0-9-] +) * $ ", $ eamil)

Then you can check whether the email is used.

Other regular expressions

Extract string

Ereg () and eregi () has a feature that allows users to extract part of a string using regular expressions (you can read the manual for specific usage ). For example, if we want to extract the file name from path/URL, the following code is required:

Ereg ("([^ \/] *) $", $ pathorurl, $ regs );
Echo $ regs [1];

Advanced replacement

Ereg_replace () and eregi_replace () are also very useful. If we want to replace all the negative signs at intervals with commas:

Ereg_replace ("[\ n \ r \ t] +", ",", trim ($ Str ));

Finally, let me analyze another string of regular expressions used to check the email:

"^ [-! # $ % & \ '* + \./0-9 =? A-Z ^ _ 'a-z {|} ~] + '.'@'.'[-! # $ % & \ '* + \/0-9 =? A-Z ^ _ 'a-z {|} ~] + \.'.'[-! # $ % & \ '* + \./0-9 =? A-Z ^ _ 'a-z {|} ~] + $"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More