Probing into the learning of regular expressions in PHP

Probing into the learning of regular expressions in PHP _php skills

Last Update:2017-01-19 Source: Internet

Author: User

Tags ereg lowercase uppercase letter

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Introductory Introduction

In short, regular expressions are a powerful tool that can be used for pattern matching and substitution. We can find regular expressions in almost all the Unix-based tools, such as the VI editor, the Perl or PHP scripting language, and awk or sed shell programs. In addition, scripting languages such as JavaScript clients also provide support for regular expressions. Thus, regular expressions have gone beyond the limits of a language or a system and become a widely accepted concept and function.

Regular expressions allow the user to build a matching pattern by using a series of special characters, then compare the matching pattern with the target objects such as data file, program input, and form input of the Web page, and execute the corresponding program according to whether the matching pattern is included in the comparison object.

For example, one of the most common applications of regular expressions is to verify that the e-mail addresses that users enter online are correctly formatted. If the user's e-mail address is properly formatted through a regular expression, the form information that the user fills out will be processed correctly, whereas if the user enters an e-mail address that does not match the regular expression, a prompt will pop up asking the user to re-enter the correct e-mail address. This shows that regular expressions play an important role in the logical judgment of Web applications.

2. Basic grammar

After a preliminary understanding of the function and function of regular expressions, let's take a look at the syntax format of regular expressions.
regular expressions are generally as follows:
/love/
The part of the "/" delimiter is the pattern that will be matched in the target object. The user simply puts the pattern content that wants to find the matching object in between the "/" delimiters. Regular expressions provide specialized "meta characters" to enable users to customize schema content more flexibly. The term "metacharacters" refers to those special characters that have special meaning in regular expressions and can be used to specify the mode in which the leading character (that is, the character at the front of the metacharacters) appears in the target object.

The more commonly used meta characters include: "+", "*", and "?". where the "+" meta character stipulates that its leading character must appear consecutively or repeatedly in the target object, the "*" metacharacters specify that its leading characters must appear 0 or more consecutive times in the target object, and "?" Metacharacters specify that its leading object must appear 0 or one consecutive times in the target object.
Next, let's look at the specific application of regular expression meta characters.
/fo+/
Because the preceding regular expression contains a "+" metacharacters, it means that a string of one or more letters O can be matched with the "fool", "fo", or "football" in the target object after the letter F.
/eg*/
Because the above regular expression contains a "*" metacharacters, the representation can match the string of 0 or more letters g that are "easy", "ego", or "egg" in the target object, which follows the letter E.
/wil?/
Because the above regular expression contains "? A meta character that matches a string of 0 or one letter L that can occur consecutively after the letter I in the target object, such as "Win" or "Wilson".
In addition to metacharacters, users can specify exactly how often a pattern will appear in a matching object. For example
/jim{2,6}/
The regular expression above stipulates that the character M can appear consecutively 2-6 times in a matching object, so the regular expression above can match a string such as Jimmy or Jimmmmmy.
After a preliminary understanding of how to use regular expressions, let's look at how other important metacharacters are used.
\s: Used to match a single spaces, including tab keys and line breaks;
\s: Used to match all characters except a single spaces;
\d: Used to match numbers from 0 to 9;
\w: Used to match letters, numbers, or underscore characters;
\w: Used to match all characters that do not match the \w;
. : Used to match all characters except for line breaks.
(Note: We can think of \s and \s as well as \w and \w as inverse)
Below, let's take a look at how to use the above metacharacters in regular expressions.
/\s+/
The preceding regular expression can be used to match one or more whitespace characters in the target object.
/\d000/
If we have a complex financial statement in hand, then we can easily find all sums up to thousand dollars through the regular expressions mentioned above.

In addition to the meta characters we have described above, there is another unique special character in the regular expression, that is, the locator. The locator character is used to specify where the match pattern appears in the target object.

The more commonly used locator characters include: "^", "$", "\b" and "\b". Where the "^" locator stipulates that the matching pattern must be present at the beginning of the target string, the "$" locator requires that the matching pattern be present at the end of the target object, and that the \b Locator must be one of the two boundaries at the beginning or end of the target string, and the "\b" The locators specify that the matching object must be within two boundaries at the beginning and end of the target string, that is, the matching object can neither be the beginning of the target string nor the end of the target string. Similarly, we can think of "^" and "$" and "\b" and "\b" as two sets of locators that are mutually inverse operations. For example:
/^hell/
Because the above regular expression contains the "^" Locator, you can match a string that starts with "hell", "Hello", or "Hellhound" in the target object.
/ar$/
Because the preceding regular expression contains a "$" locator, you can match a string that ends with "car", "bar", or "AR" in the target object.
/\bbom/
Because the above regular expression pattern starts with the "\b" locator, it can match a string that starts with "bomb" or "BOM" in the target object.
/man\b/
Because the above regular expression pattern ends with the "\b" locator, you can match the string that ends with "human", "Woman", or "man" in the target object.
In order to make it easier for users to set a matching pattern, regular expressions allow the user to specify a range in the matching pattern and not be limited to specific characters. For example:
/[a-z]/
The regular expression above will match any uppercase letter from A to Z range.
/[a-z]/
The regular expression above will match any lowercase letter from a to Z range.
/[0-9]/
The regular expression above will match any number in the range from 0 to 9.
/([a-z][a-z][0-9]) +/
The regular expression above will match any string of letters and numbers, such as "aB0". The point to note here is that you can use "()" to group strings together in regular expressions. The "()" symbol contains content that must also appear in the target object. Therefore, these regular expressions will not match strings such as "ABC", because the last character in "ABC" is a letter rather than a number.
If we want to implement a "or" operation in a regular expression that is similar in programming logic, you can use the pipe character "|" If you choose one of several different modes to match. For example:
/to|too|2/
The regular expression above will match the "to", "too", or "2" in the target object.
There is also a more commonly used operator in the regular expression, that is, the negative character "[^]". Unlike the locator "^" described in the previous article, the negative character "[^]" stipulates that the string specified in the pattern cannot exist in the target object. For example:
/[^a-c]/
The above string will match any character except A,b, and C in the target object. Generally, when "^" appears inside "[]", it is regarded as a negation operator, and when "^" is outside "[]" or there is no "[]", it should be treated as a locator character.
Finally, the escape character "\" is used when the user needs to add metacharacters to the pattern of regular expressions and find their matching objects. For example:
/th\*/
The regular expression above will match the "th*" in the target object, not the "the".

3, the use of examples

The Ereg () function can be used in ①php to perform pattern matching operations. The Ereg () function is used in the following format:
　
The following are the referenced contents:
Ereg (pattern, string)
Where pattern represents the schema of the regular expression, and string is the target object that performs a find-and-replace operation. Also verify the email address, the program code written using PHP is as follows:

Copy Code code as follows:

< PHP
if (Ereg ("^ ([a-za-z0-9_-]) +@" ([a-za-z0-9_-]) + (\.[ A-za-z0-9_-]) + ", $email))" {
echo "Your email address is correct!";}
else{
echo "Please try again!";
}
?>

②javascript 1.2 has a powerful regexp () object that can be used to match regular expressions. The test () method can verify that the target object contains a match pattern and returns TRUE or false accordingly.

We can use JavaScript to write the following script to verify the validity of the email address entered by the user.
The following are the referenced contents:

Copy code code as follows:

Presumably many people have a headache for regular expressions. Today, I have my knowledge, add some articles on the Internet, I hope to use ordinary people can understand the way of expression. To share learning experience with you.

Opening, still have to say ^ and $ they are used to match the start and end of a string, respectively, as follows:

"^the": The beginning must have "the" string;
"Of despair$": The end must have a string of "of despair";

So
"^abc$": a string that requires the beginning of ABC and the end of ABC, which is actually only an ABC match;
"Notice": matches a string containing notice;

You can see if you don't use the two characters we mentioned (the last example), which means that the pattern (regular expression) can appear anywhere in the string being tested, you don't lock him on either side.

Then, say ' * ' + ' and '? '
They are used to indicate the number of times a character can appear or the order they represent:
"Zero or more" equals {0,}
"One or more" equals {1,}
"Zero or one." Equivalent to {0,1}

Here are some examples:

"ab*": and Ab{0,} synonymous, matching with a start, followed by 0 or n B of the string ("a", "AB", "abbb", etc.);
"ab+": and Ab{1,} synonymous with the same as above, but at least a B exists ("AB" "abbb", etc.);
"Ab?" : and ab{0,1} synonymous, can not or only a B;
"a?b+$": a string that matches the end of one or 0 a plus more than one B.

Main points: ' * ' + ' and '? ' just the character in front of it.

You can also limit the number of characters appearing in curly braces, such as:

"Ab{2}": Require a must be followed by two B (one also can not be less) ("ABB");
"Ab{2,}": Require a must have two or more than two B (such as "ABB" "abbbb", etc.);
' ab{3,5} ': Requires that a 2-5 B ("abbb", "abbbb", or "abbbbb") be followed by a.

Now we put a few characters into parentheses, such as:

"A (BC) *": Match a followed by 0 or a "BC";
"A (BC) {1,5}": one to 5 "BC";

There is also a character ' | ', which is equivalent to an OR operation:

"Hi|hello": matches a string containing "hi" or "hello";
"(B|CD) EF": Matches a string containing "bef" or "cdef";
"(a|b) *c": matches a string containing such multiple (including 0) A or B, followed by a C;

A point ('. ') can represent all single characters, excluding "\ n"

What if you want to match all the individual characters, including "\ n"?

with ' [\ n.] ' This pattern.

"A.[0-9]": A plus one character plus a number of 0 to 9;
^. {3}$: Three any character end.

The content enclosed in brackets matches only one single character

"[AB]": Match a single A or B (as with "a│b");
[a-d]: a single character that matches ' a ' to ' d ' (same as ' a│b│c│d ' and ' [ABCD] ' effect);

Generally we use [a-za-z] to specify characters for a single case in English:

"^[a-za-z]": matches a string that begins with the uppercase and lowercase letters;
"[0-9]%": matches a string containing a shape such as x percent;
", [a-za-z0-9]$": A string that matches a comma with a number or letter ending;

You can also put you don't want to get the character column in brackets, you just need to use ' ^ ' in the parentheses as the opening "%[^a-za-z]%" match contains two percent signs inside a non-alphanumeric string.

Important: ^ When used at the beginning of brackets, the characters in parentheses are excluded.

For PHP to be able to explain, you have to add "around these character surfaces" and to escape some characters.

Don't forget that the characters inside the brackets are exceptions to this rule road-inside the brackets, all the special characters, including ("), will lose their special properties" [*\+?{}.] Matches a string containing these characters:

And, as REGX's Handbook tells us: "If the list contains '," It's best to use it as the first character in the table (probably following ' ^ '). If it contains '-', it is best to put it on the front or the last side, or or the second end of a range [a-d-0-9] in the middle of '-' will be valid.

Looking at the above example, you should understand {n,m}. Note that both N and m cannot be negative integers, and n is always less than M. In this way, you can match at least N times and match up to M times. such as "p{1,5}" will match the first five p in "PVPPPPPP"

Let's talk about the beginning of the

\b said he was used to match a word boundary, which is ... such as ' ve\b ', can match the love of VE and do not match very have ve

\b is exactly the opposite of the \b above. I'm not going to give you an example.

... it suddenly occurred to me that .... You can go to http://www.phpv.net/article.php/251 to see other syntax that starts with \

OK, let's do an application: How to build a pattern to match the amount of money entered.

Build a matching pattern to check whether the information entered is a number that represents money. We think that there are four ways to represent money: "10000.00″ and" 10,000.00″, or no decimal part, "10000″and" 10,000″. Now let's start building this matching pattern:

^[1-9][0-9]*$

This is the variable that must begin with a number other than 0. But it also means that a single "0″" Cannot pass the test. Here's how to fix it:

^ (0| [1-9] [0-9]*) $

"Only 0 matches the number that does not start with 0", we can also allow a minus sign before the number:

^ (0|-?) [1-9] [0-9]*) $

This is: 0 or a number that starts with 0 and may have a minus sign in front of it. Well, now let's not be so rigorous, allow 0 to begin with. Now let's give up the minus sign because we don't need to use it when it comes to coins. We now specify the pattern to match the decimal part:

^[0-9]+ (\.[ 0-9]+)? $

This implies that the matching string must begin with at least one Arabic numeral. But note that in the above mode "10." is not a match, only "10″ and" 10.2″ can, you know why?

^[0-9]+ (\.[ 0-9]{2})? $

We have to specify two decimal digits after the decimal point. If you think this is too harsh, you can change it to:

^[0-9]+ (\.[ 0-9]{1,2})? $

This will allow one to two characters after the decimal point. Now we add commas to increase readability (every three digits), and we can say this:

^[0-9]{1,3} (, [0-9]{3}) * (\.[ 0-9]{1,2})? $

Don't forget that ' + ' can be replaced if you want to allow a blank string to be entered, do not forget that the backslash ' \ ' may have errors in the PHP string (a common error):

Now that we can confirm the string, we're going to remove all the commas str_replace (",", "", $money) and then look at the type as a double and then we can do the math by him.

One more:

Construct regular expressions to check email

There are three parts in a full email address:

1. Username (everything on the ' @ ' left)
2. ' @ '
3. Server name (that's the rest of the section)

User names can contain uppercase and lowercase Arabic numerals, periods ('. ') Minus sign ('-') and underscore ' _ '. The server name also complies with this rule, with the exception of the underscore.

Now, the start and end of a username cannot be a period, as is the case with the server. And you can't have two consecutive periods there's at least one character between them, okay now let's take a look at how to write a matching pattern for the user name:

^[_a-za-z0-9-]+$

It is not possible to allow a period to exist yet. We add it to:

^[_a-za-z0-9-]+ (\.[ _a-za-z0-9-]+) *$

The above means: start with at least one canonical character (except.) followed by 0 or more strings starting with dots.

To simplify, we can use EREGI () instead of Ereg (), eregi () is not sensitive to case, we do not need to specify two range "A-Z" and "A-Z" only need to specify one on it:

^[_a-z0-9-]+ (\.[ _a-z0-9-]+) *$

The following server name is the same, but to remove the underline:

^[a-z0-9-]+ (\.[ a-z0-9-]+) *$

Good. Now just use the "@" to connect the two parts:

^[_a-z0-9-]+ (\.[ _a-z0-9-]+) *@[a-z0-9-]+ (\.[ a-z0-9-]+) *$

This is the complete email authentication matching mode, only need to call:

Eregi ("^[_a-z0-9-]+" (\.[ _a-z0-9-]+) *@[a-z0-9-]+ (\.[ a-z0-9-]+) *$ ", $eamil)

We can get an email.

Other uses of regular expressions

Extract string

Ereg () and eregi () has an attribute that allows the user to extract part of the string through a regular expression (you can read the manual). For example, we want to extract the filename from Path/url, and the following code is what you need:

Ereg ("([^\\/]*) $", $PATHORURL, $regs);
echo $regs [1];

High-level substitution

Ereg_replace () and Eregi_replace () are also useful if we want to replace all the interval minus signs with commas:

Ereg_replace ("[\n\r\t]+", ",", Trim ($STR));

Finally, I have another series of check email regular expression to let you look at the article analysis:

"^[-!#$%&\ ' *+\\./0-9=?" A-z^_ ' a-z{|} ~]+'.' @'.' [-!#$%&\ ' *+\\/0-9=? A-z^_ ' a-z{|} ~]+\.'.' [-!#$%&\ ' *+\\./0-9=? A-z^_ ' a-z{|} ~]+$ "

If you can easily read, then the purpose of this article is achieved.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More