A fast learning method for regular expressions in PHP

Last Update:2015-04-24 Source: Internet

Author: User

Tags ereg uppercase letter

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Introduction to Getting Started

Simply put, regular expressions are a powerful tool that can be used for pattern matching and substitution. We can find regular expressions in almost all UNIX-based system tools, such as VI Editor, perl or PHP scripting language, and awk or sed shell programs. In addition, the scripting language of the client like JavaScript provides support for regular expressions. It can be seen that the regular expression has gone beyond the limits of a language or a system, becoming a widely accepted concept and function.
Regular expressions allow the user to construct a matching pattern by using a series of special characters, and then compare the matching pattern with the data file, the program input, the form input of the Web page, and the corresponding program according to whether the matching pattern is included in the comparison object.
For example, one of the most common uses of regular expressions is to verify that the e-mail addresses that users enter online are in the correct format. If the user's e-mail address is properly formatted with a regular expression, the form information that the user fills in will be processed normally, and if the user enters a message address that does not match the regular expression pattern, a prompt will be displayed asking the user to reenter the correct email address. This shows that the regular expression plays an important role in the logic judgment of Web application.

2. Basic grammar

After a preliminary understanding of the function and function of regular expressions, let's take a specific look at the syntax format of the regular expression.
Regular expressions are generally in the following form:
/love/
The part where the "/" delimiter is located is the pattern that will be matched in the target object. The user simply puts the pattern content that you want to find the matching object in between the "/" delimiter. To enable the user to customize the schema content more flexibly, the regular expression provides a special "meta-character". A meta-character is a special character in a regular expression that can be used to specify its leading character (that is, the character in front of the metacharacters) in the target object.
The more commonly used metacharacters are: "+", "*", and "?". where the "+" metacharacters stipulate that their leading characters must appear one or more times in the target object, the "*" metacharacters stipulate that their leading characters must appear 0 or more times in the target object, and "?" A meta-character specifies that its leading object must appear 0 or more times in the target object.
Below, let's look at the specific application of the regular expression meta-character.
/fo+/
Because the above regular expression contains the "+" metacharacters, it can be matched with a string of "fool", "fo", or "football" in the target object, and so on with one or more letters O consecutively after the letter F.
/eg*/
Because the above regular expression contains the "*" meta-character, it can be matched with a string such as "Easy", "ego", or "egg" in the target object, such as 0 or more occurrences of the letter G after the letter E.
/wil?/
Because the above regular expression contains the "? A meta-character that represents a string that can match "Win" in the target object, or "Wilson", and so on, after the letter I, 0 consecutive or one letter L.
In addition to metacharacters, users can specify exactly how often the pattern appears in the matching object. For example
/jim{2,6}/
The regular expression above specifies that the character m can appear continuously 2-6 times in the matching object, so the above regular expression can match the string such as Jimmy or Jimmmmmy.
After a preliminary understanding of how to use regular expressions, let's take a look at some of the other important metacharacters uses.
\s: Used to match a single space character, including Tab key and line break;
\s: Used to match all characters except a single space character;
\d: Used to match numbers from 0 to 9;
\w: Used to match letters, numbers, or underscore characters;
\w: Used to match all characters that do not match the \w;
. : Used to match all characters except the line break.
(Note: We can think of \s and \s as well as \w and \w as inverse for each other)
Below, we'll look at how to use the above metacharacters in regular expressions using an example.
/\s+/
The preceding regular expression can be used to match one or more space characters in the target object.
/\d000/
If we have a complex financial statement in hand, we can easily find all sums amounting to thousands of dollars through these regular expressions.

In addition to the meta-characters we have described above, there is another unique private character in the regular expression, the locator. The locator is used to specify where the matching pattern appears in the target object. The
more commonly used locators include: "^", "$", "\b", and "\b". Where the "^" locator specifies that the matching pattern must be at the beginning of the target string, the "$" locator specifies that the matching pattern must be at the end of the target object, and the \b Locator specifies that the matching pattern must be one of the two boundaries at the beginning or end of the target string, and "\b" A locator specifies that the matching object must be within the first and end two boundaries of the target string, that is, the matching object cannot be either the beginning of the target string or the end of the target string. Similarly, we can think of "^" and "$" as well as "\b" and "\b" as two sets of locators for reciprocal operations. For example:
/^hell/
Because the preceding regular expression contains a "^" Locator, you can match a string that begins with "Hell", "Hello" or "Hellhound" in the target object.
/ar$/
Because the preceding regular expression contains a "$" locator, you can match a string that ends with "car", "bar" or "AR" in the target object.

/\bbom/
Because the regular expression pattern above starts with the "\b" locator, it can match a string that starts with "bomb" or "BOM" in the target object.
/man\b/
Because the regular expression pattern above ends with the "\b" locator, you can match a string that ends with "human", "Woman" or "man" in the target object.
In order to make it easier for users to set the matching pattern, the regular expression allows the user to specify a range in the matching pattern rather than a specific character. For example:
/[a-z]/
The regular expression above will match any uppercase letter from A to Z range.
/[a-z]/
The above regular expression will match any lowercase letter from a to Z range.
/[0-9]/
The regular expression above will match any number in the range from 0 to 9.
/([a-z][a-z][0-9]) +/
The above regular expression will match any string consisting of letters and numbers, such as "aB0". One thing you should be reminded of here is that you can use "()" in regular expressions to group strings together. The "()" symbol contains content that must appear in the target object at the same time. Therefore, the preceding regular expression will not match a string such as "ABC", because the last character in "ABC" is a letter rather than a number.
If we want to implement a "or" operation in a regular expression similar to a programming logic, you can use the pipe symbol "|" In any of several different patterns to match. For example:
/to|too|2/
The above regular expression will match "to", "too", or "2" in the target object.
There is also a more commonly used operator in the regular expression, the negative character "[^]". Unlike the locator "^" we described earlier, the negation "[^]" specifies that a string specified in the pattern cannot exist in the target object. For example:
/[^a-c]/
The above string will match any character except A, B, and C in the target object. In general, "^" is considered a negation operator when it appears in "[]", and when "^" is outside of "[]" or "[]", it should be treated as a locator.
Finally, the escape character "\" can be used when the user needs to include metacharacters in the pattern of the regular expression and find the matching object. For example:
/th\*/
The regular expression above will match the "th*" in the target object, not the "the", and so on.

3. Usage examples

You can use the Ereg () function for pattern matching operations in ①php. The Ereg () function is used in the following format:
　

The following is the referenced content:
Ereg (pattern, string)

Where pattern represents the pattern of a regular expression, and string is the target object that performs a find-and-replace operation. The same is the verification email address, the program code written in PHP is as follows:

< PHP
if (Ereg ("^ ([a-za-z0-9_-]) [email protected] ([a-za-z0-9_-]) + (\.[ A-za-z0-9_-]) + ", $email)) {
echo "Your email address is correct!";}
else{
echo "Please try again!";
}
?>

②javascript 1.2 has a powerful regexp () object that can be used to match the regular expression. The test () method can verify that the target object contains a matching pattern and returns TRUE or false accordingly.

We can use JavaScript to write the following script to verify the validity of the email address entered by the user.

The following is the referenced content:
<script language=\ "Javascript1.2\" >
<!--start hiding
function verifyaddress (obj)
{
var email = obj.email.value;
var pattern =/^ ([a-za-z0-9_-]) [email protected] ([a-za-z0-9_-]) + (\.[ A-za-z0-9_-]) +/;
Flag = pattern.test (email);
if (flag)
{
Alert ("Your email address is correct!");
return true;
}
Else
{
Alert ("Please try again!");
return false;
}
}
Stop Hiding--
</script>
<body>
<form onsubmit=\ "return verifyaddress (this); \" >
<input name=\ "email\" type=\ "text\"/>
<input type=\ "submit\"/>
</form>
</body>

Presumably many people have headaches with regular expressions. Today, I have my knowledge, plus some articles on the Internet, I hope to use ordinary people can understand the expression. To share your learning experience with everyone.

At the beginning, still have to say ^ and $ they are respectively used to match the start and end of the string, the following examples illustrate:

"^the": The beginning must have "the" string;
"Of despair$": Must end with a string of "of despair";

So
"^abc$": the string that requires ABC to start and end with ABC, is actually only ABC match;
"Notice": matches the string containing the notice;

You can see if you're not using the two characters we mentioned (the last one), which means that the pattern (regular expression) can appear anywhere in the checked string, and you don't lock him to either side.

Then, say ' * ' + ' and '? '
They are used to indicate the number or order in which a character can appear, respectively:
"Zero or more" equals {0,}
"One or more" equals {1,}
"Zero or one." Equivalent to {0,1}

Here are some examples:

"ab*": Synonymous with ab{0,}, match with A, followed by a string of 0 or N B ("a", "AB", "abbb", etc.);
"ab+": Synonymous with Ab{1,}, same as above, but at least one B exists ("AB", "abbb", etc.);
"Ab?" : Synonymous with ab{0,1}, can have no or only one B;
"a?b+$": matches a string that ends with one or 0 a plus more than one B.

Important: ' * ' + ' and '? ' just the character in front of it.

You can also limit the number of characters that appear in curly braces, such as:

"Ab{2}": Requires a must be followed by two B (one can not be less) ("ABB");
"Ab{2,}": Requires a must have two or more than two B (such as "ABB", "abbbb", etc.);
"ab{3,5}": Requires a can have 2-5 B ("abbb", "abbbb", or "abbbbb") after a.

Now let's put a few characters into parentheses, for example:

"A (BC) *": Match a followed by 0 or a "BC";
"A (BC) {1,5}": one to 5 "BC";

There is also a character ' | ', equivalent to or operation:

"Hi|hello": matches a string containing "hi" or "hello";
"(B|CD) EF": Matches a string containing "bef" or "cdef";
"(a|b) *c": matches a string containing so many (including 0) A or B, followed by a C;

A point ('. ') can represent all single characters, not including "\ n"

What if you want to match all of the individual characters, including "\ n"?

Use ' [\ n.] ' This mode.

"A.[0-9]": A plus one character plus a number 0 to 9;
^. {3}$ ": three characters end.

Bracketed content matches only one single character

"[AB]": matches a single A or B (as with "a│b");
"[A-d]": a single character matching ' a ' to ' d ' (same as "a│b│c│d" and "[ABCD]");

In general, we use [a-za-z] to specify the characters for a case in English:

"^[a-za-z]": matches a string that begins with a case letter;
"[0-9]%": matches a string containing the shape as x percent;
", [a-za-z0-9]$": matches a string that ends with a comma plus a number or letter;

You can also put the words you don't want to be in brackets, you just need to use ' ^ ' as the first "%[^a-za-z]%" to match the two percent sign containing a non-alphabetic string.

Important: ^ When you start with brackets, you exclude the characters in parentheses.

For PHP to be able to explain, you must add "and escape some characters before and after these character faces."

Do not forget that the characters inside the brackets are exceptions to this rule-in brackets, all special characters, including ("), will lose their special properties" [*\+?{}.] " Match the string containing these characters:

Also, as REGX's Handbook tells us: "If the list contains '] ', it is best to use it as the first character in the table (possibly following the ' ^ '). If it contains '-', it is best to put it on the front or the last side, or the '-' in the middle of the second end of a range [a-d-0-9] will be valid.

Looking at the example above, you should understand {n,m}. It is important to note that both N and m cannot be negative integers, and n is always less than M. This way, you can match at least n times and up to M times. such as "p{1,5}" will match the first five p in "PVPPPPPP"

Let's start with the following words.

\b The book says he is used to match a word boundary, that is ... such as ' ve\b ', can match love in the VE and does not match very has ve

The \b is exactly the opposite of the \b above. I'm not going to give you an example.

... suddenly remembered ... you can go to http://www.phpv.net/article.php/251 and see other syntax that starts with \

Okay, let's do an application: How to build a pattern to match the input of the amount of money.

Build a matching pattern to check whether the information entered is a number that represents money. We think that there are four ways to represent money: "10000.00″ and" 10,000.00″, or not a decimal part, "10000″and" 10,000″. Now let's start building this matching pattern:

^[1-9][0-9]*$

This is the variable that must start with a number other than 0. But it also means that a single "0″" Cannot pass the test. Here's how to fix it:

^ (0| [1-9] [0-9]*) $

"Only 0 and numbers not starting with 0 match", we can also allow a minus sign before the number:

^ (0|-? [1-9] [0-9]*) $

This is: 0 or a number that starts with 0 and may have a minus sign in front of it. OK, now let's not be so rigorous, allow to start with 0. Now let's give up the minus sign because we don't need it when it comes to representing coins. We now specify the pattern to match the fractional part:

^[0-9]+ (\.[ 0-9]+)? $

This implies that the matched string must begin with at least one Arabic numeral. But note that in the above mode "10." is not a match, only "10″ and" 10.2″ can, do you know why?

^[0-9]+ (\.[ 0-9]{2})? $

We have to specify two decimal places after the decimal point. If you think this is too harsh, you can change it to:

^[0-9]+ (\.[ 0-9]{1,2})? $

This will allow one to two characters after the decimal point. Now we add a comma for readability (every three bits), so we can say:

^[0-9]{1,3} (, [0-9]{3}) * (\.[ 0-9]{1,2})? $

Don't forget that ' + ' can be replaced by ' * ' If you want to allow blank strings to be entered, do not forget that the backslash ' \ ' may have errors in the PHP string (very common error):

Now that we can confirm the string, we now take all the commas out of Str_replace (",", "", $money) and then treat the type as a double and we can do the math with him.

One more:

Constructs a regular expression to check e-mail

There are three sections in a full email address:

1. User name (everything on the left of ' @ ')
2. ' @ '
3. Server name (that is the rest of the section)

User name can contain uppercase and lowercase Arabic numerals, full period ('. ') Minus sign ('-') and underline ' _ '). The server name also conforms to this rule, except of course the underscore.

Now, the start and end of the user name cannot be a period, and so is the server. And you can't have two consecutive periods. There is at least one character between them, so let's take a look at how to write a matching pattern for the user name:

^[_a-za-z0-9-]+$

There is no time to allow the period to exist. We add it to:

^[_a-za-z0-9-]+ (\.[ _a-za-z0-9-]+) *$

The above means: start with at least one canonical character (except.) followed by 0 or more strings that begin with a dot.

To simplify, we can use EREGI () instead of Ereg (), eregi () to be insensitive to case, we don't need to specify two ranges "A-Z" and "A-Z" just need to specify one:

^[_a-z0-9-]+ (\.[ _a-z0-9-]+) *$

The following server name is the same, but to remove the underscore:

^[a-z0-9-]+ (\.[ a-z0-9-]+) *$

Good. Now just use "@" to connect the two parts:

^[_a-z0-9-]+ (\.[ _a-z0-9-]+) *@[a-z0-9-]+ (\.[ a-z0-9-]+) *$

This is the complete email authentication matching mode, only need to call:

Eregi ("^[_a-z0-9-]+ (\.[ _a-z0-9-]+) *@[a-z0-9-]+ (\.[ a-z0-9-]+) *$ ", $eamil)

You can get an email.

Other uses of regular expressions

Extracting a string

Ereg () and eregi () have a feature that allows a user to extract a portion of a string from a regular expression (you can read the manual for specific usage). For example, we want to extract the file name from Path/url, the following code is what you need:

Ereg ("([^\\/]*) $", $PATHORURL, $regs);
echo $regs [1];

High-level substitution

Ereg_replace () and Eregi_replace () are also very useful if we want to replace all the interval minus signs with commas:

Ereg_replace ("[\n\r\t]+", ",", Trim ($STR));

Finally, I put another string of check email regular expression to see the article you to analyze:

"^[-!#$%&\ ' *+\\./0-9=?" A-z^_ ' a-z{|} ~]+ '. ' @’.’ [-!#$%&\ ' *+\\/0-9=? A-z^_ ' a-z{|} ~]+\. '. ' [-!#$%&\ ' *+\\./0-9=? A-z^_ ' a-z{|} ~]+$ "

If it is easy to read, then the purpose of this article is achieved.

(go) A quick learning method for PHP regular expressions

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More