Regular expressions provide a powerful way to work with text. With regular expressions, you can perform complex validation of user input, parse user input and file content, and reformat strings. PHP provides users with a simple way to use POSIX and PCRE regular expressions. This tutorial discusses the differences between POSIX and PCRE, and describes how to use regular expressions and PHP V5.
Before you begin
Learn what you can learn from this tutorial and how to make better use of this tutorial.
About this tutorial
Regular expressions provide a powerful way to work with text. With regular expressions, you can perform complex validation of user input, parse user input and text content, and reformat strings.
Goal
This tutorial will focus on a simple way to use POSIX and PCRE regular expressions, giving you a good command of PHP regular expressions. We'll explore the differences between POSIX and PCRE, and we'll show you how to use regular expressions and PHP V5. By studying this tutorial, you will learn about the methods, timing, and justification for using regular expressions.
System Requirements
You can complete the cost tutorial on any Microsoft Windows or Unix-like system that has a PHP-installed class, including Mac OS X and Linux. Since we're all about PHP's built-in plug-in, you just need to install PHP in your system, without installing additional software.
Begin
What is a regular expression?
A few years ago, I made some interesting checks on the input boxes of Web forms. The user will enter a phone number in this form. The phone number is then printed in the user's ad as typed by the user. As required, the U.S. phone number can be entered in several ways: either (555) 555-5555, or 555-555-5555, but cannot accept 555-5555 in this form.
You might wonder why we don't throw away all the non-numeric characters and only guarantee that the total number of characters remaining is 10? This approach is feasible, but does not prevent users from entering content such as!555?333-3333.
In the eyes of a Web developer, this situation poses an interesting challenge. I can write routines to examine various formats, but I would like to find a solution that would be flexible if the user subsequently endorses a format such as 555.555.5555.
This is the appropriate scenario for regular expressions (referred to as regex). I've cut and pasted them into the application before, but I've never found any hard-to-understand grammatical problems. The Regex looks very much like a mathematical expression. When you see an expression like 2x2=4 , you usually think of "2 times 2 equals 4". The regular expression is very similar to it. After reading this article, when you see one of these regular expression ^b$ , you tell yourself: "The beginning of a line is B, followed by the end of the line." Not only that, you'll also realize how easy it is to use regular expressions in PHP.
Time to use Regex
When there are rules to follow, you should use the Regex to complete the search and replace operations, but you do not have to have the exact characters you need to find or replace. For example, in the example of the phone number mentioned above, the user defines a rule that indicates the format of the phone number entered, but does not define the number contained in the phone number. The same applies to scenarios with large numbers of user input. The U.S. state abbreviation can be limited to two uppercase letters from a to Z. Regular expressions are also used here, and you can simply limit the text or user input in the form to letters in the alphabet, regardless of case and length.
The timing of inappropriate use of regex
Regular expressions are powerful, but there are some drawbacks. One of these is the need for skills related to reading and writing expressions. If you decide to include a regular expression in your application, you should make a full comment on it. Then, if someone else needs to change the expression, you can make the change without interrupting the functionality. Also, if you are not familiar with using regular expressions, you may find them difficult to debug.
To avoid these challenges, do not use regular expressions when the simpler built-in features are sufficient to solve the problem well.
POSIX and PCRE
PHP supports two types of regular expression implementations: Portable Operating System Implementation (POSIX) and perl-compatible Regular expression (PCRE). These two implementations provide different features, but they are as simple to use in PHP. The regex style you use depends on your past experience and usage in the use of regex. There is some evidence that PCRE expressions are slightly faster than POSIX expressions, but in most applications this difference is not so obvious.
In the example in this article, the syntax for each Regex method is included in the comment. In the function syntax, the regex is a regex
parameter and the string searched for is string
. The parameters in parentheses are optional, and because this tutorial focuses on the basics, it does not give an introduction to all optional parameters.
Regular expression syntax
Although POSIX and PCRE implementations differ in their support for certain features and character classes, their syntax is the same. Each regular expression is composed of one or more characters, special characters (sometimes called metacharacters ), character classes, and character groups.
POSIX and PCRE Use the same wildcard character--the wildcard character in a regex to denote "anything here." The wildcard character is a full period or dot ( .
). To find an English period or point, use the escape character /
: /.
. The same is true of other special characters discussed later, such as line anchor and qualifiers. If a character regular has a special meaning in the expression, it must be escaped to express its original literal meaning.
A row anchor is a special meta character that matches the beginning and end of a line, but does not capture any text (see table 1). For example, if a line starts with a letter, a
^a
the row anchor in the expression does not capture the letter a
, but it matches the beginning of the line.
Table 1. Line anchor
Anchor |
Description |
^ |
Match the beginning of a line |
$ |
Matches the end of a line |
The qualifier is applied to the expression immediately preceding it (see table 2). With qualifiers, you can specify the number of times to find an expression in a single search. For example, an expression a+
will find letters one or more times a
.
Table 2. Qualifier
Qualifier |
Description |
? |
The expression before the qualifier can be found 0 or 1 times |
+ |
The expression before the qualifier can be found to 1 or more times |
* |
The expression before the qualifier can be found any time (including 0 times) |
{n} |
The expression before the qualifier can be found only to n times |
{n,m} |
The expression before the qualifier can be found between n times to M times |
In a regex, capturing text and referencing it in substitution and search operations is a very useful feature (see table 3). By using the Capture feature, you can perform a search to find duplicate words and closed HTML and XML tags. If you use the capture feature when replacing, you can place the retrieved text inside the replacement string. An example of how to replace an e-mail address with a hyperlink is shown later.
Table 3. Grouping and capturing
character class |
Description |
() |
grouping characters, and being able to capture text |
POSIX character class
POSIX Regular Expressions Follow some of the criteria that make them available for many Regex implementations (see table 4). For example, if you are writing a POSIX regular expression, you can use it in PHP, use it with grep
commands, or use it with many editors that support regular expressions.
Table 4. POSIX character class
character |
Description |
[:alpha:] |
Match characters that contain letters and numbers |
[:digit:] |
Match any number |
[:space:] |
Match any blank |
POSIX match
There are two functions that use POSIX regular expressions to search for strings, namely, ereg()
and eregi()
.
Ereg ()
ereg()
method to search for a string for a specific regular expression. If no match is found, 0 is returned, so you can give the following test:
Listing 1. Ereg () method
<?php$phonenbr= "555-555-5555";//Syntax is Ereg (regex, String [, Out_captures_array]) if (Ereg ("[-:d igit:]]{12}", $ PHONENBR) { print ("Found match!/n");} else { print ("No match found!/n");}? > |
A regular expression [-[:digit:]]{12}
finds 12 characters that are numeric or hyphens. This is a bit sketchy in terms of handling phone numbers, and you can rewrite them in this form: ^[0-9]{3}-[0-9]{3}-[0-9]{4}$
. (In the Regex, [0-9]
and [:digit:]
actually exactly the same, you may prefer to use [0-9]
the form because it is shorter.) This expression as an alternative is clearly more accurate. It looks at the beginning of the line ( ^
), followed by a set of 3 numbers ( [0-9]{3}
), a hyphen ( -
), a different set of 3 numbers, another hyphen, a set of 4 numbers, and then the end of the line ( $
). When you write an expression by hand, this gives you an idea of how complex a regular expression is to handle the problem, and thus helps you predict the type of data that is searched or replaced with an expression.
Eregi ()
eregi()
method is similar ereg()
, except that it is not sensitive to capitalization. It will return an integer containing the length of the match found, but you will most likely use it in a conditional statement as follows:
Listing 2. Eregi () method
<?php$str= "Hello world!"; /Syntax is Ereg (regex, String [, Out_captures_array]) if (eregi ("Hello", $str)) { print ("Found match!/n");} else {
print ("No match found!/n");}? > |
When this example is executed, Found match!
it is output because Hellois found in a search that ignores the case. If you are using ereg
, the search will fail.
POSIX replacement
ereg_replace()
And eregi_replace()
These two methods are used to replace in text, with the characteristics of POSIX regular expressions.
Ereg_replace ()
You can use ereg_replace()
methods to make case-sensitive substitutions with POSIX regular expression syntax. The following example describes how to replace an e-mail address within a string with a hyperlink:
Listing 3. Ereg_replace () method
<?PHP$ORIGSTR = "My e-mail address is:first.last@example.com";//Syntax Is:ereg_replace (Regex, Replacestr, String) $ Newstr =/ereg_replace ("([. [: alpha:][:d igit:]]+@[.[:alpha:][:d igit:]]+)", "<a href=/" mailto://1/">//1 </a> ", $origstr);p rint (" $NEWSTR/n ");? > |
This is an incomplete version of the regular expression used to match the e-mail address, but it shows how powerful it is str_replace()
compared to other normal substitution functions ereg_replace()
. When you use regular expressions, you can define the rules for the search, rather than searching for literal characters.
Eregi_replace ()
In addition to ignoring case, the eregi_replace()
function ereg_replace()
is exactly the same as:
Listing 4. Eregi_replace () function
<?php$origstr = "1 BANANA, 2 BANANA, 3 BANANA";//Syntax Is:eregi_replace (Regex, Replacestr, string) $newstr = Eregi_ Replace ("banana", "pear", $origstr);p rint ("New string is: ' $newstr '/n");? > |
This example will
banana
Replaced by
pear
, the replace operation ignores the casing.
PCRE character class
Because the PCRE syntax supports shorter character classes and more features, it is more powerful than POSIX syntax. Table 5 lists some of the character classes that are supported in PCRE and not in POSIX expressions.
Table 5. PCRE character class
character class |
Description |
/b |
Word boundaries, finding the beginning and end of a word |
/d |
Match any number |
/s |
Match any white space, such as tab or space |
/t |
Match a tab character |
/w |
Match characters that contain letters and numbers |
PCRE Matching
The PCRE matching function in PHP is similar to the POSIX matching function, but if you are accustomed to using POSIX expressions, one of the features of the PCRE matching function may make you feel awkward: The PCRE function requires that the expression begin and end with a delimiter. In the vast majority of examples, delimiters are one and /
can be seen at the beginning and end of expressions within quotation marks. It is important to keep in mind that this delimiter is not part of an expression.
After the last delimiter in PCRE, you can add a modifier to correct the behavior of the expression. For example, i
modifiers make regular expressions insensitive to case. This is an important difference from the POSIX approach, and in POSIX you need to invoke different methods according to the need for case sensitivity.
Preg_grep ()
preg_grep()
method returns an array that contains all the items of another array in which the match was found through a regular expression. This method is useful if you have a large set of values and want to search for matches. Here is an example:
Listing 5. Preg_grep () method
<?php$array = Array ("1", "3", "ABC", "XYZ", "n");//Syntax is Preg_grep (regex, Inputarray); $grep _array = Preg_grep ("/^/d+$/", $array);p Rint_r ($grep _array);? > |
In this example, the regular expression ^/d+$
finds all the elements of the ^
$
array that contain one or more numbers () between the beginning () and the end () of the line /d+
.
Preg_match ()
preg_match()
The function uses PCRE to find a match in a string, which requires two parameters: Regex and string. You can choose to provide an array that will be populated by the matches, a flag that allows you to modify the behavior of the match operation, and a position in the string where the match is to start ( offset
). Examples are as follows:
Listing 6. Offset method
<?php$string = "ABCDEFGH"; $regex = "/^[a-z]+$/i";//Syntax is Preg_match (regex, String, [, out_matches [, flags [, offs ET]]), if (Preg_match ($regex, $string)) { printf ("Pattern '%s ' found in string '%s '/n", $regex, $string);} else { printf ("No match found in string '%s '!/n", $string);}? > |
In this example, a regular expression is used ^[a-z]+$
, and the search between the start ( ^
) and end () of a line $
searches for one or more occurrences of ( [a-z]+
), a
z
any letter from.
Preg_match_all ()
preg_match_all()
The function constructs an array for all occurrences found in the string. The following example constructs an array that contains all the words in a sentence:
Listing 7. Preg_match_all () function
<?php$string = "The Quick red fox jumped over the lazy brown Dog"; $re = "//b/w+/b/";//Syntax is Preg_match_all (regex, String, Return_array [, flags [, offset]]) Preg_match_all ($re, $string, $arrayout);p Rint_r ($arrayout);? > |
Regular Expressions /b/w+/b
/b
find the ( /w+
) Word characters that can be found one or more times between word boundaries (). Each word is placed into $arrayout
an array element of the output array.
PCRE replacement
PCRE substitutions in PHP are similar to POSIX replacements, except that they are used preg_replace()
instead of ereg_replace()
eregi_replace()
.
Preg_replace ()
preg_replace()
The function is replaced with PCRE. It requires several parameters: regular expressions, substitution expressions, and raw strings. You can also choose to provide the maximum number of replacements you want, as well as the variables that are populated with the number of replacements you have completed. Examples are as follows:
Listing 8. Preg_replace () function
<?php$orig_string = "5555555555";p rintf ("Original string is '%s '/n", $orig _string); $re = "/^ (/d{3}) (/d{3}) (/d{4}) $/ ";//Syntax is preg_replace (regex, replacement, string/[, limit [, Out_count]]); $new _string = Preg_replace ($re," (//1) 2-//3 ", $orig _string);p rintf (" New string is '%s '/n ", $new _string);? > |
This example shows a quick demonstration of capturing some text and using a reverse reference method, such as //1
. These reverse references are inserted into any text that matches within the parentheses, in this case, the //1
1th group (/d{3})
.
In the example, you can use to separate the substr
phone numbers and only make small changes to the strings, which substr
can be more difficult to rely on to reliably capture the correct text.
If the string is in the form of a (555)5555555
, you can modify the expression to ^(?(/d{3}))?(/d{3})(/d{4})$
find any parentheses.
Conclusion
PHP provides two types of syntax for regular expressions: POSIX and PCRE. This tutorial provides a high-level overview of the main functions in PHP that support POSIX and PCRE regular expressions.
Using regular expressions, you can define rules for more powerful search and replace operations-far beyond the search and substitution of text.
Resources
Learn
You can refer to the original English text on the DeveloperWorks global site in this article.
Regular-expressions.info provides relevant information about regular expressions.
php:regular Expression Functions (perl-compatible)-Manual is a PHP online document that covers PCRE-related content.
Regular Expression Functions (POSIX Extended) is a PHP online document about POSIX.
Access PHP Project resources in DeveloperWorks to get more information about PHP.
For DeveloperWorks tutorials on learning to use PHP programming, see " Learning PHP, part 1th ," 2nd and 3rd .
Learn about the latest DeveloperWorks technical activities and webcast.
Visit the DeveloperWorks Open source Zone and get a wealth of how-to information, tools, and project updates to help you develop with open source technology and use it with IBM products.
Access to products and technologies
Download The latest version of PHPvia php.net .
The Regular expression Library has a large repository of regular expressions.
Order a free SEK for Linux, two DVDs that contain the latest IBM trial software on the Linux platform, including DB2, Lotus, Rational, Tivoli, and WebSphere.
Innovate your next open source project with IBM trial software , either by downloading it or getting it from a DVD.
Discuss
About the author
/td> |
|
|
Nathan A. Good is an author, software engineer, and system administrator of the Minnesota Twin Cities. His books include the PHP 5 recipes:a problem-solution approach (Apress Press, 2005), co-authored with Lee Babin and others, Regular Expression Re Cipes for Windows developers:a problem-solution approach (Apress publishing House, 2005), Regular expressions:a Problem-solu tion approach (Apress Press, 2005) and Kapil Sharma and others co-authored Professional Red Hat Enterprise Linux 3 (Wrox publishing House, 2004 Years). |