POSIX regular Expressions for PHP learning notes

Source: Internet
Author: User
Tags alphabetic character posix

1 Basic knowledge

Regular expressions are a way of describing a text pattern. So far, the exact (literal) match we used before is also a regular expression. For example, we have previously searched for terms for regular expressions like "shop" and "delivery".

In PHP, a matching regular expression is a bit more like a strstr () match than an equality comparison, because it is somewhere in a string (or anywhere in the string if it is not indicated) to match another string. For example, the string "shop" matches the regular expression "shop". It can also match the regular expression "H", "Ho", and so on.

In addition to precisely matching characters, you can specify the Wengi (meta-meaning) of an expression with special characters. For example, with special characters, you can specify a pattern that must exist at the beginning or end of a string, a part of the pattern may be duplicated, or a character in a pattern belongs to a particular type. In addition, they can be matched by the appearance of special characters. Next, we'll talk about these changes one by one.

2 Character sets and classes

Using a character set gives you a quick and powerful regular expression than the exact match feature. A character set can be used to match any character belonging to a particular type; they are, in fact, a wildcard.

First, you can use a character as a wildcard to replace any character except a newline (\ n). For example, a regular expression:

. at

Can be matched with "cat", "sat" and "Mat" and so on. Typically, this wildcard match is used for file name matching in the operating system.

However, using regular expressions, you can more specifically indicate the type of character you want to match, and you can indicate a collection to which the character belongs. In the previous example, the regular expression matches "cat" and "mat", but it can also match "#at". If you want to qualify a character from A to Z, you can specify it as follows:

[A-z]at

Any content enclosed in square brackets ([]) is a character class-a character set to which a matched character belongs. Note that the expression in square brackets matches only one character.

We can list a collection, for example:

[Aeiou]

Can be used to denote a vowel child.

You can also describe a range, as preceded by a hyphen, or it can be a range set:

[A-za-z]

This range set represents any uppercase and lowercase letters.

In addition, you can use a collection to indicate that a character does not belong to a set. For example:

[^a-z]

Can be used to match any character that is not between A and Z. Indicates no when the caret (^) is included in square brackets. When the symbol is used outside the square brackets, it means another meaning, which we'll cover in more detail later.

3 Repetition

Typically, the reader will want to indicate that a string or character class will appear more than once. You can use two special characters in a regular expression instead. The symbol "*" indicates that the pattern can be repeated 0 or more times, and the symbol "+" indicates that the pattern can be repeated 1 or more times. These two symbols should be placed after the expression to be useful.

For example:

[[: alnum:]]+

Indicates "at least one alphabetic character."

4 Sub-expressions

In general, it is useful to separate an expression into several sub-expressions, for example, you can represent "at least one of these strings requires an exact match." You can do this with parentheses, just as you would in a mathematical expression.

For example:

(very) *large

Can match "large", "very large", "very very large" and so on.

5 sub-expression count

You can use a numeric expression in curly braces ({}) to specify the number of times the content is allowed to repeat. You can specify an exact number of repetitions ({3} means repeating 3 times), or a range of repetitions ({2,4} means repeating one or two times), or a repeating range of an open field ({2,} representing at least two repetitions).

For example:

(very) {1,3}

Indicates matching "very", "very very", and "very very very".

6 positioning to the beginning or end of a string

The [a-z] pattern matches any string that contains lowercase alphabetic characters. Regardless of whether the string has only one character, or if it contains only one matching character throughout the longer string, there is no relationship.

You can also determine whether a particular subexpression appears at the beginning, at the end, or in two locations. It is useful when you want to determine that there are only words in the string that you want to find and no other words appear.

The caret (^) is used for the beginning of the regular expression, indicating that the substring must appear at the beginning of the searched string, with the character "$" used at the end of the regular expression, indicating that the substring must appear at the end of the string.

For example, here's how to match Bob at the beginning of the string:

^bob

This pattern will match the string where COM appears at the end of the string:

com$

Finally, this pattern will match a string that contains only one character from A to Z:

^[a-z]$

7 Branches

You can use a vertical bar in a regular expression to represent a selection. For example, if you want to match com, edu, or NET, you can use an expression like this:

Com|edu|net

8 Matching special characters

If you want to match the special characters mentioned earlier in this section, for example,., {or $, you must precede them with a backslash (\). If you want to match a backslash, you must represent it with two backslashes (\ \).

In PHP, the regular expression pattern must be included in a single quoted string. Regular expressions that are referenced with double quotes bring some unnecessary complexity. PHP also uses backslashes to escape special characters-for example, backslashes.

If you want to match a backslash in the pattern, you must use two backslashes to indicate that it is a backslash character instead of an escape character.

Similarly, for the same reason, if you want to use a backslash character in a double-quoted PHP string, you must use two backslashes. This can be confusing, and the result will be a PHP string that represents a regular expression that contains a backslash character, which requires 4 backslashes. The PHP interpreter interprets these 4 backslashes as 2. Then, the regular expression interpreter resolves to one.

The $ symbol is also a special character of the PHP string and regular expression quoted in double quotation marks. To enable a $ character to match in a pattern, you must use "\\\$". Because the string is referenced in double quotes, the PHP interpreter resolves it to \$, and the regular expression interpreter resolves it to a $ character.

9 Applying in smart forms

In a smart form application, there are at least two uses for regular expressions. The first use is to look for specific nouns in customer feedback. Using regular expressions, you can do it more intelligently. Using a string function, if you want to match "shop", "Customer service", or "retail", you have to do 3 different searches. If you use a regular expression, you can match all 3 at the same time, as follows:

Shop|customer Service|retail

The second use is to verify the user's e-mail address in the program, which needs to encode the standard format of the e-mail address by using regular expressions. This format contains some numbers or punctuation marks, followed by the symbol "@", followed by a string consisting of a literal or a number and a character, followed by a "." (dot), followed by a literal or numeric string consisting of hyphens, and there may be more dot numbers until the end of the string, which is encoded as follows:

^[a-za-z0-9_\-.] [Email protected] [A-za-z0-9\-]+\. [a-za-z0-9\-.] +$

Sub-expression ^[a-za-z0-9_\-.] + means "at least one letter, number, draw line, hyphen, dot, or combination of these characters as the starting string". Note that when you use a point number at the beginning or end of a character class, the point number loses its special wildcard meaning and can only be a dot character.

The symbol "@" matches the character "@".

The subexpression [a-za-z0-9\-]+] matches the host name that contains the literal numeric characters and hyphens. Notice that we removed the hyphen, because it is a special character inside the square brackets.

Character combination "\." Match "." Character. We use a dot outside the character class and must be escaped so that it matches a dot character.

Sub-expression [a-za-z0-9\-\.] +$ matches the remainder of the domain name, which contains letters, numbers, and hyphens, and can contain more dot numbers until the end of the string if needed.

It is not hard to see that sometimes an invalid e-mail address will conform to this regular expression. It is almost impossible to find all invalid e-mails, but after analysis, the situation will improve. This expression can be refined in many different ways. For example, you can list all valid top-level domains (TLDs). When restricting certain objects, be careful, because a check function that may repel 1% of valid data is more troublesome than a check function that allows 10% of invalid data to appear.

POSIX regular Expressions for PHP learning notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.