Php-PCRE regular expression Character class (square brackets) and optional path (|)

Source: Internet
Author: User
Tags character classes alphanumeric characters
PHP extension text processing -- PCRE regular expression syntax 6 -- Character class (square brackets) and optional path (|)

Character class (square brackets)

The left square brackets start the description of a character class and end with square brackets. A separate right brace has no special meaning. If a Right square bracket needs to be a member of a character class, you can write it at the first character of the character class (if ^ is used, it is the second one) or use an escape character.

A character class matches a single character in the target string. this character must be one of the character sets defined in the character class, unless ^ is used to reverse the character class. If ^ needs to be a member of a character class, make sure it is not the first character of the class, or escape it.

For example, the character class [aeiou] matches all lowercase vowels, while the [^ aeiou] matches all non-vowel characters. Note: ^ is only a convenient symbol for specifying characters that do not exist in the character class through enumeration. Instead of assertion, it will still consume one character from the target string, and if the current match point is at the end of the target string, the match will fail.

When case-insensitive matching is set, any character classes are both case-insensitive versions, A [aeiou] that is case insensitive matches both "a" and "A", and the [^ aeiou] that is case insensitive does not match "".

Line breaks have no special meaning in character classes and are irrelevant to the PCRE_DOTALL or PCRE_MULTILINE options. A character class, such as [^ a], always matches line breaks.

In a character class, a hyphen (minus sign-) can be used to specify the range from one character to another. For example, if [d-m] matches all characters between d and m, this set is closed. If the hyphen itself needs to be described in a character class, it must be transferred or appear in a location that is not interpreted as a range, such as the start or end position of the character class.

Right brackets cannot be used after a character range description. For example, a pattern [W-] 46] is interpreted as a character class containing W and-, followed by the string "46]", therefore, it can match "W46]" or "-46]". However, if brackets are escaped, they are interpreted as the end of the range, therefore, [W-\] 46] is interpreted as a separate character class containing all characters in the range of W to] and 4 and 6. The brackets described in octal or hexadecimal format can also be used as the end point of the range.

Range Operations are sorted in ASCII order. They can be used to specify numbers for characters, such as [\ 000-\ 037]. If a range containing letters is used in case-insensitive match mode, the format matches the uppercase and lowercase letters at the same time. For example, [W-c] is equivalent to [] [\ ^ _ 'wxyzabc] in case-insensitive matching, and if "fr" (France) is used) when the locale table is set for, [\ xc8-xcb] matches the accent E character in all modes.

Character classes \ d, \ D, \ s, \ S, \ w, and \ W can also appear in a character class, it is used to add the matched character classes to the new custom character classes. For example, [\ dABCDEF] matches any valid hexadecimal number. You can use ^ to easily define strict character classes. for example, [^ \ W _] matches any letter or number but does not match the underline.

All non-alphanumeric characters except \,-, ^ (at the starting position), and ending] are non-special characters in the character class, and no escape is harmful. The pattern Terminator is always a special character in the expression and must be escaped.

Perl supports POSIX characters. This character class is closed using [: And. PCRE also supports these character classes. for example, [01 [: alpha:] %] matches "0", "1", any letter, or "% ".

Optional path (|)

The vertical line character is used as an optional path in the separation mode. For example, the mode gilbert | Sullivan matches "gilbert" or "sullivan ". The vertical bars can appear in any number of modes and allow available optional paths (matching empty strings ). Each optional path is tried from left to right for matching processing, and the first matching is successful. If the available path is in the sub-group (defined below), "successful match" means that the branches in the sub-mode and other parts in the main mode are matched at the same time.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.