Php-pcre Regular Expression character class (square brackets) and optional path (|)

Source: Internet
Author: User
Tags character classes alphanumeric characters

Character class (square brackets)

The opening parenthesis begins with a description of the character class and ends with square brackets. A separate right parenthesis does not have a special meaning. If a right parenthesis is required as a member of a character class, it can be written in the first character of the prompt (if the ^ is reversed, then the second) or the escape character is used.

A character class matches a single character in the target string, which must be one of the character sets defined in the character class, unless a ^ is used to reverse the character class. If ^ needs to be a member of a character class, make sure it is not the first character of the character class, or escape it.

For example, the character class [aeiou] matches all lowercase vowels, while [^aeiou] matches all non-vowel characters. Note: ^ is just a handy sign specifying those characters that do not exist in the character class by enumerating. Instead of asserting, it will still consume one character from the target string, and if the current match point is at the end of the target string, the match will fail.

When the case-insensitive match is set, any character class represents both two versions of the case, so for example, a case insensitive [AEIOU] matches both "a" and "a", and the case insensitive [^aeiou] does not match "a" at the same time.

NewLine characters do not have any special meaning in the character class, regardless of the pcre_dotall or Pcre_multiline options. A character class such as [^a] always matches a newline character.

In a character class, an underscore (minus-) can be used to specify a range of characters from one character to another. For example, [d-m] matches all characters from D to M, which are closed when this collection is set. If the underscore itself is to be described in a character class, it must be shifted or appear in a position that is not interpreted as a range, typically such as the start or end of a character class.

You cannot use the right bracket after a character range description. For example, a pattern [w-]46] is interpreted as a character class containing W and--followed by the string "46]", so it can match "W46" or "46]". However, if the bracket is escaped, it will be interpreted as the end of the range, so [w-\]46] is interpreted as a separate character class that contains all the characters in the W to] range and 4, 6. The brackets of the 8 binary or 16 binary description can also be used as the end point of the range.

Range operations are sorted in ASCII collation. They can be used to specify numeric values for characters, such as [\000-\037]. If you use a range that contains letters in case insensitive match mode, it also matches the case of it. For example [W-c] is equivalent to [][\^_ ' wxyzabc] When a case-insensitive match is used, and [\XC8-XCB] will match the accented e character in all modes when using a locale character set of "Fr" (France).

Character Classes \d, \d, \s, \s, \w, and \w can also appear in a character class to add their matching character classes to the new custom character class. For example, [\dabcdef] matches any valid 16 binary number. It is easy to make strict character classes with ^, such as [^\w_] matches any letter or number, but does not match the underscore.

All non-alphanumeric characters except \ 、-、 ^ (at the beginning) and end of the] are non-special characters in the character class and are not escaped without compromise. Pattern Terminators are always special characters in an expression and must be escaped.

Perl supports POSIX character class notation. This character class is closed with [: And:]. PCRE also supports these character classes, for example, [01[:alpha:]%] matches "0", "1", any letter or "%".

Optional Path (|)

The vertical bar character is used to detach the optional path in the pattern. Like pattern gilbert|. Sullivan matches "Gilbert" or "Sullivan". The vertical bar can have any number of occurrences in the pattern, and allows for an optional path that is empty (matches an empty string). The matching processing attempts each of the optional paths from left to right, and uses the first successfully matched one. If the optional path is in a subgroup (defined below), a successful match means that both the branch in the sub-pattern and the other part of the main mode are matched.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.