Linux regular expression-POSIX character class

Source: Internet
Author: User
Tags character classes alphanumeric characters

Linux regular expression-POSIX character class

POSIX standardizes the meanings of Regular Expression characters and operators. This standard defines two types of Regular Expressions: Basic Regular Expressions (BRE), grep and sed use this regular expression; extended regular expressions, egrep and awk use this regular expression.

To adapt to non-English environments, the POSIX standard enhances the function of matching character classes that are not in the English alphabet. For example, French is a letter character, but the typical character class [a-z] does not match it. This standard provides an additional letter sequence, which should be viewed as a single unit when matching and sorting string data.

POSIX also changes the frequently used terminology. What we call "character classes" is known as "Bracket expressions" in POSIX standards ". In a bracket expression, except for characters (such as ,! And so on. As follows:
? Character class. A posix character class consisting of [: And:] keywords. Keywords describe different character classes, such as text characters and control characters.
? Sort and conform. The collation conforms to a multi-Character Sequence. It indicates that these characters should be considered as a unit, which is composed of characters enclosed by [, and.
? Equivalence Class. Equivalence classes list character sets that should be considered as equivalent, such as e and e. It is composed of regionalized character elements (surrounded by [= and =.

All the three results must appear in square brackets of the brackets expression. For example, [[: alpha:]!] Match any single letter or exclamation point. [[. ch.] matches the arrangement element ch, but not only the letter c or letter h. In French, [[= e =] can match any e, e, or é. The following table lists the classes and their matching characters.

Brackets Description
[: Alnum:] Alphanumeric characters
[: Alpha:] Letter
[: Cntrl:] Control characters
[: Digit:] Numeric characters
[: Graph:] Non-blank characters (non-spaces, control characters, etc)
[: Lower:] Lowercase letters
[: Print:] Similar to [: graph:], but contains space characters
[: Punct:] Punctuation
[: Space:] All blank characters (line breaks, spaces, and tabs)
[: Upper:] Uppercase letters
[: Xdigit:] Hexadecimal numbers allowed (0-9a-fA-F)

 

When the vendor fully implements POSIX standards, these features gradually approach sed and awk's commercial version. GNU awk and GNU sed support character class symbols, but do not support the other two parentheses. You can check local system documents to see if they are available.

Because these features cannot be widely used, scripts on this site do not rely on them, and we will continue to use the term "character class" to represent the two-dimensional table in square brackets.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.