Regular Expression-character range
A hyphen (-) is used to specify a character range. For example, the range of all uppercase English characters can be specified:
A-Z
The range of a number can be specified:
[0-9]
This character class helps solve the issue of matching document references. See the following regular expression:
[CC] hapter [1-9]
It matches the string "chapter" or "Chapter" with spaces following it, and then any single number from 1 to 9. Each row below matches this pattern:
You will find the information in chapter 9
And chapter 12.
Chapter 4 contains a summary at the end.
Based on this task, the second line in this example can be considered as a false alarm. You can add spaces after "[1-9]" to avoid matching two numbers. You can also specify character classes that do not match at that position, as we will see in the next section. You can specify multiple ranges at the same time or use them together:
[0-9a-z ?,.; : '"]
This expression matches "any single character, which can be a number, lowercase letter, question mark, comma, or sentence.
Point, semicolon, colon, single quotes, and quotation marks ". Remember that each character class matches a single character. If
You can specify multiple classes to describe multiple consecutive characters, for example:
[A-zA-Z] [.?!]
This expression matches "any lowercase or upper-case letter followed by a period, question mark, or exclamation point ".
If the parentheses (]) appear as the first character in the class (or the first character after the escape character, see the next section ), it is interpreted as a member of the class. If a hyphen is the first or last character in a class, its special meaning is lost. Therefore, to match arithmetic operators, we place the hyphen (-) in the following example first:
[-+ */]
In awk, you can also use backslash to escape the characters or closed square brackets in the range, but the syntax is more messy.
It is interesting to try to match a date with a regular expression. There are two possible formats:
MM-DD-YY
MM/DD/YY
The following regular expression indicates the possible Numerical range of each character position:
[0-1] [0-9] [-/] [0-3] [0-9] [-/] [0-9] [0-9]
"-" Or "/" may all be delimiters. Place a hyphen at the first position to ensure that it is interpreted as a literal in the character class, that is, a hyphen rather than a range.