Regular Expression Basics
function libraries for regular expressions:
1. PCRE, a regular expression function compatible with the Perl language.
2. POSIX.
"Grammar Rules"
A Atomic:
Atoms are the most basic constituent units of regular expressions, with at least one atom in each pattern. Atoms are made up of all printed and nonprinting characters that are not displayed as metacharacters.
1) ordinary characters as atoms
Includes uppercase lowercase letters, all numbers.
2) Some special words and metacharacters characters as atoms
Characters that have special meaning can be used/escaped.
3) Some nonprinting characters as atoms
nonprinting characters are formatting control symbols that are not displayed in a string.
Non-printable characters common in regular expressions
Atomic characters |
Meaning description |
\f |
Match a page break |
\ n |
Match a line break |
\ r |
Match a carriage return character |
\ t |
Match a tab |
\v |
Match a vertical tab |
Attention:
'/\n/'-whether a carriage return appears in the matching string in the Windows system.
'/\r\n/'-used in a Linux system to match a carriage return in a string.
4) Use universal character type as atomic
Common character types commonly found in regular expressions
Atomic characters |
Meaning description |
\d |
Match any decimal number equivalent to [0-9] |
\d |
Matches any number except a decimal number, equivalent to [^0-9] |
\s |
Matches any white space character, equivalent to [\f\n\r\t\v] |
\s |
Matches any character except whitespace characters, equivalent to [^\f\n\r\t\v] |
\w |
Match any number, letter or underscore equivalent to [0-9a-za-z] |
\w |
Matches any character except a number, letter, or underscore equivalent to [^0-9a-za-z] |
5) Custom Atomic table ([]) as an atom
You can use an atomic table to define a set of atoms that are equal to each other, such as:
'/[ja]sp/' – can match two types of ASP and JSP.
Two Metacharacters
The so-called meta-character is used to construct regular expressions with special meaning characters, such as: ' * ', ', ', ' + ', '? ' such as
Metacharacters can not appear alone, it is used to modify the atom, you can use \ to escape, so that it loses special meaning.
The meta-character of a regular expression
Metacharacters |
Meaning description |
* |
Match 0 times, 1 times or more before the atom |
+ |
Match 1 or more times before the atom |
? |
Match 0 or 1 times before the Atom |
. |
Match any character except line break |
| |
Match two or more branches |
N |
The atoms in front of them happen to happen N times |
{N,} |
The atoms in front of it appear at least n times |
{N,m} |
The atoms in front of them appear at least n times, up to M times |
^ or \a |
Where to start the match string |
$ or \z |
Match the end position of the input string |
\b |
Match the boundaries of a word |
\b |
Match a section other than the word boundary |
[] |
Matches any one of the atoms specified in the square brackets |
[^] |
Match any character except for the atoms in square brackets |
() |
Match the whole as an atom, or a pattern unit, that can be understood as a large atom of multiple individual atoms. |
1. Qualifiers
Qualifiers are used to specify how many times a given atom of a regular expression must appear to satisfy a match. Have "*", "+", "? "," {n} "," {N,} "," {n,m} "six qualifiers, the main difference between them is that the number of repetitions is different.
2. Border restrictions
Used to limit the bounds of a string or word to obtain a more accurate result of the match. The metacharacters "^ and $ refer to the beginning and end of the string, respectively, and" \b "is used to describe the front or back bounds of each word in the string.
3. Period
. Can match any one of the characters in the target, including non-printable characters.
4. Mode selector (|)
| Selectors have the lowest precedence and are used to separate multiple selection modes.
5. Mode Unit
Use the meta-character "()" to make a large atom of multiple atoms, as a separate unit.
Three Pattern modifier
Pattern correction Symbols:
Pattern correction Symbols |
Function description |
I |
Case insensitive when matching pattern |
X |
Whitespace in a pattern is ignored unless it is escaped |
Regular Expression Basics