Original address: https://swtch.com/~src/regexp/regexp1.html
Regular expression Matching Can is simple and Fast regular expressions
A regular expression is a token used to describe a set of strings. When a particular string is a combination described by a regular expression, we often say that the regular expression matches the string.
The simplest regular expression is a simple string of literal literals. Except for the special metacharacters *+? (), the strings will match themselves. To match the metacharacters, use the escape character \ To avoid it: \+ matches a literal plus sign.
Two regular expressions can be alternately or interconnected to form a new regular expression: for example, if E1 matches s, and E2 matches T, then e1|e2 matches s or T, and e1e2 matches St.
Metacharacters *,+ and? is a repeating operator, e1* matches 0 or more (possibly different) strings, each matching e1;e1+ matches one or more; E1? match 0 or one.
The precedence of the operator, from the weakest to the strongest, is the first exchange operator, followed by the associated operator, and finally the repetition operator. Only as an arithmetic expression, a definite parenthesis can be expressed in different meanings. For example, AB|CD and (AB) | (CD) is the same meaning, ab* and a (b*) are also equivalent.
The currently described syntax is a subset of the traditional UNIX egrep regular expressions. This subset is sufficient to describe all regular expressions, and roughly speaking, a regular language is a set of strings that can be easily matched to text using fixed-size memory. Newer regular expression engines (especially Perl and the like) have added many new operators and escape strings (escape sequences). These added features make regular expressions more concise and sometimes more cryptic (crytic), but usually not as powerful: these fancy new regular expressions almost always has longer equivalents Using the traditional syntax.
A common regular expression extension that does provide powerful functionality is a backtracking reference (backreferences). Backtracking refers to an expression similar to parentheses before \1,\2 matches, and (Cat|dog) \1 matches Catcat and Dogdog, but does not match catdog,dogcat. If you consider a theory, regular expressions that contain backtracking references are not regular expressions. Back-referencing regular expressions tend to cost a lot, considering the worst case scenario, and the best approach is to use an exponential-level search algorithm. It's like Perl. Perl and other languages do not currently have the ability to remove backtracking references, and of course, they can use a faster algorithm when they do not need a regular expression for backtracking. This article is about the faster algorithms.
Regular Expression Learning Notes