Regular Expression Learning Notes

Source: Internet
Author: User

Original address: https://swtch.com/~src/regexp/regexp1.html

Regular expression Matching Can is simple and Fast regular expressions

A regular expression is a token used to describe a set of strings. When a particular string is a combination described by a regular expression, we often say that the regular expression matches the string.

The simplest regular expression is a simple string of literal literals. Except for the special metacharacters *+? (), the strings will match themselves. To match the metacharacters, use the escape character \ To avoid it: \+ matches a literal plus sign.

Two regular expressions can be alternately or interconnected to form a new regular expression: for example, if E1 matches s, and E2 matches T, then e1|e2 matches s or T, and e1e2 matches St.

Metacharacters *,+ and? is a repeating operator, e1* matches 0 or more (possibly different) strings, each matching e1;e1+ matches one or more; E1? match 0 or one.

The precedence of the operator, from the weakest to the strongest, is the first exchange operator, followed by the associated operator, and finally the repetition operator. Only as an arithmetic expression, a definite parenthesis can be expressed in different meanings. For example, AB|CD and (AB) | (CD) is the same meaning, ab* and a (b*) are also equivalent.

The currently described syntax is a subset of the traditional UNIX egrep regular expressions. This subset is sufficient to describe all regular expressions, and roughly speaking, a regular language is a set of strings that can be easily matched to text using fixed-size memory. Newer regular expression engines (especially Perl and the like) have added many new operators and escape strings (escape sequences). These added features make regular expressions more concise and sometimes more cryptic (crytic), but usually not as powerful: these fancy new regular expressions almost always has longer equivalents Using the traditional syntax.

A common regular expression extension that does provide powerful functionality is a backtracking reference (backreferences). Backtracking refers to an expression similar to parentheses before \1,\2 matches, and (Cat|dog) \1 matches Catcat and Dogdog, but does not match catdog,dogcat. If you consider a theory, regular expressions that contain backtracking references are not regular expressions. Back-referencing regular expressions tend to cost a lot, considering the worst case scenario, and the best approach is to use an exponential-level search algorithm. It's like Perl. Perl and other languages do not currently have the ability to remove backtracking references, and of course, they can use a faster algorithm when they do not need a regular expression for backtracking. This article is about the faster algorithms.

Regular Expression Learning Notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.