Objective:
Six months ago, I was interested in regular expressions, found a lot of information on the Internet, read a lot of tutorials, and finally in the use of a regular expression tool Regexbuddy to find his tutorial written very well, can be said that I have seen the best regular expression tutorial. So I always wanted to translate him. This wish was not realized until the 51 long vacation, and the result was this article. On the name of this article, the use of "simple" seems to have been too vulgar. But after reading through the original text, feel that only with "simple" can accurately express the experience of the tutorial to me, so it can not be exception.
This article is a regexbuddy written by Goyvaerts for the translation of the tutorial, copyright belongs to the original author all, welcomed the reprint. But in order to respect the work of the original author and translator, please specify the source! Thank you!
1. What is a regular expression
Basically, a regular expression is a pattern used to describe a certain amount of text. The regex represents regular Express. This article will use <<regex>> to represent a specific regular expression.
A piece of text is the most basic pattern, simple to match the same text.
2. Different regular expression engines
The regular expression engine is a software that can handle regular expressions. Typically, the engine is part of a larger application. In the software world, different regular expressions are not compatible with each other. This tutorial focuses on the Perl 5-type engine, which is the most widely used engine. We will also mention some differences from other engines. Many modern engines are similar, but not exactly the same. For example. NET regular library, JDK regular package.
3. Text symbols
The most basic regular expression consists of a single literal symbol. such as <<a>>, which matches the first occurrence of the character "a" in the string. such as the string "Jack is a boy". "A" after "J" will be matched. And the second "a" will not be matched.
The regular expression can also match the second "a", which must be you telling the regular expression engine to start the search from the first match. In a text editor, you can use "Find Next". In a programming language, there is a function that allows you to continue searching backwards in the same position as the previous match.
Similar,<<cat>> will match "cat" in "About cats and dogs". This is tantamount to telling the regular expression engine to find a <<c>>, followed by a <<a>>, and a <<t>>.
Note that the regular expression engine defaults to case sensitive. <<cat>> does not match "cat" unless you tell the engine to ignore the case.
• Special characters
For literal characters, 11 characters are reserved for special purposes. They are:
[ ] \ ^ $ .| ? * + ( )
These special characters are also called Meta characters.
If you want these characters to be used as text characters in regular expressions, you need to use the backslash "\" To change the Code (escape). For example you want to match "1+1=2", the correct expression is <<1\+1=2>>
It should be noted that,<<1+1=2>> is also a valid regular expression. However, it does not match "1+1=2" but matches "111=2" in "123+111=234". Because "+" here represents a special meaning (repeat 1 times to many times).
In programming languages, note that some special characters are processed by the compiler before being passed to the regular engine. So regular expression <<1\+2=2>> in C + + should be written "1\\+1=2". In order to match "C:\temp", you need to use regular expression <<C:\\temp>>. In C + +, the regular expression becomes "c:\\\\temp".
• Non-display characters
You can use a special sequence of characters to represent some characters that are not displayed:
<<\t>> representative Tab (0X09)
<<\r>> on behalf of carriage return (0x0D)
<<\n>> stands for line breaks (0x0A)
Note that the Windows Chinese document uses "\ r \ n" to end a line and UNIX uses "\ n".