Objective
Regular expression is cumbersome, but powerful, after the application of learning will let you in addition to improve efficiency, will give you a sense of absolute achievement. As long as the careful reading of these materials, coupled with the application of a certain reference, master regular expression is not a problem.
1. Intro
At present, regular expressions have been widely used in many software applications, including *nix (Linux, UNIX, etc.), HP and other operating systems, Php,c#,java and other development environments, as well as many applications, can see the shadow of regular expression.
The use of regular expressions can be a simple way to achieve powerful features. In order to be simple and effective and powerful, resulting in regular expression code is difficult to learn, it is not very easy, so it takes some effort to do, after the introduction of reference to certain references, the use of it is relatively simple and effective.
Example: ^.+@.+\\. +$
Such code has repeatedly scared me out of myself. Maybe a lot of people are scared away by such code. Continue reading this article will let you also have the freedom to apply such code.
Note: The 7th and previous sections of this section seem to be a bit repetitive, with the aim of repeating the parts in the previous table to make them easier to understand.
2. History of regular Expressions
The "ancestors" of regular expressions can be traced back to early studies of how the human nervous system works. Warren McCulloch and Walter Pitts, two neuroscientists, have developed a mathematical way of describing these neural networks.
In 1956, a mathematician named Stephen Kleene, based on the early work of McCulloch and Pitts, published a paper entitled "Representation of neural network events", introducing the concept of regular expressions. A regular expression is an expression that describes what he calls the algebra of a regular set, so the term "regular expression" is used.
Subsequently, it was found that this work could be applied to some early studies using Ken Thompson's computational Search algorithm, and Ken Thompson was the main inventor of Unix. The first practical application of regular expressions is the QED editor in Unix.
As they say, the rest is a well-known history. From then until now regular expressions are an important part of text-based editors and search tools.
3. Definition of regular expression
A regular expression (regular expression) describes a pattern of string matching that can be used to check whether a string contains a seed string, replaces a matching substring, or extracts a substring from a string that matches a condition.
Column directory, the *.txt in dir *.txt or LS *.txt is not a regular expression, because the meaning of this * is different from that of the regular type.
A regular expression is a literal pattern consisting of ordinary characters, such as characters A through z, and special characters, called metacharacters. A regular expression is used as a template to match a character pattern with the string being searched for.
3.1 Ordinary characters
Consists of all print and nonprinting characters that are not explicitly specified as metacharacters. This includes all uppercase and lowercase alphabetic characters, all numbers, all punctuation marks, and some symbols.
3.2 Non-printed character characters meaning
\CX matches the control characters indicated by X. For example, \cm matches a control-m or carriage return character. The value of x must be one-a-Z or a-Z. Otherwise, c is treated as a literal ' C ' character.
\f matches a page feed character. Equivalent to \x0c and \CL.
\ n matches a newline character. Equivalent to \x0a and \CJ.
\ r matches a carriage return character. Equivalent to \x0d and \cm.
\s matches any white space character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s matches any non-white-space character. equivalent to [^ \f\n\r\t\v].
\ t matches a tab character. Equivalent to \x09 and \ci.
\v matches a vertical tab. Equivalent to \x0b and \ck.
3.3 Special characters
The so-called special characters, that is, some special meaning of the characters, such as the above "*.txt" in the *, simply to say that any string meaning. If you want to find files with * in the file name, you need to escape the *, which is preceded by a \. LS \*.txt. Regular expressions have the following special characters.
Special Character description
$ matches the end position of the input string. If the Multiline property of the RegExp object is set, then $ also matches ' \ n ' or ' \ R '. To match the $ character itself, use \$.
() marks the start and end position of a subexpression. The subexpression can be obtained for later use. To match these characters, use \ (and \).
* Match the preceding subexpression 0 or more times. To match the * character, use \*.
+ matches the preceding subexpression one or more times. to match the + character, use \+.
. Matches any single character except the newline character \ n. to match., please use \.
[Marks the beginning of a bracket expression. To match [, use \[.
? Matches the preceding subexpression 0 or more times, or indicates a non-greedy qualifier. Want to match? characters, please use \?.
\ marks the next character as either a special character, or a literal character, or a backward reference, or a octal escape character. For example, ' n ' matches the character ' n '. ' \ n ' matches line breaks. The sequence ' \ \ ' matches ' \ ' and ' \ (' matches '.
^ matches the starting position of the input string, unless used in a bracket expression, at which point it means that the character set is not accepted. To match the ^ character itself, use \^.
{marks the beginning of a qualifier expression.} To match {, use \{.
| Indicates a choice between two items. to match |, use \|.
The method for constructing regular expressions is the same as for creating mathematical expressions. That is, using multiple metacharacters and operators to combine small expressions to create larger expressions. The component of a regular expression can be a single character, character set, character range, selection between characters, or any combination of any of these components.
3.4 Qualifiers
A qualifier is used to specify how many times a given component of a regular expression must appear to satisfy a match. There are * or + or? or {n} or {n,} or {n,m} altogether 6 species.
*, +, and? Qualifiers are greedy because they match as many words as possible, only to add one behind them. You can achieve a non greedy or minimal match.
The qualifiers for regular expressions are:
Character description
* Match the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * is equivalent to {0,}.
+ matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but cannot match "Z". + is equivalent to {1,}.
? Match the preceding subexpression 0 times or once. For example, "Do (es)" can match "do" in "do" or "does". is equivalent to {0,1}.
{n} n is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '.
{N,} n is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ' but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '.
{n,m} m and n are non-negative integers, where n <= m. Matches n times at least and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' o '. Notice that there is no space between the comma and the two number.
3.5 Locator Character
Used to describe the boundary of a string or word, ^ and $, respectively, the beginning and end of a string, \b describes the front or back bounds of a word, and \b represents a non word boundary. Qualifiers cannot be used on the locator.
3.6 Select
Enclose all the selections with parentheses, separating the adjacent selections with |. But with parentheses there is a side effect that the related match is cached and available at this time?: Put the first option to eliminate this side effect.
Among them: is one of the non-capture elements, and there are two not-captured dollars? = and?!, these two also have more meaning, the former is forward lookup, in any start matching the regular expression pattern within the parentheses position to match the search string, the latter is negative check, Matches the search string at any position that does not begin to match the regular expression pattern.
3.7 Forward Reference
Adding parentheses around a regular expression pattern or part of a pattern causes the correlation match to be stored in a temporary buffer, and each captured child match is stored in the content that is encountered from left to right in the regular expression pattern. The buffer number for the storage child match starts at 1 and is numbered consecutively up to 99 subexpression. Each buffer can be accessed using ' \ n ', where n is a single or two-bit decimal number that identifies a particular buffer.
You can use a non-capture meta character '?: ', '? = ', or '?! ' to ignore the preservation of the related match.
4. Operation priority of various operators
The same priority of the operation from left to right, the operation of different priorities first high and then low. The precedence of various operators from high to Low is as follows:
Operator description
\ escape Character
(), (?:), (? =), [] parentheses and square brackets
*, +,?, {n}, {n,}, {n,m} qualifier
^, $, \anymetacharacter position and order
| "or" action
Current 1/2 page
12 Next read the full text