Regular expression page 1/2

Source: Internet
Author: User
Tags printable characters

Preface
Regular Expressions are cumbersome, but they are powerful. The application after learning will give you an absolute sense of accomplishment in addition to improving your efficiency. As long as you carefully read these materials and make some reference when applying them, it is not a problem to master regular expressions.
1. Introduction
At present, regular expressions have been widely used in many software applications, including * nix (Linux, Unix, etc.), HP and other operating systems, PHP, C #, Java and other development environments, and many applications can see the shadow of regular expressions.
The use of regular expressions can be implemented in a simple way. In order to be simple, effective, and powerful, it makes the Regular Expression Code more difficult and difficult to learn. Therefore, you need to make some effort. After getting started, refer to some references, it is relatively simple and effective to use.
Example: ^. + @. + \... + $
Such code has been used to scare me out many times. Many people may be scared away by such code. Continue reading this article to allow you to freely apply such code.
Note: Part 1 here seems to be a bit repetitive, with the aim of re-describing the content in the previous table to make it easier to understand.
2. Regular Expression history
The "Ancestor" of regular expressions can be traced back to early studies on how the human nervous system works. Warren McCulloch and Walter Pitts, two neuroscientists, developed a mathematical method to describe these neural networks.
In 1956, a mathematician named Stephen Kleene published a paper titled "neural network event representation" based on McCulloch and Pitts's early work, introduces the concept of regular expressions. A regular expression is an expression used to describe the algebra of a positive set. Therefore, the regular expression is used.
Later, it was found that this work could be applied to some early research using Ken Thompson's computational search algorithm, which is the main inventor of Unix. The first utility of regular expressions is the qed editor in Unix.
As they said, the rest is the well-known history. Since then, regular expressions have been an important part of text-based editors and search tools.
3. Regular Expression Definition
A regular expression (regular expression) describes a string matching pattern, it can be used to check whether a string contains a seed string, replace matched substrings, or retrieve substrings that meet certain conditions from a string.
In the column directory, *. txt in dir *. txt or ls *. txt is not a regular expression, because here * is different from the regular expression.
A regular expression is a text mode consisting of common characters (such as characters a to z) and special characters (such as metacharacters. A regular expression is used as a template to match a character pattern with the searched string.
3.1 common characters
It consists of all the print and non-print characters that are not explicitly specified as metacharacters. This includes all uppercase and lowercase letter characters, all numbers, all punctuation marks, and some symbols.
3.2 non-printable characters
\ Cx matches the control characters specified by x. For example, \ cM matches a Control-M or carriage return character. The value of x must be either a A-Z or a-z. Otherwise, c is treated as an original 'C' character.
\ F matches a break. It is equivalent to \ x0c and \ cL.
\ N matches a linefeed. It is equivalent to \ x0a and \ cJ.
\ R matches a carriage return. It is equivalent to \ x0d and \ cM.
\ S matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v].
\ S matches any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v].
\ T matches a tab. It is equivalent to \ x09 and \ cI.
\ V matches a vertical tab. It is equivalent to \ x0b and \ cK.
Special Character 3.3
Special characters are characters with special meanings, such as *. txt. in simple words, they represent the meaning of any string. If you want to find a file with * in the file name, you need to escape *, that is, add a \ before it \. Ls \ *. txt. Regular expressions have the following special characters.
 
Special characters
$ Matches the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches '\ n' or' \ R '. To match the $ character, use \ $.
() Mark the start and end positions of a subexpression. Subexpressions can be obtained for future use. To match these characters, use \ (and \).
* Matches the previous subexpression zero or multiple times. To match * characters, use \*.
+ Match the previous subexpression once or multiple times. To match + characters, use \ +.
. Match any single character except linefeed \ n. To match., use \.
[Mark the start of a bracket expression. To match [, use \[.
? Match the previous subexpression zero or once, or specify a non-Greedy qualifier. To match? Character, use \?.
\ Mark the next character as a special character, or a literal character, or backward reference, or an octal escape character. For example, 'n' matches the character 'n '. '\ N' matches the line break. The sequence '\' matches "\", while '\ (' matches "(".
^ Matches the start position of the input string. Unless used in the square brackets expression, this character set is not accepted. To match the ^ character itself, use \ ^.
{Mark the start of the qualifier expression. To match {, use \{.
| Specify an option between the two items. To match |, use \ |.
The method for constructing a regular expression is the same as that for creating a mathematical expression. That is, a larger expression is created by combining a small expression with a variety of metacharacters and operators. The regular expression component can be a single character, Character Set combination, character range, choice between characters, or any combination of all these components.
 
3.4 qualifier
A qualifier is used to specify how many times a given component of a regular expression must appear to match. There are * or + or? There are 6 types: {n}, {n,}, or {n, m.
*, +, And? The delimiters are greedy because they will match as many words as possible, and only add one? You can achieve non-greedy or minimum matching.
Regular expressions have the following delimiters:
 
Character Description
* Matches the previous subexpression zero or multiple times. For example, zo * can match "z" and "zoo ". * Is equivalent to {0 ,}.
+ Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "zoo", but cannot match "z ". + Is equivalent to {1 ,}.
? Match the previous subexpression zero or once. For example, "do (es )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }.
{N} n is a non-negative integer. Match n times. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food.
{N,} n is a non-negative integer. Match at least n times. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent to 'o + '. 'O {0,} 'is equivalent to 'o *'.
Both {n, m} m and n are non-negative integers, where n <= m. Match at least n times and at most m times. For example, "o {1, 3}" matches the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note that there must be no space between a comma and two numbers.
3.5 positioning Operator
It is used to describe the boundary of a string or word. ^ and $ respectively refer to the start and end of a string, \ B describes the boundary before or after a word, and \ B indicates non-word boundary. The delimiters cannot be used.
3.6 select
Enclose all selection items with parentheses, and separate adjacent selection items with |. But there is a side effect when parentheses are used, that is, the related matching will be cached. Is it available now? : Put the first option to eliminate this side effect.
Where? : One non-capturing element, and two non-capturing elements? = And ?!, The two have more meanings. The former is forward pre-query, and matches the search string at any position starting to match the Regular Expression Pattern in parentheses. The latter is negative pre-query, match the search string at any position that does not match the regular expression pattern.
3.7 Back Reference
Adding parentheses on both sides of a regular expression or partial expression will cause the matching to be stored in a temporary buffer, each captured sub-match is stored in the content from left to right in the regular expression mode. The buffer number that stores the sub-match starts from 1 and ranges from consecutive numbers to a maximum of 99 subexpressions. Each buffer zone can be accessed using '\ n', where n is one or two decimal digits that identify a specific buffer zone.
Can I use non-captured metacharacters '? :','? = ', Or '?! 'To ignore the save of the matching.
4. Operation priority of various operators
Operations with the same priority are performed from left to right. Operations with different priorities are first high and then low. The priorities of operators are as follows:
 
Operator description
\ Escape Character
(),(? :),(? =), [] Parentheses and square brackets
*, + ,?, {N}, {n ,}, {n, m} qualifier
^, $, \ Anymetacharacter location and Sequence
| "Or" Operation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.