Basic regular expressions that must be mastered by programmers

Source: Internet
Author: User
From: xuejinyoulan [add to my favorites]

Source: it168
What is a regular expression?
A regular expression uses a special symbol pattern as a string in the expression format. It is mainly used to describe and parse text. ManyProgramRegular Expressions are ignored (or not used) by members (or even some good experts). I think this is a shame, because when we solve many problems, regular Expressions often make us feel comfortable. Once you have mastered it, you will find it can solve countless real-world problems.

Regular Expressions work like file name replacements in Windows or * nix systems-you can use a specific * or? To specify a series of files. However, it is more accurate to use special characters of regular expressions or metacharacters (metacharacters.

The regular expression treats most characters as direct characters, just like the regular expression mike. It will only match the sequential string M-I-k-e. At the same time, regular expressions use an extended set of metacharacters to represent very complex text matching.

Metacharacters: ^ [] () {}. *? \ | + $ And-
I know they look terrible, but once you understand them, you will know that they are cute symbols.

Row positioning point: '^' and '$'
The '^' (read as: caret) and '$' (read as: Dollar) metacharacters represent the beginning and end of a line of text respectively. As in the previous example, the regular expression Mike will match the Character Sequence M-I-k-e, but it will match all the positions in a row (for example, it matches "I'm Mike" or "carmike "). The '^' character is used to limit the start of matching rows. Therefore, ^ Mike will only look for rows starting with Mike. Similarly, expression Mike $ will only search for M-I-k-e at the end of a row (of course it will still match 'carmi ').

If we use these two row positioning characters together, we can search for the contained special string sequences in multiple lines of text. For example, expression ^ Mike $ will only match the word Mike that occupies a single row. There are not many words in one word. Similarly, the expression ^ $ is useful for discovering empty rows (one row begins with the end of the row.

Character category: '[]'
Square brackets are called character classifications and can be used to match any one or more characters. Suppose you want to match the word 'Gray 'and find the word that is spelled 'Gray. Using a character classification will allow you to match the two-the regular expression gr [EA] Y is interpreted as "matching such a string-a G, followed by R, followed by E or a A, followed by a Y ".

If you use [^...] instead of [...], this category will match any character other than the character listed below. The first character ^ indicates the "no" list-unlike listing all the characters you want to include, you are listing all the characters you do not want to include. Note that the ^ (caret) character is used here. It represents another meaning in addition to the character classification method-used to match the beginning of a text line (seeArticleIn the previous section ).

Metacharacter in character classification :'-'
In a character classification, the metacharacter '-' (DASH) in the character classification is used to indicate a character range. Consider character classification [01234567890 abcdefabcdef]. If '-' is used, we can write [0-9a-fa-f] in this way, which makes it much easier. One thing you should note is that this '-' symbol is considered metacharacters only in one character classification. In other places, it simply matches the ordinary '-' character, it does not make any other sense.

However, it was slow. I saw someone raise their hands to raise questions. If '-' is the first character in a character classification, what will it be considered? For example [-A-F], the problem is good, note: this is an exception, if in character classification, the '-' character is the first character to appear, then we treat it as a common character rather than a metacharacters (because in fact it cannot represent a character range, and the range must have start and end characters ), at this time, it will only match a common '-' character. Let's talk about another exception: s '? 'And '. 'In most cases, it is a metacharacter of a regular expression, but one exception is that in character classification, when they are in character classification (for example, in: [-0-9.?], They only represent a common character. The only special character (metacharacter) is the '-' between 0 and 9 '-').

Match any character with a period: '.'
'.' Metacharacters (generally read as a dot or point) are written to match any character. It looks cute when you want to match any character at the specified position of a string. Again, in character classification, '.' Is Not A metacharacter. So far, Have you begun to see some doorways? Which are metacharacters and which are not metacharacters are different from those in the character classification.

Optional metacharacters: '|'
'|' Metacharacters (read as pipe) mean "or ". It allows you to combine multiple expressions into one expression and then match the results of any single expression. These subexpressions are called backup options.

For example, Mike and Michael are two independent regular expressions, but if they are written like Mike | Michael, this regular expression matches any word.

Parentheses can be used here to limit the range of alternatives. We can use parentheses to achieve the same purpose as the above regular expression and shorten its length. The regular expression mi (KE | chael) matches Mike or Michael. Of course, I still use the first method in the actual program. Although it is longer, it is easier to understand and therefore easier to maintain.

Matching option :'? '
'? 'Metacharacters (read as: Question mark) mean optional. It is placed after a character at a certain position of the regular expression. This character can appear in the matching result or not. Of course, we can be certain '? 'Character can only be followed by a common character, not a metacharacter.

If I want to match the English or American spelling word 'flavor', I will use the regular expression flavou? R, it is interpreted as: "matching a string: F, followed by a L, followed by a A, followed by a V, followed by an O, followed by an optional U, follow an R ".

Number Symbol: '+' and '*'
Like '? 'Character is the same, '+' (read as plus) and '*' (read as star) metacharacters affect the leading character (that is, the character before this symbol) the number of matching strings (using '? 'Is equivalent to a leading character that can appear 0 times or once ). The metacharacter '+' matches the previous project once or more times, while '*' indicates matching any times, including 0 times.

If I want to count the score by the number of times the commentator says 'goal' in a football match, I should use the regular expression go + Al, which can match 'goal ', it can also match the 'gooooooooooooooooooal 'of some passionate broadcasters (but certainly not 'gal ').

The first three metacharacters are :'? ',' + ', And' * 'are generally called metering indicators. Because they affect the number of previous projects.

Quantity range :'{}'
The '{minimum, maximum}' metacharacter sequence allows you to specify the minimum and maximum number of times a specific project can be matched. For example, go {} al can be used to restrict the above example to match only 1 to 5 times. The same {0, 1} is actually equivalent to '? 'Metacharacters.

Escape Character :'\'
'\' Metacharacters (read as: backslash) are used to convert the meanings of specified metacharacters so that you can use them as common characters for matching. For example, do you want to match the character '? 'Or' \ ', you can add a' \ 'character before them, so that they are converted to the meaning of common characters, as if to write :'\? 'Or '\\'.

If '\' is used before a non-metacharacter, it may have different meanings depending on the language in which you use the regular expression. You must refer to the corresponding manual. Generally, Perl-Compatible Regular Expressions (pcres) are used. You can view the perldoc page for Perl regular expressions here. pcres is widely used and can be used in PHP, Ruby, ecmascript/JavaScript and many other languages.

Matching with parentheses :'()'
Most regular expression tools allow you to use parentheses to set a specific expression subset. For example, we can use a regular expression http: // ([^/] +) to match the domain name of a URL. Let's break down this regular expression to see how it works.

The starting part of this expression is straightforward: it must match a string like "h-t-p. This initial sequence is followed by parentheses, which are used to capture characters that match the subexpressions they enclose. In the current example, the subexpression is '[^/] +, we know that it actually matches any character except '/' once to multiple times. For a URL like http://immike.net/blog/some-blog-post', 'immike.net' will be matched by the expressions in the parentheses.

Hope to learn more?
In this article, I just introduced some of the work that regular expressions can do. If you want to learn more, read the masterpiece Jeffrey Friedl: Mastering regular expressions. This book is very well written by Friedl. It is very easy to understand and enjoy reading, so that you are very interested in reading it. It is by no means a dry textbook.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.