Regular expression Quick Start "go"

Source: Internet
Author: User

Let's start with a brief introduction to the regular expression:

When writing a program or Web page that handles strings, there is often a need to find strings that match certain complex rules. Regular expressions are the tools used to describe these rules. In other words, the regular expression is the code that records the text rule.

Here's a look at what the messy characters in the regular expression mean:

1, commonly used meta-characters
Code Description
. Match any character other than line break
\w Match letters or numbers or underscores or kanji
\s Match any of the whitespace characters
\d Match numbers
\b Match the beginning or end of a word
^ Match the start of a string
$ Match the end of a string

Well, let's try to understand the following:

\bhello\b is actually looking for the word Hello-first at the beginning of a word (\b), then in the string Hello, and finally at the end of the word (\b).

010-\d\d\d\d\d\d\d\d such as Beijing's fixed phone-first 010-and then 8 numbers (\d).

^\d{18}$ such as a social security number-start with a string (^), then a 18-digit number (\d), and the end of the string ($).

2. Commonly used qualifiers
Code Description
* Repeat 0 or more times
+ Repeat one or more times
? Repeat 0 or one time
N Repeat n times
{N,} Repeat N or more times
{N,m} Repeat N to M times

\ba\w*\b matches a word that begins with the letter A-first at the beginning of a word (\b), then the letter A, then any number of letters or numbers (\w*), and finally the end of the word (\b).

windows\d+ matches one or more digits after the beginning of Windows, Windows7, WINDOWS10, and so on--\d+ match one or more numbers.

010-\d{8} is also a matching Beijing fixed telephone, with the above 010-\d\d\d\d\d\d\d\d is a meaning, this more convenient--\d{8} is the continuous matching eight times the meaning of the number.

3. Commonly used anti-semantic code
Code Description
\w Matches any character that is not a letter or number or an underscore or a Chinese character
\s Characters that match any non-whitespace character
\d Match any non-numeric character
\b Match non-word start or end of place
[^x] Match any character except X
[^aeiou] Match any character except a vowel

"s[^"]+ "matches the string in quotation marks that begins with S.

4. Common grouping syntax
Code Description
(exp) Match exp, and capture text into an automatically named group
(? <name>exp) Match exp, and capture the text to a group named name, or you can write (? ') Name ' exp ')
(?: EXP) Matches exp, does not capture matching text, and does not assign group numbers to this group
(? =exp) Match the position of the exp front
(? <=exp) Match the position after exp
(?! Exp Match the position followed by the exp.
(? <!exp) Match a location that is not previously exp

\b\w*h (?! E) \w*\b This relatively complex point, but with the help of the above table, should also be able to read, the following detailed analysis-the beginning of the word (\b), and then with 0 or more letters (\w*), because it is a word, it can only be the letter, followed by the letter H, followed by a character not E (?! e), then another 0 or more letters (\w*) until the end of the word (\b). Then we know that we are looking for "words with H letters but not E after H", such as him, honey. And to exclude words such as Hello and help.

5. Lazy Qualifier
Code Description
*? Repeat any number of times, but repeat as little as possible
+? Repeat 1 or more times, but repeat as little as possible
?? Repeat 0 or 1 times, but repeat as little as possible
{n,m}? Repeat N to M times, but repeat as little as possible
{N,}? Repeat more than n times, but repeat as little as possible

When a regular expression contains a qualifier that can accept duplicates, the usual behavior is to match as many characters as possible . For example:a.*b it will match the longest string starting with a and ending with B. If you use it to search for Aabab, it will match the entire string Aabab. This is called greedy matching , but searching with A.*?b will match AaB (first to third character) and AB (fourth to fifth characters), which is called lazy matching .

6. Other symbols commonly used
Code Description
\. Metacharacters escape: is a meta character, you have no way to match it, because it will be understood as something else, then you have to use \ To cancel the special meaning of these characters, that is. Similarly, other metacharacters like *?+ and so on need to be escaped.
[] Character. For example [0-9] match numbers 0 to 9, equivalent to \d;[a-z] match lowercase letters; [.?!] Match punctuation.?!
() Group. Each group automatically has a group number, from left to right, the first occurrence of the group number is 1, the second is 2, and so on. (\d{1,3}\.) {3}\d{1,3}  is a simple IP address matching expression--\d{1,3} matches 1 to 3 digits, Span class= "part" > (\d{1,3}\.) {3} matches three digits plus an English period (this whole is the \d{1,3}). \b (\w+) \b\s+\1\b  can be used to match go go--first A word that contains one or more letters \b (\w+) \b, the word is captured in a group numbered 1, then 1 or more whitespace characters (\s+), and finally the content captured in Group 1 (that is, the word that was previously matched) ( \1), Word end (\b).
| Branches. ^\D{17} (\d|[ XX]) $ can be used to verify that the ID number-the start of the string (^), followed by a 17-bit number (\d{17}), followed by a number (\d) or (|) The letter x or X ([XX]), the end of the string ($).
I One match. The regular expression literal. For example, see below.
G Global match. The regular expression literal. For example, see below.

Here's how//i and//g are used, and we've deepened our understanding from a piece of code:

1 

The above code, is to replace Microsoft in the string to W3school, when the regular expression is/microsoft/i, run the result: Welcome to w3school!  We is proud to announce Thatmicrosoft have one of the largest WEB developers sites in the world. It can be seen that only the first Microsoft is replaced, that is, a match .

We change the regular expression/microsoft/i to/microsoft/g, and the result becomes: Welcome to w3school! We is proud to announce that W3school have one of the largest WEB developers sites in the world. That is to say, the full text of Microsoft everywhere is replaced by W3school, that is, global matching .

Of course there are a lot of code is not involved, but many regular expressions should be able to understand, the introduction of the back of the study is relatively simple. If I don't understand it, I'm not sure I can tell. You can search for more articles, each author's way of expressing things differently, and you can see who you can receive faster and better.

If the above content is wrong, deeply sorry, also hope that we can point out in time, generous enlighten, thank you.

Reprinted from: http://www.cnblogs.com/realcare/p/6028622.html

Regular expression Quick Start "go"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.