Quick Start for regular expressions

Source: Internet
Author: User

Let's start with a brief introduction to the regular expression:

When writing a program or Web page that handles strings, there is often a need to find strings that match certain complex rules. Regular expressions are the tools used to describe these rules. In other words, the regular expression is the code that records the text rule.

Here's a look at what the messy characters in the regular expression mean:

1, the commonly used metacharacters
Code Description
. Match any character other than line break
\w Match letters or numbers or underscores or kanji
\s Match any of the whitespace characters
\d Match numbers
\b Match the beginning or end of a word
^ Match the start of a string
$ Match the end of a string

Well, let's try to understand the following:

\bhello\b is actually looking for the word Hello-first at the beginning of a word (\b), then in the string Hello, and finally at the end of the word (\b).

010-\d\d\d\d\d\d\d\d such as Beijing's fixed phone-first 010-and then 8 numbers (\d).

^\d{18}$ such as a social security number-start with a string (^), then a 18-digit number (\d), and the end of the string ($).

2. Commonly used qualifiers
Code Description
* Repeat 0 or more times
+ Repeat one or more times
? Repeat 0 or one time
N Repeat n times
{N,} Repeat N or more times
{N,m} Repeat N to M times

\ba\w*\b matches a word that begins with the letter A-first at the beginning of a word (\b), then the letter A, then any number of letters or numbers (\w*), and finally the end of the word (\b).

windows\d+ matches one or more digits after the beginning of Windows, Windows7, WINDOWS10, and so on--\d+ match one or more numbers.

010-\d{8} is also a matching Beijing fixed telephone, with the above 010-\d\d\d\d\d\d\d\d is a meaning , this more convenient--\d{8} is the continuous matching eight times the meaning of the number.

3. Commonly used anti-semantic code
Code Description
\w Matches any character that is not a letter or number or an underscore or a Chinese character
\s Characters that match any non-whitespace character
\d Match any non-numeric character
\b Match non-word start or end of place
[^x] Match any character except X
[^aeiou] Match any character except a vowel

"s[^"]+ "matches the string in quotation marks that begins with S.

4. Common grouping Syntax
Code Description
(exp) Match exp, and capture text into an automatically named group
(? <name>exp) Match exp, and capture the text to a group named name, or you can write (? ') Name ' exp ')
(?: EXP) Matches exp, does not capture matching text, and does not assign group numbers to this group
(? =exp) Match the position of the exp front
(? <=exp) Match the position after exp
(?! Exp Match the position followed by the exp.
(? <!exp) Match a location that is not previously exp

\b\w*h (?! E) \w*\b This relatively complex point , but with the help of the above table, should also be able to read, the following detailed analysis-the beginning of the word (\b), and then with 0 or more letters (\w*), because it is a word, it can only be the letter, followed by the letter H, followed by a character not E ( ?! e), then another 0 or more letters (\w*) until the end of the word (\b). Then we know that we are looking for "words with H letters but not E after H", such as him, honey. And to exclude words such as Hello and help.

5. Lazy qualifier
Code Description
*? Repeat any number of times, but repeat as little as possible
+? Repeat 1 or more times, but repeat as little as possible
?? Repeat 0 or 1 times, but repeat as little as possible
{n,m}? Repeat N to M times, but repeat as little as possible
{N,}? Repeat more than n times, but repeat as little as possible

&NBSP;

When a regular expression contains a qualifier that can accept duplicates, the usual behavior is to match as many characters as possible. For example: a.*b   It will match If it is used to search for aabab, it will match the entire string aabab. This is called greedy match; but  a.*?b   search will match aab (first to third characters) and ab (fourth to fifth characters), which is called lazy match .

6. Other symbols commonly used
Code Description
\. The unary character escapes. . is a meta-character, you have no way to match it, because it will be understood as a different meaning, then you have to use \ To remove the special meaning of these characters, that is . Similarly, other metacharacters like *?+ and so on need to be escaped.
[] Character. For example [0-9] match numbers 0 to 9, equivalent to \d; [A-z] matches lowercase letters;[.?!] Match punctuation.?!
() grouping. Each group automatically has a group number, from left to right, the first occurrence of the group number is 1, the second is 2, and so on. (\d{1,3}\.) {3}\d{1,3}  is a simple IP address matching expression--\d{1,3} matches 1 to 3 digits, (\d{1,3}\.) {3} matches three digits plus an English period (this whole is the \d{1,3}). \b (\w+) \b\s+\1\b  can be used to match go go--first One word, this word contains one or more letters < Span class= "part" >\b (\w+) \b, the word is captured into a group numbered 1, followed by \s+), and finally The content captured in Group 1 (that is, the word previously matched) (\1), the end of the word (\b).
| branches. ^\d{17} (\d|[ XX]) $ can be used to verify that the ID number-the start of the string (^), followed by a 17-bit number (\d{17}), followed by a number (\d) or (|) The letter x or X ([XX]), the end of the string ($).
I One match. For example, see below.
G Global match. For example, see below.

Here's how//i and//g are used, and we've deepened our understanding from a piece of code:

1 varStr="Welcome to microsoft!";7Str=Str+ "We is proud to announce that Microsoft have";8Str=Str+ "One of the largest WEB developers sites in the world.";9document.write (Str.replace (/Microsoft /I, "W3school"));Ten </script>12 </body>14 

The above code, is to replace Microsoft in the string to W3school, when the regular expression is /microsoft/i , run the result: Welcome to w3school!  We is proud to announce that Microsoft have one of the largest WEB developers sites in the world. It can be seen that only the first Microsoft is replaced, that is, a match .

We change the regular expression/microsoft/i to /microsoft/g, and the result becomes: Welcome to w3school! We is proud to announce that W3school have one of the largest WEB developers sites in the world. That is to say, the full text of Microsoft everywhere is replaced by W3school, that is, global matching .

Of course there are a lot of code is not involved, but many regular expressions should be able to understand, the introduction of the back of the study is relatively simple. If I don't understand it, I'm not sure I can tell. You can search for more articles, each author's way of expressing things differently, and you can see who you can receive faster and better.

If the above content is wrong, deeply sorry, also hope that we can point out in time, generous enlighten, thank you.

Quick Start for regular expressions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.