30-minute introductory regular expression learning notes

Source: Internet
Author: User
Tags control characters

Original

There is a problem with this record:

Inside the negative 0-wide assertion, match the word after q that is not U.
Use \b\w*q (?! u) \w*\b cannot resolve qsqu this situation.
That is, QS has consumed a negative 0-wide assertion, but there is still a case of Qu.


Study notes:


\b#元字符 Match only one location
\bhi\b #精确查找hi这个单词
\bhi\b.*\blucy\b #hi后面不远处跟着一个Lucy
0\D{2}-\D{8} #匹配012-12345678 Phone number


Metacharacters
Table 1. Commonly used meta-characters
Code Description
. Match any character other than line break
\w Match letters or numbers or underscores or kanji
\s Match any of the whitespace characters
\d Match numbers
\b Match the beginning or end of a word
^ Match the start of a string
$ Match the end of a string


Example:
\ba\w*\b#以a开头的单词
\d+ #一个或多个数字
\bw{6}\b#6个字符的单词
^\d{5,12}$ #整个字符串是5-12-digit number


Character escapes
Find meta character itself \ Escape


Repeat
Table 2. Common Qualifiers
Code/syntax Description
* Repeat 0 or more times
+ Repeat one or more times
? Repeat 0 or one time
N Repeat n times
{N,} Repeat N or more times
{N,m} Repeat N to M times


Example:
Windows\d+ #Windows后跟一个或多个数字
^\w+ #匹配整个字符串的第一个单词


Character class
[[Email protected]#] Any one of the characters in the #匹配 []
[0-9] #\d
[0-9a-za-z] #\w
\ (? 0\d{2}[)-]?\d{8}# (010)-12345678 010-12345678


Branching conditions
0\D{2}-\D{8}|0\D{3}-\D{7}#满足二者之一即匹配, attention to order, prevent short circuit


Group
(\d{1,3}\.) {3}\d{1,3}#简单匹配ip地址
((2[0-4]\d|25[0-5]| [01]?\d\d?] \.) {3} (2[0-4]\d|25[0-5]| [01]?\d\d?]#精确匹配ip地址


Anti-righteousness


Table 3. Commonly used antisense code
Code/syntax Description
\w Match any characters that are not letters, numbers, underscores, kanji
\s Match any character that is not a whitespace character
\d Match any non-numeric character
\b Match a position that is not the beginning or end of a word
[^x] Matches any character except X
[^aeiou] Matches any character except for the letters AEIOU


\s+ #不包括空白的符的字符串
<a[^>]+> #用尖括号括起来的以a开头的字符串


Backward reference
To repeat the search for a previously grouped matched text
Table 4. Common grouping syntax
Classification Code/syntaxDescription
Capture (exp)Match exp, and capture text into an automatically named group
(? <name>exp) Match exp, and capture the text to a group named name, or you can write (? ') Name ' exp ')
(?: EXP) Matches exp, does not capture matching text, and does not assign group numbers to this group
0 Wide assertion (? =exp) Match the position of the exp front
(? <=exp) Match the position after exp
(?! Exp Match the position followed by the exp.
(? <!exp) Match a location that is not previously exp
Comments (? #comment)This type of grouping does not have any effect on the processing of regular expressions, and is used to provide comments for people to read


0 Wide Assertion
Like \b ^ $, specify a location
(? =exp) #以exp结尾
(? <=exp) #以exp开头
\b\w+ (? =ing\b)#以ing结尾的单词的前半部分
(? <=\bre) \w+\b#以re开头的单词的后半部分
(? <=\s) \d+ (? =\s)#以空白符间隔的数字




Negative 0 Wide Assertion
(?! Exp #断言此后不能出现exp表达式
(? <!exp) #断言此前不能出现exp表达式
\b\w* (q (?! u)) *\w*\b#q后面不是u的单词 (?! u) match location only, no characters consumed
~ There is a problem here qsqu
\B{3} (?! \d) #后面不是数字的三位数字
(? <! [A-z]) \D{7}#前面不是小写字母的7位数字
(?<=< (\w+) >). * (?=<\/\1>)#匹配不包含属性的简单html标签中的内容
Comments
Method One: (? #这里是注释的内容)
Method Two: Ignore the whitespace in the pattern can be added after the # comment


Greed and laziness
Add after a wildcard character? Matches as few elements as possible.
Table 5. Lazy Qualifiers
Code/syntax Description
*? Repeat any number of times, but repeat as little as possible
+? Repeat 1 or more times, but repeat as little as possible
?? Repeat 0 or 1 times, but repeat as little as possible
{n,m}? Repeat N to M times, but repeat as little as possible
{N,}? Repeat more than n times, but repeat as little as possible


Processing options
Table 6. Common Processing Options
Name Description
IgnoreCase (ignoring case)Match is not case-sensitive.
Multiline (multi-line mode)Change the meaning of ^ and $ to match at the beginning and end of any line
Singleline (single-line mode)Change the meaning of the. To make it match every character (including newline characters \ n).
Ignorepatternwhitespace (Ignore whitespace)Ignores non-escaped whitespace in an expression and enables comments marked by #.
Explicitcapture (Explicit capture)Only groups that have been explicitly named are captured.
Note: single-line mode and multiline mode can be used simultaneously.


Balance Group/recursive matching
This feeling is very difficult to look = =, can only say to understand.
(?‘ Group ') names the captured content as group and presses it onto the stack (stacks)
(?‘ -group ') pops the capture from the stack that was last pressed onto the stack, and if the stack was empty, the match for this group failed
(? (group) Yes|no) if a capture with the name group is present on the stack, continue to match the expression in the Yes section, or continue to match the no section
(?!) 0 wide Negative lookahead assertion, because there is no suffix expression, trying to match always fails






There's nothing else to mention.
Table 7: Syntax not discussed in detail
Code/syntax Description
\a Alarm character (the effect of printing it is the computer beeps)
\b is usually the word dividing position, but if the delegate backspace is used in the character class
\ t tab, tab
\ r Enter
\v Vertical tab
\f Page break
\ n Line break
\e Escape
\0nn The octal code in ASCII code is an nn character
\xnn The hexadecimal code in the ASCII code is an nn character
\unnnn Characters with hexadecimal code nnnn in Unicode code
\cn ASCII control characters. For example, \CC represents CTRL + C
\a The beginning of the string (similar to ^, but not affected by the processing of multiline options)
\z End of string or end of line (not affected by multi-line processing options)
\z End of string (similar to $, but not affected by the processing of multiline options)
\g The beginning of the current search
\p{name} A character class named name in Unicode, such as \p{isgreek}
(? >exp) Greedy sub-expression
(? <x>-<y>exp) Balance Group
(? im-nsx:exp) Change processing options in the sub-expression exp
(? im-nsx) Change the processing options for the sections that follow the expression
(? (exp) yes|no)DECLARE exp as a 0-wide forward assertion, if it matches in this position, use Yes as the expression for this group, otherwise use no
(? (exp) Yes) Ditto, just use an empty expression as no
(? (name) yes|no)If the group named name captures the content, use Yes as the expression;
(? (name) Yes) Ditto, just use an empty expression as no


















30-minute introductory regular expression learning notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.