In note (1), I have learned the simple application of regular expressions in Java. Below I will learn some matching principles. Below are some symbolic interpretations of regular expressions I have found on the Internet:
All symbolic interpretations
Character Description
\ Mark the next character as a special character, an original character, or a backward reference, or an octal escape character. For example, 'n' matches the character "N ". '\ N' matches a line break. The sequence '\' matches "\" and "\ (" matches "(".
^ Matches the start position of the input string. If the multiline attribute of the Regexp object is set, ^ matches the position after '\ n' or' \ R.
$ Matches the end position of the input string. If the multiline attribute of the Regexp object is set, $ also matches the position before '\ n' or' \ R.
* Matches the previous subexpression zero or multiple times. For example, Zo * can match "Z" and "Zoo ". * Is equivalent to {0 ,}.
+ Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "Zoo", but cannot match "Z ". + Is equivalent to {1 ,}.
? Match the previous subexpression zero or once. For example, "Do (ES )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }.
{N} n is a non-negative integer. Match n times. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food.
{N,} n is a non-negative integer. Match at least N times. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent to 'o + '. 'O {0,} 'is equivalent to 'o *'.
Both {n, m} m and n are non-negative integers, where n < = M . Match at least N times and at most m times. For example, "O {1, 3}" matches the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note that there must be no space between a comma and two numbers.
? When this character is followed by any other delimiter (*, + ,?, The matching mode after {n}, {n ,}, {n, m}) is not greedy. The non-Greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible. For example, for strings "oooo", 'O ++? 'Will match a single "O", and 'O +' will match all 'O '.
. Match any single character except "\ n. To match any character including '\ n', use a pattern like' [. \ n.
(Pattern) matches pattern and obtains this match. The obtained match can be obtained from the generated matches set. The submatches set is used in VBScript, and $0… is used in JScript... $9 attribute. To match the parentheses, use '\ (' or '\)'.
(? : Pattern) matches pattern but does not get the matching result. That is to say, this is a non-get match and is not stored for future use. This is useful when you use the "or" character (|) to combine each part of a pattern. For example, 'industr (? : Y | ies) is a simpler expression than 'industry | industries.
(? = Pattern) Forward pre-query: matches the search string at the beginning of any string that matches the pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (? = 95 | 98 | nt | 2000 )' It can match "Windows" in "Windows 2000", but cannot match "Windows" in "Windows 3.1 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
(?! Pattern) negative pre-query: matches the search string at the beginning of any string that does not match pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (?! 95 | 98 | nt | 2000) 'can match "Windows" in "Windows 3.1", but cannot match "Windows" in "Windows 2000 ". Pre-query does not consume characters. That is to say, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
X | y matches X or Y. For example, 'z | food' can match "Z" or "food ". '(Z | f) Ood' matches "zood" or "food ".
[Xyz] Character Set combination. Match any character in it. For example, '[ABC]' can match 'A' in "plain '.
[^ XYZ] combination of negative character sets. Match any character not included. For example, '[^ ABC]' can match 'p' in "plain '.
[A-Z] character range. Matches any character in the specified range. For example, '[A-Z]' can match any lowercase letter in the range of 'A' to 'Z.
[^ A-Z] negative character range. Matches any character that is not within the specified range. For example, '[^ A-Z]' can match any character that is not in the range of 'A' to 'Z.
\ B matches a word boundary, that is, the position between a word and a space. For example, 'er \ B 'can match 'er' in "never", but cannot match 'er 'in "verb '.
\ B matches non-word boundaries. 'Er \ B 'can match 'er' in "verb", but cannot match 'er 'in "never '.
\ CX matches the control characters specified by X. For example, \ cm matches a control-M or carriage return character. The value of X must be either a A-Z or a-Z. Otherwise, C is treated as an original 'C' character.
\ D matches a numeric character. It is equivalent to [0-9].
\ D matches a non-numeric character. It is equivalent to [^ 0-9].
\ F matches a break. It is equivalent to \ x0c and \ Cl.
\ N matches a linefeed. It is equivalent to \ x0a and \ CJ.
\ R matches a carriage return. It is equivalent to \ x0d and \ cm.
\ S matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v].
\ S matches any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v].
\ T matches a tab. It is equivalent to \ x09 and \ CI.
\ V matches a vertical tab. It is equivalent to \ x0b and \ ck.
\ W matches any word characters that contain underscores. It is equivalent to '[A-Za-z0-9 _]'.
\ W matches any non-word characters. It is equivalent to '[^ A-Za-z0-9 _]'.
\ XN matches n, where N is the hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers. For example, '\ x41' matches "". '\ X041' is equivalent to '\ x04' & "1 ". The regular expression can use ASCII encoding ..
\ Num matches num, where num is a positive integer. References to the obtained matching. For example, '(.) \ 1' matches two consecutive identical characters.
\ N identifies an octal escape value or a backward reference. If at least N subexpressions are obtained before \ n, n is backward referenced. Otherwise, if n is an octal digit (0-7), n is an octal escape value.
\ Nm identifies an octal escape value or a backward reference. If at least one child expression is obtained before \ nm, the NM is backward referenced. If at least N records are obtained before \ nm, n is a backward reference followed by text M. If none of the preceding conditions are met, if n and m are Octal numbers (0-7), \ nm matches the octal escape value nm.
\ NML if n is an octal digit (0-3) and both M and l are octal digits (0-7), the octal escape value NML is matched.
\ UN matches n, where n is a Unicode character represented by four hexadecimal numbers. For example, \ u00a9 matches the copyright symbol (?).
The following are some examples:
Regular Expression description
/\ B ([A-Z] +) \ 1 \ B/GI position where a word appears continuously
/(\ W +): \/([^/:] +) (: \ D *)? ([^ #] *)/Resolve a URL as a protocol, domain, port, and relative path
/^ (? : Chapter | section) [1-9] [0-9] {0, 1} $/locate the position of the chapter
/[-A-Z]/A to Z a total of 26 letters plus a-number.
/TER \ B/can match chapter, but cannot be terminal
/\ Bapt/can match chapter, but not aptitude
/Windows (? = 95 | 98 | NT)/matches Windows95, Windows98, or WindowsNT. After a match is found, the next retrieval match starts after windows.