Difficult to write regular expressions and complex syntaxes make it difficult for scholars to come into contact. To solve this problem, I decided to invite you to develop HyperScriptExpression (HSE for short) with me ). HSE is all Marked in the form of a regular syntax, for example:The equivalent HSE of regular \ d {2}-\ d {5} is: <rep = 2> <digit> </rep>-<rep = 5> <digit> </rep>.Regular <(. *)>. * <\/\ 1> equivalent HSE: <rem (> <*> <any> </*> <)> <*> <any> </*> </<rem = 1>.The equivalent HSE of regular ^ Chapter [1-9] [0-9] {0, 1} is: <@ start> Chapter <in> 1-9 </in> <rep = 0, 1> <in> 0-9 </in> </rep>.Regular (\ w) + [@] {1} (\ w) + [\.]) the equivalent HSE of {1, 3} (\ w) + is: <+> <word> </+> <rep1 >@</rep> <rep = 1, 3> <word +>. </rep> <word +>.HSE syntax comment
<> |
Transfer symbol. <Replaced by <.> Replace with> |
<@ Start> |
Matches the start position of the input string. If the HSE object's Multiline attribute is set, <@ start> matches the location after <crlf> or <cr>. |
<@ Over> |
Matches the end position of the input string. If the Multiline attribute of the HSE object is set, <@ over> matches the position before <crlf> or <cr>. |
<*> </*> |
Matches the previous subexpression zero or multiple times. For example, z <*> o </*> can match "z" and "zoo ". It is equivalent to <least = 0> </least>. |
<+> </+> |
Match the previous subexpression once or multiple times. For example, 'z <+> o </+> 'can match "zo" and "zoo", but cannot match "z ". It is equivalent to <least = 1> </least>. |
<Sel> </sel> |
Match the previous subexpression zero or once. For example, "do <sel> es </sel>" can match "do" in "do" or "does ". It is equivalent to <rep = 0, 1> </rep>. |
<Rep = n> </rep> |
N is a non-negative integer. Match n times. For example, '<rep = 2> o </rep>' cannot match 'O' in "Bob", but can match two o in "food. |
<Least = n> </least> |
N is a non-negative integer. Match at least n times. For example, '<least = 2> o </least>' cannot match 'O' in "Bob", but can match all o in "foooood. |
<Rep = m, n> </rep> |
Both m and n are non-negative integers, where n> = m. It can be matched at least m times and at most n times. For example, "<rep = 1, 3> o </rep>" matches the first three o in "fooooood. Note that there must be no space between a comma and two numbers. |
NG attribute |
When this character is followed by any other delimiter, the matching mode is not greedy. The non-Greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible. For example, for the string "oooo", '<+ NG> o </+>' matches a single "o ", '<+> o </+>' matches all 'O '. |
<Anything> or <any> |
Match any single character except "<crlf>. To match any character including '<crlf>', use the mode like '<in> <any> <crlf> </in>. |
<Rem (> p </)> or <Rem> p </rem> |
Match p and obtain this match. The obtained match can be obtained from the generated Matches set, using the SubMatches set, $0... $9 attribute. |
<(> Pattern </)> |
Matches pattern but does not get the matching result. That is to say, this is a non-get match and is not stored for future use. This is useful when you use the "or" character (|) to combine each part of a pattern. For example, 'industr <(> y <or> ies </)> is a simpler expression than 'industry | industries. |
<Eq> pattern </eq> |
Forward pre-query: matches the search string at the beginning of any string that matches the pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows <eq> 95 <or> 98 <or> NT <or> 2000 </eq> 'can match "Windows" in "Windows 2000 ", but cannot match "Windows" in "Windows 3.1 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters. |
<Neq> pattern </neq> |
Negative pre-query: matches the search string at the beginning of any string that does not match pattern. This is a non-get match, that is, the match does not need to be obtained for future use. |
X <or> y |
Match x or y. For example, 'z <or> food' can match "z" or "food ". '<(> Z <or> f </)> ood' matches "zood" or "food ". |
<In> </in> |
Character Set combination. Match any character in it. For example, '<in> abc </in>' can match 'A' in "plain '. |
<Nin> </nin> |
Negative value character set combination. Match any character not included. For example, '<nin> abc </nin>' can match 'p' in "plain '. |
<In> a-z </in> |
Character range. Matches any character in the specified range. |
<Nin> a-z </nin> |
Negative character range. Matches any character that is not within the specified range. |
<Border> </border> |
Match A Word boundary, that is, the position between a word and a space. |
<Nborder> </nborder> |
Match non-word boundary. <Border>. |
<Control = x> |
Match the control characters specified by x. For example, <control = M> matches a Control-M or carriage return. The value of x must be either a A-Z or a-z. Otherwise, the <control> is treated as the <nothing> character. |
<Digit> |
Match a numeric character. You can use <digit +>, <digit *>, <digit?> Format. The same below. |
<Ndigit> |
Match a non-numeric character. |
<Page> |
Match a form feed. |
<Crlf> |
Match A linefeed. |
<Cr> |
Match a carriage return. |
<Blank> |
Matches any blank characters, including spaces, tabs, and page breaks. |
<Nblank> |
Match any non-blank characters. |
<Tab> |
Match a tab. |
<Vtab> |
Match a vertical tab. |
<Word> |
Match any word characters that contain underscores. |
<Nword> |
Match any non-word characters. |
<Hex = n> |
Match n, where n is the hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers. |
<Call = num> |
Matches num, where num is a positive integer. References to the obtained matching. For example, '<rem> <any> </rem> <call = 1>' matches two consecutive identical characters. |
<Oct = n> |
Identifies an octal escape value. |
<Unicode = n> |
Match n, where n is a Unicode character represented by four hexadecimal numbers. For example, \ u00A9 matches the copyright symbol (). |
<Nothing> or <Nil> |
Matches empty characters. Used for selection. For example, <nothing> <or> a <or> B <or> c <or> d indicates a, B, c, d, or no (null character ). |
<Total> |
All strings must be matched. For example, <total> HS <in> DEF </in> can match HSD, but cannot match HSD in HSDB. |