Announcement on joint development initiative instead of regular expression-HyperScriptExpression

Source: Internet
Author: User







Difficult to write regular expressions and complex syntaxes make it difficult for scholars to come into contact. To solve this problem, I decided to invite you to develop HyperScriptExpression (HSE for short) with me ). HSE is all Marked in the form of a regular syntax, for example:The equivalent HSE of regular \ d {2}-\ d {5} is: <rep = 2> <digit> </rep>-<rep = 5> <digit> </rep>.Regular <(. *)>. * <\/\ 1> equivalent HSE: <rem (> <*> <any> </*> <)> <*> <any> </*> </<rem = 1>.The equivalent HSE of regular ^ Chapter [1-9] [0-9] {0, 1} is: <@ start> Chapter <in> 1-9 </in> <rep = 0, 1> <in> 0-9 </in> </rep>.Regular (\ w) + [@] {1} (\ w) + [\.]) the equivalent HSE of {1, 3} (\ w) + is: <+> <word> </+> <rep1 >@</rep> <rep = 1, 3> <word +>. </rep> <word +>.HSE syntax comment
<> Transfer symbol. <Replaced by <.> Replace with>
<@ Start> Matches the start position of the input string. If the HSE object's Multiline attribute is set, <@ start> matches the location after <crlf> or <cr>.
<@ Over> Matches the end position of the input string. If the Multiline attribute of the HSE object is set, <@ over> matches the position before <crlf> or <cr>.
<*> </*> Matches the previous subexpression zero or multiple times. For example, z <*> o </*> can match "z" and "zoo ". It is equivalent to <least = 0> </least>.
<+> </+> Match the previous subexpression once or multiple times. For example, 'z <+> o </+> 'can match "zo" and "zoo", but cannot match "z ". It is equivalent to <least = 1> </least>.
<Sel> </sel> Match the previous subexpression zero or once. For example, "do <sel> es </sel>" can match "do" in "do" or "does ". It is equivalent to <rep = 0, 1> </rep>.
<Rep = n> </rep> N is a non-negative integer. Match n times. For example, '<rep = 2> o </rep>' cannot match 'O' in "Bob", but can match two o in "food.
<Least = n> </least> N is a non-negative integer. Match at least n times. For example, '<least = 2> o </least>' cannot match 'O' in "Bob", but can match all o in "foooood.
<Rep = m, n> </rep> Both m and n are non-negative integers, where n> = m. It can be matched at least m times and at most n times. For example, "<rep = 1, 3> o </rep>" matches the first three o in "fooooood. Note that there must be no space between a comma and two numbers.
NG attribute When this character is followed by any other delimiter, the matching mode is not greedy. The non-Greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible. For example, for the string "oooo", '<+ NG> o </+>' matches a single "o ", '<+> o </+>' matches all 'O '.
<Anything> or <any> Match any single character except "<crlf>. To match any character including '<crlf>', use the mode like '<in> <any> <crlf> </in>.
<Rem (> p </)> or
<Rem> p </rem>
Match p and obtain this match. The obtained match can be obtained from the generated Matches set, using the SubMatches set, $0... $9 attribute.
<(> Pattern </)> Matches pattern but does not get the matching result. That is to say, this is a non-get match and is not stored for future use. This is useful when you use the "or" character (|) to combine each part of a pattern. For example, 'industr <(> y <or> ies </)> is a simpler expression than 'industry | industries.
<Eq> pattern </eq> Forward pre-query: matches the search string at the beginning of any string that matches the pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows <eq> 95 <or> 98 <or> NT <or> 2000 </eq> 'can match "Windows" in "Windows 2000 ", but cannot match "Windows" in "Windows 3.1 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
<Neq> pattern </neq> Negative pre-query: matches the search string at the beginning of any string that does not match pattern. This is a non-get match, that is, the match does not need to be obtained for future use.
X <or> y Match x or y. For example, 'z <or> food' can match "z" or "food ". '<(> Z <or> f </)> ood' matches "zood" or "food ".
<In> </in> Character Set combination. Match any character in it. For example, '<in> abc </in>' can match 'A' in "plain '.
<Nin> </nin> Negative value character set combination. Match any character not included. For example, '<nin> abc </nin>' can match 'p' in "plain '.
<In> a-z </in> Character range. Matches any character in the specified range.
<Nin> a-z </nin> Negative character range. Matches any character that is not within the specified range.
<Border> </border> Match A Word boundary, that is, the position between a word and a space.
<Nborder> </nborder> Match non-word boundary. <Border>.
<Control = x> Match the control characters specified by x. For example, <control = M> matches a Control-M or carriage return. The value of x must be either a A-Z or a-z. Otherwise, the <control> is treated as the <nothing> character.
<Digit> Match a numeric character. You can use <digit +>, <digit *>, <digit?> Format. The same below.
<Ndigit> Match a non-numeric character.
<Page> Match a form feed.
<Crlf> Match A linefeed.
<Cr> Match a carriage return.
<Blank> Matches any blank characters, including spaces, tabs, and page breaks.
<Nblank> Match any non-blank characters.
<Tab> Match a tab.
<Vtab> Match a vertical tab.
<Word> Match any word characters that contain underscores.
<Nword> Match any non-word characters.
<Hex = n> Match n, where n is the hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers.
<Call = num> Matches num, where num is a positive integer. References to the obtained matching. For example, '<rem> <any> </rem> <call = 1>' matches two consecutive identical characters.
<Oct = n> Identifies an octal escape value.
<Unicode = n> Match n, where n is a Unicode character represented by four hexadecimal numbers. For example, \ u00A9 matches the copyright symbol ().
<Nothing> or
<Nil>
Matches empty characters. Used for selection. For example, <nothing> <or> a <or> B <or> c <or> d indicates a, B, c, d, or no (null character ).
<Total> All strings must be matched. For example, <total> HS <in> DEF </in> can match HSD, but cannot match HSD in HSDB.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.