Summarize the differences between JavaScript's regular and other languages _javascript skills

Source: Internet
Author: User
Tags character classes

Objective

The recent discovery of regular in JavaScript is somewhat different in some places than in other languages or tools. Although it is almost impossible for you to write and hardly use the following I say, it is good to know.

The code examples in this article are executed in a compatible ES5 JavaScript environment, that is to say, the previous version of IE9, the Fx4 version, and so on, are likely to be different from what I'm saying below.

1. NULL character class

A character class that does not contain any characters is [] called an empty character class ( empty char class ). I'm sure you haven't heard anyone say that, because in other languages, this is illegal, and all documents and tutorials do not speak an illegal syntax. Let me show you what other languages or tools are doing wrong:

$echo | grep ' [] '
grep:unmatched [or [^

$echo | sed '/[]/'
sed:-e expression #1, character 4: An unresolved address regular expression

$echo | awk '/[]/' 
   
    awk:cmd. Line:1:/[]/
awk:cmd line:1: ^ unterminated regexp
. Awk:cmd: line:1 [or [^: error:unmatched c8/> $echo | Perl-ne '/[]/'
unmatched [in Regex, marked by <--here in m/[<--this]/at-e line 1.

$echo | Ruby-ne '/[]/'
-e:1: Empty char-class:/[]/

$python-C ' Import Re;re.match ("[]", "") '
traceback (most Recent call last):
 File "<string>", line 1, in <module>
 File "E:\Python\lib\re.py", line 137, in Mat CH return
 _compile (pattern, flags). Match (String)
 File "E:\Python\lib\re.py", line 244, in _compile
 Raise error, V # invalid expression
sre_constants.error:unexpected end of regular expression
   

In JavaScript, the null character class is a legitimate regular component, but its effect is "never match," which means the match will fail. The equivalent of a null negative forward look (empty negative lookahead)(?!) :

Js> "whatever\n". Match (/[]/g)//null character class, never matches
null
js> "whatever\n". Match (/(?!) /g)//null negative forward look, never match
null

Obviously, this stuff doesn't work in JavaScript.

2. Negative NULL character class

Negative character classes that do not contain any characters [^] are called negative null character classes (negative empty char class) or null negation character classes (empty negative char class), because this noun is my "self creation", Similar to the empty character class mentioned above, this is also illegal in other languages:

$echo | grep ' [^] '
grep:unmatched [or [^

$echo | sed '/[^]/'
sed:-e expression #1, character 5: An unsigned address regular expression

$echo | awk '/[^]/' C5/>awk:cmd. Line:1:/[^]/
awk:cmd line:1: ^ unterminated regexp
. Awk:cmd: line:1 [or [^: error:unmatched ^]//

$echo | perl-ne '/[^]/'
unmatched [in regex; marked by <--this in m/[<--this ^]/at-e line 1.
    $echo | Ruby-ne '/[^]/'
-e:1: Empty char-class:/[^]/

$python-C ' Import re;re.match ("[^]", "") '
traceback (most Recent call last):
 File "<string>", line 1, in <module>
 File "E:\Python\lib\re.py", line 137, in Mat CH return
 _compile (pattern, flags). Match (String)
 File "E:\Python\lib\re.py", line 244, in _compile
 Raise error, V # invalid expression
sre_constants.error:unexpected end of regular expression
$

In JavaScript, the negation of a null character class is a legitimate regular component whose effect is just the opposite of the effect of a null character class, and it can match any character, including a newline character, that is "\n" , equivalent to the common [\s\S] and [\w\W] :

Js> "whatever\n". Match (/[^]/g)  //negative null character class, match any character
["W", "H", "a", "T", "E", "V", "E", "R", "N"]
js> " whatever\n ". Match (/[\s\s]/g)  //Complementary character class, matching any character

It is important to note that it cannot be called "forever match" because the character class must have one character to match, if the target string is empty, or if it has been consumed by the left side, the match will fail, for example:

Js>/abc[^]/.test ("abc")//c no characters followed, the match failed.
False

To understand the true "forever match", you can look at my previous translation of an article: "Empty" regular

3.[]] and [^]]

The simple thing to say is that in Perl and in regular expressions of some Linux commands, [] if a character class contains a closing bracket that follows the opening parenthesis, the closing []] bracket is treated as a normal character, which can only match "]". In JavaScript, this is recognized as an empty character class followed by a closing bracket, and nothing in the null character class matches .[^]] : In JavaScript, it matches an arbitrary character (a negative null character class) followed by a right bracket, for example "a]","b]" , In other languages, the matching is any character that is not.

$perl-e ' print] ' =~/[]]/'
1

$js-e ' Print (/[]]/.test ("]")) '
false

$perl-E ' print ' x ' =~/[^]]/'
1

$js-E ' Print (/[^]]/.test ("x")) '

4.$ Anchor Point

Some beginners think that the $ match is a line feed "\n" , this is wrong, $ is a 0 wide assertion (zero-width assertion), it is impossible to match to a real character, it can only match one position. The difference I want to make occurs in non-multiline mode: You might think , in non-multiline mode, does the $ match the position after the last character? It's not that simple. In most other languages, if the last character in the target string is a newline, "\n" $ also matches the position before that line break. That is, the two positions on both sides of the line that match the end of the newline character. There are two representations of \z and \z in many languages, and if you know the difference between them, you should understand that in other languages (perl,python,php,java,c# ...), the $ equivalent in non multiline mode Z, and in JavaScript, the $ in non multiline mode is equivalent to \z (matches only the end of the position, regardless of whether the last character is a newline). Ruby is a special case because it defaults to multiline mode, and in multiline mode $ matches the position of the front of each newline and, of course, the line break that might appear at the end. This is also mentioned in the book "The Regular Guide" Yu Yu.

$perl-E ' print ' whatever\n ' =~ s/$/substitution character/rg '//global replacement
whatever replacement character/   /newline character the position in front of replace characters    // The position after the line break is replaced

$js-e ' Print ("whatever\n". Replace (/$/g, "replace character")) '/global substitution
whatever
substitution character    // The position behind the line break is replaced

5. Dot symbols "."

In a regular expression in JavaScript, the dot meta character "." You can match all characters except the four line terminator (\r-carriage return, n-line break, \u2028-row delimiter, \u2029-paragraph separator), and in other common languages, only the swap line characters are excluded \ n.

6. Forward Reference

We all know that in the regular there is a reverse reference (back reference), which refers to a string that has been matched to a previous capturing group in the form of a backslash + number, and is intended to be used again or as a replacement result (\ to $). But there's a special case, If the referenced capture group has not started (the left parenthesis is bounded), what happens when you use a reverse reference. For example, regular /(\2(a)){2}/ , (a) is the second capturing group, but on its left, it uses the \2 that references its matching result, and we know that the positive is matched from left to right, which is the heading forward in this section The origin of (forwards reference) is not a strict concept. So now you think about what the following JavaScript code will return:

Js>/(\2 (a)) {2}/.exec ("AAA")
???

Before you answer this question, look at the performance in other languages. Similarly, in other languages, it is basically ineffective to write:

$echo AAA | grep ' (\2 (a)) {2} '
grep:invalid back reference

$echo AAA | sed-r '/(\2 (a)) {2}/'
sed:-e expression #1, character 12: illegal reference 
   
     $echo AAA | awk '/(\2 (a)) {2}/'

$echo aaa | perl-ne ' Print/(\2 (a)) {2}/'

$echo aaa | ruby-ne ' Print $_ = ~/(\2 (a)) {2}/'

$ Python-c ' Import Re;print re.match ("(\2 (a)) {2}", "AAA") '
None
   

There is no error in awk because AWK does not support this kind of reverse reference, where the \2 is interpreted as a character with an ASCII code of 2. And in Perl Ruby python, I don't know why this design should be all about Perl, but the effect is the same, In this case it is impossible to match the success.

And in JavaScript, not only do not complain, but also match the success, see and you just think the answer is not the same:

Js>/(\2 (a)) {2}/.exec ("AAA")
["AA", "a", "a"]

Prevent you from forgetting the exec method returned by the result of what I said. The first element is a complete matching string, that RegExp["$&"] is, the following is the content of each capture packet match, that is, RegExp.$1 and RegExp.$2. Why can match success, the matching process? My understanding is:

First the first capturing group (the leftmost left parenthesis) is entered. The first valid match is \2, but then the second capturing group (a) is not yet on the wheel, so RegExp.$2 the value is still undefined , so \2 matches an empty character to the left of the first a in the target string, or "position," like ^ Same as the other 0 wide assertions. The point is that the match is successful. Keep Walking, when the second capturing group (a) matches the first a in the target string, RegExp.$2 the value is also assigned to "a", and then the first capturing group ends (the rightmost closing parenthesis), and RegExp.$1 The value is "a". Then the quantifier {2} , which means that, after the first a in the target string, to start a regular (\2(a)) new round of matching, the key point here: is whether RegExp.$2 the value is \2 match or not the first match at the end of the assigned value "a", the answer is: "No", RegExp.$1 and RegExp.$2 The value will be emptied undefined , \1 and \2 will match a null character successfully (equivalent to no effect, write or write) as the first time. The second a in the target string was successfully matched, RegExp.$1 and then RegExp.$2 the value again became "a", RegExp["$&"] The value becomes the complete matching string, the first two a: "AA".

In earlier versions of Firefox (3.6), a new round of quantifier matching did not empty the value of the existing capture groupings, that is to say, in the second match, \2 would match the second A, thus:

Js>/(\2 (a)) {2}/.exec ("AAA")
["AAA", "AA", "a"]

In addition, the end of a capture group depends on whether the closing parenthesis is closed, such as/(A\1) {3}/, although the first capturing group has already started to match when the \1 is used, but it is not yet finished, which is also a forward reference, so the \1 match is still null:

Js>/(a\1) {3}/.exec ("AAA")
["AAA", "a"]

Explain one more example:

Js>/(?:( f) (o) (o) | (b) (a) (r)) */.exec ("Foobar")
["Foobar", Undefined, undefined, undefined, "B", "A", "R"]

* is a quantifier, the first round after the match: $ "f", $ "o", $ for "O", $ undefined,$5 for undefined , $ for undefined .

When the second match begins: The captured value is reset to all undefined .

After the second match: $, $, $, $ " undefined undefined undefined B", $ "a", and $ "r".

& is assigned to "Foobar" and the match ends.

Summarize

The above is the summary of JavaScript and other languages are different from the whole content, I hope the content of this article for everyone's study and work can bring help.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.