Write the regular and don't escape the blind.

Source: Internet
Author: User

In JavaScript, there are two places where backslash escape sequences are used, one in string literals and one in regular literals. In the string literal, the backslash escape sequence is divided into the following forms:

1. \ followed by single quotation mark ('), double quotation mark ("), backslash (\), b, F, N, R, T, v one of the

2. \ followed by a line terminator sequence, the common line terminator sequence has three kinds: carriage return, line feed, carriage return + line wrapping

3. \ followed by 0

4. \ followed by 1 to 3 octal digits

5. \ followed by x followed by two 16 binary digits

6. \ followed by u followed by 4 16 binary digits

7. \ followed by u followed by {1 to any number of 16 binary digits}

8. \ followed by a single character that does not meet all of the above conditions

The first 7 kinds of this article are not discussed, the 8th escape form is actually invalid escape. For example, \o is such an escape, in some compiled languages, such an escape will directly report a compilation error, such as in Java:

In addition, there will be an error in JSON:

Json.parse (String.raw ' "\o" ')//syntaxerror:unexpected token o in JSON at position 2

In scripting languages, there are usually no errors, they have two choices, either not to take \ as an escape character, but as a normal backslash literal, like Python:

>>> "\o" ' \\o ' # len ("\o") is 2

Either discard the \ Drop, leaving only the character that is escaped behind, like JavaScript:

Js> "\o" "O"//"\o". Length is 1

Which is the best way to do it? I know that one of the benefits of retaining backslashes is related to the regular: in some languages that don't have regular literals, or if some people don't use regular literals, or sometimes need to dynamically generate a regular from a string variable, some people forget to double-write backslashes, such as:

"Www\.taobao\.com"//in languages that do not retain backslashes, such as JavaScript, the string generated can be incorrectly matched to another domain name, such as "wwwataobao.com"

There are some languages that will warn the pit:

$ Awk-f ' www\.taobao\.com ' awk:warning:escape sequence ' \. ' treated as plain '. '

What's good about dropping backslashes in other languages I'm not sure, but in JavaScript, I really know one, that is, when you write a string with </script> in an inline <script> tag:

<script>document.wirte ("<script src=foo.js><\/script>")////is actually invalid escape, but not \ words, this </script> will be the end tag for HTML parser errors </script>

The body of this article starts here, and the backslash escape sequence in the regular literal is more complex than the string literal, and is divided into the following forms:

1. \ followed by/

2. \ followed by ^, $, \ 、.、 *, + 、?、 (,), [,], {,}, | One of them

3. \ follow C followed by any one letter

4. \ does not exist in [], followed by B or B

5. \ exists in [], followed by-or b

6. \ followed by D, D, S, S, W, W one of the

7. \ followed by F, N, R, T, v one of the

8. \ followed by 0

9. \ does not exist in [], followed by 1 to 3 decimal digits

10. \ followed by x followed by two 16 binary digits

11. \ followed by u followed by 4 16 binary digits

12. \ followed by u followed by {1 to any number of 16 binary digits}

13. \ followed by a single character that does not meet all of the above conditions

Some of these escaped forms are the same as string literals, others are different, and even some of them look the same, but they are functionally different. We are still focusing on the last case, the invalid escape. Many people can't remember what symbols should be escaped in the regular, which should not be escaped, such as double quotes "in the regular do not need to escape, if you write a/\"/, usually, JavaScript's regular engine will help you to remove:

/^\ "$/.test ('" ')//True

However, if this regex turns on Unicode mode, this will result in a syntax error:

/\ "/u//Syntaxerror:invalid escape

Then you might ask, why do you want to turn on Unicode mode? This is because Unicode mode is more friendly to characters outside of BMP, such as:

"?? Wild Home ". Match (/./g)//["? ","? "," Wild "," Home "]"?? Wild Home ". Match (/./ug)//["?? "," Wild "," home "]

The specific advantages can be seen in the summary of this article, in short, if compatibility is not considered, the default Plus/U is always best practice.

There is also a character that is often incorrectly escaped, which is the hyphen--the hyphen is the metacharacters only inside the brackets, and inside the brackets it needs to be escaped, but if you escape it outside of the brackets, you will also get an error in Unicode mode:

/\-/u//Syntaxerror:invalid escape

Also, starting with Firefox 46 and Chrome 53, the regular in the pattern attribute of the HTML form enforces the use of Unicode mode, such as the pattern property of the input below is invalid. Open the following demo, then open the Developer tool, and then move the mouse pointer over input, you can see the Developer Tool console error message:

<input pattern= "\-" value= "foo" >

Because this change is not backwards compatible, some developers find that the good code they used to run has suddenly been an error:

Escaped @ and% http://stackoverflow.com/questions/36953775/ Firefox-error-unable-to-check-input-because-the-pattern-is-not-a-valid-regexp

Escaped! https://input.mozilla.org/en-US/dashboard/response/5898357

Escaped-http://stackoverflow.com/questions/39895209/html-input-pattern-not-working.

Escaped the ' https://bugs.chromium.org/p/chromium/issues/detail?id=667713

Can see, as long as it is a punctuation mark, some people want to escape, because they are not familiar with the regular, do not know which symbols are meta-characters, before doing nothing, but from now on, not.

Unicode mode is like the strict pattern in the regular, prohibit a lot of bad, easy to lead to the wording of the bug, and then A/U forbidden, and \ Escape related to the wording, that is, the case with non-0 decimal digits behind:

In non-Unicode mode, when \ is followed by a decimal number other than 0, if the corresponding capturing grouping for this number is exactly present, the escape sequence represents the backward reference to that grouping:

/(f) (.) \2/.test ("foo")//True

If the corresponding capturing group does not exist, and the number < 8, then the sequence is treated as an octal escape sequence:

/^\2$/.test ("\2")//True

If the corresponding capturing packet does not exist, and the number >= 8, the backslash is discarded, leaving only the number:

/^\8$/.test ("8")//True

That is, there may be three different interpretations of the same escape notation, a slight inattention will cause a bug, the code is not readable, so Unicode mode is disabled in the latter two cases, \ followed by non-0 decimal digits can only represent the reverse reference of the capturing group, as long as the corresponding capturing group does not exist, the report syntax error:

/\2/u//SyntaxError:  Invalid escape/\8/u//syntaxerror:  Invalid Escape

Summary: This article lists a few in the regular Unicode mode of the incorrect escape form, warns you later in the writing of the regular time can not see the punctuation mark to escape, to the knowledge to be meticulous.

Higher requirements: In fact, the regular Unicode mode is not as strict as I hoped, for example, most of the meta-characters in the regular, in fact, is not a meta-character in brackets, it is not necessary to escape, but even if the Unicode mode is not forbidden to do so:

/[\?\+\*]/u//No error/[?+*]/u//should be written so that readability is better than the above

Write the regular and don't escape the blind.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.