Unicode matching special characters for Regular Expressions

Source: Internet
Author: User

Unicode matching special characters for Regular Expressions

First, declare that all the code in this article is run under ES6 and ES5 needs to be modified before it can run. However, this article does not involve many new ES6 features, and v8 does not support u modifiers, the final implementation is basically the code written with ES5 knowledge.

At first, I just wanted to record the regular expression to use unicode to match special characters. It was written with a message saying that v8 does not support u modifiers, in turn to study how to convert the string to the UTF-16 format, in the study of how to convert the process found that ES5 regular for unicode encoding unit> 0x10000 strings are not supported, the conversion of strings greater than 0x10000 is now realized.

I have encountered a need for a practical regular expression to match special characters, for example, a piece of text 'AB * cd $ hi me] \ nseg $ me * ntfault \ nhello, world ', you can use * or $ to separate strings.

In javascript, $ and * are pre-defined special characters and cannot be directly written in regular expressions. Instead, they must be escaped and written as/\ $/or /\*/.

We need to write a regular expression based on the user's choice and encapsulate it into a function:

Copy codeThe Code is as follows:
Function reg (input ){
Return new RegExp ('\\$ {input }')
}

This method looks pretty good at the beginning. After escaping all the characters, some special characters can be matched. However, the reality is cruel: when a user inputs a character of the n or t type, the returned regular expression is/\ n/or/\ t/. It matches all the tabs, which violates the user's original intention.

There is usually a writing method that lists all the special characters that need to be escaped and then matches them one by one. This writing method is very effort-consuming, in addition, there may be missing matching because no special characters are found.

At this time, unicode was unveiled. In JavaScript, we can also use unicode to represent a character. For example, 'A' can be written as '\ u {61 }', you can also write it as '\ u {4f60 }'.

For more information about unicode, see Unicode and JavaScript.

ES5 provides the charCodeAt () method to return the Unicode value of the character at the specified index, except for the Unicode encoding unit greater than 0x10000, in ES2015, a new method codePointAt () is added to return a value greater than 0x10000. The returned value is in decimal format. In this case, convert toString (16) to hexadecimal notation.

The encapsulated functions are as follows:

Copy codeThe Code is as follows:
Function toUnicode (s ){
Return '\ u {$ {s. codePointAt (). toString (16 )}}'
}
ToUnicode ('$')-> '\ u {24 }'

Re-encapsulate the reg function

Copy codeThe Code is as follows:
Function reg (input ){
Return new RegExp ('$ {toUnicode (input)}', 'U ')
}

But unfortunately, V8 does not support the RegExp u modifier. If V8 is supported, it should end here. It doesn't matter. Here, we only provide the idea of escaping special characters in unicode mode.

Although v8 does not support u modifiers, as a code farmer in pursuit, we cannot stop here, and we can continue to improve it using other methods.

Function toUnicode (s) {var a = '\ u $ {utf (s. charCodeAt (0 ). toString (16)} 'if (s. charCodeAt (1) a = '$ {a} \ u $ {utf (s. charCodeAt (1 ). toString (16)} 'Return a} function utf (s) {return Array. from ('00 '). concat (Array. from (s )). slice (-4 ). join ('')} // use var here instead of let declaration, this is because the code is copied directly to the chrome console to view the execution result // test it // toUnicode ('A ') --> "\ u0061" // toUnitcode ('��') --> "\ ud842 \ udfb7" function reg (input) {return new RegExp ('$ {toUnicode (input)}')} // test reg ('$ '). test ('$') --> true

The above content is the Unicode matching special character of the regular expression shared by the Helper house

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.