View the performance of js Regular Expressions by using trim prototype functions

Last Update:2013-10-16 Source: Internet

Author: User

Tags net regex expression engine

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Generally, the regular expression is as follows:

[Ctrl + A select all Note: If you need to introduce external Js, You need to refresh it to execute]
If you encounter a variable-length string of big data, you will find that this is resource-consuming. Efficiency is not high, and sometimes it cannot be tolerated.
<! Doctype html public "-// W3C // dtd html 4.01 // EN" "http://www.w3.org/TR/html4/strict.dtd"> <ptml> <pead> <meta http-equiv = "Content-Type" content = "text/html; charset = UTF-8 "/> <title> </title> <meta http-equiv =" Pragma "content =" no-cache "/> <meta http-equiv =" Cache- control "content =" no-cache "/> <meta http-equiv =" Expires "content =" 0 "/> <meta http-equiv =" ImageToolbar "content =" no "/> <style type =" text/css "tit Le = "default" media = "screen">/* <! [CDATA [* // *]> */</style> </pead> <body> <pre class = "code"> enter enough space or tab character. </Textarea> </body> </ptml>
[Ctrl + A select all Note: If you need to introduce external Js, You need to refresh it to execute]
When explaining this reason, I think of the previous descriptions in the master regular expression. The NFA and DFA engines are different. Js/perl/php/java/. net are both NFA engines.
The difference between the DFA and NFA mechanisms has five impacts:
1. DFA only needs to scan each character in a text string once, which is faster, but has fewer features. NFA needs to overwrite and vomit characters, which is slow, but has rich features, therefore, it is widely used. Today's major regular expression engines, such as Perl, Ruby, and Python re modules, Java, and. NET regex library, all of which are NFA.
2. Only NFA supports features such as lazy and backreference;
3. NFA is eager to offer rewards. Therefore, the leftmost subregularizedregular expression is matched first, so the best matching result is occasionally missed. DFA is "the longest left subregularizedregular expression is matched first ".
4. NFA uses greedy quantifiers by default (/. * //,/\ w +/. This pattern repeats n times and is greedy to match as many characters as possible until it cannot be stopped ), NFA matches quantifiers first.
5. NFA may fall into the trap of recursive calling and has poor performance.

Backtracking)
When NFA finds itself eating too much, it spams back one by one and finds matching. This process is called backtracking. Because of this process, in the NFA matching process, especially in the preparation of unreasonable Regular Expression matching, the text is repeatedly scanned, and the efficiency loss is not small. Understanding this truth is helpful for writing efficient regular expressions.

Locate/analyze the cause
When interpreting the trim prototype method above. After testing, let alone whether the results are correct. There are several ways to resolve the number of retries of the js nfa engine.
A. Remove the specified quantifiers and change them
Copy codeThe Code is as follows:
String. prototype. trim = function (){
Return this. replace (/^ [\ s \ t] + | [\ s \ t] $/g ,'');
}

B. Remove the string tail match. Change:
Copy codeThe Code is as follows:
String. prototype. trim = function (){
Return this. replace (/^ [\ s \ t] +/g ,'');
}

C. Add multi-row matching. Change:
Copy codeThe Code is as follows:
String. prototype. trim = function (){
Return this. replace (/^ [\ s \ t] + | [\ s \ t] + $/mg ,'');
}

From the above three methods combined with the NFA data at the beginning of the article, we can probably know the cause of trim performance problems
The quantifiers are matched first.
At the end of the keyword limit, the Regular Expression Engine of JS may keep moving back and forth, and there is a trap of recursion. The depth of recursion is too deep. If the string is larger, Stack Overflow may occur.
Since multiple rows can be matched, and the performance consumption is not large. There is no performance problem. From the perspective of a person who writes this regular program, there are much more empty strings to be replaced by multiple rows than a single row. So the second conclusion should be correct.
Improvement
First, it is not efficient to determine the start regular expression of matching strings. When the matching ends, performance problems may occur. Therefore, you can use regular expressions and traditional expressions to improve the trim performance.
For example:

[Ctrl + A select all Note: If you need to introduce external Js, You need to refresh it to execute]

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More