Decimal point based on regular expressions

Last Update:2013-10-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Some details
For most languages and tools using the traditional NFA engine, such as Java and. NET, the matching range of "." is to match any character except the linefeed "\ n.
However, javascript is somewhat special, because the parsing engines of different browsers are different, "The matching range is also different. For the browser of the Trident kernel, such as IE,". "It also matches any character except the line break" \ n ", but for browsers with other kernels, such as Firefox, Opera, and Chrome,". "is to match any character except the carriage return" \ r "and the linefeed" \ n.

Some guesses about this detail
Copy codeThe Code is as follows:
# <Script type = "text/javascript">
# Document. write (//./. test ("\ r") + "<br/> ");
# Document. write (//./. test ("\ n") + "<br/> ");
# </Script>
# // Output in IE
# True
# False
# // Output in Firefox, Opera, and Chrome
# False
# False

After a rough test, Trident, Presto, and Gecko should all adopt the traditional NFA engine, while webkit at least supports the traditional NFA engine, but it is not the same as the traditional NFA engine, it is estimated that it is not a traditional NFA engine with advanced optimization, or a DFA/NFA hybrid engine.
Windows supports "\ r" and "\ n", while UNIX only supports "\ n". Therefore, I guess other browser engines are not from Windows, therefore, no support for "\ r" is provided, resulting in ". "does not match" \ r. I did not do any in-depth research, but I just had some guesses.
Common application mistakes
Note:
Do not try to use "[. \ n] "to match any character. This method represents only one of the decimal point and line break. You can use" (. | \ n) ", but it is generally not used in this way, so the write is less readable and less efficient. Generally," [\ s \ S] "is used, or". add (? S) matching mode to achieve this effect.

Example
Requirement: match the content in the <td> tag
Source string: <td> This is a test line.
Another line. </td>
Matching result: <td> This is a test line.
Another line. </td>
Regular Expression 1: <td> [\ s \ S] * </td>
Regular Expression 2 :(? S) <td>. * </td>
Matching Efficiency Test
The following is the test string, that is, the content entered in richTextBox1.Text (from the CSDN homepage ):
Copy codeThe Code is as follows:
<Link href = "images/favicon. ico" rel = "shortcut icon"/>
<Title> CSDN. NET-the leading IT technology community in China, providing the most comprehensive information dissemination and service platform for IT professionals </title>
<Script language = 'javascript 'Type = 'text/JavaScript 'src = 'HTTP: // www.csdn.net/ggmm/csdn_ggmm.js'> </script> <script type = "text/javascript" src = "http://counter.csdn.net/a/js/AreaCounter.js%22%3E%3C/script>
<Script type = "text/javascript">

Test code:
Copy codeThe Code is as follows:
# String yourStr = richTextBox1.Text;
# StringBuilder src = new StringBuilder (4096 );
# For (int I = 0; I <10000; I ++)
#{
# Src. Append (yourStr );
#}
# String strData = src. ToString ();
# List <Regex> reg = new List <Regex> ();
# Reg. Add (new Regex (@ "[\ s \ S]");
# Reg. Add (new Regex (@ "[\ w \ W]");
# Reg. Add (new Regex (@ "[\ d \ D]");
# Reg. Add (new Regex (@ "(. | \ n )"));
# Reg. Add (new Regex (@"(? S )."));
# String test = string. Empty;
# Stopwatch stopW = new Stopwatch ();
# Foreach (Regex re in reg)
#{
# StopW. Reset ();
# StopW. Start ();
# Test = strData;
# Test = re. Replace (test ,"");
# StopW. Stop ();
# RichTextBox2.Text + = "regular expression:" + re. ToString (). PadRight (10) + "execution time:" + stopW. ElapsedMilliseconds. ToString () + "ms ";
# RichTextBox2.Text + = "\ n ----------------------------------------- \ n ";
#}

Test results:
The test is divided into two groups. The memory usage before program execution is 921 MB.
One group uses no quantifiers. Only one character is replaced at a time. The execution time is as follows, and the memory usage is 938 MB.
Copy codeThe Code is as follows:
Regular Expression: [\ s \ S] execution time: 2651 MS
---------------------------------------
Regular Expression: [\ w \ W] execution time: 2515 MS
---------------------------------------
Regular Expression: [\ d \ D] execution time: 2187 MS
---------------------------------------
Regular Expression :(. | \ n) execution time: 2470 MS
---------------------------------------
Regular Expression :(? S). execution time: 1969 MS

The other group uses quantifiers to replace all characters at a time. The execution time is as follows, occupying 1128 MB of memory.
Copy codeThe Code is as follows:
Test results (with quantifiers)
Regular Expression: [\ s \ S] + execution time: 249 MS
---------------------------------------
Regular Expression: [\ w \ W] + execution time: 348 MS
---------------------------------------
Regular Expression: [\ d \ D] + execution time: 198 MS
---------------------------------------
Regular Expression :(. | \ n) + execution time: 879 MS
---------------------------------------
Regular Expression :(? S). + execution time: 113 MS
---------------------------------------

Test result analysis:
The most efficient matching mode is Singleline.
The second is "[\ d \ D]", while "(. | \ n)" has the lowest matching efficiency.
The matching efficiency of "[\ s \ S]" is in the center, but it is used more often.

Note: Because different languages support different engines, even if the same engine is used, regular expressions are also optimized. Therefore, the above performance test conclusion may only apply to. NET.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Decimal point based on regular expressions

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support