Some details
For most languages and tools using the traditional NFA engine, such as Java and. NET, the matching range of "." is to match any character except the linefeed "\ n.
However, javascript is somewhat special, because the parsing engines of different browsers are different, "The matching range is also different. For the browser of the Trident kernel, such as IE,". "It also matches any character except the line break" \ n ", but for browsers with other kernels, such as Firefox, Opera, and Chrome,". "is to match any character except the carriage return" \ r "and the linefeed" \ n.
Some guesses about this detail
Copy codeThe Code is as follows:
# <Script type = "text/javascript">
# Document. write (//./. test ("\ r") + "<br/> ");
# Document. write (//./. test ("\ n") + "<br/> ");
# </Script>
# // Output in IE
# True
# False
# // Output in Firefox, Opera, and Chrome
# False
# False
After a rough test, Trident, Presto, and Gecko should all adopt the traditional NFA engine, while webkit at least supports the traditional NFA engine, but it is not the same as the traditional NFA engine, it is estimated that it is not a traditional NFA engine with advanced optimization, or a DFA/NFA hybrid engine.
Windows supports "\ r" and "\ n", while UNIX only supports "\ n". Therefore, I guess other browser engines are not from Windows, therefore, no support for "\ r" is provided, resulting in ". "does not match" \ r. I did not do any in-depth research, but I just had some guesses.
Common application mistakes
Note:
Do not try to use "[. \ n] "to match any character. This method represents only one of the decimal point and line break. You can use" (. | \ n) ", but it is generally not used in this way, so the write is less readable and less efficient. Generally," [\ s \ S] "is used, or". add (? S) matching mode to achieve this effect.
Example
Requirement: match the content in the <td> tag
Source string: <td> This is a test line.
Another line. </td>
Matching result: <td> This is a test line.
Another line. </td>
Regular Expression 1: <td> [\ s \ S] * </td>
Regular Expression 2 :(? S) <td>. * </td>
Matching Efficiency Test
The following is the test string, that is, the content entered in richTextBox1.Text (from the CSDN homepage ):
Copy codeThe Code is as follows:
<Link href = "images/favicon. ico" rel = "shortcut icon"/>
<Title> CSDN. NET-the leading IT technology community in China, providing the most comprehensive information dissemination and service platform for IT professionals </title>
<Script language = 'javascript 'Type = 'text/JavaScript 'src = 'HTTP: // www.csdn.net/ggmm/csdn_ggmm.js'> </script> <script type = "text/javascript" src = "http://counter.csdn.net/a/js/AreaCounter.js%22%3E%3C/script>
<Script type = "text/javascript">
Test code:
Copy codeThe Code is as follows:
# String yourStr = richTextBox1.Text;
# StringBuilder src = new StringBuilder (4096 );
# For (int I = 0; I <10000; I ++)
#{
# Src. Append (yourStr );
#}
# String strData = src. ToString ();
# List <Regex> reg = new List <Regex> ();
# Reg. Add (new Regex (@ "[\ s \ S]");
# Reg. Add (new Regex (@ "[\ w \ W]");
# Reg. Add (new Regex (@ "[\ d \ D]");
# Reg. Add (new Regex (@ "(. | \ n )"));
# Reg. Add (new Regex (@"(? S )."));
# String test = string. Empty;
# Stopwatch stopW = new Stopwatch ();
# Foreach (Regex re in reg)
#{
# StopW. Reset ();
# StopW. Start ();
# Test = strData;
# Test = re. Replace (test ,"");
# StopW. Stop ();
# RichTextBox2.Text + = "regular expression:" + re. ToString (). PadRight (10) + "execution time:" + stopW. ElapsedMilliseconds. ToString () + "ms ";
# RichTextBox2.Text + = "\ n ----------------------------------------- \ n ";
#}
Test results:
The test is divided into two groups. The memory usage before program execution is 921 MB.
One group uses no quantifiers. Only one character is replaced at a time. The execution time is as follows, and the memory usage is 938 MB.
Copy codeThe Code is as follows:
Regular Expression: [\ s \ S] execution time: 2651 MS
---------------------------------------
Regular Expression: [\ w \ W] execution time: 2515 MS
---------------------------------------
Regular Expression: [\ d \ D] execution time: 2187 MS
---------------------------------------
Regular Expression :(. | \ n) execution time: 2470 MS
---------------------------------------
Regular Expression :(? S). execution time: 1969 MS
The other group uses quantifiers to replace all characters at a time. The execution time is as follows, occupying 1128 MB of memory.
Copy codeThe Code is as follows:
Test results (with quantifiers)
Regular Expression: [\ s \ S] + execution time: 249 MS
---------------------------------------
Regular Expression: [\ w \ W] + execution time: 348 MS
---------------------------------------
Regular Expression: [\ d \ D] + execution time: 198 MS
---------------------------------------
Regular Expression :(. | \ n) + execution time: 879 MS
---------------------------------------
Regular Expression :(? S). + execution time: 113 MS
---------------------------------------
Test result analysis:
The most efficient matching mode is Singleline.
The second is "[\ d \ D]", while "(. | \ n)" has the lowest matching efficiency.
The matching efficiency of "[\ s \ S]" is in the center, but it is used more often.
Note: Because different languages support different engines, even if the same engine is used, regular expressions are also optimized. Therefore, the above performance test conclusion may only apply to. NET.