Let's take a look at the two paragraphs. Code They use the for loop and regular expression to detect the length of the string in bytes:
For loop detection string Byte Length Method 1:
Copy code The Code is as follows: var lenfor = function (STR ){
VaR bytelen = 0, Len = Str. length;
If (STR ){
For (VAR I = 0; I <Len; I ++ ){
If (Str. charcodeat (I)> 255 ){
Bytelen + = 2;
}
Else {
Bytelen ++;
}
}
Return bytelen;
}
Else {
Return 0;
}
}
Usage
VaR strlength = lenfor (STR)
For loop detection string Byte Length Method 2:Copy codeThe Code is as follows: function Len (STR ){
VaR I, sum = 0;
For (I = 0; I <Str. length; I ++ ){
If (Str. charcodeat (I) >=0) & (Str. charcodeat (I) <= 255 ))
Sum = sum + 1;
Else
Sum = sum + 2;
}
Return sum;
}
The regular expression is used to check the byte length of a string. Method 3:
The Code is a bit concise. According to the test below, the efficiency is not high. You can use the above functions.Copy codeThe Code is as follows: var lenreg = function (STR ){
Return Str. Replace (/[^ \ x00-\ xFF]/g, '**'). length;
};
VaR strlengh2 = lenreg (STR)
I use the following code snippet to test the above two functions, mainly to test their running time:Copy codeThe Code is as follows: var S = '...'; // a long string, which is not listed here
Function (){
VaR timestart, timeend;
Timestart = new date ();
VaR S1 = lenreg (s );
Timeend = new date ();
VaR T1 = (timeend-timestart) * 1000;
Timestart = new date ();
VaR S2 = lenfor (s );
Timeend = new date ();
VaR t2 = (timeend-timestart) * 1000;
Alert ('lenreg: '+ S1 + 'time:' + t1 + '\ nlenfor:' + S2 + 'time: '+ T2 );
}
Window. onload = function (){
A ();
};
When the above Code is loaded in the browser, a warning window is displayed. There are two lines of information in the window: the first line is the length and time (× 1000) of the String Detected by regular expressions ); the second line uses the for loop to detect the length and time of the string in bytes (× 1000 ).
I get two answers:
First:
Lenreg: 25824, time: 20000
Lenfor: 25824 time: 10000
Second:
Lenreg: 48795, time: 15000
Lenfor: 48795 time: 25000
Note that the strings used for the two tests are the same string.
Why is the difference so big? What did I secretly change ?? As mentioned above, "Chinese characters occupy 2 bytes (related to encoding)" (the third section in this article). The number of bytes occupied by Chinese characters is related to encoding. Generally, in GB-2312 and UTF-8 encoding, Chinese characters occupy 2 bytes, but in iso-8859-1 encoding, Chinese characters occupy 5 bytes.
Yes, the problem is the document encoding. The encoding of the first case is charset = UTF-8, And the encoding of the second case is charset = iso-8859-1.
In Chinese Web pages, we generally do not use charset = iso-8859-1 encoding (Chinese garbled), but with charset = UTF-8 or GB-2312 encoding. The problem is here. Let's compare the first case above:
Lenreg: 25824, time: 20000
Lenfor: 25824 time: 10000
As shown in the preceding figure, the regular expression is used to detect two times of the for loop !!!! (In fact, not all tests are double after multiple tests, but most tests are double)
Why?
Str. Replace (/[^ \ x00-\ xFF]/g, '**'). length;
Take a look at the preceding statements (statements in the lenreg function ). In my personal understanding, the problem occurs here -- replace needs to traverse the string once and traverse the string again when length is called. Therefore, the entire operation needs to traverse the string twice. The for loop only needs to be traversed once-this should be the problem, but I am not very sure.
I'm not sure whether the above understanding is correct, but the analysis should be like this on the surface.
Then, use a regular expression to detectAlgorithmMore complex? Or did the above fail to take full advantage of regular expressions? I have no idea yet, so I need to further think about it. Keep in doubt. ^_^ ......