Let's take a look at two pieces of code that detect the byte length of a string, respectively, with a For loop and a regular expression:
The For loop detects a string's byte length method one:
Copy Code code as follows:
var lenfor = function (str) {
var bytelen=0,len=str.length;
if (str) {
for (var i=0; i<len; i++) {
if (Str.charcodeat (i) >255) {
Bytelen + 2;
}
else{
bytelen++;
}
}
return bytelen;
}
else{
return 0;
}
}
How to use
var strlength=lenfor (str)
The For loop detects a string's byte length method two:
Copy Code code as follows:
function LEN (str) {
var i,sum=0;
for (i=0;i<str.length;i++) {
if ((Str.charcodeat (i) >=0) && (Str.charcodeat (i) <=255))
sum=sum+1;
Else
sum=sum+2;
}
return sum;
}
The regular expression detects a string's byte length method three:
Some of the code is more concise, according to the following test, the efficiency is not high, you can use the above function.
Copy Code code as follows:
var lenreg = function (str) {
Return str.replace (/[^\x00-\xff]/g, ' * * '). length;
};
var strlength2=lenreg (str)
I tested the above two functions with the following code snippet, mainly to test the elapsed time:
Copy Code code as follows:
var s = ' ... '//a long string, not listed here
function A () {
var timestart,timeend;
Timestart = new Date ();
var S1 = Lenreg (s);
Timeend = new Date ();
var T1 = (Timeend-timestart) *1000;
Timestart = new Date ();
var s2 = lenfor (s);
Timeend = new Date ();
var t2 = (Timeend-timestart) *1000;
Alert (' Lenreg: ' + S1 + ' time: ' + T1 + ' \nlenfor: ' + s2 + ' time: ' + T2 ');
}
Window.onload = function () {
A ();
};
The above code pops up a warning window when the browser is loaded, and there are two lines of information on the window: the first line is the string byte length and the time used (x1000) detected with the regular expression, and the second line detects the string byte length and the time used (x1000) with the For loop.
I got two answers:
First type:
lenreg:25824 time:20000
lenfor:25824 time:10000
The second type:
lenreg:48795 time:15000
lenfor:48795 time:25000
It should be explained that the string used for the two Tests is the same string.
Why is the difference so big? What did I secretly change?? I mentioned above, "Chinese characters occupy 2 bytes (with Encoding)" (the third paragraph of this article), the number of bytes used in Chinese characters is related to encoding, in general, in GB-2312 and UTF-8 encoding, Chinese characters occupy 2 bytes, but in the iso-8859-1 encoding, Chinese characters occupy 5 bytes.
Yes, the problem is the encoding of the document. The first of these cases is encoded as: Charset=utf-8, and the second case is encoded as charset=iso-8859-1.
In Chinese web pages, we generally do not use charset=iso-8859-1 to encode (Chinese garbled), but with Charset=utf-8 or GB-2312 to encode. The problem is here, compare the first of these cases:
lenreg:25824 time:20000
lenfor:25824 time:10000
As shown above, the time used to detect a regular expression is twice times that of the FOR loop!!!! (In fact, after many tests are not all twice times, but most of the tests are twice times)
Why, then?
Str.replace (/[^\x00-\xff]/g, ' * * '). length;
Look at the statements above (the statements in the Lenreg function). As I understand it, the problem is here. to traverse a string at the time of--replace, and then iterate through the string once the length is called, so the entire operation needs to traverse the two-time string. The For loop only needs to be traversed once--that should be the problem, but I'm not quite sure.
I'm not sure whether the above understanding is accurate, but it should be the case on the surface.
So, does the use of regular expression detection really make the algorithm more complex? Or did the above not take full advantage of regular expressions? Now I have no sense of the idea that needs to be further elaborated. Just keep the doubt, ^_^.