GNU strlen source code analysis

Source: Internet
Author: User

Directly operating the string operation functions provided by the C standard library is risky. A slight carelessness may cause memory problems. This week, I wrote a small security string operation library in my spare time. However, after testing, I found that my implementation had major performance defects.
A simple performance comparison was initially made on Solaris. The following figure shows the obtained performance data (taking the strlen data as an example ):
When the length of the input string is 10, run the following command:
Strlen execution time is: 32762 milliseconds
My_strlen execution time is: 491836 milliseconds
When the length of the input string is 20, run the following command:
Strlen execution time is: 35075 milliseconds
My_strlen execution time is: 770397 milliseconds
Obviously, the consumption of strlen in the standard library is less than of my_strlen, and its performance consumption does not increase linearly with the increase of String Length, while that of my_strlen is obviously changed. Presumably, you can also guess that my_strlen adopts the traditional implementation method, that is, it uses byte-byte to determine whether it is '/0', which is also consistent with the test phenomenon. In the spirit of root question, I found the source code implemented by strlen in the C standard library provided by GNU on the Internet. I want to see what skills strlen uses in glibc to achieve such high performance. To be honest, I have been in a relatively elementary position in performance optimization, which will also be a direction for my future efforts.
Download all the glibc code packages. This package is really small. Find strlen. c In the string subdirectory, Which is the source code for strlen implementation used by most UNIX platforms, Linux platforms, and the vast majority of GNU software. This copyCode Written by Torbjorn granlund (also implementing memcpy), Jim blandy and Dan Sahlin provide help and comments. Including comments, there are nearly 130 lines of code in strlen of glibc. You can read it carefully without understanding it. The following is a summary of strlen source code. I will write some understanding about this code later:
1/* return the length of the null-terminated string Str. Scan
2 The null Terminator quickly by testing four bytes at a time .*/
3 size_t strlen (STR) const char * STR;
4 {
5 const char * char_ptr;
6 const unsigned long int * longword_ptr;
7 unsigned long int longword, magic_bits, himagic, lomagic;
8
9/* handle the first few characters by reading one character at a time.
10 do this until char_ptr is aligned on a longword boundary .*/
11
12 For (char_ptr = STR; (unsigned long INT) char_ptr
13 & (sizeof (longword)-1 ))! = 0;
14 ++ char_ptr)
15 if (* char_ptr = '/0 ')
16 return char_ptr-STR;
17
18/* all these elucidatory comments refer to 4-byte longwords,
19 but the theory applies equally well to 8-byte longwords .*/
20
21 longword_ptr = (unsigned long int *) char_ptr;
22
23 himagic = 0x80808080l;
24 lomagic = 0x010101l;
25
26 if (sizeof (longword)> 8)
27 abort ();
28
29/* instead of the traditional loop which tests each character,
30 we will test a longword at a time. The tricky part is testing
31 if * any of the four * bytes in the longword in question are zero .*/
32
33 (;;)
34 {
35 longword = * longword_ptr ++;
36
37 If (longword-lomagic) & himagic )! = 0)
38 {
39/* which of the bytes was the zero? If none of them were, it was
40 A Misfire; continue the search .*/
41
42 const char * CP = (const char *) (longword_ptr-1 );
43
44 If (CP [0] = 0)
45 return CP-STR;
46 If (CP [1] = 0)
47 return CP-str + 1;
48 if (CP [2] = 0)
49 return CP-str + 2;
50 if (CP [3] = 0)
51 return CP-str + 3;
52 If (sizeof (longword)> 4)
53 {
54 if (CP [4] = 0)
55 return CP-str + 4;
56 If (CP [5] = 0)
57 Return CP-str + 5;
58 If (CP [6] = 0)
59 return CP-str + 6;
60 if (CP [7] = 0)
61 Return CP-str + 7;
62}
63}
64}
65}
From the comments of the author starting with this code, we can roughly understand the implementation principle of this strlen: It is to test four bytes at a time to replace the traditional method of testing one byte at a time. If you know the principle, you need to solve two problems:
1) c Standard Library requires good portability and should be able to run correctly in most system architectures. The memory alignment problem needs to be considered every time we take out 4 bytes of comparison (unsigned long INT). The first character address of the input string may not be in the 4-aligned address;
2) how to test the four bytes and find out that one of them is all 0 is a trick.
12 ~ The 21 lines of code solve the first problem:
For (char_ptr = STR; (unsigned long INT) char_ptr
& (Sizeof (longword)-1 ))! = 0;
++ Char_ptr)
If (* char_ptr = '/0 ')
Return char_ptr-STR;
/* All these elucidatory comments refer to 4-byte longwords,
But the theory applies equally well to 8-byte longwords .*/
Longword_ptr = (unsigned long int *) char_ptr;
The author uses a for-loop method to find the address of the first 4-character address in the input string. Since the address has been aligned to 4, the forced transformation in the last line is safe. Although the alignment address can be directly obtained through the round integer formula, considering the possible '/0' in this range, it is inevitable to compare one character to one character. In many strictly aligned architectures (such as Sun's iSCSI Platform), the compiler generally places the string address on the compiler's alignment address. As a result, when strlen is actually executed, for-loop rarely executes one step.
The second problem is solved by a "prerequisite" technique. The author sets two mask variables:
Himagic = 0x80808080l;
Lomagic = 0x010101l;
A conditional expression is used to detect all 0 bytes in four bytes: (longword-lomagic) & himagic )! = 0
We will expand himagic and lomagic by bit:
Himagic 1000 0000 1000 0000 1000 0000 1000
Lomagic 0000 0001 0000 0001 0000 0001 0000 0001
There seems to be no theory to follow for such code, and it should be understood in practice. At first, I constructed a longword that does not contain all 0 bytes. For example:
Longword 1000 0001 1000 0001 1000 0001 1000 0001, after calculating according to that conditional expression, it actually meets! = 0 is there a problem with the author's logic? Later, I thought that this logic had a "precondition. Let's review what strlen does. Is the input parameter arbitrary? Of course not. The value of each character in the input string is within the ASCII code range of [0,127], that is to say, the highest bit of each byte is 0, so longword should look like this:
Longword 0xxx XXXX 0xxx XXXX 0xxx XXXX 0xxx xxxx
Based on the preceding premise, we consider two situations:
When longword does not contain all 0 bytes, for example:
Longword 0000 0001 0000 0001 0000 0001 0000 0001
In this way, after calculation, the value is 0, which does not meet the conditions.
When longword contains all zero bytes, for example:
Longword 0000 0000 0000 0001 0000 0001 0000 0001
In this way, after calculation, the maximum byte bit value must be 1, satisfying! = 0 condition, all 0 bytes are detected. That is to say, once there are all 0 bytes, the potential will generate a bits when lomagic is subtracted. The byte of all 0 will definitely change the highest bit from 0 to 1 after lomagic is subtracted, which is consistent with himagic, it must not be 0. It is detected in this way.
This method is still applicable to the 64-bit platform. The Code summary above omitted special processing for the 64-bit platform, in order to make the code logic clearer and easier to read.

 

From: http://blog.csdn.net/hashmat/article/details/6054046

Expansion: http://blog.csdn.net/dog250/article/details/5302947

Http://blog.csdn.net/dog250/article/details/5302948

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.