1 Why is The gets () function still in our code?
Okay, it finally happened. We have encountered a very serious and common buffer overflow problem. This problem has a huge impact. The process of fixing this problem will be very difficult, slow, and costly. In my opinion, there may be many software product managers in this world asking programmers: "Why have you never warned me ?", It is estimated that many of these asked programmers will directly answer: "I warned you. Why didn't you listen? "
There is always a conflict in the process of software development: correct solution and quick solution. This problem is even more prominent in the security field. So in the next few weeks, let's talk about this conflict. The following two contradictions are important in our discussion:
No matter how perfect your solution is, it will be useless if no one uses this solution.
No matter what the purpose is, if you do not use the perfect solution, all the considerations will be wasted. Because this solution is not implemented in your code
Let's start with this seemingly cheesy example: The gets () function in the C standard library. The function is defined as follows:
Char * gets (char * str );
The gets () function has only one pointer. It reads characters from the standard input stream to a contiguous memory address space. The starting position of this address space is the position pointed to by the pointer str. When an EOF or linefeed is encountered in the input stream, the read operation ends. When a linefeed (n) is read, the character is not placed in the contiguous address space. At the end of reading, gets () automatically appends a NULL character to the end of the memory space. After these operations, the programmer obtains the C string that is input from the standard and ends with a NULL character. If the reading of the primary stream is a whole line, the line break at the end of the line will be removed.
This function is convenient and has limitations. C programmers often use it to read standard input. The following code is a typical application scenario:
The code is as follows: |
Copy code |
Char input [100]; Printf ("Yes or no? N "); Gets (input ); /* And so on... */ |
Over the past 30 years, many colleagues in the C programming community have realized that the gets () function is insecure and cannot be improved without changing the interface. The reason is also intuitive. This function only has one pointer as a parameter, and the pointer points to the memory space to save the read data. However, the gets () function does not know how much memory it needs. If you read long enough characters in the standard input that do not contain line breaks, the gets () function will certainly overwrite the specified memory area, and programmers can't do anything about it.
In addition to the lack of security for the gets () function, there is also a problem with its partner fgets. The prototype of this function is as follows:
The code is as follows: |
Copy code |
Char * fgets (char * str, int num, FILE * stream ); |
Str is a pointer pointing to a memory area. The read data is stored in this memory space. Num is an integer that specifies the size of the memory space. stream is a file pointer that specifies where to read data. You may first look at the past, just like me at the time, and think that the previous piece of insecure code can be rewritten using the fgets () function to avoid the problem of buffer overflow.
The code is as follows: |
Copy code |
Char input [100]; Printf ("Yes or no? N "); Fgets (input, 100, stdin ); /* And so on... */ |
However, the gets () function is different from the fgets () function. The fgets () function will stop when a line break is encountered, and the data saved to the memory will contain the line break, while the gets () function will exclude the line break. Therefore, simply rewriting the code cannot implement exactly the same functions. To ensure code security and implement identical functions, we need to check the characters in the memory address. If there is a line break at the end, delete it.
So we can get the following code by shoot our head. This code is safe and can ensure the same behavior as the gets () function.
The code is as follows: |
Copy code |
/* This code doesn' t work! */ Char input [100]; Printf ("Yes or no? N "); Fgets (input, 100, stdin ); Char * last = input + strlen (input)-1; If (* last = 'n ') * Last = ''; /* And so on... */ |
However, although the code becomes more complex, there is still a hidden problem that may cause program crash or security risks. When the program is executed, if the standard input stream has obtained all available characters, but there is no file terminator (EOF), fgets () the function will directly return a NULL string by marking input [0] as NULL characters. In this case, the strlen (intput) return value is 0, so the last pointer points to the character before the input array. The behavior of this code cannot be determined because it cannot determine what the character is.
Make a small exercise. Please fix the code yourself. Click here to view the solution
In a company I used to work in, a former manager was a very sensitive person and asked the gets () function to be removed from all local C libraries. With this requirement, we often need to rewrite the code obtained from other places. So it is not surprising that we have the following conversation.
A: the code you sent me. Have you read it? We need to rewrite some of the code and remove the call to The gets () function.
B: Why cannot the gets () function appear in the code?
A: <long term explanation> 5421 words are ignored here.
B: Ha, interesting
A: If you need it, we are happy to send you the modified code.
B: Okay. I 'd like to. Send it to me. But what I can tell you now is that we cannot do anything for the time being, because we can only modify the code when the customer finds and reports the problem.
Although the gets () function has long been recognized as insecure, it still exists in the C89 and C99 standards and is finally removed from the C2011 standard. However, this is only the removal of language standards. When I checked some of my code, I found that it was still used. What's more interesting with my current understanding of C is that there is no safe and convenient method to replace the gets () function in the C Language Library.
Can anyone who has read this article answer the following questions:
Before reading this article, do you know that the gets () function is insecure?
Do you have any restrictions on using the gets () function in your work?
Have you ever written code to avoid using the gets () function?
What do you want to know about the gets () function?
Please continue with this discussion next week.
2 Practice: How to solve the security problem of the gets () function
2.1 toolchain security warning
By default, GCC generates a warning message for the code containing the call to The gets () function.
For example, the following code:
The code is as follows: |
Copy code |
# Include <stdio. h> Int main (void) { Char c [5]; Gets (c ); Puts (c ); } |
The following prompt message is displayed:
The code is as follows: |
Copy code |
Gets_warn.c :(. text + 0xd): warning: the 'gets' function is dangerous and shoshould not be used. |
2.2 secure gets () implementation
In the C11 standard (ISO/IEC 9899: 201x), The gets () function is deleted and a new function gets_s () is introduced ().
C11 K.3.5.4.1 The gets_s function
The code is as follows: |
Copy code |
# Define _ STDC_WANT_LIB_EXT1 _ 1 # Include <stdio. h> Char * gets_s (char * s, rsize_t n ); |
Currently, this standard has not been fully implemented in GCC, so the gets_s () function is not included in the current GNU tool chain. Gets_s is not supported in Clang.
Therefore, the most common practice may be to implement one by yourself. The following is an implementation method:
The code is as follows: |
Copy code |
Char * gets_s (char * str, int num) { If (fgets (str, int, stdin )! = 0) { Size_t len = strlen (str ); If (len> 0 & buffer [len-1] = 'n ') Buffer [len-1] = ''; Return buffer; } Return 0; } |
2.3 Other functions in the C standard library with security risks
Except for very insecure functions such as gets () functions. In C, the lack of array out-of-bounds check and extensive use of pointers lead to improper use of many functions, which are easy to be exploited by hackers and pose security risks.
Strcpy: strncpy is recommended.
Strcat: strncat is recommended.
Sprintf: We recommend that you use snprintf.
If you want to implement some string operation functions by yourself, the following interface design is recommended. You must specify the size of the target address space:
Size_t foobar (char * dest, size_t buf_size,/* operands here */)
Microsoft provides suggestions on how to securely use the C language standard library interface in MSDN. If you are interested, please refer to https://msdn.microsoft.com/en-us/library/bb288454.aspx.