Data type integer overflow in C language

Last Update:2017-01-13 Source: Internet

Author: User

Tags coding standards error handling

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What is integer overflow?
I believe everyone is familiar with the integer problem of C language. For integer overflow, there are two types: unsigned integer overflow and signed integer overflow.

For unsigned integer overflow, the C specification is defined-"The number after overflow is calculated using 2 ^ (8 * sizeof (type)." that is to say, if an unsigned char (1 character, 8 bits) overflows, the overflow value and the 256 modulo are calculated. For example:

Unsigned char x = 0xff;
Printf ("% dn", ++ x );
The above code will output: 0 (because 0xff + 1 is 256, it is 0 after modulo with 2 ^ 8)

For signed integer overflow, the standard definition of C is "undefined behavior", that is, how the compiler implements it. For most compilers, what is it. For example:

Signed char x = 0x7f; // Note: 0xff is-1, because the highest bit is 1 or a negative number.
Printf ("% dn", ++ x );
The above code will output:-128, because 0x7f + 0 × 01 gets 0 × 80, that is, the binary 1000, the symbol bit is 1, negative, followed by all 0, is the minimum number of negative values, that is,-128.

In addition, do not think that the signed integer overflow is a negative number. This is not certain. For example:

Signed char x = 0x7f;
Signed char y = 0x05;
Signed char r = x * y;
Printf ("% dn", r );
The above code will output: 123

I believe we will not be unfamiliar with these.

Hazards of integer overflow
The danger of integer overflow is described below.

Example 1: an integer overflow causes an endless loop

......
......
Short len = 0;
......
While (len <MAX_LEN ){
Len + = readFromInput (fd, buf );
Buf + = len;
}
The above code may be the code that many programmers like to write (I have seen it many times). MAX_LEN may be a relatively large integer, such as 32767, we know that the short value is 16 bits and the value range is-32768 to 32767. However, the above while loop code may cause integer overflow, and len is a signed integer, so it may become a negative number, resulting in endless loops.

Example 2: Overflow During plastic transformation

Int copy_something (char * buf, int len)
{
# Deprecision MAX_LEN 256
Char mybuf [MAX_LEN]; </pre>
<Pre> ......
......

If (len> MAX_LEN) {// <---- [1]
Return-1;
}

Return memcpy (mybuf, buf, len );
}
In the above example, the if statement at [1] does not seem to be a problem, but len is a signed int, while memcpy requires a size_t len, that is, an unsigned type. As a result, len will be promoted to unsigned. At this time, if we pass len a negative number, it will pass the if check, but will be upgraded to a positive number in memcpy, so our mybuf is overflow. This will cause the data after the mybuf buffer to be overwritten.

Example 3: allocate memory
A typical example of heap overflow caused by integer overflow is the OpenSSH Challenge-Response SKEY/BSD_AUTH remote buffer overflow vulnerability. The following problematic code is taken from the input_userauth_info_response () function in the auth2-chall.c in OpenSSH code:

Nresp = packet_get_int ();
If (nresp> 0 ){
Response = xmalloc (nresp * sizeof (char *));
For (I = 0; I <nresp; I ++)
Response [I] = packet_get_string (NULL );
}
In the above code, nresp is of the size_t type (size_t is generally unsigned int/long int). This example is an example of data packet decoding. Generally, there is a len in the data packet, data is followed. If we carefully prepare a len, such as 1073741825 (in a 32-bit system, the pointer occupies 4 bytes, and the maximum value of the unsigned int is 0 xffffffff, we only need to provide the value of 0 xffffffff/4 -- 0 × 40000000. Here we set 0 × 4000000 + 1), nresp will read this value, and then nresp * sizeof (char *) the result is 1073741825*4, and the overflow is 0*100000004. Then, evaluate the modulo and obtain 4. Therefore, malloc (4), so the for loop next to 1073741825 times, you can do the loop (after a loop of 0 × 40000001, user data already overwrites the 4-byte space originally allocated by xmalloc and the subsequent data, including program code and function pointers, so you can rewrite the program logic. For more information, see Survey of Protections from Buffer-Overflow Attacks.

Example 4: security problems caused by buffer overflow

Int func (char * buf1, unsigned int len1,
Char * buf2, unsigned int len2)
{
Char mybuf [256];

If (len1 + len2)> 256) {// <--- [1]
Return-1;
}

Memcpy (mybuf, buf1, len1 );
Memcpy (mybuf + len1, buf2, len2 );

Do_some_stuff (mybuf );

Return 0;
}
In the above example, we wanted to copy the content of buf1 and buf2 to mybuf. For example, if len1 + len2 exceeded 256, we made a judgment. However, if len1 + len2 overflows, according to the features of unsigned, it will evaluate the modulo with 2 ^ 32. Therefore, basically, [1] In the above code may be false. (Note: In general, in this case, if you enable the-O code optimization option, all the if statement blocks will be replaced by the if statement blocks-deleted by the compiler.) for example, you can test len1 = 0x104, len2 = 0 xfffffc.

There are many such examples. If the integer overflow problem is critical, especially when it comes to user input, if it is exploited by hackers, this can cause serious security problems.

Compiler behavior
Before talking about how to correctly check integer overflow, let's take a look at something about the compiler. Please do not blame me for being arrogant.

Compiler optimization
How to check whether integer overflow or integer variables are valid is sometimes very troublesome, just like in the fourth example above, the compiled optimization parameter-O/-O2/-O3 will basically assume that your program will not have an integer overflow. It will optimize the overflow checking code in your code.

For compiler optimization, let's take another example here. Suppose we have the following code (and a fairly common code ):

Int len;
Char * data;

If (data + len <data ){
Printf ("invalid lenn ");
Exit (-1 );
}
In the above code, len and data are used together. We are afraid that len's value is invalid or len overflow occurs, so we wrote the if statement to check. This code is normal under the-O parameter. However, under the-O2 compilation option, the entire if statement block is optimized.

You can write a small program, compile it under gcc (my version is 4.4.7, remember to add the-O2 and-g parameters), and then use gdb for debugging, output the assembly using disass/m message, and you will see the following results (you can see that the entire if statement block does not have any assembly code -- it is directly harmonized by the compiler ):

7 int len = 10;
8 char * data = (char *) malloc (len );
0x00000000004004d4 <+ 4>: mov $ 0xa, % edi
0x00000000004004d9 <+ 9>: callq 0x4003b8 <malloc @ plt>

9
10 if (data + len <data ){
11 printf ("invalid lenn ");
12 exit (-1 );
13}
14
15}
0x00000000004004de <+ 14>: add $0x8, % rsp
0x000000000000004004e2 <+ 18>: retq
To solve this problem, you need to convert the char * to uintptr_t or size_t. To put it bluntly, convert the char * to the unsigned data structure, and the if statement block cannot be optimized. As follows:

If (uintptr_t) data + len <(uintptr_t) data ){
......
}
For more information, see section 9899 of ISO/IEC 1999: 8th C specification, the screenshot below shows: (This section defines the pointer +/-an integer action. If it is out of bounds, the action is undefined)

Pay attention to the red line above, saying that if the pointer is in the array range, it will be undefined if it is out of the range, that is to say, it is handed over to the compiler for implementation. What should the compiler do, you may want to optimize it. Here we will focus on the Undefined, a big devil in C language! This is a place where "wild animals exist". You must be careful.

Highlights: Compiler eggs
As mentioned above, the so-called undefined behavior is solely handed over to the compiler for implementation. In gcc 1.17, the undefined behavior also has an egg (see Wikipedia ).

The source code of the eggs played by gcc in the unix release version when the undefined behavior is encountered in the gcc version 1.17 below. We can see that it will try to execute some games NetHack, Rogue or Towers of Hanoi of Emacs. If it cannot be found, it will output an NB error.

Execl ("/usr/games/hack", "# pragma", 0); // try to run the game NetHack
Execl ("/usr/games/rogue", "# pragma", 0); // try to run the game Rogue
// Try to run the Tower's of Hanoi simulation in Emacs.
Execl ("/usr/new/emacs", "-f", "hanoi", "9", "-kill", 0 );
Execl ("/usr/local/emacs", "-f", "hanoi", "9", "-kill", 0); // same as abve
Fatal ("You are in a maze of twisty compiler features, all different ");
Check integer overflow correctly
After reading these behaviors of the compiler, you should understand-"check before integer overflow. Otherwise, it will be too late ".

Let's look at a piece of code:

Void foo (int m, int n)
{
Size_t s = m + n;
.......
}
The above code has two risks: 1) signed to unsigned, 2) integer overflow. You should have seen these two situations in the previous examples. Therefore, do not write any checked code behind the name s = m + n. Otherwise, it will be too late. The undefined behavior will show up-in pure English, it is-"Dragon is here"-you can't control anything. (Note: Some beginners may think that the size_t is unsigned, and the priority m and n will be upgraded to the unsigned int. In fact, this is not the case. m, n, signed int, and m + n are also signed int. Then, convert the result to unsigned int and assign it to s)

For example, the following code is wrong:

Void foo (int m, int n)
{
Size_t s = m + n;
If (m> 0 & n> 0 & (SIZE_MAX-m <n )){
// Error handling...
}
}
In the above code, you should pay attention to the judgment (SIZE_MAX-m <n). Why not use m + n> SIZE_MAX? If m + n overflows, the expression will be truncated, so the expression will not be detected. In addition, in this expression, m and n are upgraded to unsigned.

But the above code is wrong, because:

1) the check is too late, and the undefined behavior of the compiler before if has come out (you don't know what will happen ).

2) As mentioned above, (SIZE_MAX-m <n) may be optimized by the compiler.

3) In addition, SIZE_MAX is the maximum value of size_t, and size_t is 64-bit in a 64-bit system. INT_MAX or UINT_MAX should be used for rigor.

Therefore, the correct code should be as follows:

Void foo (int m, int n)
{
Size_t s = 0;
If (m> 0 & n> 0 & (UINT_MAX-m <n )){
// Error handling...
Return;
}
S = (size_t) m + (size_t) n;
}
In Apple security coding specifications (PDF), the code on page 1 is as follows:

If both n and m are signed int, this code is wrong. The correct one should be like the above example. At least n * m should be given to cast n and m as size_t. N * m may have exceeded and undefined. It makes no sense to convert undefined code to size_t. (If m and n are unsigned int, the above code will only be valid when m and n are size_t.

In any case, the Apple security coding standards are definitely worth reading.

Check for upper overflow and lower overflow
The previous code only determines whether a positive overflow exists, but does not determine whether a negative overflow underflow exists. Let's take a look at how to judge:

For addition, it's okay.

# Include <limits. h>

Void f (signed int si_a, signed int si_ B ){
Signed int sum;
If (si_ B> 0) & (si_a> (INT_MAX-si_ B) |
(Si_ B <0) & (si_a <(INT_MIN-si_ B )))){
/* Handle error */
Return;
}
Sum = si_a + si_ B;
}
Multiplication is complicated (the following code is too exaggerated ):

Void func (signed int si_a, signed int si_ B)
{
Signed int result;
If (si_a> 0) {/* si_a is positive */
If (si_ B> 0) {/* si_a and si_ B are positive */
If (si_a> (INT_MAX/si_ B )){
/* Handle error */
      }
} Else {/* si_a positive, si_ B nonpositive */
If (si_ B <(INT_MIN/si_a )){
/* Handle error */
      }
}/* Si_a positive, si_ B nonpositive */
} Else {/* si_a is nonpositive */
If (si_ B> 0) {/* si_a is nonpositive, si_ B is positive */
If (si_a <(INT_MIN/si_ B )){
/* Handle error */
      }
} Else {/* si_a and si_ B are nonpositive */
If (si_a! = 0) & (si_ B <(INT_MAX/si_a ))){
/* Handle error */
      }
}/* End if si_a and si_ B are nonpositive */
}/* End if si_a is nonpositive */

Result = si_a * si_ B;
}
More security code to prevent integer overflow during operations

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More