Guidelines for undefined behaviors in C and C ++, part 1

Source: Internet
Author: User
Tags valgrind

Author: John Regehr

Original: http://blog.regehr.org/archives/226

When a tool such as GCC, purify, and valgrind appears for the first time, it runs any UNIX application under them.ProgramIt is interesting. The output of the checker shows that these applications, even though working well, execute a large number of Memory Security errors, such as using uninitialized data and out-of-bounds array access. Running only grep or whatever causes dozens or hundreds of these errors.

What happened? Basically, the accompanying properties of the C/Unix execution environment make these errors (usually) mild. For example, blocks returned by malloc () usually contain Padding Bytes before and/or after them. These padding bytes can be stored out of bounds as long as they are not too far away from the allocation area. Are these bugs worth eliminating? Yes. First, an execution environment with different attributes, for example, a malloc () that provides a system to reduce the number of bytes to be embedded, will put a mild near-loss (near-miss) array write becomes a dangerous heap crash bug. Second, under different circumstances, the same moderate bug may even lead to a crash or crash in the same execution environment. Developers often find that these types of arguments are convincing, and most UNIX programs are currently relatively valgrind-purified.

The tool used to search for integer undefined behaviors is not as mature as the memory insecure checker. The bad Integer Behavior in C and C ++ includes the overflow of the number of symbols, Division by 0, and offset exceeding the bit width. In recent years, these have become more serious problems because:

    • Integer defects are a source of serious security problems.
    • Using Integer undefined behaviors to generate efficientCodeC compiler has become radical.

Recently, my student Peng Li has implemented an integer undefined behavior check tool. Using it, we found that many programs contain these bugs. For example, more than half of the specint2006 benchmark tests were executed or the integer or undefined. Today, in many ways, integer bugs are similar to memory bugs around 1995. More specifically, the integer check tool does exist, but it seems that they are not widely used, and many of them work on Mechanism 2 files, too late. Before the compiler has the opportunity to use undefined behaviors, you must take a look.Source code-- Then eliminate it.

The remaining part of this post is discussed in llvm: a medium size (~ 800 kloc) Several integers in the open-source C ++ code library are undefined. Of course, I am not picky about llvm here: it is a very high quality code. The idea is to take a look at some hidden problems that have not been detected in this well-tested code. We are expected to learn to avoid writing these bugs in the future.

As an unpurposeful annotation (asa random note), if we regard llvm code as C ++ 0x rather than C ++ 98, there will be a lot of undefined behaviors related to the offset. In subsequent posts, I will talk about the new offset limit (which is equivalent to the limit in c99 ).

I sorted out the output of the tool to improve readability.

Integer Overflow #1

Error message:

 
Undefined at <bitcodewriter. cpp, (740: 29)>:
 
Operator :-
 
Reason: signed subtraction Overflow
 
Left (int64): 0
 
Right (int64):-9223372036854775808.

Code:

 
Int64_t v = IV-> getsextvalue ();
 
If (V> = 0)
 
Record. push_back (v <1 );
 
Else
 
Record. push_back (-v <1) | 1); <----- bad line

In all modern C/C ++ variants running on binary encoding machines, the bitvalue is int_min (or in this case, int64_min) int is undefined. In this case, add an explicit check.

Will the compiler take advantage of this undefined behavior? They will:

 
[Regehr @ Gamow ~] $ Cat negate. c
 
Int Foo (int x) _ attribute _ (noinline ));
 
Int Foo (int x)
 
{
 
If (x <0) x =-X;
 
Return x> = 0;
 
}
 
 
# Include <limits. h>
 
# Include <stdio. h>
 
 
 
Int main (void)
 
{
 
Printf ("% d \ n",-int_min );
 
Printf ("% d \ n", Foo (int_min ));
 
Return 0;
 
}
 
[Regehr @ Gamow ~] $ Gcc-O2 negate. C-o negate
 
Negate. C: In function 'main ':
 
Negate. C: 13: 19: Warning: Integer Overflow in expression [-woverflow]
 
[Regehr @ Gamow ~] $./Negate
 
-2147483648
1

In the idea that the C compiler has a conflict,-int_min is both negative and non-negative. If the first real AI is written in C or C ++, I think it will immediately deduce that freedom is slavery, love is hate, and peace is war.

Integer Overflow #2

Error message:

 
Undefined at <initpreprocessor. cpp, (173: 39)>:
 
Operator :-
 
Reason: signed subtraction Overflow
 
Left (int64):-9223372036854775808.
 
Right (int64): 1

Code:

 
Maxval = (1ll <(typewidth-1)-1;

In C/C ++, it is illegal to calculate the largest signed integer like this. There is a better way, such as creating a vector of all 1 and then clearing the high bit.

Integer Overflow #3

Error message:

 
Undefined at <targetdata. cpp, (629: 28)>:
 
Operator :*
Reason: signed multiplication Overflow
 
Left (int64): 142998016075267841
 
Right (int64): 129

Code:

 
Result + = arrayidx * (int64_t) gettypeallocsize (TY );

The allocated size seems reasonable, but for any imaginable array, the array index exceeds the boundary.

Offset exceeds Bit Width #1

Error message:

 
Undefined at <instcombinecils. cpp, ()>:
 
Operator: <
 
Reason: Unsigned left shift error: right operand is negative or is greater than or equal to the width of the promoted left operand
 
Left (uint32): 1
 
Right (uint32): 63

Code:

Unsigned align = 1u <STD: min (bitwidth-1, trailz );

This is completely a bug: bitwidth is set to 64, but it should be 32.

Offset exceeds the Bit Width #2

Error message:

 
Undefined at <instructions. h, (233: 15)>:
 
Operator: <
 
Reason: signed left shift error: right operand is negative or is greater than or equal to the width of the promoted left operand
 
Left (int32): 1
 
Right (int32): 32

Code:

 
Return (1 <(getsubclassdatafrominstruction ()> 1)> 1;

When getsubclassdatafrominstruction () returns a value in the range of 128-131, the value of the right argument shifted to 32Getsubclassdatafrominstruction ()> 2). Moving this bit width or greater is an error. Therefore, this function requires getsubclassdatafrominstruction () to return a value not greater than 127.

Conclusion

It makes some programs wrong, but does not give developers any way to tell their code whether to execute these actions. If it is where it is, it is basically evil. One of the design points of C is "Trust programmers ". This is good, but there is confidence and then trust (there 'strust and then there's trust ). I mean, I trust my 5-year-old child, but I still won't let him walk alone through a busy street. Using C or C ++ to create a large segment of critical security or confidentiality code, programming is equivalent to covering your eyes over eight-lane highways.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.