An integer range limit for discovering Buffer Overflow Vulnerabilities
Author: Brief
E-mail: Brief # fz5fz.org
Web: http://www.fz5fz.org
Web: http://www.safechina.net
Date: 05-03-2004
-- [Directory
1-Preface
2-prototype Overview
3-integer limit generation
4-integer Restriction
5-conclusion
6-Reference
7-about
-- [1-Preface
There is really nothing new, and you have the right to take notes.
The software buffer overflow vulnerability mining technology based on static analysis mainly targets source code or assembly code in advanced languages. It can discover some potential security problems before or after the software is released, of course, it mainly applies to some open-source systems or system developers. Here we discuss the problem of abstracting the software buffer overflow vulnerability into an integer range analysis, and only for the C language source code. For details about how to discover Buffer Overflow Vulnerabilities restricted by integer ranges, refer to [1].
I believe that everyone has some knowledge about the software buffer overflow problem. I will not describe the background and principle in detail here. Chinese friends can go to [2] and [3] to view related technical literature, or [4] to view the latest buffer overflow vulnerability information.
-- [2-prototype Overview
The buffer overflow vulnerability mining technology based on static source code analysis can discover many security problems in the development stage, and also some software vulnerabilities in open source systems. They mainly target source code program analysis, including some compilation principles related knowledge, such as syntax analysis and semantic analysis.
The mining prototype restricted by Integer Range includes program source code analysis, mathematical modeling, and system security. First, perform special analysis on the C language source code, then generate Integer Range restrictions, and finally analyze the restrictions to filter out the code with security issues. For a brief introduction, see [5].
As a prototype, there must be some practical value. This prototype is the compromise they made after analyzing their actual work. The mining prototype restricted by Integer Range also has some related accuracy problems, such as false positive and false negative. False positives are warnings of errors, while false negatives are security vulnerabilities that are omitted. This is what we call false positives and false negatives. As a compromise, it selects the relative balance between accuracy and measurable measurement, which increases the feasibility of analyzing the actual large system.
Two new ideas are proposed in the prototype:
(1) regard the C string as an abstract data type;
(2) regard the buffer zone as an integer range pair, including the allocation range and use range;
The buffer length allocated to the string variable is defined as alloc (s), the buffer length currently used by the string variable is defined as Len (s), and S is the string variable defined in the system.
If the buffer length Len (s) currently used by the software is greater than the system's allocated buffer length alloc (s), there may be a buffer overflow vulnerability, whether it can be used is not considered for the time being. This is a difficult issue of automated analysis and measurement. When considering the buffer length, you must also note the buffer location, that is, the length and position of the buffer range related to the variable. In this way, the discovery of the software buffer overflow vulnerability becomes a problem of tracking the Integer Range.
The entire work is divided into two parts: one is to generate corresponding Integer Range restrictions for string operations; the other is to quickly and accurately analyze the resulting restrictions and obtain the final vulnerability report information.
For detailed mathematical descriptions and representations of restricted languages, see [1].
-- [3-integer limit generation
The tool set Bane [6] is used to analyze the C language source code. It generates Integer Range restrictions by tracking the analysis tree of the C code. For convenience, the length of the string contains the terminator '/0 '. Such a safe expression is Len (s) ≤ alloc (s ). For string operations, it defines a series of matching expressions. When a corresponding statement appears in the source code, it searches for the corresponding Integer Range Limit expression in the table.
The C statements corresponding to these matching expressions include: Char s [N], strlen (s), strcpy (DST, Src), strncpy (DST, SRC, n ), S = "foo", P = malloc (N), P = strdup (s), strcat (S, suffix), strncat (S, suffix, n ), P = getenv (...), gets (s), fgets (S, N ,...), sprintf (DST, "% s", Src), sprintf (DST, "% d", n), snprintf (DST, N, "% s", Src ), P [N] = '/0', P = strchr (S, C), H = gethostbyname (...) and so on.
To be measurable and reproducible, they use a non-stream-sensitive analysis method that ignores the control flow and expression order in the code. We can easily think of this method, which can easily lead to inaccurate analysis. For example, loop statements in a program may contain some relatively repeated statements, which may lead to incorrect analysis results or loss of some analysis results. For more information, see [7].
However, in their prototype, the strcat () function is specially processed to mark all places where strcat () occurs as a potential security vulnerability.
Next let's take a look at the prototype analysis method. The range of Len (s) is defined as [a, B], and the range of alloc (s) is [c, d]. there are three possible scenarios:
(1) If B is less than or equal to C, it can be determined that string s will not experience Buffer Overflow;
(2) If a> D, the string s will always experience Buffer Overflow;
(3) If the two ranges overlap with each other, D> B> C> A, no definite judgment can be made, and a potential buffer overflow vulnerability may exist;
To make the prototype simple and easy to implement, they discard a lot of analysis on Pointer operations. unanalyzed operations include pointer aliasing, double pointer operation, array pointer, function pointer. However, because structure is widely used in C language, it supports structure analysis and processing.
-- [4-integer limit solved
When the buffer overflow vulnerability is discovered as an integer range restriction, an analysis solution is required to determine whether a buffer overflow exists. They used simple graph theory technology to build an effective algorithm that addresses Integer Range restrictions. Due to the inconvenient description of some graph theory knowledge, we will not perform too many in-depth analysis here.
The solution to restrict the system provides the boundary range of each variable, but does not provide the association between variables. This reduces the burden on analysis and is easy to implement, but also misses some security issues. At present, simple security problems may be caused by improper use of a variable, but complicated security problems are often caused by a complicated relationship, this is also a limitation of the primitive type.
First, a directed graph is generated. Each vertex corresponds to a variable in the program. The restriction relationships generated for each variable are expressed in Directed Graphs in the form of directed edges. The solution releases some relationships by spreading some information in the directed graph, and finally obtains a simplified directed graph related to the nature of vulnerability mining.
If the resulting directed graph is cyclic, the topology is used to classify the graph, and then the information is propagated to the edge after being classified in the directed graph.
They proposed three solutions for cyclic Directed Graphs:
(1) restrict the programs that will result in circular restrictions on the system;
(2) Introduce a relaxation operation on the variable that generates a loop, which can avoid infinite rising chains;
(3) use the special domain information in the restricted language to directly process the restriction system related to loops;
Due to the disadvantages of the first two solutions, the third solution was selected. Because the first scheme is impractical, there are always many loops and recursion in the real program; the second scheme will produce some inaccurate results.
Finally, they come up with a law that can solve the limitations of loops in a linear time.
-- [5-conclusion
In terms of performance, there are still many improvements to the primitive model, although it is now a usable system. The prototype generates too much debugging information, which brings a great deal of work for later analysis. In the future, we can try to reduce the analysis time.
The biggest limitation is that inaccurate range analysis leads to excessive false positives. This is a common problem faced by many vulnerability mining prototypes/tools, that is, the accuracy of scope analysis is not well solved when an abstract model is created, but it also provides us with a lot of practical information available, which is much higher than the full manual audit. If you can give it enough time to analyze the integer range limit, the accuracy will be much better.
One problem still exists is underreporting. Currently, the underreporting rate cannot be well measured. At the same time, limiting the system's resolution results does not provide sufficient detailed vulnerability information for personnel.
As an automated static analysis prototype of software buffer overflow vulnerability, it is already quite good. Although there are many improvements, it can reduce the time and effort of personnel audits. I personally think that to improve the precision of vulnerability mining, in addition to having a deep understanding and understanding of the principle of buffer overflow vulnerability mining, we also need to have a deep abstraction capability, in order to generate practical and effective ideas and methods.
-- [6-Reference
[1] David Wagner et al, "a first step towards Automated Detection of buffer overrun vulnerabilities ".
Http://www.isoc.org/isoc/conferences/ndss/2000/proceedings/039.pdf
[2] xfocus
Http://www.xfocus.net/articles/
[3] nsfocus
Http://www.nsfocus.net/index.php? Act = magazine
[4] securityfocus
Http://www.securityfocus.com/archive/1/
[5] David Wagner et al, "Towards automatic detection of buffer overrun vulnerabilities: a first step ".
Http://www.isoc.org/isoc/conferences/ndss/2000/proceedings/slides/01.pdf
[6] a. Aiken et al, "a toolkit for Constructing Type-and constraint-based program analyses ".
Http://theory.stanford.edu /~ Aiken/publications/papers/tic98.ps
[7] Jeffrey S. Foster et al, "Flow-sensitive type qualifiers ".
Http://www.cs.umd.edu /~ Jfoster/papers/pldi02.pdf
-- [7-about
About us:
Fz5fz is mainly engaged in the study and research of network/system security, in-depth analysis and discussion of programming technology, persistence in originality, and pursuit of sharing.
Fz5fz home: http://www.fz5fz.org