Bitblaze (iii) static analysis component vine

Source: Internet
Author: User
Tags valgrind ocaml
3 vine: static analysis component

This section describes vine, a static analysis component of the bitblze platform, its intermediate language (IL), its front-end components, backend components, and its implementation.

3.1 vine Overview

Figure 2 advanced structure of vine. Vine is divided into platform-related frontend and platform-independent backend. The core of vine is a platform-independent intermediate language (IL ). Il is a small-scale language that fully corresponds to the assembly language. The Assembly Language converts the front-end to Il, and the back-end analysis is based on IL. Therefore, you can perform analysis that is independent of the computer architecture and does not need to process complex instruction sets (such as x86 ). This design is also highly scalable-you can use the core components provided by vine to easily implement your own IL-based analysis.

 

Vine front-end now supports converting x86 and armv4 to Il. It uses a series of third-party libraries to analyze different binary files and generate assembly code. The assembly code is converted to Il in a syntax-oriented manner.

Vine backend supports a variety of core program analysis tools. These tools can create different graphs, such as control flow charts and program dependency graphs. The backend also contains an optimization framework. This framework is generally used to simplify specific instruction sets. It also provides program verification capabilities, including symbolic execution, the weakest prerequisite for computing, and decision-making processes. Vine can also generate C code through the back-end code generator.

To combine static analysis and dynamic analysis, we also provide an interface for vine to use the execution trace generated by dynamic analysis components (such as Temu. The execution trace file can be converted to Il to support further analysis.

3.2 vine intermediate language

Il is the target language for conversion and the Analysis Language for backend program analysis. The syntax of IL corresponds to the assembly language. Table 1 shows Il.

The basic types of Il are 1-bit, 8-bit, 16-bit, 32-bit, and 64-bit registers (such as N-bit vectors) and memory. The memory type can be divided into three categories by byte sequence: Little (for example, small-end mode in X86 architecture), big (for example, large-end mode in PowerPC architecture), and norm (for details later. The memory type can also be determined by its index type (which must be a register type. For example, mem_t (little, reg32_t) indicates a 32-Bit Memory Address type in the small-end mode.

Vine supports three numeric types. The first type is the Tau Reg type. Second, vine has a memory value {Na1→ NV1, Na2→ NV2,...}, WhereNaiIndicates the address value,NviThe value at the address. In addition, vine also has a unique type.Bytes. This value is transparent to users. It is used to indicate an execution exception.

Expressions in vine have no side effects. Il supports binary operationsB(& | Is a bitwise operator ).U, Constant, let binding, and type conversion. Type conversion is used to change the width of a value. For example, in x86, the low 8-bit eax is Al. We can retrieve the low 8 bits of eax.

In vine, the load and store operations are pure. The load syntax is usually pure, but store is not like this. Each store expression must specify the memory address to be used. For example, the following vine store operation mem1 = store (mem0, A, Y), where mem1 and mem0 are basically the same, but the address a in mem1 points to the value Y. The advantage of vine's pure storage operations is that it can distinguish from syntaxes that the memory has been modified or read. This can be used to calculate the SSA (a single static assignment with a unique static address for both the scalar and memory ).

A vine program generally uses a series of variable declarations followed by instructions. There are 7 different commands, including assignment, jump, conditional transfer, and tag. All jump and conditional transfer targets must be a legal tag, otherwise the program will terminate. Note that jumping to an undefined position will also lead to program termination. You can use the halt statement to terminate the program at any time. Vine also supports assert, similar to the C language assert: The asserted statement must be true, otherwise it will terminate.

The special in vine is relative to calling an externally defined process or function. Special ID indicates the corresponding type. The semantics of special depends on the analysis process; its operation semantics is uncertain. Special is regarded as a type of command. It is explicitly identified when calls that affect the analysis results occur. A typical method for processing special commands is to use Il to compile a summary function suitable for analysis.

 

Regular storage

 

The machine's byte order is usually determined by the hardware's byte storage order. The small-end mode stores low-order bytes first, and the large-end mode stores High-Order bytes first. X86 uses the small-end mode, while PowerPC uses the big-end mode.

When analyzing memory access, you must consider the terminal mode. Consider the assembly code in Figure 3A. The mov command in the second line writes 4 bytes to the memory in the small-end mode (because x86 is a small-end mode ). After the second row is executed, the memory is shown in B. 2nd, 3 rows, EBX = eax + 3. Lines 4th and 5 Write the 16-bit value 0x1122 to EBX. By analyzing these lines of code, we can see that 4th rows rewrite the last byte of the content written in 1st rows, as shown in C.

Regular memory refers to B-byte addressable memory load and store are both B-byte aligned. For example, x86 adopts byte addressing, so the formal storage of X86 means that all its load and store are byte-level. The normalization form of row 1st in Figure 3A is shown in 4. Note that the current storage of the 7th rows is mem6.

Regular memory simplifies program analysis involving memory. Regular storage syntactically points out memory updates that are originally implied in the end mode. Vine backend provides a normalization tool.

3.3 vine frontend

The vine frontend is responsible for converting binary code to Il. In addition, the front end is also associated with libraries such as the GNU binary file descriptor (libbfd) library to analyze the low-level details of binary files.

Translating binary code into Il is divided into three steps:

-Step 1. First, disassemble the binary file. Vine currently supports three disassembly tools: Ida pro, a commercial anti-assembler, and kruegel er Al. can disassemble x86 obfuscation code assembler, we developed based on GNU libopcodes Linear Scan er.

-Step 2. After disassembly, Vex (a third-party library) converts the assembly code to vex intermediate language (Vex il ). Vex Il is part of valgrind's dynamic measurement tool. It is a bit like a language based on. Therefore, it has only a few command types, similar to vine. However, Vex il itself is not suitable for program analysis because it does not handle the side effects of commands. This step mainly aims to simplify vine generation: This step generates a basic Il, and the third step only needs to handle the side effects of the command.

-Step 3. Translate Vex il into Vine.

All the side effects of Assembly commands are explicitly expressed as vine commands after translation is complete. Therefore, an assembly command may be translated into a series of vine commands. For example, the x86 command Add eax, 0x2

Translate Into the vine command:

 

Tmp1 = eax; eax = eax + 2;

// Eflags Calculation

Cf: reg1_t = (eax <tmp1 );

Tmp2 = cast (low, eax, reg8_t );

PF = (! Cast (low,

(Tmp2> 7) values (tmp2> 6) values (tmp2> 5) values (tmp2> 4) Values

(Tmp2> 3) values (tmp2> 2) values (tmp2> 1) values tmp2), reg1_t );

AF = (1 = (16 & (eax finished (tmp1 finished 2 ))));

ZF = (eax = 0 );

Sf = (1 = (1 & (eax> 31 )));

Of = (1 = (1 & (tmp1 partial (2 partial 0xffffffff) & (tmp1 partial eax)> 31 )));

 

The translated commands list all the side effects of the add command, including the six eflags that the operation may update.

In addition to binary files, vine can also convert the instruction trace (Instruction trace) to Il. The condition branch in the trail is converted into an Assert statement. Vine and Temu are jointly designed. Therefore, the trace file generated by Temu can be recognized and used by vine.

3.4 vine backend

At the vine backend, analysis tools are developed based on Vine Il. Vine provides basic components. The following describes the analysis tools and components provided by the vine backend.

Identification machine.Vine provides an identification machine for vine il operation semantics. Through the identification machine, we can directly execute the program without re-compiling il into a compilation.

Graph.Vine allows you to create a control flow chart (CFG), a data dependency graph, and a program dependency graph.

One problem for creating a CFG is how to determine the next instruction for a non-direct jump (jump to a calculated address. Program Analysis tools such as VSA (value set analysis) are usually required to resolve non-direct redirects. Therefore, circular dependencies may occur. Note that the non-direct jump may jump anywhere, either a heap or a code that has not been decompiled.

One solution is to set a node in the CFG graph that points to the uncertain non-direct jump to the next node. In this way, the CFG-based analysis tool knows that we do not know the subsequent status. For example, data stream analysis can extend all facts to the bottom. A better method is to execute a non-direct jump parsing process to generate a more precise cfg. Vine provides an analysis tool, VSA.

Single Static assignment (SSA ).Vine supports mutual conversion with a single static assignment. Since each variable can only be statically defined once, the SSA format makes the analysis easier. We convert both the memory and scalar into the SSA format. The reason for converting the memory into the SSA format is that we can identify the memory before and after the write operation is executed from the syntax, and the analysis tool itself does not need to make similar records.

Truncation.For a given source and sink, program truncation is a graph that contains the statements that affect the sink of the source definition. For example, truncation can be used to limit subsequent analysis to some code (this code is related to the given source and sink, rather than the entire program ).

Data Flow and optimization.Vine provides a universal data stream engine based on user-defined structures. Vine also implements data stream analysis. Vine currently supports global numerical numbers, constant propagation and merging, unattainable code deletion, dynamic variable analysis, Integer Range Analysis, and value set analysis (VSA ). VSA is a worthwhile data flow analysis process that approaches every variable at any program point. Value set analysis is helpful for solving non-direct jump problems. It can also be used for alias analysis. If the intersection of two Memory Access value sets is not empty, one may be another alias.

Optimization is used to simplify or accelerate subsequent analysis. For example, optimization can be performed by halving the time STPS returns a query result during the decision-making process.

C code generator.Vine can generate valid C code from Il. Vine can be used as a basic anti-compiler. First, the assembly language is converted to Il, and then the C code is generated. This also provides a way to compile the vine program: first, convert il into C code and compile it in the C compiler.

The C code generator converts the storage bodies in IL into arrays. The store operation is the storage operation on the array, and the load operation is the loading operation from the array. Therefore, the C code simulates the real memory. For example, a program vulnerable to buffer overflow is submitted to vine, C code is generated, and then compiled. The original program will be simulated in the corresponding C array, but it will not cause real Buffer Overflow.

Program Verification Analysis.Vine currently supports two formal program verification methods. First, vine can convert Il to Dijkstra Guard Command Language (GCL) and calculate the weakest premise of the GCL program. For a program, the weakest prerequisite for the Q predicate is to ensure that any input that meets this condition will obtain the most common condition that satisfies the Q state. Currently, only non-circular programs are supported. For example, while in GCL is not supported.

Vine contains the interface with the decision-making process. Vine can use CVC syntax to generate expressions (for example, the weakest premise ). Vine can be associated with the STP decision-making process by directly calling the STP database.

3.5 vine implementation

Vine is implemented using C ++ and ocaml. The front-end is implemented mainly by C ++, including about 17200 lines of code. The backend is implemented by ocaml and contains about 40000 lines of code. The stub generated by IDL can be used to connect the front-end and back-end.

The front end uses valgrind Vex to assist in conversion instructions, gnu bfd to analyze executable objects, and GNU libopcodes to print the program after disassembly.

In addition to the commands in Figure 1, vine il also contains several constructors:

-Vine il contains the annotation constructor. It can print every disassembly instruction and user-defined comment.

-Vine il supports cross-block location variable jurisdictions.

-Vine il contains a constructor that modifies user-defined attributes.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.