1, LLVM and Valgrind Introduction
LLVM (Low level Virtualmachine) is a compiler framework developed by the Chrislattner of the Illinois State State University of Champagne. With Chrislattner going to Apple, LLVM as Apple's official support for the compiler. Performance is more than GCC in many ways compared to GCC,LLVM. LLVM received the ACM Software System Award in 2012.
Valgrind is a software development tool for memory debugging, memory leak detection, and performance analysis, a software developed by a developer organization from around the world, and his initial author won the second section of the Google-o ' Reilly Open source Code award in 2006.
2. LLVM structure and LLVM IR (intermediaterepresentation)
LLVM as the compiler backend is not responsible for converting the source program language into Llvmir, which is done by the compiler frontend of the respective language, Clang is the front end of the C/c++/object-c language, and LLVM-GCC can be used as the front end of the Fortran language. After the compiler has compiled the language into Llvmir, LLVM optimizes the IR and, depending on the platform (x86, ARM, etc.), converts the Llvmir compiler into the machine language of the corresponding platform and executes it, or it can be JIT executed. The structure of the LLVM is as follows:
LLVM ir is a kind of connecting link in the whole process, it is not related to the source language, and is independent of the execution platform, it is an SSA (Static singleassignment) Form of code, and RISC instructions like the instructions, it is designed to be a three-address form.
Here is a corresponding example: the source code of the program:
int Mian () { int a = 10; int B = 20; int C = a + B; int d = b/a; return 0; } |
The corresponding LLVM intermediate code:
; ModuleID = ' Test.ll ' Target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128 : 128:128-a0:0:64-f80:32:32-n8:16:32-s128 " Target triple = "I386-pc-linux-gnu" ; Function Attrs:nounwind Define I32 @_z4mianv () #0 { Entry %a = Alloca i32, align4 %b = Alloca i32, align4 %c = Alloca i32, align4 %d = Alloca i32, align4 Store i32, i32*%a, Align4//a=10 Store i32, i32*%b, Align4//b=20 %0 = Load i32*%a, ALIGN4 %1 = load i32*%b, ALIGN4 d = Add NSW i32%0,%1//a+b Store I32 D, i32*%c, Align4//c=a+b %2 = load i32*%b, ALIGN4 %3 = Load i32*%a, ALIGN4 %div = Sdiv i32%2,%3//b/a Store i32%div, i32*%d, align4 RET i320 } Attributes #0 = {Nounwind "Less-precise-fpmad" = "false" "No-frame-pointer-elim" = "true" " No-frame-pointer-elim-non-leaf "=" true "" No-infs-fp-math "=" false "" no-nans-fp-math "=" false "" unsafe-fp-math "=" False "" "Use-soft-float" = "false"} |
From the above figure can see the basic results of the intermediate code, the first is some target platform information, the end is some properties, these two parts is not the focus of Llvmir, in addition, Llvmir can be divided into three parts: global variables, functions and Symbol table entry (above). Llvmir can see his three address form, and its type is expressed in a form like ' i32 ', has not specified a specific register, so that the target platform is not related to the purpose.
3. Valgrind Structure and Vex IR
Unlike LLVM, Valgrind does not need the source code of the program, Valgrind directly to the executable file, so far it has supported most of the mainstream platform (X86,ARM,AMD). Valgrind when parsing an executable file, it is equivalent to adding a vargrind own virtual layer, the structure diagram is as follows:
Valgrind First initializes the Vex binary translation engine, and then vex the front end to replace the executable's binary code with the Vex Intermediate Code representation (IR), and then valgrind the tool to optimize the Vex intermediate code and plug the Piles, and finally vex the backend to Vex IR translation becomes the machine code execution program. The Valgrind internal structure diagram is as follows:
The intermediate code form for Valgrind is shown in the following figure:
On the left is the assembly code corresponding to the machine code, and the right side is the middle code of Vex ir. As can be seen from the figure Vex IR is an intermediate representation of two address forms, and for each machine instruction, Vex translates it into a basic block in Vex ir. VEX ir is a set of instructions similar to the RISC instruction set, also in the form of SSAS (Static singleassignment), which means that it can have an infinite number of variables. For each variable it has a type, and there is no implicit type conversion.
Valgrind internal testing tools according to their own needs will be vex ir pile some of their own required instructions, when the instruction insert pile is completed, Vexback-end first for the SSA form of the variable allocation register, and then the Vex-IR converted into machine code, and then executed. The conversion is as follows:
VEX ir Middle says:
Assign registers to the variables:
Translate into machine code:
4. The similarities between the two
1) They are both intermediate forms of program existence, which represent the way the program exists in a specific time period, and can well represent the meaning of the program.
2) They are in SSA form, can have countless variables, just need to use the register allocation algorithm when generating machine code to assign the appropriate registers to these variables.
3) Their class of RISC-like instruction sets means that they only require some simple and easy-to-use instructions, which require the combination of these simple instructions for complex operations.
5, the difference between the two
1) LLVM IR is derived from source code, and the source program language is converted from the compiler front end. Vex IR is obtained by translating the machine code of a workable file without the need for a source program.
2) LLVM IR is in three address form, while Vex IR is a two address form. The advantage of the Llvmir in the form of a three-address is that it can be expressed in a form like ' I32 ' when representing a type, and that it does not apply to registers, which makes it irrelevant to the target platform. Vex IR is related to the platform.
3) LLVM IR as the intermediate language of the compiler, has a very powerful function, in the program analysis, Llvmir is currently only used for static analysis, VEX IR is used for dynamic analysis. Llvmir is used for static analysis without inserting some external instructions into it, VEX IR needs to pile some instructions for testing.
6. Summary
Both Llvmir and vex ir are good middle-said they have a lot in common, and because of their different goals, there are some differences. LLVM as an increasingly popular compiler framework, Llvmir as an inter-code representation will be recognized by more people. VEX-IR, as an intermediate representation of the dynamic analysis tool Valgrind, is accepted by more professional-related people.