I. Introduction
Libemu is a x86 shellcode-based library implemented in C language.
It supports:
1. parse x86 commands, register simulation, and FPU Simulation
2. Static Analysis, dynamic analysis, WIN32API hook
You can use libemu:
1. Determine if a string is shellcode
2. You can use libemu to obtain the command execution flowchart (similar to IDA and other debugging tools)
Libemu can be used in IDS, honeypot, and other security products.
Ii. Use
The following is an example of using libemu.
/*libemu test*/#include <emu/emu.h>#include <emu/emu_shellcode.h>#include <emu/emu_memory.h>struct emu *emu;char shellcode[] = "\xbe\x1f\x5e\x89\x76\x09\x31\xc0\x88\x46\x08\x89\x46\x0d\xb0\x0b" "\x89\xf3\x8d\x4e\x09\x8d\x56\x0d\xcd\x80\x31\xdb\x89\xd8\x40\xcd" "\x80\xe8\xdc\xff\xff\xff\x2f\x62\x69\x6e\x2f\x6c\x73\x00\xc9\xc3"; int main(){emu = emu_new();if ( emu_shellcode_test(emu, (uint8_t *)shellcode, 48) >= 0 ) { fprintf(stderr, "suspecting shellcode\n"); }emu_free(emu);return 0;}
In the preceding example, the suspecting shellcode is printed during execution, indicating that this is a string that can be used. In fact, this string is the shellcode in Linux, the completed function is to execute "/bin/ls" in the current path ".
Iii. Implementation Principle
Libemu is based on parsing and simulating the x86 assembly language. Unlike bochs and qemu, libemu is only a simulator, not a virtual machine. You can only perform a simple simulation of memory and CPU, but not a full simulation.
3.1 A basic assumption
A basic assumption of libemu is that if the string is a piece of shellcode, it must contain the "call" (0xe8) or "fnstenv" (0xd9) Command (getpc code ).
This involves shellcode writing skills. Generally, address locating is required in shellcode writing, and address locating is difficult to bypass call/RET or floating point number commands like fnstenv. For example:
Example:
jmp 0x2a popl %esi movl %esi,0x9(%esi) movb $0x0,0x8(%esi) movl $0x0,0xd(%esi) movl $0xb,%eax movl %esi,%ebx leal 0x9(%esi),%ecx leal 0xd(%esi),%edx int $0x80 movl $0x1, %eax movl $0x0, %ebx int $0x80 call -0x2f .string \"/bin/ksh\"
The above assembly actually executes two system calls exec and exit, but during this execution, the address of the string "/bin/KSh" needs to be passed to exec, however, this address is unknown when writing shellcode.
The feature of the "call" command is used here: the EIP is pushed during the call (in fact, the address of the next instruction of the Call Command is put into the stack, in This shellcode, It is the address of the string "/bin/KSh"), and then jump to the "popl % ESI" command. After the command is executed, it is the address that just pushed to the stack, pop is assigned to the ESI register.
Another example: Write another shellcode technique-delta offset. After a shellcode is written, the command and data are in a fixed position. The shellcode is different on the shellcode development machine and the attacked machine in that shellcode has different loading locations (different EIPs ). If the address used by shellcode is hard-coded as the actual address on the development machine, you can calculate the address of the attacked machine by simply knowing the Delta offset. For example:
In this case, the problem is converted to how to obtain the EIP value of the attacked machine. You can do this:
call delta delta: pop ebp
Or:
fpu_addr: fnop call GetPhAddr sub ebp,fpu_addr GetPhAddr: sub esp,16 fnstenv [esp-12] pop ebp add esp,12 ret
3.2 three basic actions: 1. Simulate Memory and CPU memory: Simulate two-level page tables (copy upon writing ). CPU: analog registers, segments, current commands, and descriptions. 2. Static Analysis
Whether the x86 Assembly format is legal:
Sort out the command flow: Perform the command in sequence, jump, and conditional jump to obtain the execution flow chart.
Determine whether the data address of the command exceeds the control range of "shellcode.
3. Dynamic execution
Determine whether the value of registers and memory addresses used by the command during command execution is controlled by shellcode. Here we need to use the command execution flowchart obtained from static analysis.
3.3 determine whether shellcode is the standard:
Can this string be dynamically executed as an assembly language?
4. pseudocode
Emu_shellcode_test (XXX, uint8_t * data, uint16_t size {If (data does not contain "0xe8" and "0xd9") {This is safe; return ;} find the first legal assembly statement from data and perform static analysis; If (Static Analysis Error) // syntax error or data uncontrollable {This is safe; return ;} perform dynamic analysis; If (the Assembly in data can perform n consecutive steps dynamically) {This is suspecting shellcode; return ;}}
5. Limitations
Platform: limited to x86
Performance: relatively low. The NIDs demo provided by libemu has a throughput of 2 Mbps.
False Positive and false positive: "call" may not exist in shellcode"
Detection Method: libeum does not actually detect attacks.
Encoding/encryption: libemu is powerless to encode or encrypt data.