Detect Android simulators using the specific system value of the cache
0x00
Currently, Android simulators are detected based on specific system values. For example, getDeviceId (), getLine1Number (), and a series of values recorded by the android. OS. Build class. However, it was accidentally discovered that a foreigner proposed to use the cache to distinguish the idea of the simulator and the real machine. However, the foreigner may be lazy at the time and has no specific details, after a simple PoC is written, the Evaluation is empty and there is no experiment, so I don't know whether this method is really effective. Therefore, this article shows the entire detection method from principle to implementation.
0x01 ARM and x86
Because most Android mobile phones use the ARM architecture, first let's take a look at the differences between the ARM architecture and the x86 architecture in cache. The differences between the two are shown in.
Figure 1: differences between ARM and x86 cache
We can see that there can be several levels of cache between the CPU and memory, here is L1 and L2. The purpose of cache is to speed up and cache commands, so you do not need to fetch them in low-speed memory. X86 cache is continuous, but ARM divides L1 cache into two parallel segments, I-Cache and D-Cache. This memory structure separates program instruction storage from data storage is called the Harvard architecture ), the von noriman architecture, which combines the program instruction memory and data memory, is called the von norann architecture ).
The problem arises. In the structure where commands and data are stored separately, these two caches are not synchronized. Therefore, the data values of a specific address are updated in one cache, but it is not updated in another cache. For example, if you write data to the data cache, the command cache does not write the data.
Currently, the Android SDK provides a QEMU-based simulator, and QEMU is an open-source simulation processor software. For details, refer to the Wikipedia QEMU. Therefore, the simulator has no separate cache, And the simulator has only one full cache.
So we have the idea of using cache to detect simulators.
0x02 ideas
Let's take a look at the flowchart:
Figure 2: Detection ideas
On the left is the situation on the real machine, and on the right is the situation on the simulator. The following describes the operations and consequences.
Step 1:
Execute the command on an address, which is assumed to be the address $ address. On a real machine, the commands are written to the I-Cache, And the simulator is directly written to the cache (because the simulator is a whole cache ).
Step 2:
Write a new command to $ address. Note: The difference is that the new commands on the real machine will be written into the D-Cache and directly written to the cache in the simulator.
Step 3:
Run the $ address command. At this time, on the real machine, the command will be read from the I-Cache, that is, the first step of the command will be executed. The simulator reads commands directly from the cache and executes the new commands in step 2.
Of course, the Instruction cache on the real machine may be washed away, but the possibility of the experiment is still relatively small.
0x03 show me the code
First, design a piece of code that will re-write a command to a specific address. Then, we can use a loop to implement it because we need to return to the original address and execute it again. The Code is as follows:
#! Cpp _ asm _ volatile (1 "stmfd sp !, {R4-r8, lr} \ n "2" mov r6, #0 \ n "used to count the number of cycles, debug 3" mov r7, #0 \ n "assigns the initial value 4" mov r8, pc \ n "4, and 7 for r7 to obtain the address 5" mov r4, which overwrites the $ address "new command, #0 \ n "assigns an initial value for r4 6" add r7, #1 \ n "to overwrite $ address's" new command "7" ldr r5, [r8] \ n "8" code: \ n "9" add r4, #1 \ n "this is $ address, which adds 110" mov r8, pc \ n "10, 11, 12 is used to write the 6th-line command to 9th-line 11" sub r8, #12 \ n "12" str r5, [r8] \ n "13" add r6, #1 \ n "r6 is used to count 14" cmp r4, #10 \ n "control loop count 15" bge out \ n "16" cmp r7, #10 \ n "control loop count 17" bge ou T \ n "18" B code \ n "10 cycles back 19" out: \ n "20" mov r0, r4 \ n "returns the value of r4 21" ldmfd sp !, {R4-r8, pc} \ n ");
The annotations have been clearly explained. That is to say, if r4 is 10, the old command is executed, and the command is on the real machine. If r4 is equal to 1, the old command is executed on the simulator.
Here we will encounter a problem, that is, we do not have the permission to write code segments. The solution is to write a piece of mmap, copy the compiled machine code, and then skip the execution.
#!cppvoid (*call)(void);#define PROT PROT_EXEC|PROT_WRITE|PROT_READ#define FLAGS MAP_ANONYMOUS| MAP_FIXED |MAP_SHAREDchar code[]="\xF0\x41\x2D\xE9\x00\x60\xA0\xE3\x00\x70\xA0\xE3\x0F\x80\xA0\xE1""\x00\x40\xA0\xE3\x01\x70\x87\xE2\x00\x50\x98\xE5\x01\x40\x84\xE2""\x0F\x80\xA0\xE1\x0C\x80\x48\xE2\x00\x50\x88\xE5\x01\x60\x86\xE2""\x0A\x00\x54\xE3\x02\x00\x00\xAA\x0A\x00\x57\xE3\x00\x00\x00\xAA""\xF5\xFF\xFF\xEA\x04\x00\xA0\xE1\xF0\x81\xBD\xE8";void *exec = mmap((void*)0x10000000,(size_t)4096 ,PROT ,FLAGS,-1,(off_t)0);memcpy(exec ,code,sizeof(code)+1);call=(void*)0x10000000;call();
I applied for a piece of memory, copied the machine code of the assembly code, and then jumped to the memory for execution. Then we can get the value of r4.
#!cpp__asm __volatile ("mov %0,r0\n":"=r"(a)::);
Put the r0, that is, the r4 value, in the variable. Then, return different values based on the value of. It is convenient to judge the result in the application.
0x04 debugging
For the debugging method, see Dr. Zheng's article about dynamic debugging of seven weapons in Android: peacock Ling-Ida Pro.
The entire debugging process is to compile the code in the previous section into an so shared library. The returned value is r0, which is the value of r4 (a variable ), then, you can determine the environment in which the application runs based on the returned values.
Break the breakpoint before entering 10000000, and then F7 goes in.
After entering, it will be disconnected during mov r0 and r4 and executed by F9. At this time, we can see that the value of r4 is 10, which is the result of the test on the real machine. You can see that the original add r4, #1 has changed to add r7, #1, but the actual execution is still add r4, #1.
The result of the simulator execution is as follows. We can see that the value of r4 is 1 and r7 is 10. Therefore, the new command is executed on the simulator:
0x05 Test
I don't know if it is feasible on other machines. You can download it from https://github.com/leonnewton/cache_testfor testing.