BadIRET vulnerability Exploitation
The Linux kernel code file arch/x86/kernel/entry_64.S versions earlier than 3.17.5 did not correctly handle errors related to the SS (stack) segment register, this allows the local user to initiate an IRET command to access the GS base address from the wrong address space to escalate the permission. The vulnerability, numbered CVE-2014-9322, was fixed by the Linux kernel community in November 23, 2014, with no public exploitation of code or even related discussions in the weeks that followed. When people are about to forget this threat, Rafal Wojtczuk's analysis article Exploiting "BadIRET" vulnerability (unfinished translation version [See note 1) seems to remind us: Don't forget the server guard factory. Rafal has completed research and testing on the 64 ora 20 64-bit GNU/Linux release. The kernel is 3.11.10-301, and even SMEP and SMAP can be bypassed. It also reminds enterprises and individuals who do not pay attention to security O & M at ordinary times that it is necessary to fix known vulnerabilities, because you never know where your enemies are buying and selling digital weapons. "
Note 1:
Author: Rafal Wojtczuk, Feb 2 2015 Original: Exploiting "BadIRET" vulnerability (CVE-2014-9322, Linux kernel privilege escalation) http://labs.bromium.com/2015/02/02/exploiting-badiret-vulnerability-cve-2014-9322-linux-kernel-privilege-escalation/ Translator: Shawn the R0ck (add the name to the back of the participant) -- [0. the IntroCVE-2014-9322 is described as follows: kernel Linux kernel code file arch/x86/kernel/entry_64.S versions earlier than 3.17.5 do not correctly handle errors related to the SS (stack zone) segment register, this allows the local user to initiate an IRET command to access the GS base address from the wrong address space to escalate the permission. ------------------------------------------------------------------- The vulnerability was fixed by the Community in November 23, 2014 [2]. So far, I have not seen public code and detailed discussions. In this article, I will try to explain the nature of this vulnerability and the process of exploits. Unfortunately, I cannot fully reference all the content of the Intel White Paper [3]. If some readers are not familiar with some terms, they can directly look into the Intel White Paper. All the experiments were completed on the 64 ora 20 64-bit release, and the kernel was 3.11.10-301. All discussions were conducted based on 64-bit. Conclusion: 1. The vulnerability can be exploited completely and stably through tests. 2. SMEP [4] cannot block arbitrary code execution; SMAP [5] cannot block arbitrary code execution. -- [1. digression: kernel, usermode, iret ............................... let's look at the figure in the original article: http://labs.bromium.com/2015/02/02/exploiting-badiret-vulnerability-cve-2014-9322-linux-kernel-privilege-escalation/ ............................... -- [2. in some cases, an exception occurs when the Linux kernel returns the user space through the iret command. The exception handler returns the execution path to the bad_iret function. She does: Prepare/* So pretend we completed the iret and took the # GPF in user mode. */pushq $0 SWAPGS jmp general_protection callback as this line of comment explains, the next code stream should be in line with the general protection exception (GeneralProtection) when a user space occurs (jump to # GP handler) identical. Most of the exception handling situations are caused by the iret command, e.g. # GP. The problem is # SS exception. If a vulnerable kernel (such as 3.17.5) also has the "espfix" feature (introduced from 3.16), The bad_iret function then executes the "push" command on the read-only stack, this will cause page errors and directly cause two errors. I don't consider this scenario. From now on, we only focus on kernels without "espfix" before 3.16. This vulnerability is rooted in the fact that the # SS exception handler does not comply with the "pretend-it-was-# GP-in-userspace" [6] plan, compared with the # GP handler, # SS Exception Handling will run the swapgs command once more. If you do not know swapgs, do not skip the following chapter. -- [3. partial question: when the swapgs command accesses the memory through the gs segment, the following steps may actually occur: 1. the BASE_ADDRESS value is extracted from the hidden part of the segment register. the linear address LOGICAL_ADDRESS + BASE_ADDRESS in the memory is dereferenced (Shawn: char * p; * p is deref ). The base address is inherited from GDT (or LDT. In any case, GDT is not required to modify the base address of the GS segment. Reference from Intel White Paper: "SWAPGS exchanges the current GS base register value with the value contained in MSR address C0000102H (IA32_KERNEL_GS_BASE. SWAPGS commands are privileged commands designed for system software. (...) The kernel can use the GS prefix to access the [per-cpu] kernel data structure in normal memory reference ." The Linux kernel allocates a fixed-size struct for each CPU at startup to store key data. Then load IA32_KERNEL_GS_BASE for each CPU to the corresponding structure address. Therefore, in general, for example, the system calls the processing program: 1. swapgs (now GS points to the kernel space) 2. access the per-cpu kernel data structure through memory commands and gs prefixes 3. swapgs (undo the previous swapgs, GS points to the user space) 4. return the user space Naturally, kernel code must ensure that whenever it wants to accesspercpu data with gs prefix, the number of swapgs instructions executedby the kernel since entry from usermode is noneven (so that gs basepoints to kernel memory ). -- [4. it is obvious that the vulnerability is triggered. This vulnerability is simply a grave, because an additional swapgs command is in the vulnerability code path, the kernel will try to access important data structures from the wrong GS base address that may be manipulated by users. When the iret command produces a # SS exception? Interestingly, the Intel White Paper does not fully introduce this aspect (Shawn: big brother ?); When describing the iret command, the Intel White Paper says this: ------------------------------------------------------------------ 64-Bit mode exception: # SS (0) if one tries to pop a value from the stack, it violates. If an attempt to pop a value from the stack causes a reference from the non-canonical address (Shawn: 64-bit can only access the canonical address. ---------------------------------------------------------------------- No condition can be forced to occur in the kernel space. In any case, the iret pseudocode In the Intel White Paper shows another situation: when the segment defined by the return frame isnot present: else IF stack segment is not presentTHEN # SS (SS selector); FI; else so in user space, we need to set the ss register to a value to indicate that it does not exist. This is not very straightforward: we cannot just use: 1_mov $ nonpresent_segment_selector, % eaxmov % ax, % ss then the second command will cause # GP. Setting ss registers through the debugger (any ptrace) is not allowed; similarly, sys_sigreturn system calls do not set this register on 64-bit systems (which may work in 32-bit systems ). Solution: 1. thread A: create A custom segment X2. thread B: s: = X_selector3 in LDT through sys_modify_ldt system call. thread A: Use sys_modify_ldt to invalidate X. 4. thread B: the reason why two threads need to be used in a process to wait for hardware interruption is that sysret commands that are hardcoded # ss value are returned from system calls (including sys_modify_ldt. If we make X invalid in the same thread, it is equivalent to "ss: = X instruction", and the ss register will be in the state of unfinished settings. Running the above Code will cause the kernel panic. In a more meaningful way, we will need to control the gs base address of the user space; she can be set through the system call arch_prctl (ARCH_SET_GS. -- [5. achieving write primitive if you run the above Code, # The SS processing program will return bad_iret normally (meaning it does not reach the GS base address of the memory), and then jump to # GP exception handling program, after a period of execution, this function is called: Listen 289 dotraplinkage void290 do_general_protection (struct pt_regs * regs, long error_code) 291 {292 struct task_struct * tsk ;... 306 tsk = current; 307 if (! User_mode (regs )){... it is not reached317} 318 319 tsk-> thread. error_code = error_code; 320 tsk-> thread. trap_nr = X86_TRAP_GP; 321 322 if (random & unhandled_signal (tsk, SIGSEGV) & 323 printk_ratelimit () {324 pr_info ("% s [% d] general protection ip: % lx sp: % lxerror: % lx ", 325 tsk-> comm, task_pid_nr (tsk), 326 regs-> ip, regs-> sp, error_code ); 327 print_vma_addr ("in", regs-> ip); 328 pr_con T ("\ n"); 329} 330 force_sig_info (SIGSEGV, SEND_SIG_PRIV, tsk); 331 exit: 332 prediction_exit (prev_state); 333} Your C code is not obvious, however, the values read from the gs prefix to the existing macro are assigned to tsk. The row 306th is: 0xffffffff8164b79d: mov % gs: 0xc780, % rbx ------------------------------------------------------------, which is very interesting. We control the current pointer, which points to the data structure used to describe the entire Linux Process. Limit 319 tsk-> thread. error_code = error_code; 320 tsk-> thread. trap_nr = X86_TRAP_GP; Limit write (fixed offset starting from task_struct) address we control. Note that the value itself cannot be controlled (0 and 0xd constants respectively), but this should not be a problem. The game is over? No. We want to overwrite some important data structures on X. Follow these steps: 1. prepare the user space memory in FAKE_PERCPU and set the gs base address to her. let the address FAKE_PERCPU + 0xc780 store the pointer FAKE_CURRENT_WITH_OFFSET to meet FAKE_CURRENT_WITH_OFFSET = X-offsetof (struct task_struct, thread. error_code) 3. after the vulnerability is triggered, do_general_protection is written to X. But soon I will try again to access other members of currenttask_current, And the e. g. unhandled_signal () function will be referenced from the task_struct pointer. We do not rely on X for control. In the end, a page error is generated in the kernel. How can we avoid this problem? Options: 1. Do nothing. Unlike Windows, Linux kernel completely allows the kernel to continue running when a page error that is not expected appears in the kernel. If possible, the kernel will kill the current process (Windows will display a blue screen ). This mechanism is powerless for a large amount of kernel data pollution. My guess is that after the current process is killed, swapgs is not balanced, which leads to more page errors in the context of other processes. 2. Use "tsk-> thread. error_code = error_code" to overwrite the IDT entry of the page error handler. The following page error occurs (triggered by unhandled_signal ). This technology has been successful in some accidental environments. But it won't succeed here, because there are two reasons: * Linux allows IDT to read only * even if IDT is writable, we cannot control the overwrite value -- 0 or 0xd. SMEP/SMAP is also a problem. 3. we can try a race. say, "tsk-> thread. error_code = error_code "write facilitates code execution, e.g. allows to control code pointerP that is called via SOME_SYSCALL. then we can trigger ourvulnerability on CPU 0, and at the same time CPU 1 can runSOME_SYSCALL in a loop. the idea is that we will get code executionvia CPU 1 before damage is done on CPU 0, and e.g. hook the page faulthandler, so tha T CPU 0 can do no more harm. I tried this approach acouple of times, with no luck; perhaps with different vulnerabilitythe timings wocould be different and it wocould work better.4. Throw a towel on "tsk-> thread. error_code = error_code "write. although it is a bit disgusting, we will try the last option. We will let current point to the user space, set this pointer to the memory that we can control through the read deref. Naturally, we observe the following code and look for more code to write deref. -- [6. achieving write primitive continued, aka life after do_general_protection the next opportunity is the function called by do_general_protection (): effecintforce_sig_info (int sig, struct siginfo * info, struct task_struct * t) {unsigned long int flags; int ret, blocked, ignored; struct k_sigaction * action; spin_lock_irqsave (& t-> sighand-> siglock, flags); action = & t-> Sighand-> action [sig-1]; ignored = action-> sa. sa_handler = SIG_IGN; blocked = sigismember (& t-> blocked, sig); if (blocked | ignored) {action-> sa. sa_handler = SIG_DFL; if (blocked) {sigdelset (& t-> blocked, sig); recalc_sigpending_and_wake (t) ;}} if (action-> sa. sa_handler = SIG_DFL) t-> signal-> flags & = ~ SIGNAL_UNKILLABLE; ret = percentile (sig, info, t); spin_unlock_irqrestore (& t-> sighand-> siglock, flags); return ret;} sighand, a member of task_struct, is a pointer, we can set any value. Define action = & t-> sighand-> action [sig-1]; action-> sa. sa_handler = SIG_DFL; else we cannot control the written value, SIG_DFL is the constant 0. This will eventually work, though somewhat distorted. Suppose we want to overwrite the kernel address X. To this end we prepare forged task_struct, So X is equal to the address of t-> sighand-> action [sig-1]. sa. sa_handler. The above line should be noted: ----------------------------------------------------------------------- spin_lock_irqsave (& t-> sighand-> siglock, flags ); ------------------------------------------------------------------------- t-> sighand-> siglock in t-> sighand-> action [sig-1]. sa. on the constant offset of sa_handler, the kernel will call spin_local_irqsave on some addresses, and the content of X + SPINLOCK cannot be controlled. What will happen? There are two possibilities: the memory address of 1.x + SPINLOCK looks like a spinlock without a lock. Spin_lock_irqsave will be completed immediately. Finally, spin_unlock_irqrestore will cancel the write operation of spin_lock_irqsave. 2. The memory address of X + SPINLOCK looks like a locked spinlock. If we do not intervene, the spin_lock_irqsave will wirelessly wait for the spinlock loop. Some worries. to bypass this obstacle, we need other assumptions-the content of the memory address where X + SPINLOCK is located. This is acceptable. We can see that X is set in the kernel. data area later. * First, prepare FAKE_CURRENT to point t-> sighand-> siglock to the locked area of the user space. The SPINLOCK_USERMODE * force_sig_info () will be mounted in the spin_lock_irqsave, another user space thread runs on another CPU and changes t-> sighand, so t-> sighand-> action [sig-1.sa.sa_hander becomes our overwrite target, after unlocking the SPINLOCK_USERMODE * spin_lock_irqsave, * force_sig_info () will be returned and re-loaded to t-> sighand. To execute the desired write operation, we encourage careful readers to ask why 2nd solutions cannot be used, that is, X + SPINLOCK does not have a lock at the beginning. This is not all-we need to prepare some FAKE_CURRENT fields for minimal code execution. I won't reveal more details-this BLOG is long enough... what will happen next? Force_sig_info () and do_general_protection () are returned. Next, the iret command will generate another # SS Exception Handling (because the user space's ss value references a nonpresent segment on the stack), but this time, # The additional swapgs commands in the SS processing program will return and cancel the previously incorrect swapgs. Do_general_protection () will call and operate the real task_struct instead of forged FAKE_CURRENT. Eventually, current sends a SIGSEGV signal, and other processes are scheduled to run. The system is still stable .................................. ............... See the original figure :................................... ............. [1] CVE-2014-9322 http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-9322 [2] Upstream fix http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6f442be2fb22be02cafa606f1769fa1e6f894441 [3] Intel Software Developer's Manuals, http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html [4] SMEP http://vulnfactory.org/blog/2011/06/05/smep-what-is-it-and-how-to-beat-it-on-linux/ [5] SMAP http://lwn.net/Articles/517475 [6] "pretend-it-was-# GP-in-userspace" https://lists.debian.org/debian-kernel/2014/12/msg00083.html
This article permanently updates the link address: