The mysterious situation of Linux page table isolation patch
Errata and supplement in this article
Long message warning:This is a strictly restricted security bug that is not allowed to be disclosed (LCTT Note: it has been partially disclosed). It affects almost all CPU architectures that implement virtual memory, hardware changes are required to completely solve this bug. Urgent development work is in progress to mitigate this impact through software, and has recently been implemented in the Linux kernel, and in May November, A similar emergency development was also started in the NT kernel. In the worst case, software fixes can cause a huge slowdown in general workloads (LCTT ). Here is a prompt that the attack will affect the virtualization environment, including Amazon EC2 and Google computing engines, and the other prompt that this precise attack may involve a new Rowhammer Variant: a dram Security Vulnerability proposed by Google's security team will be briefly introduced later in the article ).
I generally don't care much about security issues, but I'm a little curious about this bug. People who write this topic usually seem very busy, or those who know the details of this topic will remain silent. This allowed me to spend several hours exploring more information about the mystery on the first day of the New Year (New Year's Day), And I pieced together these pieces of information.
Note that this is a highly correlated event. Therefore, the main description of this event is speculation, unless the ban is canceled for a while. What I see, including the suppliers involved, a lot of arguments and such dramatic scenes, will appear on the day when the ban is lifted.
LWN
The cause of this event is the current status of the kernel page table on December 20 LWN: Page isolation this article. From the tone of the article, we can see the degree of urgency of the work, core kernel developers urgently joined the development of the KAISER patch series, which was first published by a group of researchers from TU Graz in Austria in July.
The purpose of these patches is conceptually simple: to prevent processes running in the user space from generating various attack methods on the kernel space page through ing in the process page table, it can effectively prevent attempts to identify kernel virtual addresses from unprivileged user space code.
This group specifically pointed out in the summary of the paper KASLR dead: KASLR permanent by describing KAISER that when the user code is active on the CPU, delete all kernel address space information from the memory management hardware.
The charm of this patch set is that it has reached the core, all the base columns of the kernel (and interfaces with the user space), obviously, it should be given the highest priority. In terms of Memory Management in Linux, the first introduction of a change usually takes place long before the change is merged, it usually carries out multiple evaluations, rejections, and a series of arguments for various reasons.
The merging of the KAISER (the current KPTI) series (from introduction to) takes less than three months.
ASLR Overview
On the surface, these patches are designed to ensure that Address Space Layout Randomization (ASLR) remains effective: This is a security feature of modern operating systems, it tries to introduce more random bits into the address space of the Public ing object.
For example/usr/bin/python
Dynamic Links deploy the system's C library, heap, thread stack, and main executable files to accept random address ranges:
$ bash-c ‘grep heap /proc/$$/maps’
019de000-01acb000 rw-p 0000000000:000 [heap]
$ bash-c 'grep heap /proc/$$/maps’
023ac000-02499000 rw-p 00000000 00:00 0 [heap]
Pay attention to the changes in the start and end offset of the heap of the bash process that runs twice.
If a buffer zone management bug occurs, attackers can override the memory address pointed to by some program code, which will be used in the program control flow, in this way, attackers can redirect the control flow to a buffer that contains the content they select. This feature is used by attackers to fill the buffer zone with machine code to do what they want (for example, to callsystem()
C library function) will be more difficult, because the function address is different on different running processes.
This is a simple example. ASLR is designed to protect many scenarios like this, including preventing attackers from knowing the address of the program data that may be used to modify the control flow or launching an attack.
KASLR is a "simplified" ASLR applied to the kernel itself: on each reboot system, the address range belonging to the kernel is random, so that, although the control flow controlled by attackers runs in kernel mode, they cannot guess the addresses of functions and structures required to achieve their attack goals, for example, locate the data segment of the current process, and promote the active UID from a non-privileged user to the root user.
Bad news: reducing the software operating costs of such attacks is too expensive
In the previous method, Linux maps the kernel memory to the same page table of the user memory. The main reason is that when the user's code triggers a system call, failure, or interruption, you do not need to change the virtual memory layout of running processes.
Because it does not need to change the virtual memory layout, and thus does not need to clean (flush) the cache dependent on the layout and high CPU Performance: that is to say, if the cache is cleared, the CPU performance will decrease.) The main reason is to convert the lookup Buffer Translation Lookaside Buffer (TLB) (LCTT: TLB, converts a virtual address to a physical address ).
With the merge of page table split patches, the kernel cache needs to be cleared every time the kernel starts to run, and this will happen every time the user code resumes running. For most workloads, the actual total loss of TLB in each system call will significantly slow down: @ grsecurity measures a simple case, on a latest amd cpu, Linuxdu -s
The command slows down by 50%.
34C3
At this year's CCC conference, you can find another TU Graz researcher describing a pure Javascript ASLR attack. by carefully understanding the operation time of the CPU memory management unit, we traverse the page table describing the virtual memory layout to implement ASLR attacks. This result is achieved through a combination of highly precise time-controlled and selectively recycled CPU cache rows. A Javascript program running on a web browser can retrieve the virtual address of a Javascript Object, this allows you to use browser memory management bugs for subsequent attacks. (LCTT: The author of this article said that the CCC lecture on the above link has nothing to do with the KAISER patch, and the author has made a mistake)
Therefore, on the surface, we have a set of KAISER patches that also demonstrate ASLR-based address release technology. In addition, this demonstration uses Javascript, it can be re-deployed on an operating system kernel soon.
Virtual Memory Overview
In general, when some machine code tries to load, store, or jump to a memory address, the modern CPU must first convert thisVirtual AddressTo onePhysical addressThis is done by traversing a series of operating system-hosted arrays (called page tables) that describe the ing between virtual addresses and physical memory mounted on this machine.
In modern operating systems, virtual memory may be the most important feature: what can it do to avoid? For example, A dying process crashes the operating system, a web browser bug crashes your desktop environment, or a change to a virtual machine running in Amazon EC2 affects another virtual machine on the same host..
The principle of this attack is to use a large number of caches maintained on the CPU. By carefully manipulating the cached content, it can speculate on the address of the memory management unit, to access different levels of the page table, because an Uncached access takes longer (in real time) than a cached access ). By detecting accessible elements on a page table, it may be able to restore most of the bits in a virtual address busy with MMU (LCTT: Memory Management Unit ).
Evidence of such motives, but do not panic
We found the motivation, but so far we have not seen any panic about the introduction of this job. In general, ASLR cannot completely reduce this risk, and it is also the final line of defense: only in the six-month period, even a security-conscious person can see some news about the unmasking ASLR pointer. In fact, this was the case when ASLR appeared.
Independently repairing ASLR is not enough to describe the motivation behind the high priority of the job.
It is evidence of hardware security bugs.
By reading these patches, you can identify many things.
First, as @ grsecurity pointed out, some comments in the Code have been edited (redacted), and the attached main document file describing this work has been invisible in the Linux source code tree.
By checking the code, it is built with a runtime patch. During system boot, It will be applied only when the kernel detects an affected system, similar mechanisms are used to mitigate the notorious Pentium F00F bug:
More clues: Microsoft has already split page tables.
From a simple mining of FreeBSD source code, we can see that currently, other free operating systems do not implement page table segmentation. However, with the prompt of Alex Ioniscu on Twitter, this work is no longer limited to Linux: Since November, the public NT kernel has implemented the same technology.
Guess: Rowhammer
Further exploration of the work of the TU Graz researchers, we found this article "When rowhammer hits only once", a new Rowhammer attack variant announced in December 4:
In this paper, we propose a new method of original exploitation of Rowhammer attacks and vulnerabilities, which means that even combining all the defenses has no effect. Our new attack technology breaks the previously assumed precondition for triggering the Rowhammer bug.
A quick review, Rowhammer is the majority (all ?) Fundamental issues of commercial DRAM types, such as memory in normal computers. By precisely operating a region in the memory, this may cause the storage-related (but logically independent) content in this region to be destroyed. The result is that Rowhammer may be used to reverse the BIT (bits) in the memory so that unauthorized user code can be accessed. For example, this bit describes the access permissions for other code in the system.
I found this work interesting on Rowhammer, especially when its reverse bit is close to the page table split patch, but the Rowhammer attack requires a target: you must know the physical address of the bit you are trying to reverse in the memory, and the first step is that the physical address may be a virtual address, just like unmasking in KASLR) work.
Speculation: it affects major cloud providers
In the kernel email list that I can see, except for the name of the subsystem maintainer, the e-mail address belongs to employees of Intel, Amazon, and Google, this indicates that these two big cloud computing vendors are particularly interested in this, which provides us with a powerful clue that this work may be driven by virtualization security.
It may lead to more guesses: Virtual Machine RAM and virtual memory addresses used by these virtual machines are eventually represented as a large number of adjacent arrays on the host, those arrays, especially when there are only two tenants on a host, Xen and Linux kernel are determined by memory allocation, which may result in very high (accurate) predictable behavior.
My favorite guess: this is an attack that promotes privilege.
Combining these together is not difficult to predict. It may be the release of these bugs that we will use in 2018, or a similar system will push such urgent progress, the name of so many interested parties is displayed in the CC list of the patch set.
Finally, although I did not find what I wanted when I was reading the patch set, it is not affected to mark paravirtual or HVM Xen in some code.
People who eat melons say 2018 will be very interesting
These guesses are completely possible. They are very close to implementation, but it is certain that when these things are made public, it will be a very exciting week.
Via: http://pythonsweetness.tumblr.com/post/169166980422/the-mysterious-case-of-the-linux-page-table
Author: python sweetness Translator: qhwdw Proofreader: wxy
This article was originally compiled by LCTT and launched with the honor of Linux in China