Linux kernel space-Understanding high-end memory

Last Update:2018-02-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Linux operating system and drivers run in kernel space, applications run in user space, both cannot simply use pointers to pass data, because Linux uses a virtual memory mechanism, user-space data may be swapped out, and when kernel space uses user-space pointers, The corresponding data may not be in memory.

Linux Kernel address mapping model

The x86 CPU uses a Segment-page address mapping model. The address in the process code is a logical address, and after a segment-page address mapping, the physical memory is actually accessed.

The paragraph-page mechanism is as follows.

Linux Kernel address space partitioning

Typically, the 32-bit Linux kernel address space is partitioned 0~3g to user space and 3~4g to kernel space. Note that this is a 32-bit kernel address space partition, and the 64-bit kernel address space partition is different.

The origin of the Linux kernel high-end memory

When kernel module code or thread accesses memory, the memory address in code is logical address, and corresponding to true physical memory address, need address one-to -one mapping, such as logical address 0xc0000003 corresponding physical address is 0x 3,0xc0000004 corresponds to the physical address of 0x4, ..., the logical address corresponds to the physical address of the relationship is

Physical Address = logical address –0xc0000000

Logical Address	Physical memory Address
0xc0000000	0x0
0xc0000001	0x1
0xc0000002	0x2
0xc0000003	0x3
...	...
0xe0000000	0x20000000
...	...
0xFFFFFFFF	0x40000000??

Assuming that the above simple address mapping relationship, then the kernel logical address space access is 0xc0000000 ~ 0xFFFFFFFF, then the corresponding physical memory range is 0x0 ~ 0x40000000, that is, only access to 1G of physical memory. If 8G physical memory is installed in the machine, the kernel will only be able to access the first 1G physical memory, and the 7G physical memory will be inaccessible because the kernel's address space has all been mapped to the physical memory address range 0x0 ~ 0x40000000. Even if 8G physical memory is installed, then the physical address is 0x40000001 memory, how to access the kernel? The code must have a memory logical address, 0xc0000000 ~ 0xFFFFFFFF of the address space has been exhausted, so the physical address 0x40000000 later memory can not be accessed.

It is not clear that the kernel address space 0xc0000000 ~ 0XFFFFFFF is all used for simple address mapping. So the x86 architecture divides the kernel address space into three parts: ZONE_DMA, Zone_normal, and Zone_highmem. Zone_highmem is the high-end memory, which is the origin of memory high-end memory concept.

In the x86 structure, three types of zones are as follows:

ZONE_DMA memory starts at 16MB

Zone_normal 16MB~896MB

zone_highmem 896MB ~ End

Linux kernel high-end memory understanding

Earlier we explained the origin of high-end memory. Linux divides the kernel address space into three parts ZONE_DMA, Zone_normal, and Zone_highmem, and the high-end memory High_mem address space ranges from 0xf8000000 to 0xFFFFFFFF (896MB~1024MB). So how can all physical memory be accessed if the kernel is how the 128MB high-end memory address space is implemented ?

When the kernel wants to access more than 896MB of physical address memory, find a corresponding size free logical address space from 0xf8000000 to 0xFFFFFFFF address space and borrow it for a while. Using this logical address space, create a map to the physical memory that you want to access (that is, populate the Kernel Pte page table), temporarily for a while, and then return when you're done. This allows others to use the address space to access other physical memory, allowing access to all physical memory using a limited address space. Such as.

For example, the kernel wants to access 2G of physical memory at the beginning of 1MB, that is, the physical address range is 0x80000000 ~ 0x800fffff. Find an idle address space of 1MB size before access, assuming that the free address space found is 0xf8700000 ~ 0xf87fffff, and this 1MB logical address space is mapped to the memory of the physical address space 0x80000000 ~ 0X800FFFFF. The mapping relationship is as follows:

Logical Address	Physical memory Address
0xf8700000	0x80000000
0xf8700001	0x80000001
0xf8700002	0x80000002
...	...
0xf87fffff	0x800fffff

When the kernel accesses 0x80000000 ~ 0X800FFFFF physical memory, it releases the 0xf8700000 ~ 0xf87fffff kernel linear space. This allows other processes or code to access other physical memory using the 0xf8700000 ~ 0xf87fffff address.

From the above description, we can know the most basic idea of high-end memory : Borrow an address space, set up a temporary address mapping, after the release, to reach the address space can be reused, access to all physical memory.

See here, can not help someone will ask: In case there is a kernel process or module has been occupied a certain period of logical address space is not released, what should I do? If this is the case, then the kernel's high-end memory address space is becoming more and more tense, if it is not released, then no mapping to physical memory is inaccessible.

Some office buildings in Tsim Sha Tsui, Hong Kong, have few toilets and have locks. If the customer wants to go to the restroom, they can take the key to the front desk and return the key to the front desk after convenience. This way, although there is only one washroom, it can satisfy all the customers ' need to go to the washroom. If a customer has been occupying the bathroom and the key is not returned, other customers will not be able to use the bathroom. The Linux kernel has a similar idea of high-end memory management.

Linux kernel high-end memory partitioning
The kernel divides high-end memory into 3 parts: Vmalloc_start~vmalloc_end, Kmap_base~fixaddr_start, and fixaddr_start~4g.

For high-end memory, the corresponding page can be obtained through alloc_page () or other functions, but to access the actual physical memory, the page must be converted to a linear address (why?). Consider how the MMU accesses physical memory, which means that we need to find a linear space for the page that corresponds to the high-end memory, a process called high-end memory mapping.

For 3 parts of high-end memory, there are three ways of high-end memory mapping:
Map to "Kernel Dynamic mapping Space" (noncontiguous memory allocation)
This is simple, because with Vmalloc (), when the "kernel Dynamic mapping Space" is requested for memory, it is possible to obtain a page from high-end memory (see VMALLOC implementation), so that high-end memory may be mapped to "kernel dynamic mapping Space".

Persistent kernel mapping (permanent kernel mapping)
How do I find a linear space for a page with a high-end memory corresponding to the Alloc_page ()?
The kernel specifically leaves a linear space for this, from Pkmap_base to Fixaddr_start, to map high-end memory. On the 2.6 kernel, this address range is 4g-8m to 4g-4m. This space is called the "kernel permanent mapping Space" or "permanent kernel mapping Space". This space and other space use the same page catalog table, for the kernel, is swapper_pg_dir, for ordinary processes, through the CR3 register point. Typically, this space is 4M in size, so just a page table is needed, and the kernel looks for the page table by pkmap_page_table. With Kmap (), you can map a page to this space. Since this space is 4M in size, you can map up to 1024 page at a time. Therefore, for unused page, and should be released from this space (that is, to de-map), through Kunmap (), a page corresponding to the linear address from the space release.

Temporary mapping (temporary kernel mapping)
The kernel retains some linear space between Fixaddr_start and fixaddr_top for special needs. This space is called a "fixed mapping space" in this space, with a subset of the temporary mappings for high-end memory.

This space has the following characteristics:
(1) Each CPU occupies a piece of space
(2) in the space occupied by each CPU, divided into several small spaces, each small space size is 1 page, each small space for one purpose, these purposes are defined in the kmap_types.h in the Km_type.

When you want to do a temporary mapping, you need to specify the purpose of the mapping, according to the mapping purposes, you can find the corresponding small space, and then the address of the space as a map address. This means that a temporary mapping will cause the previous mappings to be overwritten. Temporary mappings can be implemented by Kmap_atomic ().

can refer to: Linux high-end memory mapping and so on

Problems:

1. Does the user space (process) have a high-end memory concept?

User processes do not have high-end memory concepts. Only high-end memory exists in kernel space. A user process can access up to 3G of physical memory, and the kernel process can access all physical memory.

2. Is there high-end memory in the 64-bit kernel?

Currently, there is no high-end memory in the 64-bit Linux kernel because the 64-bit kernel can support more than 512GB of memory. If the machine installs more physical memory than the kernel address space, there will be high-end memory.

3. How much physical memory can the user process access? How much physical memory can the kernel code access?

The 32-bit system user process can access up to 3GB, and the kernel code can access all physical memory.

The 64-bit system user process can access more than 512GB, and the kernel code can access all physical memory.

4, high-end memory and physical address, logical address, linear address relationship?

High-end memory is only related to the logical address, not the logical address, the physical address is not directly related.

5. Why not allocate all the address space to the kernel?

If all the address space is given to memory, then how does the user process use memory? How to ensure that the kernel uses memory and user processes do not conflict?

(1) Let's ignore Linux support for segment memory mapping. In protected mode, we know that regardless of whether the CPU is running in a user or kernel state, the address that the CPU executor accesses is a virtual address, and the MMU must read the value in the control register CR3 as a pointer to the current page directory. This translates the virtual address into a real physical address based on the paging memory mapping mechanism (see related documents) to allow the CPU to actually access the physical address.

(2) for 32-bit Linux, each process has a 4G addressing space, but when a process accesses an address in its virtual memory space, how does it not confuse the virtual space of other processes? Each process has its own page directory pgd,linux the directory's pointer to the memory structure that corresponds to the process task_struct. (struct mm_struct) in MM-&GT;PGD. Each time a process is dispatched (schedule ()), the Linux kernel sets CR3 (SWITCH_MM ()) with the PGD pointer of the process.

(3) When creating a new process, create a new page directory PGD for the new process and copy the kernel interval page directory entries from the kernel's page directory Swapper_pg_dir to the corresponding location of the new Process page directory PGD, as follows:
Do_fork ()--copy_mm ()--mm_init ()--Pgd_alloc ()--set_pgd_fast ()--Get_pgd_slow ()--memcpy (& Amp PGD + USER_PTRS_PER_PGD, Swapper_pg_dir + USER_PTRS_PER_PGD, (PTRS_PER_PGD-USER_PTRS_PER_PGD) * sizeof (pgd_t))
In this way, the page directory of each process is divided into two parts, the first part of the "User space" to map its entire process space (0x0000 0000-0xbfff FFFF) is the virtual address of 3G bytes, the second part is "system space" for mapping (0xc000 0000-0xffff FFFF) 1G bytes of virtual address. It can be seen that the second part of the page directory of each process in the Linux system is the same, so from a process point of view, each process has 4G bytes of virtual space, the lower 3G bytes are its own user space, the highest 1G bytes are the system space shared with all processes and the kernel.

(4) Now suppose we have the following scenario:
In process A, set the host name of the computer in the network by system call SetHostName (const char *name,seze_t len).
In this scenario, we are bound to involve the transfer of data from the user space to the kernel space, where name is the address in the user space, which is set to an address in the kernel through the system call. Let's take a look at some of the details of this process: the specific implementation of the system call is to put the parameters of the system call into the register Ebx,ecx,edx,esi,edi (up to 5 parameters, the scenario has two name and Len), and then the system call number is stored in the register eax, Process A is then brought into system space by the interrupt instruction "int 80". Since the CPU run level of the process is less than or equal to the ingress level 3 of the trap gate set for the system call, it is possible to enter the system space unimpeded to execute the function pointer System_call () set for int 80. Since System_call () is in kernel space, its runlevel is 0,cpu to switch the stack to the kernel stack, which is the system space stack for process a. We know that when the kernel creates the TASK_STRUCT structure for the new process, it allocates two contiguous pages, which is the size of 8 K, and uses the size of about 1k at the bottom for task_struct (such as # define ALLOC_TASK_STRUCT () (struct task _struct *) __get_free_pages (gfp_kernel,1)), while the rest of the memory is used in the stack space of the system space, that is, when the system space is transferred from user space, the stack pointer esp becomes (ALLOC_TASK_STRUCT () + 8192), which is why system space usually defines the current (see its implementation) with a macro to get the task_struct address of the present process. Each time the process enters the system space from the user space, the system stack has been pressed into the user stack SS, the user stack pointer esp, EFLAGS, user space CS, EIP, then System_call () eax Press in, and then call Save_all in turn into ES, DS , EAX, EBP, EDI, ESI, EDX, ECX, EBX, and then call Sys_call_table+4*%eax, this scenario is sys_sethostname ().

(5) in Sys_sethostname (), after some protection considerations, call Copy_from_user (To,from,n), where to points to the kernel space system_ Utsname.nodename, such as 0xe625a000,from, point to user space such as 0x8010fe00. Now that process a enters the kernel and runs in the system space, the MMU completes the mapping of the virtual address to the physical address according to its PGD, and finally completes the replication from the user space to the system spatial data. Before copying the kernel to determine the validity of the user space address and length, as to the user space from the beginning of a certain length of the entire interval has been mapped and not to check, if an address in the interval is not mapped or read and write permissions and other problems occur, it is considered a bad address, resulting in a page exception, Let the page exception service program handle it. The process is as follows: Copy_from_user ()->generic_copy_from_user ()->access_ok () +__copy_user_zeroing ().

(6) Summary:
* Process Addressing space 0~4g
* The process can only access 0~3g in the user state, only access to the kernel state 3g~4g
* Process enters kernel state via system call
* The 3g~4g portion of each process virtual space is the same
* Process from the user state into the kernel state will not cause CR3 changes but will cause the stack changes

Linux simplifies the segmentation mechanism so that the virtual address is always consistent with the linear address, so the virtual address space for Linux is also 0~4g. The Linux kernel divides this 4G-byte space into two parts. The highest 1G bytes (from virtual addresses 0xc0000000 to 0xFFFFFFFF) are used by the kernel, which is called "kernel space". Instead, the lower 3G bytes (from the virtual address 0x00000000 to 0xBFFFFFFF) are used by each process, called "User space." Because each process can enter the kernel through system calls, the Linux kernel is shared by all processes within the system. Thus, from a specific process perspective, each process can have a virtual space of 4G bytes.
Linux uses level two protection: level 0 for kernel use, and 3 for user programs. As you can see (it is not possible to represent a graph), each process has its own private user space (0~3G), which is invisible to other processes in the system. The highest 1GB bytes virtual kernel space is shared by all processes and the kernel.
1. Mapping of virtual kernel space to physical space
Kernel space is the kernel code and data, while the process of user space is stored in the user program code and data. Both the kernel space and the user space are in virtual space. Readers will ask, when the system starts, the kernel code and data is not loaded into physical memory? Why are they also in virtual memory? This is related to the compiler, which we will understand later through the concrete discussion.
Although kernel space occupies up to 1GB bytes in each virtual space, mapping to physical memory always starts with the lowest address (0x00000000). For kernel space, the address map is a very simple linear mapping, 0xc0000000 is the physical address and linear address between the amount of displacement, in the Linux code is called Page_offset.

Let's take a look at the description and definition of address mappings in kernel space in include/asm/i386/page.h:
/*
* This handles the memory map. We could make this a config
* option, but too many people screw it up, and too few need
* it.
*
* A __page_offset of 0xC0000000 means that the kernel have
* A virtual address space of one gigabyte, which limits the
* Amount of physical memory you can use for about 950MB.
*
* If you want more physical memory than this and see the CONFIG_HIGHMEM4G
* and CONFIG_HIGHMEM64G options in the kernel configuration.
*/

#define __PAGE_OFFSET (0xC0000000)
......
#define PAGE_OFFSET ((unsigned long) __page_offset)
#define __PA (x) ((unsigned long) (x)-page_offset)
#define __VA (x) ((void *) ((unsigned long) (x) +page_offset)
Note In the source code, if your physical memory is greater than 950MB, then you need to compile the kernel with the config_highmem4g and CONFIG_HIGHMEM64G options, this situation we do not consider. If the physical memory is less than 950MB, for kernel space, given a virtual address X, its physical address is "X-page_offset", given a physical address X, its virtual address is "x+ page_offset".
Once again, macro __PA () maps a virtual address of a kernel space to a physical address, but never to a user space, and the address mapping of the user space is much more complex.
2. Kernel image
In the following description, we call the kernel image (kernel image) code and data. When the system starts, the Linux kernel image is installed at the beginning of the physical address 0x00100000, which is the 1MB interval (1th m is reserved for it). However, during normal operation, the entire kernel image should be in the virtual kernel space, so that when the connector connects to the kernel image, it adds an offset of page_offset to all the symbolic addresses, so that the kernel image has a starting address of 0xc0100000 in the kernel space.
For example, a process's page directory PGD (which is part of a kernel data structure) is in kernel space. In the process of switching, to set the register CR3 exponentially to the new Process page directory PGD, and the starting address of the directory in the kernel space is a virtual address, but CR3 need is a physical address, this time to use __PA () for address translation. In Mm_context.h, there is this line of statements:
ASM volatile ("Movl%0,%%CR3":: "R" (__pa (NEXT-&GT;PGD));
This is a line of embedded assembly code, which means that the next Process page directory start address NEXT_PGD, through __PA () to the physical address, stored in a register, and then use the MOV instruction to write it to the CR3 register. After the processing of this line of statements, CR3 points to the new process next page directory table PGD.

Linux kernel space-Understanding high-end memory

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux kernel space-Understanding high-end memory

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linux kernel space-Understanding high-end memory

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support