kernel space and user space
Modern operating systems use virtual memory, for 32-bit operating systems, its addressing space (virtual storage space) is 4G (2 of 32). The core of the operating system is the kernel, independent of ordinary applications, access to protected memory space, and access to the underlying hardware devices. In order to ensure that the user process can not directly operate the kernel and guarantee the kernel security, the operating system divides the virtual space into two parts, one part is kernel space and the other is user space. For the Linux operating system, the highest 1G bytes (from the virtual address 0xc0000000 to 0xFFFFFFFF) for the kernel to use, called kernel space, and the lower 3G bytes (from the virtual address 0x00000000 to 0xBFFFFFFF) for each process to use, Called User space. Each process can be entered into the kernel through system calls. In the Linux system, the process of user space is independent, and kernel space is common, process switching, user space switch, the kernel space unchanged.
With the division of user space and kernel space, the entire Linux internal structure can be divided into three parts, from the bottom to the top of the sequence is: hardware-> kernel space-> user space, as shown in the following figure:
User state and kernel state
When a process executes a system call and falls into the kernel code to execute, it is said to be in the kernel running state (the kernel State). The processor is now executing in the highest privileged (level 0) kernel code. When the process is in the kernel state, the kernel code that executes uses the kernel stack of the current process. Each process has its own kernel stack.
When executing the user's own code, it is said to be in user-run state (user state). At this point the processor runs in the lowest level of privilege (level 3) user code. When a user program is being executed and suddenly interrupted, the user program can also be symbolically called the in-progress kernel, because the interrupt handler uses the kernel stack of the current process.
Statement: The above two parts are organized from: http://www.cnblogs.com/Anker/p/3269106.html logical address, linear address and physical address
Before explaining the high-end memory and memory mapping, review what is the logical address, linear address and physical address, if you know, you can skip directly. Logical Address
A logical address (Logical addresses) is the part of an offset address that is generated by a program and that is related to segments. For example, you are in the C language pointer programming, can read the pointer variable itself value (& operation), in fact, this value is the logical address, which is relative to your current process data section of the address, does not correspond to the absolute physical address. Only in Intel real mode, the logical address is equal to the physical address (because the real mode does not have a segmented or paging mechanism and the CPU does not have automatic address translation); The logic is that the program executes the offset address within the code snippet length in Intel protection mode (assuming that the code snippet, data segment is identical). The application programmer only needs to deal with the logical address, and the segmentation and paging mechanisms are completely transparent to you and are only covered by system programmers. Although the application programmer can manipulate the memory directly, it can only operate on the memory segment assigned to you by the operating system. Linear Address
Linear addresses (Linear address) are the middle layer between the logical address and the physical address transformation. The code generates a logical address, or an offset address in the paragraph, and a linear address is generated by adding the base address of the corresponding segment. If the paging mechanism is enabled, then the linear address can be transformed to produce a physical address. If the paging mechanism is not enabled, then the linear address is directly the physical address. Intel 80386 has a linear address space capacity of 4G (2 of 32 times 32 address bus select、read addressing). Physical Address
The Physical address (physical addresses) is the address signal that indicates the physical memory addressed on the external address bus select、read of the current CPU, and is the final result address of the address transformation. If the paging mechanism is enabled, linear addresses are transformed into physical addresses using items in the page directory and page tables. If the paging mechanism is not enabled, then the linear address becomes the physical address directly. Virtual Address
Virtual Memory refers to the amount of memory that a computer presents that is much larger than it actually has. So he allows programmers to compile and run programs that are much larger than the actual system. This allows many large projects to be implemented on systems with limited memory resources. A very appropriate analogy: you don't have to have a very long track to get a train from Shanghai to Beijing. You only need a long enough track (say 3 km) to complete the task. The way to do this is to put the back rails in front of the train immediately, as long as you are fast enough to meet the demand, the train can be like in a complete track running. This is the task that virtual memory management needs to accomplish. In the Linux0.11 kernel, each program (process) is divided into virtual memory space with a total capacity of 64MB. Therefore, the logical address range of the program is 0x0000000 to 0x4000000. Sometimes we also refer to the logical address as a virtual address. Because similar to the concept of virtual memory space, logical addresses are not related to the actual physical memory capacity. The "gap" between logical and physical addresses is 0xc0000000, due to the fact that the virtual address-> linear address-> Physical address mapping is exactly the same value. This value is specified by the operating system. The mechanism logical address (or virtual address) to a linear address is automatically converted by the CPU's segment mechanism. If paging management is not turned on, the linear address is the physical address. If paging management is turned on, the system program requires the conversion process of the parameter and the linear address to the physical address. This is done by setting up the page Catalog table and page table entries.
Statement: The above section is excerpted from: The origin of http://blog.csdn.net/do2jiang/article/details/4512417 high-end memory
In the traditional Linux x86 32-bit system, when the kernel module's code or the thread accesses the memory, the memory address in the code is the logical address, and the one by one mapping of the address is required when it corresponds to the real physical memory address. If the logical address bit 0xc0000003, then the corresponding physical address is 0x3, if the logical address bit 0xc0000004, then the corresponding physical address is 0x4, so the relationship between physical address and logical address is as follows:
Physical Address = logical address –0xc0000000
Depending on the address translation relationship of the kernel address space above, note that the virtual address of the kernel is "high-end", but the physical memory address of the TA map is at the low end. It will be found that the kernel module can access the logical address of 0XC0000000-0XFFFFFFFF, the corresponding physical address is 0x00000000-0x40000000, a total of 1G of memory. In other words, if the computer's total physical memory is greater than 1G, the kernel will not be accessible by the mapping above, above the 1G portion. In order to solve this situation, there is a high level of memory to say.
Because the internal and spatial 1G memory can not directly do one by one mapping, the Linux kernel will be divided into three parts of the kernel space, respectively: Zone_dma,zone_normal and Zone_highmem. The memory allocations for these three zones are as follows:
ZONE_DMA |
16MB space for memory start |
Zone_normal |
16mb-896mb |
Zone_highmem |
896mb-End (1G) |
The understanding of high memory
The last section says that high-end memory is used to solve problems where the kernel cannot access more than 1G of memory address space. So how exactly is it achieved. In general, it is very simple, when the kernel needs to access more than 1G of memory space, such as the kernel needs to access 0X50000000-0X500FFFFF this 1MB memory space, only to the Zone_highmem in this area to temporarily request a 1MB of memory space, It is then mapped to the area of memory that needs to be accessed above. When the kernel is used, the 1MB memory space that is released from the application completes the access to more than 1G memory space. Memory Mappings (mmap) mmap Basic Concepts
Mmap is a memory-mapped file that maps a file or other object to the address space of the process, implementing a one by one-pair relationship between the file disk address and a virtual address in the process virtual address space. When this mapping relationship is implemented, the process can read and write the memory in a pointer way, and the system will automatically write dirty pages to the corresponding file disk, that is, the operation of the file is done without having to call the Read,write system call function. In contrast, the kernel space changes to this area directly reflect user space, which enables file sharing among different processes. As shown in the following illustration:
As can be seen from the diagram above, the virtual address space of the process is composed of multiple virtual memory regions. Virtual memory area is a homogeneous interval in the virtual address space of a process, that is, a contiguous address range with the same characteristics. The text data segment (code snippet), initial data segment, BSS data segment, heap, stack, and memory map shown in the previous illustration are separate virtual memory regions. The address space for the memory-mapped service is in the spare part of the stack.
The Linux kernel uses the VM_AREA_STRUCT structure to represent a separate virtual memory area, and because each of the different virtual memory region functions and internal mechanisms are different, a process uses multiple vm_area_struct structures to represent different types of virtual memory regions respectively. Each VM_AREA_STRUCT structure uses a linked list or tree-structured link to facilitate fast access to the process, as shown in the following illustration:
The VM_AREA_STRUCT structure contains the start and end addresses of the zones and other relevant information, as well as a vm_ops pointer, which internally leads to all system call functions that can be used for the zone. In this way, any operation of a virtual memory region that the process needs to use can be obtained from the vm_area_struct. The Mmap function is to create a new vm_area_struct structure and connect it to the physical disk address of the file. Take a look at the next section for specific steps. mmap Memory Mapping principle
The implementation process of mmap memory mapping can be divided into three phases in general:
The process initiates the mapping process and creates a virtual map zone process for the mappings in the virtual address space call library functions in user space mmap, prototypes: void Mmap (void start, size_t length, int prot, int flags, int fd, off_t Offset); In the virtual address space of the current process, look for a contiguous virtual address that is free to meet the requirements assign a VM_AREA_STRUCT structure to this virtual zone, and then initialize each domain of the structure to create a new virtual zone structure (Vm_area_ struct) Insert the virtual address area list or tree of the process
System Call function mmap (different from user space functions) calling kernel space one by one mapping relationship between the physical address of the file and the virtual address of the process after assigning a new virtual address area to the map, the file descriptor is found in the File descriptor table, and the file descriptor is linked to the kernel The file structure (struct file) of the file in the open file set, and each file structure maintains information related to this open file. The file structure of the file, linked to the File_operations module, calls the kernel function mmap, which is the prototype: int mmap (struct file Filp, struct vm_area_struct), different from User space library functions. The kernel mmap function locates the physical address of the file disk through the virtual file system Inode module. The page table is established through the Remap_pfn_range function, which realizes the mapping relationship between the file address and the virtual address area. At this point, the virtual address does not have any data associated with the main memory.
The process initiates access to this map space, triggering a page-fault anomaly, and implementing a copy of the file contents to physical memory (main storage)
Note: The first two phases only create the virtual interval and complete the address mapping, but do not copy any file data to main memory. True file reads are when a process initiates a read or write operation. The read or write operation of the process accesses the address space of the virtual address, and by querying the page table, it is found that the address is not on the physical page. Since only the address map has been established, the real hard drive data has not been copied into memory, thus causing a page fault exception. The fault of the page is determined by a series of judgments, and the kernel initiates the request paging process after no illegal operation. The paging process first looks for memory pages that need to be accessed in the swap cache space (swap), and if not, calls the Nopage function to load the missing pages from the disk into main memory. After the process can be read or write the main memory of the operation, if the write operation changed its content, a certain amount of time after the system will automatically write dirty pages to the corresponding disk address, that is, completed the process of writing to the file.
Note: modified dirty pages are not immediately updated back to the file, but are delayed for a period of time and can be invoked Msync () to force synchronization so that the written content is immediately saved to the file.
Statement: The above chapter is excerpted from the following: Http://www.cnblogs.com/huxiao-tee/p/4660352.html vm_struct and Vm_area_struct
As for the two structures of vm_struct and vm_area_struct, it is necessary to explain briefly that vm_struct and vm_area_struct are used to represent a continuous virtual address space, but can be discontinuous when mapped to a physical address space. Second, the virtual address represented by Vm_area_struct is used for the process, and the virtual address represented by Vm_struct is used for the kernel. As you can see from the above, the address of the kernel space is divided into three parts, ZONE_DMA, Zone_normal and Zone_highmem, where the first two parts are used to map the physical address one by one, and the Zone_ HIGHMEM uses a temporary lease and mapping method to manage memory above 1G, and the kernel virtual address used by Vm_struct is the zone_highmem part of the address.
Http://blog4jimmy.com/2018/01/348.html