I have already talked about how the kernel manages physical memory. But the fact is that the kernel is the core of the operating system, not only managing its own memory, but also managing the address space of the process. The Linux operating system uses the virtual memory technology. All processes share the memory in a virtual way. The process address space is composed of linear address zones in each process, and more importantly, the kernel allows the process to use the addresses in the space. Generally, each process has a unique address space, and the address space of the process is irrelevant to each other. However, you can also select a shared address space between processes. Such a process is called a thread.
The kernel uses the memory descriptor structure to represent the address space of the process, represented by the mm_struct struct, which is defined in Linux/sched. H, as follows:
Struct mm_struct {struct vm_area_struct * MMAP;/* List of memory areas */struct rb_root mm_rb;/* red-black tree of VMAs */struct vm_area_struct * mmap_cache; /* last used memory area */unsigned long free_area_cache;/* 1st address space hole */pgd_t * PGD;/* Page global directory */atomic_t mm_users; /* address space users */atomic_t mm_count;/* primary usage counter */INT map_count;/* nu Mber of memory areas */struct rw_semaphore mmap_sem;/* memory area semaphore */spinlock_t page_table_lock;/* page table lock */struct list_head mmlist; /* List of all mm_structs */unsigned long start_code;/* start address of code */unsigned long end_code;/* final address of code */unsigned long start_data; /* start address of Data */unsigned long end_data;/* final address of Data */UN Signed long start_brk;/* start address of heap */unsigned long BRK;/* final address of heap */unsigned long start_stack;/* start address of stack */unsigned long arg_start; /* Start of arguments */unsigned long arg_end;/* end of arguments */unsigned long env_start;/* Start of Environment */unsigned long env_end; /* end of Environment */unsigned long RSS;/* pages allocated */unsigned Long total_vm;/* Total Number of pages */unsigned long locked_vm;/* Number of locked pages */unsigned long def_flags;/* default access flags */unsigned long cpu_vm_mask; /* lazy TLB switch mask */unsigned long swap_address;/* last scanned address */unsigned dumpable: 1;/* Can this mm core dump? */INT used_hugetlb;/* used hugetlb pages? */Mm_context_t context;/* arch-specific data */INT core_waiters;/* thread core dump waiters */struct completion * core_startup_done;/* Core start completion */struct completion core_done; /* Core end completion */rwlock_t ioctx_list_lock;/* aio I/O list lock */struct kioctx * ioctx_list;/* aio I/O list */struct kioctx default_kioctx; /* AIO default I/O context */};
Mm_users records the number of processes that are using this address (for example, if two processes are in use, 2 is used ). Mm_count is the primary reference count of this structure. As long as mm_users is not 0, it is 1. If the value is 0, the latter is 0. In this case, it means that there is no reference pointing to the mm_struct struct, and the structure is destroyed. The kernel uses these two counters at the same time to distinguish between the number of processes that use the counter and the address space. MMAP and mm_rb describe the same object: all memory areas in the address space. The difference is that the former is linked list, and the latter is organized in the form of a red/black tree. All mm_struct schemas are connected to a two-way linked list through its own mmlist domain. The first element of the Chain List is the init_mm memory descriptor, which represents the address space of the INIT process. In addition, you must use the mmlist_lock lock to prevent concurrent access when operating the linked list. The lock is defined in the file kernel/fork. C. The total number of memory descriptors is in the mmlist_nr global variable, which is also defined in the fork. c file.
The process descriptor I mentioned earlier contains a mm domain, which stores the memory descriptor used by the process. The current-> MM can point to the memory descriptor of the current process. The fork function uses the copy_mm () function to replicate the memory descriptor of the parent process, while the mm_struct struct in the child process actually uses the file kernel/fork. the allocate_mm () macro in C is allocated from the mm_cachu slab cache. Generally, each process has a unique mm_struct structure.
As mentioned above, in Linux, processes and threads are actually the same. The only difference is whether to share the address space here. This can be achieved through the clone_vm flag. The Linux kernel does not treat them differently. A thread is just a process that shares specific resources with the kernel. Well, if you set this flag, it seems that many problems have been solved. The allocate_mm function is no longer required. In addition, in the copy_mm () function, point the MM domain to the memory descriptor of its parent process, as shown below:
If (clone_flags & clone_vm) {/** current is the parent process and * tsk is the child process during a fork () */atomic_inc (& Current-> MM-> mm_users); tsk-> MM = Current-> mm ;}
Finally, when the process exits, the kernel calls the exit_mm () function, which calls the matrix () function to reduce the mm_users user count in the memory descriptor. If the count is reduced to 0, call the mmdrop function to reduce the usage of the mm_count. If the count is also 0, call the free_mm () macro to return the mm_struct struct to the mm_cache_free () cache through the kmem_cache_free () function.
but for the kernel, there is no process address space or memory descriptor for the kernel thread, and the MM domain in the process descriptor corresponding to the kernel thread is also empty. However, the kernel thread still needs to use some data, such as the page table. To avoid wasting memory for the memory Descriptor and page table in the kernel thread, to avoid wasting the processor cycle switching to the new address space, the kernel thread will directly use the memory descriptor of the previous process. Recall the process scheduling problem I just mentioned. When a process is scheduled, the address space pointed to by the MM domain in the process structure will be loaded into the memory, the active_mm field in the process descriptor is updated and points to the new address space. However, the kernel here does not have a mm domain (empty). Therefore, when a kernel thread is scheduled, the kernel finds that its mm domain is null, the address space of the previous process is retained, and then the Kernel updates the active domain in the process descriptor corresponding to the kernel thread so that it refers to the memory descriptor of the previous process. Therefore, when necessary, the kernel thread can use the page table of the previous process. Because the kernel threads may wish to ask about the user space memory, they only use information related to the address space and the kernel memory. The meaning of this information is exactly the same as that of common processes.
the memory area is described by the vm_area_struct struct, which is defined in Linux/mm. h, the memory area is often referred to as the virtual memory area or VMA in the kernel. it describes an independent memory range in the continuous interval of the specified address space. The kernel manages each memory area as a separate memory object, and each memory area has consistent attributes. Struct:
Struct vm_area_struct {struct mm_struct * vm_mm;/* associated mm_struct */unsigned long vm_start;/* VMA start, aggressive */unsigned long vm_end;/* VMA end, exclusive */struct vm_area_struct * vm_next;/* List of VMA's */pgprot_t privileges;/* Access Permissions */unsigned long vm_flags;/* flags */struct rb_node vm_rb; /* VMA's node in the tree */Union {/* links to address_space-> I _mmap or I _mmap_nonlinear */struct {struct list_head list; void * parent; struct vm_area_struct * head;} vm_set; struct prio_tree_node;} shared; struct list_head comment;/* anon_vma entry */struct anon_vma * anon_vma; /* anonymous VMA object */struct vm_operations_struct * vm_ops;/* associated ops */unsigned long vm_pgoff;/* offset within file */struct file * vm_file;/* mapped file, if any */void * vm_private_data;/* Private Data */};
Each memory descriptor corresponds to a unique interval in the address process space. The vm_mm domain points to the mm_struct struct related to VMA. Two independent processes map the same file to their respective address spaces. Each of them has a vm_area_struct struct to indicate their own memory areas. However, if the two threads share an address space, then they also share all the vm_area_struct struct.
The above vm_flags domain stores the VMA mark, which indicates the page behavior and information contained in the memory area and reflects the code of conduct to be followed by the kernel processing page, as shown in the following table:
The above table is already quite detailed and I will not mention it. In the vm_area_struct struct, The vm_ops field points to the operation function table related to the specified memory area in the domain. The kernel uses the method in the table to operate VMA. Vm_area_struct represents any type of memory area as a common object, and the operation table describes specific methods for specific object instances. The operating function table is represented by the vm_operations_struct struct and is defined in Linux/mm. H, as follows:
Struct detail {void (* open) (struct vm_area_struct *); void (* close) (struct vm_area_struct *); struct page * (* nopage) (struct vm_area_struct *, unsigned long, INT); int (* populate) (struct vm_area_struct *, unsigned long, unsigned long, pgprot_t, unsigned long, INT );};
Open: This function is called when the specified memory area is added to an address space. Close: This function is called when the specified memory area is deleted from the address space. Nopages: When the page to be accessed is not in the physical memory, this function is processed by PAGE errors.ProgramCall. Populate: This function is called by the system to call remap_pages to pre-map a new ing for the page missing interruption to occur. |
Good memory, you must remember that the MMAP and mm_rb fields in the memory descriptor point independently to all memory region objects related to the memory descriptor. They contain pointers of identical vm_area_struct struct, which are only organized in different ways. The former is organized as a linked list, and all regions are sorted in the direction of address growth. The MMAP domain points to the first memory area in the linked list, and the last VMA struct pointer in the chain points to null. The mm_rb domain uses the red-black tree to connect all memory region objects. It points to the red-black root node. Each vm_area_struct struct in the address space is connected to the tree through its vm_rb domain. I will not elaborate on the structure of the red/Black binary tree. I may discuss this issue in detail in the future. The kernel uses these two structures to represent the same memory area, mainly because the linked list structure facilitates traversing all nodes, while the red/black tree structure facilitates locating nodes in specific memory areas in the address space. I can use the/proc file system and pmap tool to view the memory space of a given process and the memory areas contained in it. I will not go into detail here.
The kernel also provides an API for operating on the memory area, which is defined in Linux/mm. h:
(1) In find_vma <defined in mm/MMAP. c>, this function searches for a memory region where vm_end is greater than ADDR in the specified address space. In other words, this function looks for the first include ADDR or the memory area with the first address greater than ADDR. If no such area is found, this function returns NULL; otherwise, the vm_area_struct structure pointing to the matched memory area is returned. Body pointer. (2) find_vma_prev (). the Function Definition and declaration are respectively in the mm/MMAP file. linux/mm. in H, it works in the same way as find_vma (), but returns the first Addr vma. (3) find_vma_intersection (). In the Linux/mm. h file, the first VMA with the specified address range is returned. This function is an restrained function. |
The following two functions are very important. They are responsible for creating and deleting address spaces.
The kernel uses the do_mmap () function to create a new linear address space. However, if the created address range is adjacent to an existing address range and they have the same access permission, the two ranges are merged into one. If it cannot be merged, a new VMA needs to be created, but in either case, the do_mmap () function will add an address range to the address space of the process. This function is defined in Linux/mm. H, as follows:
Unsigned long do_mmap (struct file * file, unsigned long ADDR, unsigned long Len, unsigned long Prot, unsigned long flag, unsigned long offset)
In this function, the file is specified by the file. The specific ing is the data in the file starting from the offset, and the length is within the LEN byte range, if the file parameter is null and the offset parameter is 0, it indicates that the ing is not related to the file. In this case, it is called anonymous ing. If the file and offset are specified, the ing is called file-backed mapping. The prot parameter specifies the page access permission in the memory area, these access permissions are defined in ASM/Mman. h, as follows:
The flag parameter specifies the VMA flag, which is defined in ASM/Mman. H, as follows:
If the do_mmap parameter is invalid, the system returns a negative value. Otherwise, it allocates a suitable new memory area in the virtual memory. If possible, merge the new and adjacent regions; otherwise, the kernel is from vm_area_cach
A vm_area_struct struct is allocated to the EP long-byte cache, And the vma_link () function is used to add the newly allocated memory areas to the memory area linked list and the red/black tree of the address space, then, the total_vm domain in the memory descriptor is updated before the initial address of the newly allocated address range is returned. In the user space, we can call the MMAP () system to obtain the do_mmap () function of the kernel function. This is detailed in the advanced programming of the UNIX environment. I am sorry to continue. Let's continue.
Now that we have created, we have to delete it, right? The do_mummp () function does this. It deletes the specified address space from a specific process address space. This function is defined in the file Linux/mm. H, as follows:
Int do_munmap (struct mm_struct * Mm, unsigned long start, size_t Len)
The first parameter specifies the address space in which the region to be deleted is located. The address space starting from the address start with the length of len bytes is deleted. If yes, 0 is returned. Otherwise, a negative error code is returned. The corresponding user space system call is munmap.
The last point is as follows: page table
We know that the objects operated by the application are virtual memory mapped to the physical memory, but the processor directly operates the physical memory. Therefore, when an application accesses a virtual address, it must first convert the virtual address to a physical address before the processor can parse the address access request. This conversion can only be completed through querying pages. In summary, address translation requires virtual address segments so that each segment of virtual address points to the page table as an index, the page table item points to the next level of page table or to the final physical page. In Linux, the three-level page table is used to complete address conversion. In most architectures, the work of searching a page table is completed by hardware. The following table describes the process of finding a physical address through a page table:
In the above figure, the top-level page table is the global page Directory (PGD), the second-level page table is the Center page Directory (PMD), and the last level is the page table (PTE). The page table structure points to the physical page. The struct corresponding to the page table is defined in the file ASM/page. h. To speed up searching, a fast table (TLB) is implemented in Linux. Essentially, a buffer is used as a hardware cache to map virtual addresses to physical addresses, when a request accesses a virtual address, the processor first checks whether the tling from the virtual address to the physical address is cached in TLB. If yes, the physical address is immediately returned. Otherwise, you need to search for the required physical address through the page table.