Linux memory address space management Mm

Linux memory address space management Mm_struct

Last Update:2017-05-21 Source: Internet

Author: User

Tags goto

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

http://blog.csdn.net/yusiguyuan/article/details/39520933

Linux has a lot to do with memory management, and this article starts with the management of the virtual address space of the process. (The code on which it is based is 2.6.32.60)

Whether it is a kernel thread or a user process, for the kernel, it is nothing more than an instance of the TASK_STRUCT data structure, Task_struct is called the process descriptor, because it records all the context of the process. There is a data structure called ' Memory Descriptor ' (descriptor) mm_struct, which abstracts and describes all the information in the management process address space from the Linux perspective. Mm_struct is defined in Include/linux/mm_types.h, where the domain abstracts the address space of the process, as shown in: Each process has its own independent mm_struct, so that each process has an abstract, flat, independent 32-or 64-bit address space, Each process stores different data and does not interfere with the same address memory in its own address space. If the same address space is shared between processes, it is called Threads。 Where [Start_code,end_code] represents the address space range of the code snippet. [Start_data,end_start] represents the address space range of a data segment. [START_BRK,BRK] represents the starting space of the heap and the current heap pointer, respectively. [Start_stack,end_stack] represents the address space range of the stack segment. The mmap_base represents the starting address of the memory mapping segment. then why does the mmap segment have no end address? What is the BBS section used for? BBS represents all the non-initialized global variables, so that they only need to be anonymously mapped to ' 0 pages ', rather than the mapping from the disk file during program load, which reduces the size of elf binaries and increases the efficiency of program loading. Why is there no address space representation of BBS segment in mm_struct? In addition to this, Mm_struct also defines several important domains:

215        atomic_t mm_users;                      /* How many users with user space? */216        atomic_t Mm_count;                      /* How many references to "struct Mm_struct" (Users count as 1) */

These two counter appear to be similar at first glance, what is the difference in Linux usage? Reading the code is the best explanation.

681static int copy_mm (unsigned long clone_flags, struct task_struct * tsk) 682{683        struct mm_struct * mm, *OLDMM; 684        int retval;  692        tsk->mm = null; 693        tsk->active_mm = null; 694 695/*        696         * is we cloning a kernel thread? 697< c7/>* 698         * We need to steal a active VMS for that: 699         * * OLDMM        = current->mm; 701        if (!OLDMM) 7                703 return 0; 704        if (Clone_flags & CLONE_VM) {705                atomic_inc (&oldmm->mm_users); 706                mm = OLDMM; 707                goto good_mm; 708        }

No matter when we call Fork,vfork,clone, the Do_fork function is eventually called, except that Vfork and clone give copy_mm a CLONE_VM flag, which indicates that the parent-child process is running in the same ' Virtual address space ' above (as Linux is called Lightweight process or thread ), and of course share the same physical address space (Page Frames).

The COPY_MM function, if there is a CLONE_VM identity in the creation thread, indicates that the parent-child process shares the address space and the same memory descriptor, and only needs to mm_users the value +1, which means that mm_users represents the number of threads that are referencing the address space. is a thread level counter.

Where's mm_count? Mm_count's understanding is a bit complicated.

For Linux, the user process and kernel thread (kernel thread) are task_struct instances, the only difference being that kernel thread does not have a process address space and the kernel thread does not have a mm descriptor, so the kernel thread's tsk-> The MM field is empty (null). Kernel Scheduler in the process context switching, according to TSK->MM determine whether the process will be scheduled is a user process or kernel thread. But while thread thread does not have access to the user process address space, it still requires page table to access kernel's own space. Fortunately, for any user process, their kernel space is 100% the same, so the kernel can ' borrow ' the page table in mm of the called user process to access the kernel address, this mm is recorded in the active_mm.

In short, for kernel thread,tsk->mm = = NULL represents the identity of its kernel thread, and tsk->active_mm is the mm of the previous user process, using the MM page table to access the kernel space. For user processes, tsk->mm = = tsk->active_mm.

To support this particular, Mm_struct introduced another counter,mm_count. Just said mm_users indicates how many threads the process address space is shared with or referenced by, while Mm_count represents the number of times that the address space is referenced by the kernel thread +1.

For example, if a process A has 3 threads, the mm_users value of the mm_struct of A is 3, but Mm_count is 1, so mm_count is the counter of the process level. What is the use of maintaining 2 counter? Consider such a scenario, after the kernel dispatched a, switch to kernel kernel thread b,b ' borrow ' a mm description utilises access to the kernel space, then mm_count becomes 2, while another CPU core dispatched a and process a exit, this time mm_ Users become 0,mm_count to 1, but the kernel will not destroy the mm_struct because of mm_users==0, and the kernel will release mm_struct only when Mm_count==0. Because there is no user process using this address space at this time, there is no kernel thread referencing this address space.

449static struct mm_struct * mm_init (struct mm_struct * mm, struct task_struct *p) 450{451        Atomic_set (&mm->mm_ Users, 1); 452        Atomic_set (&mm->mm_count, 1);

When initializing a mm instance, both Mm_users and Mm_count are initialized to 1.

2994/*2995 * Context_switch-switch to the new MM and the new2996 * thread ' s register state.2997 */2998static inline void 2999context_switch (struct RQ *rq, struct task_struct *prev,3000               struct task_struct *next) 3001{3002        struct MM_ struct *mm, *oldmm;30033004        prepare_task_switch (RQ, Prev, next); 3005        Trace_sched_switch (RQ, Prev, next); 3006        mm = next->mm;3007        OLDMM = prev->active_mm;30143015        if (unlikely (!mm)) {3016                next->active_ MM = oldmm;3017                atomic_inc (&oldmm->mm_count); 3018                enter_lazy_tlb (OLDMM, next); 3019        } else3020                switch_mm (OLDMM, MM, next); 3021

The above code is a small segment of the context switch made by Linux scheduler, starting with unlike (!MM), next->active_mm = OLDMM indicates that if the kernel thread is going to be switched, ' Borrow the MM descriptor of the previous pro process and assign him to active_mm, focusing on the mm_counter of the ' borrowed ' mm descriptor plus 1.

Let's look at how the mm_struct is handled when the fork is in a process.

1362/*1363 *  Ok, this is the main fork-routine.1364 *1365 * It copies the process, and if successful kick-starts1366 * It and waits for it to finish using the VM if required.1367 */1368long do_fork (unsigned long clone_flags,1369              Unsigne D long stack_start,1370              struct pt_regs *regs,1371              unsigned long stack_size,1372              int __user *parent_tidptr, 1373              int __user *child_tidptr) 1374{1417        p = copy_process (Clone_flags, Stack_start, regs, stack_size,1418                         Child_tidptr, NULL, Trace);

Do_fork calls Copy_process.

973/* 974 * This creates a new process as a copy of the old one, 975 * but does not actually start it yet. 976 * 977 * It copies the registers, and all the appropriate 978 * parts of the process environment (as per the clone 979 * flags). The actual kick-off is left to the caller. 980 */981static struct task_struct *copy_process (unsigned long clone_flags, 982                                        unsigned long stack_start, 983                                        s Truct pt_regs *regs, 984                                        unsigned long stack_size, 985                                        int __user *child_tidptr, 986                                        struct PID *pid, 987                                        int trace) 988{1155        if ((retval = copy_mm (Clone_flags, p))) 1156                Goto bad_fork_cleanup_signal;

Copy_process call copy_mm, below to analyze copy_mm.

 681static int copy_mm (unsigned long clone_flags, struct task_struct * tsk) 682{683 struct mm_struct * mm, *OLDMM; 684 int retval; 685 686 Tsk->min_flt = Tsk->maj_flt = 0; 687 TSK-&GT;NVCSW = TSK-&GT;NIVCSW = 0; 688#ifdef config_detect_hung_task 689 Tsk->last_switch_count = TSK-&GT;NVCSW + tsk->nivcsw; 690#endif 691 692 tsk->mm = NULL; 693 tsk->active_mm = NULL; 694 695/* 696 * is we cloning a kernel thread? 697 * 698 * We need to steal a active VMS for that: 699 */OLDMM = current->mm; 701 if (!OLDMM) 702 return 0;                703 704 if (clone_flags & CLONE_VM) {705 atomic_inc (&oldmm->mm_users); 706 MM = OLDMM; 707 Goto good_mm; 708} 709 710 retval =-enomem; 711 mm = dup_mm (tsk); 712 if (!mm) 713 Goto Fail_nomem; 714 715good_mm:716/* IniTializing for Swap token stuff */717 mm->token_priority = 0; 718 mm->last_interval = 0; 719 720 tsk->mm = mm; 721 tsk->active_mm = mm; 722 return 0; 723 724fail_nomem:725 return retval; 726}

692,693 rows, the mm and active_mm of the child process or thread are initialized (NULL).

700-708 lines, that's what we said. If the thread is created, the new thread shares the MM of the creation process, so the following copy operation is not required.

The focus is on 711 rows of dup_mm (TSK).

 621/* 622 * Allocate a new mm structure and copy contents from the 623 * mm structure of the passed in task structure.  624 */625struct mm_struct *dup_mm (struct task_struct *tsk) 626{627 struct mm_struct *mm, *oldmm = current->mm; 628 int err; 629 630 if (!OLDMM) 631 return NULL; 632 633 mm = allocate_mm (); 634 if (!mm) 635 goto Fail_nomem; 636 637 memcpy (mm, OLDMM, sizeof (*MM)); 638 639/* Initializing for Swap token stuff */640 mm->token_priority = 0; 641 Mm->last_interval = 0; 642 643 if (!mm_init (mm, tsk)) 644 Goto Fail_nomem; 645 646 if (Init_new_context (tsk, mm)) 647 goto Fail_nocontext; 648 649 dup_mm_exe_file (OLDMM, MM); 650 651 Err = dup_mmap (mm, OLDMM); 652 if (err) 653 goto free_pt; 654 655 Mm->hiwater_rss = Get_mm_rss (mm); 656 MM-&GT;HIWATER_VM = mm->total_vm;     657 658   if (mm->binfmt &&!try_module_get (mm->binfmt->module)) 659 goto free_pt; 660 661 return mm;

633 rows, the Mm_struct memory object is allocated with slab.

637 rows, the mm_struct process of the child process is assigned a value equal to the parent process, which is the same value for each field in the process mm and parent process mm.

In the implementation of COPY_MM, mainly in order to implement the Semantics of Unix cow, so theoretically we only need the parent-child process mm start_x and end_x such as the same domain (like start_data,end_data), and the rest of the domain (like mm_users) Re-init is required, this operation is done mainly in Mm_init.

 449static struct mm_struct * mm_init (struct mm_struct * mm, struct task_struct *p) 450{451 Atomic_set (&mm-&gt ; mm_users, 1); 452 Atomic_set (&mm->mm_count, 1); 453 Init_rwsem (&AMP;MM-&GT;MMAP_SEM); 454 Init_list_head (&mm->mmlist); 455 mm->flags = (current->mm)? 456 (Current->mm->flags & Mmf_init_mask): Default_dump_filter; 457 mm->core_state = NULL; 458 mm->nr_ptes = 0; 459 Set_mm_counter (mm, File_rss, 0); 460 Set_mm_counter (mm, Anon_rss, 0); 461 Spin_lock_init (&mm->page_table_lock); 462 Mm->free_area_cache = task_unmapped_base; 463 mm->cached_hole_size = ~0ul; 464 Mm_init_aio (mm); 465 Mm_init_owner (mm, p); 466 467 if (Likely (!MM_ALLOC_PGD (mm))) {468 mm->def_flags = 0; 469 mmu_notifier_m M_init (mm); 470 return mm; 471} 472 473 free_mm (mm); 474 REturn NULL; 475}

One particular concern is the MM_ALLOC_PDG of 467-471 rows, which is the copy of page table, and page table is responsible for converting logic address to physical address.

The result of the copy is that the parent-child process has a separate page table, but each entries value inside the page table is the same, that is, the same logical address in the parent-child process's independent addressing space corresponds to the same physical address, This is the implementation of the cow (copy on write) semantics of the parent-child process.

In fact, the biggest overhead savings compared to vfork and fork are copies of the page table.

In kernel 2.6, fork is lossy in performance due to a copy of page table, so the kernel community has discussed the implementation of the Shared page table (http://lwn.net/Articles/149888/).

Linux memory address space management Mm_struct

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More