Mastering the Linux kernel design idea (13): Memory management process address space

Source: Internet
Author: User

"Copyright Notice: respect for the original, reproduced please retain the source: blog.csdn.net/shallnet. The article is for academic communication only and should not be used for commercial purposes "

The process address space consists of process-addressable virtual memory, and the virtual address space for Linux is 0~4g bytes (Note: This section describes all 32 as an example). The Linux kernel divides this 4G-byte space into two parts. The highest 1G bytes (from virtual addresses 0xc0000000 to 0xFFFFFFFF). For kernel use, called "kernel space".

Instead, the lower 3G bytes (from the virtual address 0x00000000 to 0xBFFFFFFF) are used by each process, called "User space."

Because each process can enter the kernel through system calls. Therefore, the Linux kernel is shared by all processes within the system. So, from a detailed process point of view. Each process can have a virtual space of 4G bytes.

Although a process can address 4G of virtual memory, it does not mean that it has access to all of the address space, the virtual memory space must be mapped to a physical storage space (memory or disk space), to really be able to be used.

The process can only access the legitimate address space, assuming that a process has access to an illegal address space. The kernel terminates the process. and returns "segment error".

Where is the legal address space for virtual memory? Let's take a look at the partition of the process virtual address space:


The stack is arranged at the top of the virtual address space, and the data segment and code snippet are distributed at the bottom of the virtual address space. The empty part is the space that can be dynamically distributed when the process executes. Contains the mapping kernel address space content, the dynamic requisition address space, the shared library code or data, and so on.

In the virtual address space, only addresses that are mapped to physical storage space are valid address spaces. Each piece of legal address space fragment corresponds to a separate virtual memory area (vma,virtual areas). The process address space of the process is made up of these memory regions.

Linux uses a complex data structure to track the virtual address of a process, and the process address space is represented by a memory-descriptive descriptor structure. The memory descriptive descriptor is represented by the MM_STRUCT structure, which is represented in the <include/linux/mm_types.h> file:
struct Mm_struct {struct vm_area_struct * mmap;    /* List of VMAs */struct rb_root mm_rb;    struct vm_area_struct * mmap_cache; /* Last FIND_VMA result */unsigned long (*get_unmapped_area) (struct file *filp, unsigned long addr, un    Signed long Len, unsigned long pgoff, unsigned long flags);    void (*unmap_area) (struct mm_struct *mm, unsigned long addr);        unsigned long mmap_base;        /* Base of mmap area */unsigned long task_size;     /* Size of Task VM space */unsigned long cached_hole_size;        /* If Non-zero, the largest hole below Free_area_cache */unsigned long free_area_cache;    /* First hole of size cached_hole_size or larger */pgd_t * PGD;            atomic_t mm_users; /* How many users with user space?            */atomic_t Mm_count;                /* How many references to "struct Mm_struct" (Users count as 1) */int map_count;    /* Number of VMAs */struct Rw_semaphore mmap_sem; Spinlock_t Page_table_lock;        /* Protects page tables and some counters */struct list_head mmlist;    /* List of maybe swapped mm ' s.                         These is globally strung * together off init_mm.mmlist, and is protected * by Mmlist_lock *//* Special counters, in some configurations protected by the * Page_ta     Ble_lock, in and configurations by being atomic.    */mm_counter_t _file_rss;    mm_counter_t _anon_rss;    unsigned long hiwater_rss;    /* High-watermark of RSS usage */unsigned long HIWATER_VM;    /* High-water Virtual Memory usage */unsigned long TOTAL_VM, LOCKED_VM, SHARED_VM, EXEC_VM;    unsigned long stack_vm, RESERVED_VM, Def_flags, nr_ptes;    unsigned long start_code, End_code, Start_data, End_data;    unsigned long start_brk, BRK, Start_stack;    unsigned long arg_start, arg_end, Env_start, env_end; unsigned long saved_auxv[at_vector_size]; /* FOR/PROC/PID/AUXV */struct LINUX_BINFMT *binfmt;    cpumask_t Cpu_vm_mask;    /* architecture-specific MM Context */mm_context_t context;     /* Swap Token stuff */* * Last value of the global fault stamp as seen by this process.     * In other words, this value gives a indication of how long * it had been since this task got the token.    * Look at MM/THRASH.C */unsigned int faultstamp;    unsigned int token_priority;    unsigned int last_interval; unsigned long flags; /* Must use atomic bitops to access the bits */struct core_state *core_state;    /* coredumping support */#ifdef Config_aio spinlock_t ioctx_lock; struct Hlist_head ioctx_list; #endif #ifdef config_mm_owner/* * "OWNER" points to a task which is regarded as the Canonical * User/owner of this mm. All of the following must is true in * order for it is changed: * * current = = Mm->owner * current-&   Gt;mm! = mm * NEW_OWNER-&GT;MM = = mm * New_owner->alloc_lock is held  */struct task_struct *owner; #endif #ifdef CONFIG_PROC_FS/* Store ref to File/proc/<pid>/exe symlink points    to */struct file *exe_file; unsigned long num_exe_file_vmas; #endif #ifdef config_mmu_notifier struct mmu_notifier_mm *mmu_notifier_mm; #endif};
the first row member of the struct is mmap the memory area, represented by struct struct vm_area_struct:
/* * This struct defines a memory VMM memory area.  There is one of these * per Vm-area/task. A VM area was any part of the process virtual memory * space that had a special rule for the Page-fault handlers (ie a shar Ed * library, the executable area etc).    */struct vm_area_struct {struct mm_struct * VM_MM; /* The address space we belong to.        */unsigned long vm_start;        /* Our start address within VM_MM. */unsigned long vm_end; /* The first byte after our end address within VM_MM.///* linked list of VMS areas per task, Sor    Ted by Address */struct vm_area_struct *vm_next;        pgprot_t Vm_page_prot; /* Access permissions of this VMA.        */unsigned long vm_flags; /* Flags, see mm.h.    */struct Rb_node vm_rb;     /* * For areas with a address space and backing store, * linkage into the address_space->i_mmap prio tree, or * Linkage to the list of like VMAs hanging off it node, or * linkage of VMA in the Address_space->i_mmap_nonlinear list.            */Union {struct {struct list_head list;    void *parent;        /* Aligns with prio_tree_node parent */struct vm_area_struct *head;        } Vm_set;    struct Raw_prio_tree_node prio_tree_node;    } GKFX; /* * A file ' s map_private VMA can is in both i_mmap tree and ANON_VMA * list, after A COW of one of the file pages    .  A map_shared VMA * can only is in the I_mmap tree.     An anonymous map_private, a stack * or BRK VMA (with a NULL file) can only is in an ANON_VMA list.    */struct List_head anon_vma_node;    /* Serialized by Anon_vma->lock */struct ANON_VMA *anon_vma; /* Serialized by Page_table_lock */* Function pointers to deal with the this struct.    */const struct VM_OPERATIONS_STRUCT *vm_ops;        /* Information about our backing store: */unsigned long vm_pgoff; /* Offset (within vm_file) in page_size units, *not* page_cache_sIZE * * struct file * vm_file; /* File We map to (can is NULL).        */void * VM_PRIVATE_DATA;    /* was Vm_pte (shared mem) */unsigned long vm_truncate_count;/* truncate_count or restart_addr */#ifndef Config_mmu    struct Vm_region *vm_region;    /* NOMMU Mapping Region */#endif #ifdef config_numa struct mempolicy *vm_policy; /* NUMA Policy for the VMA */#endif};
The VM_AREA_STRUCT structure describes an independent memory range on successive intervals in the process address space, each of which is represented by the struct, each connected in a doubly linked list. In addition to the linked list structure, Linux also uses red-black tree mm_rb to organize vm_area_struct. Through such a tree structure. Linux can locate a virtual memory address at high speed.


the members Vm_start and vm_end in the struct represent the first address and the end address of the memory interval, and the two value subtraction is the length of the memory interval.
The member vm_mm points to the process address space structure to which it belongs. So two different processes map the same file to their own address space. Each of them will have a VM_AREA_STRUCT structure to identify its own area of memory. Two threads with a shared address space have only one vm_area_struct structure to identify them, because they are using the same process address space.


Vm_flags identifies the behavior and information of the pages included in the memory area, reflecting the code of conduct that the kernel must follow to process the page.

Take the process with process number 17192 on the author's system as an example.

# Cat/proc/17192/maps     //Display all memory areas in the process address space 001e3000-00201000 R-xp 00000000 fd:00 789547     /lib/ ld-2.12.so00201000-00202000 r--p 0001d000 fd:00 789547     /lib/ld-2.12.so00202000-00203000 rw-p 0001e000 fd:00 789547     /lib/ld-2.12.so00209000-00399000 R-xp 00000000 fd:00 789548     /lib/libc-2.12.so00399000-0039a000---P 00190000 fd:00 789548     /lib/libc-2.12.so0039a000-0039c000 r--p 00190000 fd:00 789548     /lib/ libc-2.12.so0039c000-0039d000 rw-p 00192000 fd:00 789548     /lib/libc-2.12.so0039d000-003a0000 Rw-p 00000000 00:00 008048000-08049000 R-xp 00000000 fd:00 1191771    /home/allen/myprojects/blog/conn_user_kernel/test/ a.out08049000-0804a000 Rw-p 00000000 fd:00 1191771    /home/allen/myprojects/blog/conn_user_kernel/test/ a.outb7755000-b7756000 Rw-p 00000000 00:00 0b776d000-b776e000 rw-p 00000000 00:00 0b776e000-b776f000 R-xp 00000000 00:00 0          [vdso]bfc9f000-bfcb4000 Rw-p 00000000 00:00 0          

# Pmap 1719217192:   ./a.out001e3000    120K r-x--  /lib/ld-2.12.so    //bank and the following two Behaviors dynamic Link program ld.so code snippet, data segment, BSS segment 00201000      4K r----  /lib/ld-2.12.so00202000      4K rw---  /lib/ld-2.12.so00209000   1600K r-x--  /lib/libc-2.12.so    //bank and the following are the code snippets, data segments, and BSS segments of libc.so in library C 00399000      4K-----  /lib/ libc-2.12.so0039a000      8K r----  /lib/libc-2.12.so0039c000      4K rw---  /lib/libc-2.12.so0039d000     12K RW---    [Anon]08048000      4K r-x--  /home/allen/myprojects/blog/conn_user_kernel/test/a.out    // Code snippet for running an object 08049000      4K rw---  /home/allen/myprojects/blog/conn_user_kernel/test/a.out    // Data segment for running objects b7755000      4K rw---    [anon]b776d000      4K rw---    [anon]b776e000      4K r-x--    [anon] bfc9f000     84K rw---    [stack]    //Stack segment Total     1860K
the Vm_ops field in the struct specifies the memory area related Operation function table. The kernel uses the table method to manipulate VMA. The Action function table is represented by the VM_OPERATIONS_STRUCT structure, defined in the <include/linux/mm.h> file:
/* * These is the virtual MM functions-opening of an area, closing and * unmapping it (needed to keep files on disk up- To-date etc), pointer * to the functions called when a no-page or a wp-page exception occurs.    */struct vm_operations_struct {void (*open) (struct vm_area_struct * area);    Specifies that the memory area is loaded into an address space when the function is called Void (*close) (struct vm_area_struct * area); Specifies that the memory area is removed from the address space when the function is called int (*fault) (struct vm_area_struct *vma, struct vm_fault *VMF); Page fault handling calls the function/* notification that a previously read-only page was about to become * writable, if there is no page in the physical memory now being interviewed    An error is returned it would cause a sigbus */int (*page_mkwrite) (struct vm_area_struct *vma, struct vm_fault *VMF); /* Called by ACCESS_PROCESS_VM if Get_user_pages () fails, typically * for use by special VMAs that can switch bet Ween Memory and hardware */INT (*access) (struct vm_area_struct *vma, unsigned long addr, void *buf, I NT Len, int write); #ifdef Config_numa ... #enDIF}; 
In the kernel, given a virtual address belonging to a process, it is required to find the interval to which it belongs and the VMA_AREA_STRUCT structure, which is implemented by FIND_VMA (), which is done through the red-Haishi.

The function is defined in <mm/mmap.c>:

/* Look up the first VMA which satisfies addr < Vm_end, NULL if none.    */struct vm_area_struct *find_vma (struct mm_struct *mm, unsigned long addr) {struct Vm_area_struct *vma = NULL;        if (mm) {/* first check the area of memory used recently, to see if the cached VMA includes the required address */* (hit record close to 35%.) */VMA = mm->mmap_cache;        Assuming that the cache does not include VMA that are not included, the function searches for red-black trees. if (! (            VMA && vma->vm_end > Addr && vma->vm_start <= addr) {struct Rb_node * rb_node;            Rb_node = mm->mm_rb.rb_node;            VMA = NULL;                while (Rb_node) {struct vm_area_struct * VMA_TMP;                Vma_tmp = Rb_entry (rb_node, struct vm_area_struct, VM_RB);                    if (Vma_tmp->vm_end > Addr) {VMA = vma_tmp;                    if (vma_tmp->vm_start <= addr) break;                Rb_node = rb_node->rb_left; } Else Rb_node = Rb_node->rB_right;        } if (VMA) Mm->mmap_cache = VMA; }} return VMA;}
When an image of a program starts running, the running image must be mounted into the virtual address space of the process. Assuming that the process uses no other shared library, the shared library must also be mounted into the virtual address space of the process.

As you can see, Linux does not load the image into physical memory. Instead. The executable file is simply connected to the virtual address space of the process. As the program runs. The part of the referenced program is loaded into physical memory by the operating system. Such a method of linking an image to a process address space is called a "memory map."

        When the runtime image is mapped to the virtual address space of the process. A set of VM_AREA_STRUCT structures will be generated to describe the starting and ending points of the virtual memory interval,each vm_area_struct structure represents a part of a running image. may be a running code,It is also possible to initialize variables or uninitialized data. These are all implemented in the function Do_mmap (). With the generation of vm_area_struct structures,The standard operating functions on the virtual memory interval described in these structures are also initialized by Linux.
Static inline unsigned long do_mmap (struct file *file, unsigned long addr,    unsigned long len, unsigned long prot,    unsigned long flag, unsigned long offset) {    unsigned long ret =-einval;    if (offset + page_align (len)) < offset)        goto out;    if (! ( Offset & ~page_mask)        ret = do_mmap_pgoff (file, addr, Len, prot, flag, offset >> page_shift); Out:    ret URN ret;}
The function adds a new range of addresses to the address space of the process.

Defined in <include/linux/mm.h>.


meaning of the parameters in a function :

file: Represents the files to be mapped.
Offset\: The amount of offsets within the file. Since we're not all going to map a file all at once, it might just be part of the mapping file, and off will indicate the starting position of that part.


Len: The length of the portion of the file to be mapped.
Addr: An address in a virtual space that represents a virtual area where a spare is located from this address.
Prot: This parameter specifies access rights to the pages included in the virtual area. Possible signs are Prot_read, Prot_write, Prot_exec and Prot_none. The first 3 marks are the same as those of the Mark Vm_read, Vm_write and Vm_exec. Prot_none indicates that the process does not have any of the above 3 access permissions.
Flag: This parameter specifies the other flags for the virtual zone.
The function calls the Do_mmap_pgoff () function, which does the main work of memory mapping. The function is much longer. Implement to view <mm/mmap.c> files.

     due to the mapping of the file to the virtual memory, however, a mapping relationship has been established, and the mapping between the virtual memory pages and the physical pages has not been established. When a running image is mapped to process virtual memory and starts running, it is very likely that the data being asked is not in physical memory because only a very small portion of the virtual memory interval is loaded into physical memory. Then. The processor will report a page failure and its corresponding failure to Linuxreason,
the kernel must load it into physical memory from a disk image or swap file (This page is swapped out), this is the page mechanism.


Mastering the Linux kernel design idea (13): Memory management process address space

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.