Reproduced Mmap detailed description of Linux memory management

Last Update:2015-12-11 Source: Internet

Author: User

Tags access properties prepare reserved

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reprinted from Http://blog.chinaunix.net/uid-26669729-id-3077015.html

One. Mmap system call

1. Mmap system call

Mmap maps a file or other object into memory. Files are mapped to multiple pages, and if the size of the file is not the sum of the size of all pages, the space that is not used by the last page will be zeroed out. Munmap performs the opposite operation by deleting the object mappings for a specific address area.

When using the mmap mapping file to the process, you can directly manipulate this virtual address for file read and write operations, no longer need to call Read,write and other system calls. However, it is important to note that the memory written directly to the segment does not write content that exceeds the current file size.

One obvious benefit of using shared memory communication is that it is efficient because the process can read and write directly to the memory without requiring any copy of the data. For communications like pipelines and message queues, four copies of the data are required in the kernel and user space, while shared memory copies only two data: one from the input file to the shared memory area, and the other from the shared memory area to the output file. In fact, when you share memory between processes, you do not always have to read and write small amounts of data, and then re-establish the shared memory area when there is new communication. Instead, the shared area is maintained until the communication is complete, so that the data content is kept in shared memory and is not written back to the file. Content in shared memory is often written back to a file when it is de-mapped. Therefore, the use of shared memory communication mode is very efficient.

File-based mappings, st_atime of mapped files may be updated at any point during mmap and Munmap execution. If the St_atime field is not updated in the preceding scenario, the value of the field is updated the first time the first page of the map area is indexed. The file mappings established with the Prot_write and map_shared flags, whose st_ctime and St_mtime are written after the mapping area, are updated before Msync () is called by Ms_sync and Ms_async two flags.

Usage:

#include

void *mmap (void *start, size_t length, int prot, int flags,

int FD, off_t offset);

int Munmap (void *start, size_t length);

Return Description:

When executed successfully, MMAP () returns a pointer to the mapped area, and Munmap () returns 0. On Failure, mmap () returns map_failed[whose value is (void *) -1],munmap returns-1. Errno is set to one of the following values

Eacces: Access Error

Eagain: The file is locked, or too much memory is locked

EBADF:FD is not a valid file description word

EINVAL: One or more parameters are invalid

Enfile: The system has reached the limit of open files

Enodev: The file system where the specified file resides does not support memory mapping

ENOMEM: Insufficient memory, or the process has exceeded the maximum number of memory mappings

Eperm: Insufficient power, operation not allowed

Etxtbsy: Open file in written mode, specifying Map_denywrite flag

SIGSEGV: Try writing to the read-only area

Sigbus: Try to access memory areas that are not part of the process

Parameters:

Start: The starting address of the map area.

Length: The size of the map area.

Prot: Expected memory protection flag, cannot conflict with open mode of file. is one of the following values that can be reasonably grouped together by an OR operation

Prot_exec//page content can be executed

Prot_read//page content can be read

Prot_write//page can be written

Prot_none//Page not accessible

Flags: Specifies the type of the mapped object, and whether the mapping options and mapping pages can be shared. Its value can be a combination of one or more of the following bits

Map_fixed//Using the specified mapping start address, if the memory area specified by the start and Len parameters overlap the existing mapping space, the overlapping portions will be discarded. If the specified start address is not available, the operation will fail. And the start address must fall on the boundary of the page.

Map_shared//share the mapping space with all other processes that map this object. Writes to the shared area are equivalent to outputting to a file. The file is not actually updated until Msync () or Munmap () is called.

Map_private//Create a private mapping of a write-time copy. The write to the memory area does not affect the original file. This flag is mutually exclusive to the above logo and can only be used with one.

Map_denywrite//This flag is ignored.

Map_executable//Ibid.

Map_noreserve//Do not reserve swap space for this mapping. When the swap space is preserved, modifications to the map area may be guaranteed. When the swap space is not preserved and the memory is low, modifications to the mapping area can cause a segment violation signal.

map_locked//Locks the page of the map area, thus preventing the page from being swapped out of memory.

Map_growsdown//For the stack, telling the kernel VM system that the mapping area can be scaled down.

Map_anonymous//Anonymous mapping, the map area is not associated with any files.

Map_anon//map_anonymous's nickname, no longer used.

Map_file//compatible flag, ignored.

Map_32bit//The mapping area is ignored when the low 2gb,map_fixed of the process address space is specified. Currently this flag is only supported on the X86-64 platform.

Map_populate//Prepare the page table for file mapping by pre-reading. Subsequent access to the map area is not blocked by page violations.

Map_nonblock//is only meaningful when used with map_populate. Does not perform a read-ahead, only creates a page table entry for a page that already exists in memory.

FD: A valid file description word. If Map_anonymous is set, the value should be-1 for compatibility issues.

Offset: The starting point for the content of the mapped object.

2. System call Munmap ()

#include

int Munmap (void * addr, size_t len)
The call unlocks a mapping relationship in the process address space, addr is the address returned when Mmap () is called, and Len is the size of the map area. When the mapping relationship is lifted, access to the original mapped address causes a segment error to occur.

3. System call Msync ()

#include

int Msync (void * addr, size_t len, int flags)
In general, the changes to the shared content of the process in the mapping space are not directly written back to the disk file, and are often performed after calling Munmap (). You can implement the content of the file on disk by calling Msync () to match the contents of the shared memory area.

Two. The system calls Mmap () to share memory in two ways:

(1) Memory mapping provided with normal files: applies to any process; At this point, you need to open or create a file and then call Mmap (); The typical calling code is as follows:

Fd=open (name, flag, mode);
if (fd<0)
...
Ptr=mmap (NULL, Len, prot_read| Prot_write, map_shared, FD, 0);

There are many characteristics and places to be aware of the way to communicate shared memory through MMAP ()

(2) Use special files to provide anonymous memory mapping: Applies to relationships between processes; Because of the special affinity of a parent-child process, call Mmap () first in the parent process and then call Fork (). Then, after calling Fork (), the child process inherits the address space of the parent process after the anonymous mapping and also inherits the address returned by Mmap (), so that the parent-child process can communicate through the mapped area. Note that this is not a general inheritance relationship. In general, child processes maintain separate variables inherited from the parent process. The address returned by the mmap () is maintained by the parent-child process together.
The best way to implement shared memory for a genetically-related process should be to use anonymous memory mapping. At this point, you do not have to specify a specific file, just set the appropriate flag.

Three. The principle of mmap memory mapping

The ultimate goal of the MMAP system call is to map the device or file to the virtual address space of the user process, and to implement the user process's direct read and write to the file, which can be divided into the following three steps:

1. Look for a contiguous virtual address space in the user's virtual address space that satisfies the requirements, preparing for the mapping (completed by kernel mmap system call)

Each process has a 3G byte of user virtual storage space. However, this does not mean that the user process can be used at any point in the 3G, because the virtual storage space is ultimately mapped to a physical memory space (RAM or disk space) before it can actually be used.

So, how does the kernel manage the virtual storage space of each process 3G? In a nutshell, the image file formed after the user process is compiled and linked has a code snippet and data segment (including data and BSS segments), where the code snippet is below and the data segment is on top. The data segment includes all statically allocated data spaces, the global variables and all local variables declared static, which are essential requirements for the process, which are allocated when the image of a running process is established. In addition, the space used by the stack is a basic requirement, so it is also allocated when the process is established, as shown in 3.1:

Figure 3.1 Partition of process virtual space

In the kernel, each area is represented by a struct struct vm_area_struct. It describes a contiguous, virtual storage space with the same access property, the size of which is an integer multiple of the physical memory page. You can use Cat/proc//maps to see the memory usage of a process, and the PID is the process number. Each of the lines displayed corresponds to a vm_area_struct structure of the process.

Here is the definition of struct vm_area_struct struct:

#include <linux/mm_types.h>
/* This struct defines a memory VMM memory area. */
struct Vm_area_struct {
struct mm_struct * VM_MM; /* VM Area Parameters */
unsigned long vm_start;
unsigned long vm_end;
/* Linked list of VMS areas per task, sorted by address */
struct Vm_area_struct *vm_next;
pgprot_t Vm_page_prot;
unsigned long vm_flags;
/* AVL Tree of VMS areas per task, sorted by address */
Short vm_avl_height;
struct vm_area_struct * vm_avl_left;
struct vm_area_struct * vm_avl_right;
/* For areas with an address space and backing store,
Vm_area_struct *vm_next_share;
struct Vm_area_struct **vm_pprev_share;
struct vm_operations_struct * vm_ops;
unsigned long vm_pgoff; /* Offset in page_size units, *not* page_cache_size */
struct file * vm_file;
unsigned long vm_raend;
void * VM_PRIVATE_DATA; /* was Vm_pte (shared mem) */
};

In general, the virtual storage space used by the process is not contiguous, and the access properties of each part of the virtual storage space may be different. Therefore, the virtual storage space of a process needs multiple vm_area_struct structures to describe. When the number of vm_area_struct structures is small, each vm_area_struct is sorted in ascending order, organizing the data in the form of a single-linked list (pointing to the next vm_area_struct structure through the vm_next pointer). But when the data of the VM_AREA_STRUCT structure is more, the organization of the linked list is still used, which is bound to affect its search speed. For this issue, Vm_area_struct also added three members of vm_avl_hight (tree height), vm_avl_left (left child node), Vm_avl_right (right child node) to implement the AVL tree to improve Vm_area_ The search speed of the struct.

If the vm_area_struct describes a file mapping of the virtual memory space, the member Vm_file points to the file structure of the mapped files, Vm_pgoff is the virtual storage space start address in the Vm_file file in the file offset, the unit is a physical page.

Figure 3.2 Process virtual address

Therefore, the work done by the MMAP system call is to prepare such a virtual storage space, and establish the VM_AREA_STRUCT structure to pass it to the specific device driver.

2. Establish a mapping between the virtual address space and the physical address of the file or device (device-driven completion)

The second step in establishing a file map is to establish a mapping between the virtual address and the specific physical address, which is achieved by modifying the Process page table. The Mmap method is a member of the FILE_OPEARTIONS structure:

Int (*mmap) (struct file *,struct vm_area_struct *);

There are 2 ways to build a page table in Linux:

(1) Use Remap_pfn_range to create all page tables at once.

int Remap_pfn_range (struct vm_area_struct *vma, unsigned long virt_addr, unsigned long pfn, unsigned long size, pgprot_t p ROT);

return value:

Successful return 0, failure returns a negative error value
Parameter description:

VMA user process creates a VMA zone

Virt_addr Remap the user virtual address that should start. This function establishes the page table for this virtual address range from virt_addr to virt_addr_size.

PFN the physical address of the page frame number that corresponds to the virtual address that should be mapped. This page frame number is simply the physical address to the right of the page_shift bit. For most use, the Vm_paoff member of the VMA structure contains exactly the value you need. This function affects the physical address from (Pfn<<page_shift) to (pfn<<page_shift) +size.< span= "" >

Size of the extents being remapped, in bytes.

Prot to the new VMA required "protection". The driver can (and should) use the value found in Vma->vm_page_prot.

(2) Use the Nopage VMA method to create a page table item at a time.

struct page * (*nopage) (struct vm_area_struct *vma, unsigned long address, int *type);

return value:

Success returns a valid mapping page, and the failure returns NULL.

Parameter description:

Address represents the virtual address of the user space passed from the user space.

Returns a valid mapping page.

(3) Restrictions on use:

Remap_pfn_range cannot map regular memory, only the physical addresses that are reserved pages and above the top of the physical memory. The various submodules of the memory management system are not managed because the physical address on the top of the physical memory is reserved for the page. 640 KB and 1MB are reserved pages that may be mapped, and device I/O memory can also be mapped. If you want to map the memory requested by Kmalloc () to the user space, you can set the corresponding memory to reserved by Mem_map_reserve ().

3. Operation when the newly mapped page is actually accessed (completed by a missing pages)

(1) Pages in the page cache and swap cache: a physical page of an accessed file resides in the page cache or swap cache, and all the information for a page is described by the struct page. There is a field in the struct page that is pointer mapping, which points to a struct ADDRESS_SPACE type structure. All pages in the page cache or swap cache are differentiated according to the address_space structure and an offset. (2) Correspondence between the file and the Address_space structure: When a specific file is opened, the kernel creates a struct inode structure in memory for which the i_mapping domain points to a address_space structure. Thus, a file corresponds to a address_space structure, a address_space with an offset to determine a page in the cache or swap cache. Therefore, when addressing a data, it is easy to find the corresponding page based on the offset of the given file and data within the file.

(3) When the process calls Mmap (), it simply adds a buffer of the corresponding size within the process space and sets the corresponding access identity, but does not establish a mapping of the process space to the physical page. Therefore, when the space is first accessed, a page fault is thrown.

(4) For shared memory mapping, the page fault handler first looks for the target page in the swap cache (the physical page that conforms to the address_space and the offset), returns the address if it is found, and if it is not found, determines whether it is in the swap area, If so, a swap is performed, and if neither of these cases is satisfied, the handler assigns a new physical page and inserts it into the page cache. The process eventually updates the Process page table.

Note: For mapping normal file conditions (non-shared mappings), the page fault handler first looks for the corresponding pages in page cache based on the address_space and the data offset. If not found, the file data is not yet read into memory, the handler reads the corresponding page from disk and returns the corresponding address, and the Process page table is updated. (5) When all processes map the same shared memory area, the same is true when the mapping between linear and physical addresses is established, regardless of the return address of the process, the actual access is the physical page corresponding to the same shared memory area.

Reproduced Mmap detailed description of Linux memory management

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More