Seriously analyze mmap: What is the reason why use "turn"

Source: Internet
Author: User

Mmap Basic Concepts

Mmap is a memory-mapped file method that maps a file or other object to the address space of a process, implementing a one by one-mapping relationship between the file disk address and a virtual address in the process virtual address space. After such a mapping relationship is implemented, the process can read and write the memory in a pointer way, and the system will automatically write back the dirty page to the corresponding file disk, which completes the operation of the file without having to call the system call function such as Read,write. In contrast, kernel space modifications to this area directly reflect user space, allowing for file sharing between different processes. As shown in the following:

As can be seen, the virtual address space of a process is composed of multiple virtual memory regions. The virtual memory area is a homogeneous interval in the virtual address space of the process, that is, a contiguous address range with the same characteristics. The text data segment (code snippet), initial data segment, BSS data segment, heap, stack, and memory map shown in is a separate virtual memory area. The address space for the memory-mapped service is in the spare part of the stack.

The Linux kernel uses the VM_AREA_STRUCT structure to represent a separate area of virtual memory, because each of the different virtual memory region functions and internal mechanisms are different, so a process uses multiple vm_area_struct structures to represent different types of virtual memory regions. Each VM_AREA_STRUCT structure uses linked lists or tree-structured links to facilitate quick access to the process, as shown in:

Mmap Memory Mapping principle

The implementation process of mmap memory mapping can be divided into three stages in general:

(a) Process initiates the mapping process and creates a virtual map region for the mapping in the virtual address space
    1. Process in user space Call library function mmap, prototype:void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);
    2. In the virtual address space of the current process, look for a free, contiguous virtual address that satisfies the requirements
    3. This virtual area is assigned a VM_AREA_STRUCT structure, which is then initialized for each domain of the structure.
    4. Inserts a new virtual zone structure (vm_area_struct) into the virtual address area of a process list or tree
(ii) Call the system call function mmap (different from the user space function) of the kernel space to achieve a one by one mapping between the physical address of the file and the virtual address of the process
    1. After assigning a new virtual address region to the map, the file descriptor is found in the File descriptor table by the document pointer to be mapped, and is linked to the file structure (struct file) of the file in the kernel "open file Set" via a file descriptor. Each file structure maintains information related to this open file.
    2. Through the file structure of the files, linked to the File_operations module, call the kernel function mmap, whose prototype is: int mmap(struct file *filp, struct vm_area_struct *vma) , different from the User space library function.
    3. The kernel mmap function locates the physical address of the file disk through the virtual file system Inode module.
    4. The page table is created by Remap_pfn_range function, which realizes the mapping relationship between the file address and the virtual address area. At this point, the virtual address does not have any data associated with the main memory.
(iii) The process initiates access to this mapping space, causing a fault of the pages, and implementing a copy of the file contents to the physical memory (main storage)

Note: The first two stages only create virtual intervals and complete the address mapping, but do not copy any file data to main memory. A true file read is when a process initiates a read or write operation.

    1. The read or write operation of the process accesses the virtual address space in this section of the map address, by querying the page table and discovering that this segment of the address is not on a physical page. Because only address mappings are currently established, real hard disk data has not yet been copied into memory, causing a missing pages exception.
    2. Page faults to make a series of judgments, determined that no illegal operation, the kernel initiates the request paging process.
    3. The paging process first looks for the memory pages that need to be accessed in swap cache space (swap caches), and if not, calls the Nopage function to load the missing pages from the disk into main memory.
    4. After the process can read or write the main memory operation, if the write operation changed its contents, after a certain time the system will automatically write back to the corresponding disk address, the process of writing to the file.

Note: modified dirty pages are not immediately updated back to the file, but have a period of delay, you can call Msync () to force synchronization, so that the content can be saved to the file immediately.

The difference between mmap and regular file operations

For those of you who do not know about Linux file systems, please refer to my previous blog post, "read the file reading and writing process from the kernel file system", and we'll start with a simple review of the procedure for invoking a function in a regular file system operation (calling Read/fread, etc.):

    1. The process initiates a read file request.
    2. The kernel locates the inode of this file by locating the file information on the file set that the kernel has opened by looking up the process file character table.
    3. Inode on Address_space finds whether the file page to be requested is already cached in the page cache. If present, the contents of this file page are returned directly.
    4. If it does not exist, it navigates to the file disk address through the inode and copies the data from the disk to the page cache. The Read page process is then initiated again, and the data in the page cache is sent to the user process.

In summary, a page caching mechanism is used in order to improve read and write efficiency and to protect disks in general file operations. This causes the file page to be read from disk to the page cache, because the page cache is in kernel space and cannot be addressed directly by the user process, so it is also necessary to copy the page cache data pages to the corresponding user space in memory. In this way, two copies of the data are passed in order to complete the process of obtaining the contents of the file. Write operation is the same, the buffer to be written in the kernel space is not directly accessible, you must first copy to the kernel space corresponding to the main memory, and then write back to the disk (deferred writeback), but also requires two copies of the data.

Instead of using the Mmap action file, create a new virtual memory area and set up the file disk address and virtual memory area mappings in these two steps without any file copy operations. After accessing the data, it is found that there is no data in memory to initiate the fault of the process, through the established mapping relationship, only one copy of the data, the data from the disk into the memory of the user space, for the process to use.

In summary, regular file operations require two copies of data from disk to page cache to user main memory. The mmap controls the file, requiring only a single copy of the data from the disk to the user's main memory. Frankly speaking, the key point of Mmap is to realize the data direct interaction between user space and kernel space, and eliminate the tedious process of different data in space. The mmap is therefore more efficient.

MMAP Benefits Summary

As discussed above, the advantages of mmap a few points:

    1. The read operation of the file spans the page cache, reduces the number of copies of the data, and replaces I/O with memory read-write, which improves the file read efficiency.

    2. It realizes the efficient interaction between user space and kernel space. The respective modification of two spaces can be directly reflected in the mapped area, thus being captured by the opponent's space in time.

    3. Provides a way to share memory between processes and communicate with each other. Whether it is a parent-child process or a non-affinity process, you can map your own user space to the same file or to an anonymous map to the same area. Thus, the purpose of inter-process communication and inter-process sharing is achieved through the respective changes of the mapping area.

At the same time, if both process a and Process B map zone C, a page is copied from the disk to memory when a is read C for the first time, but when B reads the same page of C, it will also produce a fault, but it is no longer necessary to copy files from the disk, but to use the file data that has been saved in memory directly.

    1. can be used to achieve efficient mass data transfer. Lack of memory space is one aspect of restricting big data operations, and the solution is often the use of hard disk space to assist the operation, to supplement the memory shortage. However, it can cause a lot of file I/O operation, which greatly affects the efficiency. This problem can be solved well by mmap mapping. In other words, mmap can use disk space instead of memory to perform its functions.
Mmap Correlation function Function prototype

void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);

Return description

When executed successfully, MMAP () returns a pointer to the mapped area. On Failure, mmap () returns map_failed[whose value is (void *)-1], and error is set to one of the following values:

EACCES:访问出错EAGAIN:文件已被锁定,或者太多的内存已被锁定EBADF:fd不是有效的文件描述词EINVAL:一个或者多个参数无效ENFILE:已达到系统对打开文件的限制ENODEV:指定文件所在的文件系统不支持内存映射ENOMEM:内存不足,或者进程已超出最大内存映射数量EPERM:权能不足,操作不允许ETXTBSY:已写的方式打开文件,同时指定MAP_DENYWRITE标志SIGSEGV:试着向只读区写入SIGBUS:试着访问不属于进程的内存区返回错误类型
Parameters

Start: The starting address of the map area

Length: Width of the map area

Prot: Expected memory protection flag, cannot conflict with open mode of file. is one of the following values that can be reasonably grouped together by an OR operation

1 PROT_EXEC :页内容可以被执行2 PROT_READ :页内容可以被读取3 PROT_WRITE :页可以被写入4 PROT_NONE :页不可访问

Flags: Specifies the type of the mapped object, and whether the mapping options and mapping pages can be shared. Its value can be a combination of one or more of the following bits

MAP_FIXED //使用指定的映射起始地址,如果由start和len参数指定的内存区重叠于现存的映射空间,重叠部分将会被丢弃。如果指定的起始地址不可用,操作将会失败。并且起始地址必须落在页的边界上。MAP_SHARED //与其它所有映射这个对象的进程共享映射空间。对共享区的写入,相当于输出到文件。直到msync()或者munmap()被调用,文件实际上不会被更新。MAP_PRIVATE //建立一个写入时拷贝的私有映射。内存区域的写入不会影响到原文件。这个标志和以上标志是互斥的,只能使用其中一个。MAP_DENYWRITE //这个标志被忽略。MAP_EXECUTABLE //同上MAP_NORESERVE //不要为这个映射保留交换空间。当交换空间被保留,对映射区修改的可能会得到保证。当交换空间不被保留,同时内存不足,对映射区的修改会引起段违例信号。MAP_LOCKED //锁定映射区的页面,从而防止页面被交换出内存。MAP_GROWSDOWN //用于堆栈,告诉内核VM系统,映射区可以向下扩展。MAP_ANONYMOUS //匿名映射,映射区不与任何文件关联。MAP_ANON //MAP_ANONYMOUS的别称,不再被使用。MAP_FILE //兼容标志,被忽略。MAP_32BIT //将映射区放在进程地址空间的低2GB,MAP_FIXED指定时会被忽略。当前这个标志只在x86-64平台上得到支持。MAP_POPULATE //为文件映射通过预读的方式准备好页表。随后对映射区的访问不会被页违例阻塞。MAP_NONBLOCK //仅和MAP_POPULATE一起使用时才有意义。不执行预读,只为已存在于内存中的页面建立页表入口。flag

FD: A valid file description word. If Map_anonymous is set, the value should be 1 for compatibility issues

Offset: The starting point for the content of the mapped object

Related functions

int munmap( void * addr, size_t len )

When executed successfully, Munmap () returns 0. On failure, MUNMAP returns -1,ERROR return flag and mmap consistent;

The call unlocks a mapping relationship in the process address space, addr is the address returned when calling Mmap (), and Len is the size of the map area;

When the mapping relationship is lifted, access to the original mapped address causes a segment error to occur.

int msync( void *addr, size_t len, int flags )

In general, the changes to the shared content of the process in the mapping space are not directly written back to the disk file, and are often performed after calling Munmap ().

You can implement the content of the file on disk by calling Msync () to match the contents of the shared memory area.

Mmap Use details
    1. A key point to note with Mmap is that the size of the Mmap map area must be an integer multiple of the physical page size (page_size) (typically 4k bytes in a 32-bit system). The reason is that the minimum granularity of memory is the page, and the process virtual address space and memory mappings are also in page units. To match memory operations, the mmap from disk to virtual address space must also be a page.
    2. The kernel can track the size of the underlying objects (files) that are mapped by memory, and the process can legitimately access those bytes within the current file size and within the memory-mapped area. That is, if the size of the file has been expanding, as long as the data within the map area, the process can be legitimately obtained, which is independent of the size of the file when the mapping was established. See "Scenario three" for specific situations.
    3. After the mapping is established, the mappings persist even if the file is closed. Because the map is the address of the disk, not the file itself, and the file handle is irrelevant. The valid address space that can be used for interprocess communication is not entirely limited by the size of the mapped file, because it is map-by-page.

In the above knowledge premise, let's look below if the size is not the full multiples of the page in the specific case:

Scenario One: The size of a file is 5000 bytes, and the Mmap function starts at the beginning of a file and maps 5000 bytes into virtual memory.

Analysis: Because the size of the unit physical page is 4096 bytes, although the mapped file is only 5000 bytes, the size corresponding to the process virtual address area needs to meet the full page size, so after the Mmap function executes, it actually maps to the virtual memory area of 8,192 bytes, 5000~ The byte portion of 8191 is populated with 0. The mapping corresponds to the following relationship as shown:

At this time

    1. The first 5,000 bytes (0~4999) of the read/write will return the contents of the action file.
    2. When reading byte 5000~8191, the result is all 0. When writing 5000~8191, the process does not error, but the content written will not be written to the original file.
    3. A SIGSECV error is returned when reading/writing a disk part other than 8192.

Scenario Two: The size of a file is 5000 bytes, the mmap function starts at the beginning of a file, maps 15000 bytes into virtual memory, that is, the size of the map exceeds the size of the original file.

Analysis: Because the size of the file is 5000 bytes, and the case one, its corresponding two physical pages. Both of these physical pages can be read and written, but beyond 5000 will not be reflected in the original file. Because the program requires a mapping of 15000 bytes, and the file only accounts for two physical pages, 8192 bytes of ~15000 bytes cannot be read and written, and an exception is returned during operation. As shown in the following:

At this time

    1. The process can normally read/write the first 5000 bytes (0~4999) that are mapped, and changes to the write operation are reflected in the original file after a certain amount of time.
    2. For 5000~8191 bytes, the process can read and write without error. However, the content is 0 before writing, and in addition, it is not reflected in the file after it is written.
    3. For 8192~14999 bytes, the process cannot read or write to it and will report a sigbus error.
    4. For bytes other than 15000, the process cannot read and write to it, and a SIGSEGV error is raised.

Scenario Three: A file initial size of 0, using the mmap operation to map the size of 1000*4k, that is, 1000 physical pages about 4M bytes of space, MMAP returns the pointer ptr

Analysis: If the file is read-write at the beginning of the mapping, the file size is 0 and there is no valid physical page counterpart, as in Scenario two, a Sigbus error is returned.

However, if the file size is increased before each operation of PTR read-write, the operation of PTR within the file size is legal. For example, if the file expands by 4096 bytes, PTR can manipulate the space of PTR ~ [(char) PTR + 4095]. As long as the file extension is within the range of 1000 physical pages (the mapping range), PTR can correspond to the same size as the operation.

In this way, convenient to expand the file space at any time to write files, do not cause wasted space

Seriously analyze mmap: What is the reason why use "turn"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.