Linux mmap File Memory ing mechanism

Source: Internet
Author: User
Article Title: Linux mmap File Memory ing mechanism. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.

When describing the concept of file ing, it is inevitable that virtual memory (SVR 4 VM) is involved ). in fact, file ing is the central concept of virtual storage. On the one hand, file ing provides users with a set of measures, as if users map files to a part of their own address space, you can use a simple memory to access commands to read and write files. On the other hand, it can also be used in the basic organization mode of the kernel. In this mode, the kernel regards the whole address space as a ing of different objects such as files. in the traditional file access method, first open the file with an open system call, and then use the read, write, lseek and other calls for sequential or instant I/O. this method is very inefficient. Every I/O operation requires a system call. in addition, if several processes access the same file, each process needs to maintain a copy in its own address space, wasting the memory space. if the page can be mapped to the address space of the process through a certain mechanism, that is, the ing is created by simply generating some memory management data structures. when a process accesses a page, a page disconnection occurs. The kernel reads the page into the memory and updates the page table to the page. in addition, this method is very convenient for sharing the same copy.

VM is designed in an object-oriented way. The object here refers to the memory object. The memory object is a software abstract concept that describes the ing between the memory zone and the backup storage. the system can use multiple types of backup storage, such as swap space, local or remote files, and frame cache. the VM system processes them in a unified manner and uses the same set of operations, such as reading pages or writing back pages. each type of backup storage can be implemented in different ways. in this way, the system defines a set of unified interfaces, and each backup storage provides its own implementation method. in this way, the address space of a process is considered as a set of mappings mapped to different data objects. all valid addresses are those mapped to data objects. these objects provide persistent backup storage for the page mapped to it. ing allows you to directly address these objects.

It is worth noting that the VM architecture is independent of Unix systems. All Unix system semantics, such as the body, data, and stack can be constructed on the basic VM system. at the same time, the VM architecture is also independent of storage management, and storage management is implemented by the operating system. For example, what kind of swap and request paging algorithms are adopted, whether to adopt the segmentation or paging mechanism for storage management, and how to convert virtual addresses into physical addresses (Three Level Page Table in Linux ), these are irrelevant to the concept of memory objects.

The following describes the implementation of VM in Linux.

A process should include an mm_struct (memory manage struct). This structure is an abstract description of the virtual address space of the process, which includes some management information of the virtual space of the process: start_code, end_code, start_data, end_data, start_brk, end_brk, and so on. in addition, there is a pointer to the virtual memory area table (vm_area_struct: virtual memory area) of the process. The chain is arranged in the order of virtual addresses. in a Linux Process, the address space is divided into multiple zones (vma). Each zone (vma) corresponds to a continuous area in the virtual address space, vma is an independent entity that can be shared and protected. vma is the memory object mentioned above. the following is the structure of vm_area_struct, where the first half is public and some data members irrelevant to the type, such as the pointer to mm_struct and the address range, the second half is a type-related member. The most important part is the vm_ops pointer pointing to the vm_operation_struct vector table. The vm_pos vector table is a set of virtual functions, defines an interface unrelated to the vma type. each specific subclass, that is, each vma type must perform these operations in the vector table. the following operations are included: open, close, unmap, protect, sync, nopage, wppage, and swapout.

Struct vm_area_struct {

/* Public, irrelevant to vma type */

Struct mm_struct * vm_mm;

Unsigned long vm_start;

Unsigned long vm_end;

Struct vm_area_struct * vm_next;

Pgprot_t vm_page_prot;

Unsigned long vm_flags;

Short vm_avl_height;

Struct vm_area_struct * vm_avl_left;

Struct vm_area_struct * vm_avl_right;

Struct vm_area_struct * vm_next_share;

Struct vm_area_struct ** vm_pprev_share;

/* Type-related */

Struct vm_operations_struct * vm_ops;

Unsigned long vm_pgoff;

Struct file * vm_file;

Unsigned long vm_raend;

Void * vm_private_data;

};

Vm_ops: open, close, no_page, swapin, swapout ......

After introducing the basic concepts of VM, we can introduce the mmap and munmap system calls. mmap calls are actually the creation process of a memory object vma. The mmap call format is:

Void * mmap (void * start, size_t length, int prot, int flags, int fd, off_t offset );

Here, start is the ing address, and length is the ing length. If the MAP_FIXED of flags is not set, this parameter is usually ignored, and the first idle area with the same length in the process address space is found; fd is the file handle of the ing file, offset is the offset address in the ing file, and prot is the ing protection permission. It can be PROT_EXEC, PROT_READ, PROT_WRITE, PROT_NONE, and flags, it refers to the ing type, it can be MAP_FIXED, MAP_PRIVATE, and MAP_SHARED. This parameter must be specified as one of MAP_PRIVATE and MAP_SHARED. MAP_PRIVATE is used to create a copy-on-write ing ), that is to say, if multiple processes are mapped to a file at the same time, only the same storage page is shared when the ing is created. However, if a process attempts to modify the page content, a copy is copied to the process for private use, any modifications made to it are invisible to other processes. MAP_SHARED uses the same copy regardless of whether it is modified or not. Any modification made to the page by any process is visible to other processes.

The mmap system calls the following implementation process:

1. First, locate the file to be mapped through the file system;

2. Permission check. The ing permission does not exceed the file opening method. That is to say, if the file is opened in read-only mode, a writable ing cannot be created;

3. Create a vma object and initialize it;

4. Call the mmap function of the ing file to assign values to the vm_ops vector table;

5. link the vma to the vma linked list of the process. If it can be combined with the vma before and after, it will be merged;

6. If VM_LOCKED (the ing area is not swapped out) is required for ing, a page missing request is sent to read the ing page into the memory.

Munmap (void * start, size_t length ):

This call can be considered as an inverse process of mmap. it will close the ing of a segment of length starting from start in the process. If the region does not exactly correspond to a vma, it may split several or more vma.

Msync (void * start, size_t length, int flags ):

Write the modification of the ing area back to the backup storage. in munmap, page write-back is not guaranteed. If msync is not called, modifications to the ing zone may be lost after munmap. among them, flags can be MS_SYNC, MS_ASYNC, MS_INVALIDATE, and MS_SYNC must be returned after the write-back is complete. MS_ASYNC will return immediately after the write-back request is sent, MS_INVALIDATE: Use the write-back content to update other Mappings of the file. this system call is done by calling the sync function of the ing file.

Brk (void * end_data_segement ):

Extend the data segment of a process to the address specified by end_data_segement. The system call is similar to the mmap implementation method. A vma is generated and its attributes are specified. however, before that, you need to check the validity of the address, for example, whether the address is greater than mm-> end_code, end_data_segement, and mm-> brk. the vma ing file generated by brk is empty, which is similar to the vma generated by anonymous ing. the library function malloc is implemented through brk.

[1] [2] [3] Next page

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.