Linux interprocess communication--mmap () shared Memory (ii)

Source: Internet
Author: User

How the kernel guarantees that each process addresses the memory page of the same shared memory area

1. Page cache and swap cache pages: a physical page of an accessed file resides in the page cache or swap cache, and all the information for a page is described by a struct page. There is a field in the struct page that is pointer mapping, which points to a struct ADDRESS_SPACE type structure. All pages in the page cache or swap cache are differentiated according to the address_space structure and an offset.

2, the file and address_space structure of the corresponding: a specific file after opening, the kernel will be in memory for it to establish a struct inode structure, where the i_mapping domain points to a address_space structure. Thus, a file corresponds to a address_space structure, a address_space with an offset to determine a page in the cache or swap cache. Therefore, when addressing a data, it is easy to find the corresponding page based on the offset of the given file and data within the file.

3. When the process calls Mmap (), it simply adds a buffer of the corresponding size within the process space and sets the corresponding access identity, but does not establish a mapping of the process space to the physical page. Therefore, when the space is first accessed, a page fault is thrown.

4, for the shared memory mapping situation, the page fault handler first looks for the target page in the swap cache (the physical page that conforms to the address_space and the offset), returns the address if it is found, and if it is not found, determines whether it is in the swap area, If so, a swap is performed, and if neither of these cases is satisfied, the handler assigns a new physical page and inserts it into the page cache. The process eventually updates the Process page table.
Note: For mapping normal file conditions (non-shared mappings), the page fault handler first looks for the corresponding pages in page cache based on the address_space and the data offset. If not found, the file data is not yet read into memory, the handler reads the corresponding page from disk and returns the corresponding address, and the Process page table is updated.

5, all processes when mapping the same shared memory area, the same situation, after establishing the mapping between the linear address and the physical address, regardless of the process's respective return address, the actual access is the same shared memory area corresponding to the physical page.
Note: A shared memory area can be viewed as a file in a special file system SHM, and the SHM installation point is on the swap area.

Mmap () system calls enable shared memory between processes by mapping the same common file. After the normal file is mapped to the process address space, the process can access the same file as the normal memory without having to call read (), write (), and so on.

Note: In fact, the mmap () system call is not designed entirely for shared memory. It itself provides a different way of accessing ordinary files than normal, and processes can operate on ordinary files like read-write memory. The shared memory IPC for POSIX or System V is purely for sharing purposes, and of course mmap () realizes shared memory is also one of its main applications.

The system calls Mmap () to share memory in two ways:

(1) Memory mapping provided with normal files: applies to any process; At this point, you need to open or create a file and then call Mmap (); The typical calling code is as follows:

Fd=open (name, flag, mode); if (fd<0) ...

Ptr=mmap (NULL, Len, prot_read| Prot_write, map_shared, FD, 0); There are many features and areas of communication that enable shared memory through Mmap (), and we will specify them in the example.

(2) Use special files to provide anonymous memory mapping: Applies to relationships between processes; Because of the special affinity of a parent-child process, call Mmap () first in the parent process and then call Fork (). Then, after calling Fork (), the child process inherits the address space of the parent process after the anonymous mapping and also inherits the address returned by Mmap (), so that the parent-child process can communicate through the mapped area. Note that this is not a general inheritance relationship. In general, child processes maintain separate variables inherited from the parent process. The address returned by the mmap () is maintained by the parent-child process together.
The best way to implement shared memory for a genetically-related process should be to use anonymous memory mapping. At this point, you do not have to specify a specific file, just set the appropriate flag

Talking about shared memory is generally reminiscent of the following methods:
1, multi-threaded. The memory between threads is shared. More specifically, threads belonging to the same process use the same address space instead of sharing memory between different address spaces;
2. Memory sharing between parent and child processes. Parent process with map_shared| The map_anonymous option mmap a piece of anonymous memory that, after fork, can share the memory between its descendants. This kind of shared memory is generally less used because it is limited by the parent-child relationship of process;
3, mmap file. Multiple processes mmap to the same file, which is actually the memory of everyone in the shared File page cache. However, the file involved in disk read and write, used to do the shared memory is very cumbersome, so there is no disk-related memory files, that is, we have to discuss the TMPFS and Shmem;

TMPFS is a set of virtual file systems in which the files created are memory-based and the machine restarts and disappears.
Shmem is a set of IPC that, through the corresponding IPC system, calls Shmget to create a piece of shared memory with the specified key. A process that needs to use this memory can get it by Shmat system calls.
Although there are two different sets of interfaces, the implementation in the kernel is the same. The Shmem internally mounts a TMPFS partition (the user is not visible), and Shmget is getting a file named "Sysv${key}" under that partition. Then Shmat is equivalent to mmap this file.
So we're going to talk about Tmpfs and shmem as the same thing next.

Tmpfs/shmem is something between a file and an anonymous memory.
On the one hand, it has the properties of the file and is able to manipulate it like a file. It has its own inode, has its own page cache;
On the other hand, it also has the properties of anonymous memory. Because there is no external storage media like disk, the kernel cannot simply discard the page from their page cache when memory is scarce, and need swap-out; (see "Analysis of Linux page recycling")

Reading and writing to TMPFS/SHMEM memory is the memory that is represented by the page in the page cache, which is no different from normal file mappings.
If the corresponding location of the process address space is not already mapped, a mapping of the corresponding page in the page cache is established;
If the corresponding location in the page cache has not been assigned a page, it is assigned a. Of course, since there is no source data on the disk, the newly assigned page is always empty (in particular, when a read system call goes to a location that has not been assigned a page, the new page is not assigned, but the zero_page is shared);
If the page in the page cache is recycled, it will be restored first;

For the third "if", page recycling of Tmpfs/shmem and ordinary files is different from the way it is restored:
When page is recycled, as with normal files, the kernel finds each page table that maps the page by prio_tree the reverse map and empties the corresponding PTEs.
The difference is that the page of the normal file can be discarded after ensuring that it is synchronized with the disk (if the page is dirty), and the page for Tmpfs/shmem needs to be swap-out.
Note that when the anonymous page is swap-out, it is not to empty the PTEs that map it, but to fill out the corresponding swap_entry on the PTE so that it knows where the page is swapped out, otherwise it cannot be swap-in when the page is needed.
and Tmpfs/shmem's page? The corresponding Pte in page table is emptied and swap_entry is stored in the corresponding slot on page cache's radix_tree.

When the next visit triggers page fault, the page needs to be restored.
The page recovery of the normal file is the same as when the page is not allocated, it needs to be assigned a new page, and then read out the corresponding data from the disk according to the location of the map;
And Tmpfs/shmem is through the location of the map to find the corresponding slot on the Radix_tree, from which to get swap_entry, so swap-in, and the new page back to the page cache;

Here's a question, how do you know that the normal page is stored in a slot on page cache's Radix_tree? Or the swap_entry left after swap-out?
If it is swap_entry, then the value on the slot is added with the radix_tree_exceptional_entry tag (the value is 2). The value of the swap_entry is shifted to the left by two digits or radix_tree_exceptional_entry, filled in slots.
That is, if ${slot} & Radix_tree_exceptional_entry! = 0, then it represents swap_entry, and the value of Swap_entry is ${slot} >> 2; otherwise it represents page,${ Slot} is a pointer to the page, and of course its value may be null, indicating that the page has not been assigned.
Then obviously, the page's address value of its last two bits is definitely 0, otherwise it may conflict with the radix_tree_exceptional_entry tag, and the value of Swap_entry can only be 30bit or 62bit (corresponding to 32-bit or 64-bit machine), Otherwise, shift the left two-bit to overflow.

Finally, a picture illustrates the collection and recovery process for anonymous page, file map page, Tmpfs/shmem page:



Linux interprocess communication--mmap () shared Memory (ii)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.