Mmap Implementation Analysis

Source: Internet
Author: User

Mmap Implementation Analysis

Mmap Implementation Analysis

This article does not introduce how to use mmap functions, but analyzes its kernel implementation. There is a lot of information available on the Internet. The essence of Mmap is to assign (or find) a suitable vma for the current process, and then set the corresponding page missing processing function for the vma.

We know that mmap can be divided into anonymous ing and non-Anonymous ing Based on flag, and also into shared ing and private ing. In this way, we get four mappings from two dimensions.

(1) Anonymous shared ing: fd is-1, which can be used for parent-child process communication.

(2) anonymous private ing: for example, malloc large memory (larger than 128 k ).

(3) Non-Anonymous shared ing: Commonly Used for Process Communication.

(4) non-anonymous private ing: for example, this method is used when a program loads so at startup, which is equivalent to "Copy at write time ".

Next we will look at the differences between several methods in the kernel.

In the kernel, mmap is implemented mainly by the sys_mmap_pgoff function, which is defined in mm/mmap. c.

  1. SYSCALL_DEFINE6 (mmap_pgoff, unsigned long, addr, unsigned long, len,
  2. Unsigned long, prot, unsigned long, flags,
  3. Unsigned long, fd, unsigned long, pgoff)
  4. {
  5. Struct file * file = NULL;
  6. Unsigned long retval =-EBADF;
  7. If (! (Flags & MAP_ANONYMOUS) {/* anonymous ing */
  8. Audit_mmap_fd (fd, flags );
  9. If (unlikely (flags & MAP_HUGETLB ))
  10. Return-EINVAL;
  11. File = fget (fd);/* Find the corresponding file structure by fd */
  12. If (! File)
  13. Goto out;
  14. If (is_file_hugepages (file ))
  15. Len = ALIGN (len, huge_page_size (hstate_file (file )));
  16. } Else if (flags & MAP_HUGETLB ){
  17. /*......*/
  18. }
  19. Flags & = ~ (MAP_EXECUTABLE | MAP_DENYWRITE );
  20. Retval = vm_mmap_pgoff (file, addr, len, prot, flags, pgoff );
  21. If (file)
  22. Fput (file );
  23. Out:
  24. Return retval;
  25. }

The main function of this function is implemented by vm_mmap_pgoff, while the main logic of vm_mmap_pgoff is to call do_mmap_pgoff. The following describes the implementation of vm_mmap_pgoff.

Ldo_mmap_pgoff

  1. Unsigned long do_mmap_pgoff (struct file * file, unsigned long addr,
  2. Unsigned long len, unsigned long prot,
  3. Unsigned long flags, unsigned long pgoff,
  4. Unsigned long * populate)
  5. {
  6. Struct mm_struct * mm = current-> mm;
  7. Struct inode * inode;
  8. /*......*/
  9. /* Obtain the address to map to. we verify (or select) it and ensure
  10. * That it represents a valid section of the address space.
  11. */
  12. Addr = get_unmapped_area (file, addr, len, pgoff, flags );
  13. If (addr &~ PAGE_MASK)
  14. Return addr;
  15. /*......*/
  16. Addr = mmap_region (file, addr, len, vm_flags, pgoff );
  17. /*......*/
  18. Return addr;
  19. }

This function first creates (or obtains) an appropriate vma through get_unmapped_area, and then calls mmap_region to set the vma. Let's take a look at the implementation of mmap_region.

Lmmap_region

  1. Unsigned long mmap_region (struct file * file, unsigned long addr,
  2. Unsigned long len, vm_flags_t vm_flags, unsigned long pgoff)
  3. {
  4. Struct mm_struct * mm = current-> mm;
  5. Struct vm_area_struct * vma, * prev;
  6. Int correct_wcount = 0;
  7. Int error;
  8. Struct rb_node ** rb_link, * rb_parent;
  9. Unsigned long charged = 0;
  10. Struct inode * inode = file? File_inode (file): NULL;
  11. /*......*/
  12. If (file) {/* if it is not an anonymous ing */
  13. If (vm_flags & (VM_GROWSDOWN | VM_GROWSUP ))
  14. Goto free_vma;
  15. If (vm_flags & VM_DENYWRITE ){
  16. Error = deny_write_access (file );
  17. If (error)
  18. Goto free_vma;
  19. Correct_wcount = 1;
  20. }
  21. Vma-> vm_file = get_file (file );
  22. Error = file-> f_op-> mmap (file, vma);/* call the mmap function of the corresponding file system */
  23. If (error)
  24. Goto unmap_and_free_vma;
  25. Addr = vma-> vm_start;
  26. Pgoff = vma-> vm_pgoff;
  27. Vm_flags = vma-> vm_flags;
  28. } Else if (vm_flags & VM_SHARED) {/* shared anonymous ing */
  29. If (unlikely (vm_flags & (VM_GROWSDOWN | VM_GROWSUP )))
  30. Goto free_vma;
  31. Error = shmem_zero_setup (vma );
  32. If (error)
  33. Goto free_vma;
  34. }/* Private anonymous ing */
  35. File = vma-> vm_file;
  36. /*......*/
  37. }

If fd is input, the mmap function of the corresponding file system is called. Take the ext4 file system as an example. Its mmap function is ext4_file_mmap.

Lext4_file_mmap

  1. Static int ext4_file_mmap (struct file * file, struct vm_area_struct * vma)
  2. {
  3. Struct address_space * mapping = file-> f_mapping;
  4. If (! Mapping-> a_ops-> readpage)
  5. Return-ENOEXEC;
  6. File_accessed (file );
  7. Vma-> vm_ops = & ext4_file_vm_ops;
  8. Return 0;
  9. }

We can see that this function only sets vma-> vm_ops as the processing function of the current file system.

  1. Static const struct vm_operations_struct ext4_file_vm_ops = {
  2. . Fault = filemap_fault,
  3. . Page_mkwrite = ext4_page_mkwrite,
  4. . Remap_pages = generic_file_remap_pages,
  5. };

For anonymous ing (fd is not input), and shared flag is passed in. Then, call shmem_zero_setup.

Lshmem_zero_setup

  1. Int shmem_zero_setup (struct vm_area_struct * vma)
  2. {
  3. Struct file * file;
  4. Loff_t size = vma-> vm_end-vma-> vm_start;
  5. File = shmem_file_setup ("dev/zero", size, vma-> vm_flags );
  6. If (IS_ERR (file ))
  7. Return PTR_ERR (file );
  8. If (vma-> vm_file)
  9. Fput (vma-> vm_file );
  10. Vma-> vm_file = file;
  11. Vma-> vm_ops = & shmem_vm_ops;
  12. Return 0;
  13. }

We can see that we set vma-> vm_ops to the shmem_vm_ops of the tmpfs file system.

  1. Static const struct vm_operations_struct shmem_vm_ops = {
  2. . Fault = shmem_fault,
  3. # Ifdef CONFIG_NUMA
  4. . Set_policy = shmem_set_policy,
  5. . Get_policy = shmem_get_policy,
  6. # Endif
  7. . Remap_pages = generic_file_remap_pages,
  8. };

The entire mmap function process is as follows:

We know that the mmap function only allocates virtual memory space for the process, and does not really create a ing between virtual memory and physical memory. The ing process is implemented in the function of page disconnection.

The process of page disconnection is as follows:


  1. Int handle_pte_fault (struct mm_struct * mm,
  2. Struct vm_area_struct * vma, unsigned long address,
  3. Pte_t * pte, pmd_t * pmd, unsigned int flags)
  4. {
  5. Pte_t entry;
  6. Spinlock_t * ptl;
  7. /*......*/
  8. Entry = * pte;
  9. If (! Pte_present (entry )){
  10. If (pte_none (entry )){
  11. If (vma-> vm_ops)
  12. Return do_linear_fault (mm, vma, address,
  13. Pte, pmd, flags, entry );
  14. /* Anonymous private ing */
  15. Return do_anonymous_page (mm, vma, address,
  16. Pte, pmd, flags );
  17. }
  18. }

  19. Return 0;
  20. }


We can see that do_anonymous_page is called when vma-> vm_ops. Note that the function name is regarded as the logic of anonymous shared ing, but vma-> vm_ops is also set when the anonymous shared is analyzed based on the previous code. Only one case is not set, that is, anonymous private ing.

To sum up, we have the following conclusions:

(1) Non-Anonymous shared ing: Call the page missing function of the respective file systems;

(2) Non-anonymous private ing: Call the page missing function of the respective file systems;

(3) Anonymous shared ing: Call the page missing function of the tmpfs file system;

(4) anonymous private ing: do_anonymous_page is the only method currently supporting THP (transparent large page.

In addition: in fact, the underlying layer of posix and systemV shared memory is implemented through tmpfs. For details, seeHttp://hustcat.github.io/shared-memory-tmpfs/. But note that there are actually two tmpfs file systems in the kernel. One is the shared anonymous ing and systemV shared memory mounted by the kernel startup, and the other is mounted by mount, the default size is 1/2 of the system memory, which is used for posix shared memory.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.