Kernel and binder MMAP implementation

Source: Internet
Author: User

 

1. Introduction

The following functions call MMAP in user State:

void*   mmap( void*  addr,  size_t  size, int  prot, int  flags, int  fd,  long  offset )

Then go to system call.

 

2. kernel MMAP implementation

1) then go to the system call. The system call number is:

Kernel/ARCH/ARM/include/ASM/unistd. h

# DEFINE _ nr_mmap2 (_ nr_syscall_base + 192)

2) Soft Interrupt triggered

The ISR code is located in the entry (vector_swi) of the kernel/ARCH/ARM/kernel/entry-common.S, and the function of _ nr_mmap2 is: sys_mmap2 (located in Linux/ARCH/ARM/kernel/CILS. s)

3) Implementation of sys_mmap2

Located in the kernel/ARCH/ARM/kernel/entry-common.S, the implementation code is as follows:

/* * Note: off_4k (r5) is always units of 4K.  If we can't do the requested * offset, we return EINVAL. */sys_mmap2:#if PAGE_SHIFT > 12tstr5, #PGOFF_MASKmoveqr5, r5, lsr #PAGE_SHIFT - 12streqr5, [sp, #4]beqsys_mmap_pgoffmovr0, #-EINVALmovpc, lr#elsestrr5, [sp, #4]bsys_mmap_pgoff#endif

4) Call sys_mmap_pgoff.

The definition in kernel/include/Linux/syscall. H is as follows:

asmlinkage long sys_mmap_pgoff(unsigned long addr, unsigned long len,unsigned long prot, unsigned long flags,unsigned long fd, unsigned long pgoff);

6) sys_mmap_pgoff implementation
The implementation in kernel/MM/MMAP. C is as follows:

SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,unsigned long, prot, unsigned long, flags,unsigned long, fd, unsigned long, pgoff){struct file *file = NULL;unsigned long retval = -EBADF;if (!(flags & MAP_ANONYMOUS)) {audit_mmap_fd(fd, flags);if (unlikely(flags & MAP_HUGETLB))return -EINVAL;file = fget(fd);if (!file)goto out;} else if (flags & MAP_HUGETLB) {struct user_struct *user = NULL;/* * VM_NORESERVE is used because the reservations will be * taken when vm_ops->mmap() is called * A dummy user value is used because we are not locking * memory so no accounting is necessary */len = ALIGN(len, huge_page_size(&default_hstate));file = hugetlb_file_setup(HUGETLB_ANON_FILE, len, VM_NORESERVE,&user, HUGETLB_ANONHUGE_INODE);if (IS_ERR(file))return PTR_ERR(file);}flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);down_write(¤t->mm->mmap_sem);retval = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);up_write(¤t->mm->mmap_sem);if (file)fput(file);out:return retval;}

This function is used to obtain available virtual address space (vm_area_struct * VMA) from the current process and obtain VMA in mmap_region, then, call file-> f_op-> MMAP (file, VMA) and call the MMAP-supported driver for processing.

The following uses the binder driver as an example.

3. Binder MMAP implementation

The binder-driven MMAP function is binder_mmap. The implementation code is as follows:

static int binder_mmap(struct file *filp, struct vm_area_struct *vma){int ret;struct vm_struct *area;struct binder_proc *proc = filp->private_data;const char *failure_string;struct binder_buffer *buffer;if ((vma->vm_end - vma->vm_start) > SZ_4M)vma->vm_end = vma->vm_start + SZ_4M;binder_debug(BINDER_DEBUG_OPEN_CLOSE,     "binder_mmap: %d %lx-%lx (%ld K) vma %lx pagep %lx\n",     proc->pid, vma->vm_start, vma->vm_end,     (vma->vm_end - vma->vm_start) / SZ_1K, vma->vm_flags,     (unsigned long)pgprot_val(vma->vm_page_prot));if (vma->vm_flags & FORBIDDEN_MMAP_FLAGS) {ret = -EPERM;failure_string = "bad vm_flags";goto err_bad_arg;}vma->vm_flags = (vma->vm_flags | VM_DONTCOPY) & ~VM_MAYWRITE;if (proc->buffer) {ret = -EBUSY;failure_string = "already mapped";goto err_already_mapped;}area = get_vm_area(vma->vm_end - vma->vm_start, VM_IOREMAP);if (area == NULL) {ret = -ENOMEM;failure_string = "get_vm_area";goto err_get_vm_area_failed;}proc->buffer = area->addr;proc->user_buffer_offset = vma->vm_start - (uintptr_t)proc->buffer;#ifdef CONFIG_CPU_CACHE_VIPTif (cache_is_vipt_aliasing()) {while (CACHE_COLOUR((vma->vm_start ^ (uint32_t)proc->buffer))) {printk(KERN_INFO "binder_mmap: %d %lx-%lx maps %p bad alignment\n", proc->pid, vma->vm_start, vma->vm_end, proc->buffer);vma->vm_start += PAGE_SIZE;}}#endifproc->pages = kzalloc(sizeof(proc->pages[0]) * ((vma->vm_end - vma->vm_start) / PAGE_SIZE), GFP_KERNEL);if (proc->pages == NULL) {ret = -ENOMEM;failure_string = "alloc page array";goto err_alloc_pages_failed;}proc->buffer_size = vma->vm_end - vma->vm_start;vma->vm_ops = &binder_vm_ops;vma->vm_private_data = proc;if (binder_update_page_range(proc, 1, proc->buffer, proc->buffer + PAGE_SIZE, vma)) {ret = -ENOMEM;failure_string = "alloc small buf";goto err_alloc_small_buf_failed;}buffer = proc->buffer;INIT_LIST_HEAD(&proc->buffers);list_add(&buffer->entry, &proc->buffers);buffer->free = 1;binder_insert_free_buffer(proc, buffer);proc->free_async_space = proc->buffer_size / 2;barrier();proc->files = get_files_struct(current);proc->vma = vma;/*printk(KERN_INFO "binder_mmap: %d %lx-%lx maps %p\n", proc->pid, vma->vm_start, vma->vm_end, proc->buffer);*/return 0;err_alloc_small_buf_failed:kfree(proc->pages);proc->pages = NULL;err_alloc_pages_failed:vfree(proc->buffer);proc->buffer = NULL;err_get_vm_area_failed:err_already_mapped:err_bad_arg:printk(KERN_ERR "binder_mmap: %d %lx-%lx %s failed %d\n",       proc->pid, vma->vm_start, vma->vm_end, failure_string, ret);return ret;}

1) obtain the kernel virtual address space:
Struct vm_struct * area;
Area = get_vm_area (VMA-> vm_end-VMA-> vm_start, vm_ioremap );

Based on the passed VMA (the data structure is vm_area_struct, which belongs to a piece of space of the process and is used for ing with the kernel space ), call get_vm_area to obtain a continuous space of the same size in the vmalloc area of the kernel. The data structure is vm_struct, and the structure is added to vm_list for unified management.

 

2) Save the starting address of the kernel virtual address space for later use:

Proc-> buffer = Area-> ADDR;

 

3) Calculate and save the difference between the starting address of the Process User-state virtual address space and the starting address of the kernel state virtual address space for later use.

Proc-> user_buffer_offset = VMA-> vm_start-(uintptr_t) proc-> buffer;

 

4) Allocate physical page table items (struct page)

Proc-> pages = kzarloc (sizeof (proc-> pages [0]) * (VMA-> vm_end-VMA-> vm_start)/page_size), gfp_kernel );

 

5) binder_update_page_range

It works as follows:

A) allocate a physical page

B) Create a page table for the VMA user space and a page table ing relationship for the vmalloc area respectively.

There is a virtual address space in the user State and the kernel state, but it cannot be accessed because there is no corresponding physical memory.

Additional knowledge:

A) struct page is used to track whether a physical page is being used. All page structures will be stored in a Global Array called mem_map.

B) each process's task_struct contains a pointer to the mm_struct structure. the mm_struct of the process contains the page Directory pointer pgd of the process executable image. it also contains several pointers to vm_area_struct. Each vm_area_struct contains the virtual address area of a process.

Binder_update_page_range (Proc, 1, proc-> buffer, proc-> buffer + page_size, VMA)

Proc-> buffer points to the starting address of the vmalloc region of the kernel. VMA (vm_area_struct) and area (vm_struct) are available before ). The binder_update_page_range implementation code is as follows:

 

Static int round (struct binder_proc * proc, int allocate, void * Start, void * end, struct limit * VMA) {void * page_addr; unsigned long user_page_addr; struct vm_struct tmp_area; struct page ** page; struct mm_struct * mm; binder_debug (binder_debug_buffer_alloc, "Binder: % d: % s pages % P-% P \ n", proc-> PID, allocate? "Allocate": "free", start, end); If (end <= Start) return 0; If (VMA) Mm = NULL; elsemm = get_task_mm (proc-> TSK); If (MM) {down_write (& mm-> mmap_sem); VMA = proc-> VMA;} If (allocate = 0) goto free_range; If (VMA = NULL) {printk (kern_err "Binder: % d: binder_alloc_buf failed to" "Map pages in userspace, no VMA \ n ", proc-> PID); goto err_no_vma;} For (page_addr = start; page_addr <end; page_addr + = page_size) {Int ret; struct page ** page_array_ptr; page = & proc-> pages [(page_addr-proc-> buffer)/page_size]; bug_on (* page ); // allocate a physical page * page = alloc_page (gfp_kernel | _ gfp_zero); If (* page = NULL) {printk (kern_err "Binder: % d: binder_alloc_buf failed "" For page at % P \ n ", proc-> PID, page_addr); goto counter;} tmp_area.addr = page_addr; tmp_area.size = page_size + page_size/* guard page? */; Page_array_ptr = page; // based on the virtual address in the kernel state, allocate the corresponding pud, PMD, and PTE, and fill in the corresponding value // to make the virtual address, you can address the corresponding physical storage unit ret = map_vm_area (& tmp_area, page_kernel, & page_array_ptr) through PGD, pud, PMD, and PTE; If (RET) {printk (kern_err "Binder: % d: binder_alloc_buf failed "" To map page at % P in kernel \ n ", proc-> PID, page_addr); goto err_map_kernel_failed;} user_page_addr = (uintptr_t) page_addr + proc-> user_buffer_offset; // insert a page of VMA to the user space based on the virtual address of the user State. // when the user space accesses a page of memory starting from user_page_addr, // you can access the storage unit ret = vm_insert_page (VMA, user_page_addr, page [0]) corresponding to the physical page corresponding to the page; If (RET) {printk (kern_err "binary: % d: binder_alloc_buf failed" "To map page at % lx in userspace \ n", proc-> PID, user_page_addr); goto err_vm_insert_page_failed ;} /* vm_insert_page does not seem to increment the refcount */} If (MM) {up_write (& mm-> mmap_sem); matrix (mm);} return 0; free_range: for (page_addr = end-page_size; page_addr> = start; page_addr-= page_size) {page = & proc-> pages [(page_addr-proc-> buffer)/page_size]; if (VMA) zap_page_range (VMA, (uintptr_t) page_addr + proc-> user_buffer_offset, page_size, null); values: values (unsigned long) page_addr, page_size); values: __free_page (* page); * page = NULL; err_alloc_page_failed:;} err_no_vma: If (MM) {up_write (& mm-> mmap_sem); matrix (mm );} return-enomem ;}

 

A)Map_vm_area:Map the virtual address of the kernel to the physical memory, and map the page table for the continuous address space in the vmalloc area. Of course, the vm_struct (Virtual Address Provided) parameter and page parameter (used to make PTE) are required ), this completes the ing of the kernel zone.

B) vm_insert_page:Update the page table corresponding to VMA to implement the MMAP function.

C) when calling binder_update_page_range (Proc, 1, proc-> buffer, proc-> buffer + page_size, VMA), only one page is allocated. This is to save space and allocate as needed. Process Virtual space and vmalloc kernel space are allocated as needed. It does not occupy the actual physical memory, so it occupies all the required space at the beginning, and the actual physical page is obtained as needed;

Proc-> VMA is the user space of the calling process;

Proc-> files is the files_struct structure of the calling process;

Proc-> buffer_size is the length to be mapped (less than 4 m)-sizeof (struct binder_buffer );

Proc-> pages is the pointer array of the allocated physical page. There is only one entry at the beginning, that is, one page, but the length is reserved;

Proc-> buffer is the first address in the kernel continuous ing area;

Proc-> user_buffer_offset is the first address of the user space ing area-the first address of the kernel space continuous ing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.