LinuxMalloc analysis-from user space to kernel space

Source: Internet
Author: User

LinuxMalloc analysis-from user space to kernel space

This article introduces the implementation of malloc and the heap extension of malloc, and analyzes how the ing between virtual addresses and physical addresses is realized.

Ordeder original, original link: http://blog.csdn.net/ordeder/article/details/41654509

1. Background: 1.1 process user space

 

 

Figure 1: Source http://www.open-open.com/lib/view/open1409716051963.html

 

The mm_struct structure managed by the process task_struct.mm_struct is defined as follows:

 

Struct mm_struct {struct vm_area_struct * mmap;/* list of VMAs */... pgd_t * pgd; // used for address ing atomic_t mm_users;/* How many users with user space? */Atomic_t mm_count;/* How many references to "struct mm_struct" (users count as 1) */int map_count;/* number of VMAs */... // describe the segment distribution of user space: Data Segment, code segment, stack segment unsigned long start_code, end_code, start_data, end_data; unsigned long start_brk, brk, start_stack; unsigned long arg_start, arg_end, env_start, env_end; unsigned long rss, total_vm, locked_vm ;...};

 

Startxxx and endxxx in the structure describe the address of the Data Segment of the process user space. For the heap space, start_brk is the starting address of the heap space and the heap is extended up. For process heap space expansion, brk records the top position of the heap. The address space used by the Space dynamically applied by the process (the variables being used) is mapped. These address spaces are recorded in the struct vm_area_struct * mmap linked list.

1.2 address ing

Ing between virtual addresses and physical addresses: http://blog.csdn.net/ordeder/article/details/41630945

 

2 malloc and free

Malloc is a function interface used for user space heap extension. This function is a glibc library function that encapsulates related system calls (brk. Instead of system calling (the system does not have sys_malloc (). If you talk about the operations of the system kernel involved in the malloc function, the general discussion can be divided into the user space level and the kernel space level.

2.1 user layer

 

Malloc source code visible http://repo.or.cz/w/glibc.git/blob/HEAD:/malloc/malloc.c

Malloc and free work at the user layer, which provides users with an interface that is more convenient to manage heap. It is mainly used to maintain a idle heap space buffer linked list. The buffer can be expressed in the following data structure:

 

struct malloc_chunk {INTERNAL_SIZE_T prev_size; /* Size of previous chunk (if free). */INTERNAL_SIZE_T size; /* Size in bytes, including overhead. */struct malloc_chunk* fd; /* double links -- used only if free. */struct malloc_chunk* bk;/* Only used for large blocks: pointer to next larger size. */struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */struct malloc_chunk* bk_nextsize;};

 

The idle buffer linked list of the simplified version is as follows. The head in the figure is the above malloc_chunk structure. The memory interval following the size is the data zone corresponding to the chunk.



[Malloc]

Every time a process calls malloc, it first finds enough memory blocks in the heap buffer and allocates them to the process (select the block in the buffer zone to have two algorithms: first hit and best hit ). If the freechunklist cannot meet the chunk requirements, malloc will call brk () to expand the heap of the process space by calling the system, create a new chunk in The New Extended heap space and add it to freelist. This process is equivalent to a process that wants the system to apply for a piece of memory in batches (the size may be much larger than the actual requirement ).

The address returned by malloc is the first address of the chunk used to store data, that is, chunk + sizeof (chunk)

A simple pseudo code that hits malloc for the first time:

Chunk free_listmalloc (size) foreach (chuck in freelist) if (chunk. size> size) return chunk + sizeof (chunk) // The idle buffer cannot meet the requirements, so like the system wholesale memory add = sys_brk (brk + (size + sizeof (chunk ))) newchunk = (chunk) add; newchunk. size = size ;... return newchunk + sizeof (newchunk)

[Free]

The free operation recycles the heap space. The recycled blocks are not immediately returned to the kernel. Instead, the chunk corresponding to the block is marked as idle and added to the idle queue. Of course, if the chunk of the adjacent address appears in the idle queue, you can consider merging. The memory fragmentation has been solved to meet the subsequent large memory application requirements.

A simple free pseudocode: add the released address space to the idle linked list

 

free(add)  pchunk = add - sizeof(chunk)  insert_to_freelist(pchunk)


2.2 kernel layer

 

In the preceding article, the idle chunk list of malloc cannot meet your needs. Therefore, you need to use sys_brk () to expand the heap, which is truly equivalent to entering the kernel space.
Sys_brk () involves the following main operations:
1. The upper boundary brk in the mm_struct extends to newbrk: Apply for a vma, vma. start = brk vma. end = newbrk
2. Physical memory ing for the virtual interval block: from virtual space vma. start ~ Map each memory page in vma. end:
addr = vma.startdo{  handle_mm_fault(mm,vma,addr,...)  addr += PAGESIZE}while(addr< vma.end)

 

The handle_mm_fault function maps the Memory Page of the addr to the physical page. Converts and maps a virtual space to a physical space.

1. Apply for a physical page through alloc_page;

2. Convert the pte address of the addr in the process pdg ing;

3. Set the pte corresponding to addr to the first address of the physical page.


2.3 virtual address and physical address

 

When a process reads the heap space address vaddr, The ing between the virtual address vaddr and the physical page is shown in.

 

1. The virtual address vaddr of the user space uses MMU (pgd, pmd, pte) to find the physical address paddr of the corresponding page table item pte record
2. The 20-bit height of the paddr in the page table item is the physical page number: index = x> PAGE_SHIFT. Similarly, 12 zeros after the index are the first address of the physical page table.
3. Through the physical page number, we can find the description pointer mem_map [index] of the physical page in the kernel. For the Page Structure, see http://blog.csdn.net/ordeder/article/details/41630945.

 

3. Conclusion

 

1. How does Malloc and free look like a memory pool of user space. Especially free.

2 heap expansion is based on brk movement. Vm_area records the address Blocks Used in the virtual space.

3. The ing between virtual addresses of each process and physical addresses is determined by the process mm. pgd. The ing between virtual page numbers and physical page numbers is recorded in this structure.

Reference

Kernel source code Scenario Analysis

Http://blog.csdn.net/kobbee9/article/details/7397010

Http://www.open-open.com/lib/view/open1409716051963.html

Appendix

 

# Define pgd_offset (mm, address) (mm)-> pgd + pgd_index (address) int handle_mm_fault (struct mm_struct * mm, struct vm_area_struct * vma, unsigned long address, int write_access) {int ret =-1; pgd_t * pgd; pmd_t * pmd; pgd = pgd_offset (mm, address); pmd = pmd_alloc (pgd, address); if (pmd) {pte_t * pte = pte_alloc (pmd, address); // if (pte) ret = handle_pte_fault (mm, vma, address, Write_access, pte);} return ret;} // 32-bit address. pmd has no significance in extern inline pmd_t * pmd_alloc (pgd_t * pgd, unsigned long address) {return (pmd_t *) pgd;} // build the pte index item extern inline pte_t * pte_alloc (pmd_t * pmd, unsigned long address) for the page where the address is located {address = (address> PAGE_SHIFT) & (PTRS_PER_PTE-1); if (pmd_none (* pmd) {pte_t * page = get_pte_fast (); if (! Page) return get_pte_slow (pmd, address); pmd_set (pmd, page); return page + address;} if (pmd_bad (* pmd) {__ bad_pte (pmd ); return NULL;} return (pte_t *) _ pmd_page (* pmd) + address;} // allocate the physical page static inline int handle_pte_fault (struct mm_struct * mm, struct vm_area_struct * vma, unsigned long address, int write_access, pte_t * pte) {pte_t entry; entry = * pte; if (! Pte_present (entry )){... if (pte_none (entry) return do_no_page (mm, vma, address, write_access, pte); // page missing, physical page allocated ...}... return 1;} static int do_no_page (struct mm_struct * mm, struct vm_area_struct * vma, unsigned long address, int write_access, pte_t * page_table) {struct page * new_page; pte_t entry; // anonymous (for virtual buckets) Physical if ing if (! Vma-> vm_ops |! Vma-> vm_ops-> nopage) return do_anonymous_page (mm, vma, page_table, write_access, address); // the following error occurs when the file is not displayed ...} // The page pointer can be used to calculate the physical address of the page: physical address = (page pointer-mem_map) * page size + physical memory start address/** anonymous ing, used to store data in physical memory */static int do_anonymous_page; pte_t entry = pte_wrprotect (mk_pte (ZER O_PAGE (addr), vma-> vm_page_prot); if (write_access) {page = alloc_page (GFP_HIGHUSER); // allocate memory from high-end memory if (! Page) return-1; clear_user_highpage (page, addr); entry = pte_mkwrite (pte_mkdirty (mk_pte (page, vma-> vm_page_prot); mm-> rss ++; flush_page_to_ram (page);} set_pte (page_table, entry); // * page_table = entry;/* No need to invalidate-it was non-present before */update_mmu_cache (vma, addr, entry); return 1;/* Minor fault */} # define _ MEMORY_STARTCONFIG_MEMORY_START // The starting address void flush_page_to_ram (struct page * pg) in the physical memory for Dynamic Allocation) {unsigned long phys;/* Physical address of this page */phys = (pg-mem_map) * PAGE_SIZE + _ MEMORY_START ;__ flush_page_to_ram (phys_to_virt (phys ));} # define _ pai_to_phys (vpage)-PAGE_OFFSET + PHYS_OFFSET) # define _ phys_to_virt (ppage) + PAGE_OFFSET-PHYS_OFFSET)


 


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.