Linux malloc analysis-from user space to kernel space

Source: Internet
Author: User

This article describes the implementation of malloc and its malloc in the heap expansion operation, and analyzes how the virtual address to the physical address is how to implement the mapping relationship.

Ordeder Original, original link: http://blog.csdn.net/ordeder/article/details/41654509

1 Background Knowledge 1.1 process user space

Figure 1: Source http://www.open-open.com/lib/view/open1409716051963.html

The structure is defined by the process task_struct.mm_struct management mm_struct as follows:

struct Mm_struct {struct vm_area_struct * mmap;/* List of VMAs */...pgd_t * pgd;//for address mapping atomic_t mm_users;/* how many user s with user space? */atomic_t mm_count;/* How many references to "struct Mm_struct" (Users count as 1) */int map_count;/* number of VMAs */. .//Describes the segment distribution of user space: Data segment, code snippet, stack segment unsigned long start_code, End_code, Start_data, end_data;unsigned long start_brk, BRK, Start_ stack;unsigned long Arg_start, Arg_end, Env_start, env_end;unsigned long rss, TOTAL_VM, LOCKED_VM, ...};

The startxxx and endxxx in the structure describe the address where the process user space data segment is located. For heap space, START_BRK is the starting address for heap space, and the heap is scaled up. For the expansion of the process heap space, BRK to record the top position of the heap. The address space (variables being used) that the process dynamically requests for space is mapped, and these address spaces are recorded in the linked list struct vm_area_struct * mmap.

1.2 Address mapping

Mapping of virtual addresses and physical addresses: http://blog.csdn.net/ordeder/article/details/41630945

2 malloc and free

malloc is used for the function interface of the user space heap extension. The function is a C library, which belongs to the GLIBC library function that encapsulates the associated system call (BRK ()). Instead of a system call (the system can have no sys_malloc (). If you talk about the operations of the kernel of the system that the malloc function involves, then the whole can be discussed at the user-space level and the kernel-space level.

2.1 User Layer

The source of malloc is visible http://repo.or.cz/w/glibc.git/blob/HEAD:/malloc/malloc.c

malloc and free work at the user level, which provides a convenient interface for the user to manage the heap. Its main work is to maintain a free heap space buffer chain list. The buffer can be expressed in the following data structures:

struct Malloc_chunk {internal_size_t prev_size;/* Size of previous chunk (if free). */internal_size_t size;/* size in by TES, including overhead. */struct malloc_chunk* FD; /* Double links--Used only if free. */struct malloc_chunk* bk;/* only used for large blocks:pointer to next larger size. */struct malloc_chunk* fd_nextsize; /* Double links--Used only if free. */struct malloc_chunk* bk_nextsize;};

The simplified version of the free buffer chain list is shown below, and the head in the figure is the malloc_chunk structure described above. The size of the immediately following memory interval is the chunk corresponding data area.



"Malloc"

Whenever a process calls malloc, it first finds a chunk of memory in that heap buffer allocated to the process (the block in the selection buffer has the first hit and the best hit of two algorithms). If Freechunklist is unable to meet the requirements of chunk, then malloc expands the heap of process space by invoking the system call BRK (), builds a new chunk on the newly expanded heap space, and joins the Freelist. This process is equivalent to the process of bulk want the system to request a piece of memory (size may be much larger than the actual demand).

The address that malloc returns is the first address in chunk that is used to store data, namely: Chunk + sizeof (chunk)


A simple pseudo-code that hits malloc for the first time:

Chunk Free_listmalloc (size)  foreach (chuck in freelist)    if (chunk.size >size)      return chunk + sizeof ( Chunk)  ///idle buffer does not meet demand, then like system wholesale memory  add = SYS_BRK (brk+ (Size +sizeof (chunk)))  Newchunk = (chunk) add;  newchunk.size = size;  ...  return newchunk + sizeof (newchunk)


"Free"

The free operation is a collection of heap space, and the recovered chunks are not immediately returned to the kernel. Instead, the chunk "mark" for the chunk is idle and is added to the idle queue. Of course, if the chunk of neighboring addresses appear in the idle queue, then you can consider merging to resolve the fragmentation of memory and the need for a large memory request after it is satisfied.

A simple free pseudo-code: Adds the freed address space to the idle list

Free (add)  Pchunk = add-sizeof (chunk)  insert_to_freelist (Pchunk)


2.2 Kernel Layer

Above, the idle chunk list of malloc does not meet the needs of the user, so the heap expansion through SYS_BRK () is really the time to enter the kernel space.
The main operations involved in SYS_BRK () are:
1. The upper bound of the heap in Mm_struct BRK extends to NEWBRK: that is, apply for a piece of vma,vma.start=brk vma.end=newbrk
2. Mapping of physical memory for this virtual interval block: Mapping from each memory page in the virtual space vma.start~vma.end:
addr = vma.startdo{  handle_mm_fault (mm,vma,addr,...)  Addr + = Pagesize}while (addr< vma.end)

The Handle_mm_fault function maps the physical page to the memory page where addr resides. Realize the conversion and mapping of virtual space to physical space.

1. Apply for a physical page through alloc_page;

2. Convert the PTE address where the addr is located in the process PDG mapping;

3. Set the addr corresponding Pte to the first address of the physical page.


2.3 Virtual address and Physical address

When the process reads the address of the heap space vaddr, the virtual address vaddr to the physical page is mapped as shown.


1. The virtual address of the user space vaddr the physical address of the corresponding page table entry Pte record through the MMU (PGD,PMD,PTE) paddr
2. The high 20 bits of the page table entry paddr are physical page numbers: index = x >> page_shift, in the same way, 12 0 after index is the first address of the physical page table.
3. Through the physical page number, we can then find the description of the physical page in the kernel pointer mem_map[index]. The page structure can refer to http://blog.csdn.net/ordeder/article/details/41630945.

3 Summary

1 Malloc and free are looking at a pool of user-space memory. The implementation of the special free.

2 The expansion of the heap is based on the movement of BRK. Vm_area records the address blocks that have been used in the virtual space.

3 The mapping of the virtual address to the physical address of each process is determined by the process MM.PGD, in which the mapping of the virtual page number to the physical page number is recorded.

Reference

Kernel Source Scenario Analysis

http://blog.csdn.net/kobbee9/article/details/7397010

Http://www.open-open.com/lib/view/open1409716051963.html

Appendix

#define PGD_OFFSET (mm, address) ((mm)-&GT;PGD + pgd_index (address)) int Handle_mm_fault (struct mm_struct *mm, struct VM_ Area_struct * vma,unsigned long address, int write_access) {int ret = -1;pgd_t *pgd;pmd_t *PMD;PGD = Pgd_offset (mm, address );p MD = Pmd_alloc (PGD, address), if (PMD) {pte_t * pte = Pte_alloc (PMD, address),//PMD is empty, so the PTE project if (PTE) is returned pgd[address] ret = Handle_pte_fault (mm, VMA, address, write_access, Pte);} return ret;} 32-bit address, PMD meaningless extern inline pmd_t * PMD_ALLOC (pgd_t * PGD, unsigned long address) {return (pmd_t *) PGD;} To build a PTE index entry for the page where address addresses are located extern-inline pte_t *pte_alloc (pmd_t *pmd, unsigned long address) {address = (address >> PAG) E_shift) & (Ptrs_per_pte-1), if (Pmd_none (*PMD)) {pte_t *page = Get_pte_fast (), if (!page) return Get_pte_slow (PMD, add ress);p Md_set (pmd,page); return page + address;} if (Pmd_bad (*PMD)) {__bad_pte (PMD); return NULL;} Return (pte_t *) __pmd_page (*PMD) + address;} Assign a physical page to the address corresponding to the page static inline int handle_pte_fault (struct mm_struct *mm,struct vm_area_struct * VMA, unsigned long address,int write_access, pte_t * Pte) {pte_t entry;entry = *pte;if (!pte_presen T (entry)) {... if (Pte_none (entry)) return Do_no_page (mm, VMA, address, write_access, PTE);//pages, assigning physical page ...} ... return 1;} static int Do_no_page (struct mm_struct * mm, struct vm_area_struct * vma,unsigned long address, int write_access, pte_t *p age_table) {struct page * new_page;pte_t entry;//Anonymous (for virtual storage space) physical mapping if (!vma->vm_ops | |!vma->vm_ops->nopage)  return Do_anonymous_page (mm, VMA, page_table, write_access, address); This is the file of the pages of the processing, not table ...} With the page pointer, you can calculate the physical address of the page: Physical address = (page pointer-mem_map) * Pages size + physical memory start address/* Anonymous mapping for virtual storage to physical memory */static int Do_anonymous_page (stru  CT mm_struct * mm, struct vm_area_struct * VMA, pte_t *page_table, int write_access, unsigned long addr) {struct page *page = null;pte_t Entry = Pte_wrprotect (Mk_pte (zero_page (addr), Vma->vm_page_prot)); if (write_access) {PAGE = Alloc_page ( Gfp_highuser); Allocates memory from high-end memory if (!page) Return-1;clear_user_highpage (page, addr); entry = Pte_mkwrite (Pte_mkdirty (Mk_pte (page, Vma->vm_page_prot)); Mm->rss++;flush_page_to_ram ( page);} Set_pte (page_table, entry); *page_table = entry;/* No need to invalidate-it is non-present before */update_mmu_cache (VMA, addr, entry); return 1; /* Minor fault */} #define __MEMORY_STARTCONFIG_MEMORY_START//physical memory used to dynamically allocate the starting address for use void Flush_page_to_ram (struct page *pg {unsigned long phys;/* Physical address of this page */phys = (pg-mem_map) *page_size + __memory_start;__flush_page_to_r AM (Phys_to_virt (phys));}  #define __virt_to_phys (Vpage) ((vpage)-page_offset + phys_offset) #define __PHYS_TO_VIRT (Ppage) ((ppage) + Page_offset- Phys_offset)



Linux malloc analysis-from user space to kernel space

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.