Linux memory management: kmalloc and _ get_free_page ()

Source: Internet
Author: User

Dynamically open the memory in the device driver, instead of using malloc, but kmalloc, or directly apply for a page using get_free_pages. Kfree or free_pages are used to release the memory.

Linux provides a complex Storage Management System for processors that provide MMU (Storage Manager, which assists the operating system in memory management and hardware support such as virtual/real address translation, the memory that the process can access reaches 4 GB.

The 4 GB memory space of a process is divided into two parts: user space and kernel space. The user space address distribution ranges from 0 to 3 GB (page_offset, in 0x86, It is equal to 0xc0000000), and 3 GB to 4 GB is the kernel space.

In the kernel space, the address range from 3G to vmalloc_start is the physical memory ing area (this area contains the kernel image, the physical page box table mem_map, and so on ), for example, if the memory of the VMWare virtual system we use is 160 MB, then 3G ~ The 3G + M memory should be mapped to the physical memory. After the physical memory ing area, it is the vmalloc area. For a M system, the vmalloc_start location should be around 3G + M (an 8 m gap exists between the physical memory ing zone and vmalloc_start to prevent the gap ), the location of vmalloc_end is close to 4 GB (the system will retain a kb area for dedicated page ing)

The memory requested by kmalloc and get_free_page is located in the physical memory ing area and is physically continuous. They have only a fixed offset with the real physical address, so there is a simple conversion relationship, virt_to_phys () can be used to convert the kernel virtual address to the physical address:

# DEFINE _ Pa (x) (unsigned long) (x)-page_offset)

Extern inline unsigned long comment _to_phys (volatile void * address)

{

Return _ Pa (Address );

}

The above conversion process is to subtract the virtual address 3G (page_offset = 0xc000000 ).

The corresponding function is phys_to_virt (), which converts the physical address of the kernel to a virtual address:

# DEFINE _ VA (x) (void *) (unsigned long) (x) + page_offset ))

Extern inline void * phys_to_virt (unsigned long address)

{

Return _ VA (Address );

}

Both pai_to_phys () and phys_to_virt () are defined in include/asm-i386/IO. h.

 

1. kmalloc () allocates consecutive physical addresses for small memory allocation.

2. _ get_free_page () allocates consecutive physical addresses for full-page allocation.

The following record explains why the above function allocates consecutive physical addresses and whether the returned physical addresses or virtual addresses.

The kmalloc () function is implemented based on slab. Slab is an efficient mechanism for allocating small memory. However, the slab allocation mechanism is not independent. It also divides more fine-grained memory for callers Based on the page distributor. That is to say, the system first allocates a consecutive physical address in the smallest unit of the page with the page distributor, and then kmalloc () splits the address based on the caller's needs.

For the above discussion, we can view the implementation of kmalloc (). The implementation of kmalloc () function is in _ do_kmalloc (), we can see in _ do_kmalloc () the Code finally calls _ cache_alloc () to allocate an slab. In fact, the implementation of functions such as kmem_cache_alloc () also calls this function to allocate a new slab. We keep track of the call path of the _ cache_alloc () function. We will find that the kmem_getpages () function is used in the cache_grow () function to allocate a physical page, kmem_getpages () the alloc_pages_node () called in the function eventually uses _ alloc_pages () to return a struct page structure, which is used by the system to describe the physical page. As mentioned above, slab is implemented on a physical page. Kmalloc () is assigned a physical address.

_ Get_free_page () is the underlying memory allocation function provided by the page distributor to the caller. It allocates continuous physical memory. The _ get_free_page () function is implemented based on buddy. In the physical memory management implemented by buddy, the minimum allocation granularity is page-based. For the above discussion, we can view the implementation of _ get_free_page (). We can see that the _ get_free_page () function is just a very simple encapsulation, its entire function implementation is to call the _ alloc_pages () function unconditionally to allocate the physical memory. The preceding Implementation of the kmalloc () record also mentioned that when calling _ alloc_pages () function to allocate physical pages for slab management. So how is this function allocated to the physical page in what region? To answer this question, you can only look at the implementation. We can see that in the _ alloc_pages () function, we tried to call the get_page_from_freelist () function multiple times to retrieve the zone from zonelist, return an available struct page from the page (some call branches here are different because of the logo ). At this point, we can know that the allocation of a physical page is returned from the Zone in the zonelist (structure array of a zone. So how is zonelist/zone associated with physical pages and initialized? Let's continue to look at the free_area_init_nodes () function. This function is indirectly called by the zone_sizes_init () function during system initialization. The zone_sizes_init () function fills in three areas: zone_dma, zone_normal, and. Call free_area_init_nodes () as a parameter. A pglist_data structure is allocated in this function, which contains the zonelist/zone structure and a struct page physical page structure, at the end of the function, the free_area_init_node () function is called as a parameter. In this function, the calculate_node_totalpages () function is used to mark the related regions of pglist_data, and alloc_node_mem_map () is called () the function initializes the struct page physical page in the pglist_data structure. Finally, use the free_area_init_core () function to associate pglist_data with zonelist. Now, through the above analysis, the process for allocating physical memory by the _ get_free_page () function has been clarified. But there are a few new questions, that is, how is the physical page allocated by the function mapped? Where is it mapped? Here you have to look at the Boot Code related to vmm.

Before looking at the vmm-related Boot Code, let's take a look at the functions of pai_to_phys () and phys_to_virt. As the name implies, it is the conversion from a virtual address to a physical address and from a physical address to a virtual address. Function implementation is very simple. The former calls _ Pa (Address) to convert a virtual address to a physical address, and the latter calls _ VA (addrress) to convert a physical address to a virtual address. Let's look at the two macros _ Pa _ va.

# DEFINE _ Pa (x) (unsigned long) (x)-page_offset)

# DEFINE _ VA (x) (void *) (unsigned long) (x) + page_offset ))

We can see that only the address is added or subtracted from page_offset, and page_offset is defined as 0xc0000000 in x86. This raises another question. Anyone who has written driver in Linux knows that kmalloc () and

_ Get_free_page () after the physical address is allocated, use javas_to_phys () to convert the physical address to the correct one. So why is this step necessary? Aren't we allocating physical addresses? How do I still need to switch after the allocation is completed? If a virtual address is returned, why can address translation be implemented only by page_offset Based on the above analysis of pai_to_phys? Do I need to check the page table for the conversion between virtual addresses and physical addresses? On behalf of the above questions, let's look at the vmm-related boot code.

Find vmm content directly from the start_kernel () kernel boot section. We can see that the first function to be concerned is setup_arch (). In this function, the paging_init () function is used to initialize and map the hardware page table (8 MB of memory is mapped before initialization, the record is not recorded here), while paging_init () is called pagetable_init () to map the physical address of the kernel and initialize the relevant memory. In the pagetable_init () function, the PAE/PSE/PGE-related judgment and settings are first used, and then the kernel_physical_mapping_init () function is used to map the physical memory of the kernel. In this function, we can clearly see that pgd_idx is mapped based on the starting address of page_offset. That is to say, all physical addresses are initialized cyclically starting from page_offset. Continue to observe. We can see that after the PMD is initialized, all address calculations are incremented by page_offset. It is obvious from the analysis that the physical address is mapped to the virtual address space starting with page_offset. In this way, all the above questions will be answered. The physical pages allocated by kmalloc () and _ get_free_page () are mapped to the virtual addresses starting with page_offset. That is to say, there is a one-to-one relationship between the actual physical addresses and virtual addresses,

It is precisely because of this ing relationship that the allocation of virtual addresses starting with page_offset in the kernel is also the allocation of physical addresses (of course, this has a certain scope, it should be between page_offset and vmalloc_start, and the latter allocates the starting address of the memory for the vmalloc () function ). This explains why the virt_to_phys () and phys_to_virt () functions can be converted between virtual addresses and physical addresses simply by adding/subtracting page_offset, it remains unchanged, so you do not need to query the page table for conversion. This also answers the initial question: kmalloc ()/_ get_free_page () allocates a physical address, and returns a virtual address. Because of this ing relationship, you need to subtract page_offset from their return addresses to obtain the real physical address.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.