Memory under Linux

Source: Internet
Author: User

The MMU consists of one or a group of chips. The function is to map the logical address to a physical address for address translation (the MMU is part of the CPU)

Machine instructions still specify the address of an operand or an address of an instruction with a logical address

Each logical address consists of a segment selector (16 bits) and a relative offset (32 bits) within the segment.

The only purpose of the segment register is to hold the segment selector.

The MMU consists of two parts: a segmented part and a pagination part, which translates the logical address into a linear address, and the paging mechanism translates the linear address into a physical address.

The read or write on the RAM chip must run serially, so a hardware circuit of a memory arbiter is inserted between the bus and each RAM chip.

In the whole system, the global Descriptive descriptor (GDT) has only one, which can be stored in memory regardless of location. , but the CPU must know the entrance of the GDT. GDT not only stores the paragraph descriptive narrative, but also other descriptive descriptors, are 64bit long, it is globally visible. Yes, no matter what a task it is.

The first item of the GDT is always set to 0, which ensures that the logical address of the empty segment selector is considered invalid and therefore causes a processor exception.?????

A segment can assign a different linear address space to each process. Paging enables the mapping of the same linear address space to different physical address spaces.

All processes executing in the user state use a pair of same fields to address instructions and data, and these two segments are called user code segments and user data segments.

All the paragraphs begin with 0x00000000. It is important to conclude that the Linux logical address is consistent with the linear address, that is, the offset quantum segment of the logical address is always the same as the value of the corresponding linear address.

When a pointer to an instruction or data structure is saved, the kernel does not need to set the segment selector for its logical address at all, since the CS register contains the current segment selector.

Each processor has a task status segment (TSS). The corresponding linear address space is a small subset of the linear address space corresponding to the kernel data segment.

Only a few items in the GDT may depend on the process in which the CPU is running (the LDT and TLS segment descriptive descriptors).

A key task of the paging unit is that the requested type of access is compared to that of the linear address, assuming that the memory interview is invalid, resulting in a page fault.

Linear addresses are divided into groups of fixed-length units, called pages. The continuous linear address within the page is mapped to a contiguous physical address.

The paging unit divides all the ram into a fixed-length page box.

A data structure that maps a linear address to a physical address is called a page table.

The page table is stored in main memory. The page table must be initialized appropriately by the kernel before starting the paging unit.

Secondary mode reduces memory usage by simply requesting page tables for those virtual memory regions that the process actually uses.

Extended paging is used to translate large segments of contiguous linear addresses into corresponding physical addresses. In these cases, the kernel can save memory and retain TLB entries without using an intermediate page table for address translation.

Unlike the 3 access rights of a segment (read and write runs), there are only two types of access to the page (read and write).

Because of the need for the user process's linear address space, the kernel cannot direct more than 1GB of RAM to be addressed.

Only the kernel can change the page table of the process. Therefore, a process executed under the user state cannot use a physical address.

The number of levels used depends on the CPU type.

Hardware fast caching is based on the famous local principle. This principle applies both to the program structure and to the data structure.

The fast cache cell is inserted between the paging unit and the main memory.

It includes a hardware fast cache memory and a fast cache controller. The fast cache memory holds the real rows in memory. The fast cache controller holds an array of items.

The execution speed of the system is usually limited by the CPU getting instructions and data rate from memory.

When a fast cache is hit, the fast cache controller does different things, depending on the type of access.

Cache Access: 1 writeback operation (write back) 2 write-through operation (write through).

Write-back only updates the fast cache lines, without altering the contents of the RAM, providing a quicker effect.

There is only when the CPU runs a directive that requires refreshing the fast cache table entry, or when a flush hardware signal is generated (usually after a fast cache miss). The fast controller only writes the fast cache line back to the channel RAM.

Each processor in a multiprocessor system has a separate hardware fast cache. Therefore, additional hardware circuitry is required to keep the cache content synchronized.

The fast cache of TLB (translation lookaside buffer) is used to accelerate the conversion of linear addresses. When the linear address is used for the first time, the corresponding physical address is calculated from the page table in the slow access RAM. At the same time, the physical address is stored in a TLB table entry so that subsequent references to the same linear address can be quickly converted.

In a multiprocessor system. Each CPU has its own TLB.

The process processing of Linux relies heavily on paging.

During the initialization phase, the kernel must establish a physical address mapping to specify which physical address ranges are available to the kernel and which are unavailable (either because they map the shared memory of hardware device I/O, or because the corresponding page box contains BIOS data).

The value of macro Page_offset is the range size that distinguishes between user space and kernel space, and is also the offset of the process in the linear address. Page_shift refers to the page size.

****

The kernel maintains a set of page tables for its own use. resides in the so-called Master Kernel Global folder, after the system is initialized. This set of page tables has never been directly used by any kernel thread, regardless of the process.

After the kernel has just been loaded into memory, the CPU still executes in real mode. So the paging feature is not enabled.

The global folder for the temporary page is placed in the Swapper_pg_dir variable.

The last mapping provided by the kernel page table must convert a linear address from page_offset to a physical address starting from 0.

The main kernel Global folder is still stored in the Swapper_pg_dir variable. Initialized by the Paging_init () function.

Kernel Linear address Fourth GB initial part mapping system physical memory at least 128MB of linear addresses are always reserved for him, since the kernel uses these linear addresses to implement non-contiguous memory allocations and fixed mappings for linear addresses.

The indirect consumption of a pointer variable is more than an indirect reference to an immediate constant address to a memory interview.


Some parts of RAM are permanently assigned to the kernel. and used to store kernel code and static kernel data structures.

The rest of RAM is called dynamic memory, which is not only a valuable resource for the process. is also a valuable resource for the kernel itself.

The kernel must record the current state of each page box.

Under the circumstances, the page box is not spare. Data that includes the user-state process, the data that a software caches quickly, and the dynamically allocated kernel data structure. Device driver buffers The data, kernel module code, and so on.

The state information for the page box is saved in a page descriptor with Type page, and all of the page descriptive descriptors are stored in the Mem_map array.

In a NUMA model, the access time of a given CPU to different memory units may vary, and the system's physical memory is divided into several nodes. Within a single node, the time required to randomly give the CPU access to the page is the same. Each node has a descriptive descriptor of type gd_data_t.

Each page descriptive descriptor has links to memory nodes and to node management areas (including corresponding boxes).

When the kernel calls a memory allocation function, it must indicate the admin area where the request page box resides.

Atomic requests (gfp_atomic) are never blocked. Assuming there is not enough space, the allocation fails.

The amount of reserved memory in kilobytes (KB) is stored in the min_free_kbytes variable. Its initial value is set when the kernel is initialized.

The parameter gfp_mask is a set of flags that indicate how to find the spare page box.

With the direct mapping of physical memory ends. The corresponding linear address at the beginning of the high-end memory is stored in the high_memory variable.

The page allocator that returns the linear address of the assigned page box does not apply to high-end memory, that is, the page box in the Zpone_highmem memory management area.

High-end memory page box allocations can only be alloc_page () through the Alloc_pages () function and its shortcut function.

These functions do not return a linear address of the first assigned page box. Because it is assumed that the page box belongs to high-end memory. Then this linear address does not exist at all. Instead. These functions return the linear address of the page descriptive descriptor of the first assigned page box.

All page descriptive descriptors once allocated in low-end memory, they do not change during the kernel initialization phase.

The kernel can use three different mechanisms to map page frames to high-end memory, called permanent kernel mappings, transient kernel mappings, and non-contiguous memory allocations.

Establishing a permanent kernel mapping can clog the current process, which occurs when the spare page table entry does not exist. Therefore, permanent kernel mappings cannot be used for interrupt handlers and for deferred functions.

Establishing a temporary kernel map will never require blocking the current process, but the disadvantage is that only very few temporary kernel mappings can be built at the same time.

The permanent kernel mapping uses the I special page table in the Kernel page table, whose address is stored in the pkmap_page_table variable, the number of table entries in the page table is generated by the LAST_PKMAP macro, and the linear address of the table map starts from pkmap_base .

To record the connection between the memory page frame and the linear address included in the permanent kernel mapping, the kernel uses the Page_address_htable hash list.

The page_address () function returns the corresponding linear address of the page box, assuming that the page box is in high-end memory and not mapped. Then NULL is returned.

The Kmap () function establishes a permanent kernel mapping, assuming that the page box does belong to high-end memory, then call Kmap_high ();

Any page box in high-end memory can be mapped to the kernel address space through a "form". The number of forms left for temporary kernel mappings is very small.

The kernel must ensure that the same form is never used by two different control paths at the same time.

Therefore, the symbol in the KM_TYPE structure can only be used by a kernel component. and named after the ingredient. The last symbol, KM_TYPE_NR, does not itself represent a linear address, but is used by each CPU to produce a different number of available forms.

In order to establish a temporary kernel mapping, the kernel calls the Kmap_atomic () function.

Linux uses a well-known partner system algorithm to solve the problem of external fragmentation, the entire spare page box is grouped into 11 linked lists, each of the block list includes size 1,2,4,8,16,32,64,128,256. 512,1024 a contiguous page box. The maximum request for 1024 page boxes corresponds to a 4MB size contiguous block of RAM.

The LRU chain list is collectively known as: the list of active links, inactive linked lists, the list is the process user State address space or page cache all pages. The former is a page that has recently been interviewed. The latter is a page that has not been interviewed for some time.

The primary data structure used by each partner system: the Mem_map array. The Free_area array, the K-element of the element's free_list array, identifies a spare block with a full size of 2^k, which includes a page descriptive descriptor for the start Page box of each spare page box block, and a pointer to the adjacent element in the linked list is stored in the LRU field of the page description descriptor. Finally, the first page of a 2^K spare page describes the private field of the descriptor that holds the order of the block, which is the number K.

__rmqueue (); the function is used to find a spare block in the admin area.

The function requires two parameters, and the admin area describes the address and order of the descriptor.

__rmqueue () function if the caller has disabled the local interrupt and obtained a zone->lock spin lock that protects the partner system data structure.

The __free_pcppages_bulk () function releases the page box according to the policy of the partner system.

To improve system performance, each memory management area defines a fast cache per CPU page box. All "per CPU" fast caches include pre-allocated page boxes that are used to satisfy a single memory request made by the local CPU.

The main data structure that implements the fast cache per CPU page box is a per_cpu_pageset array data structures stored in the (zone) memory management Area descriptive descriptor Pageset field.

The Buffered_rmqueue () function allocates a page box in a specified memory management area, which uses a per-CPU page box cache to handle a single page box request.

In order to free up a single page box to the per-CPU page box, fast cache. The kernel uses the Free_hot_page () and Free_cold_page () functions.

The admin area allocator is the front end of the kernel page-box allocator, which must be assigned a memory region that includes enough spare page boxes.

The companion system algorithm uses a page box as the base memory area, which is suitable for large chunks of memory.

Kernel functions tend to repeatedly request the same type of memory area.

Slab Splitter Bar object groups are placed into the fast cache, and each fast cache is a "reserve" of homogeneous linear objects.

The main memory area, including the fast cache, is divided into multiple slab, each of which consists of more than one contiguous page box that includes both the allocated object and the spare object.

Each fast cache is described by a kmem_cache type of data structure.

The fast cache is divided into two types. Ordinary and dedicated, the normal fast cache is used only by the slab allocator for its own purposes (Kmem_cache). The dedicated fast cache is used by the rest of the kernel.

Call Kmem_cache_init () and Kmem_sizes_init () during system initialization to establish a normal fast cache.

The dedicated fast cache is created by the kmem_cache_create () function.

To avoid wasting memory space, the kernel must revoke its full slab before undoing the fast cache itself. The Kmem_cache_shrink () function revokes all slab in the fast cache by repeating the call to Slab_destroy ().

All common and dedicated fast cache names can be obtained by reading the/proc/slabinfo file during execution.

When the slab allocator creates a new slab. It relies on the page box allocator to obtain a contiguous set of spare page boxes that, for this purpose, calls the Kmem_getpages () function.

In the opposite operation, you can release the page box assigned to slab by calling the Kmem_freepages () function.

A newly created fast cache does not include any slab, so there are no spare objects.

Only by when the conditions are true (1. A request has been made for an Allocation object 2. The fast cache does not include any spare objects) to assign slab to the fast cache.

The slab allocator assigns a new slab to the fast cache by calling the Cache_grow () function, and this function calls Kmem_getpages () obtains a set of page boxes from the partitioned page table allocator to hold a separate slab, and then calls ALLOC_SLABMGMT ( ) to obtain a new slab descriptive descriptor.

Only the function of the partner system will use the LRU field when the page box is spare. And only to involve the partner system. The slab allocator function handles a page box that is not spare and resets the PG_SLAB flag.

Each object has a descriptive descriptor similar to kmem_bufctl_t. Object descriptive descriptors are stored in an array. is located after the corresponding slab descriptive descriptor.

An object descriptive descriptor is simply an unsigned integer that makes sense only when the object is spare.

It includes the subscript of the next spare object in the slab. Therefore, a simple linked list of slab internal spare objects is implemented.

The objects managed by the slab allocator can be aligned in memory, that is, the starting physical address of the memory unit that holds them is a multiple of a given constant, typically a multiple of 2. The constant is also called the justification factor.

The maximum alignment factor agreed by the slab allocator is 4096. That is, the page box size.

The same hardware cache line can map very many different blocks in RAM.

Objects of the same size tend to be stored in the same offset within the fast cache.

So. Objects with the same offsets within different slab are finally likely to be mapped in the same fast cache line.

The array field of the fast cache descriptive descriptor is a set of pointers to the Array_cache data structure, with each CPU in the system corresponding to an element. Each Array_cache data structure is a descriptive descriptor of the local fast cache of the spare object.

Local Fast Cache Descriptive descriptors do not include the address of the local fast cache itself, and the local fast cache holds pointers to objects that are disposed of, rather than the object itself.

When a new slab fast cache is created. The Kmem_cache_create () function determines the size of the local fast cache and allocates a local fast cache. and store their pointers in the fast Cache array field.

You can obtain a new object by calling the Kmem_cache_alloc () function.

Assuming that requests for storage are not frequently visited, it is handled with a common set of fast caches, and the objects in the normal fast cache have a geometric distribution size.

Call the Kmalloc () function to get an object of this type.

A reserved memory pool can only be used to satisfy an atomic memory allocation request made by an interrupt handler or memory critical section. The memory pool is a device of dynamic memory and can only be used by specific kernel components (the pool owner).

A pool of memory is often superimposed on the slab allocator, and the memory pool can be used to allocate whatever type of dynamic memory, from the entire page box to the small memory area allocated with Kmalloc ().

When the memory element is a slab object, the Alloc and free objects are generally implemented by the Mempool_alloc_slab () and the Mempool_free_slab () functions. In such a case. The Pool_data field of the Mempool_t object holds the address of the slab fast cache descriptive descriptor.

The Mempool_create () function creates a new memory pool that accepts the number of memory elements MIN_NR, implements the function address of the alloc and free methods, and assigns arbitrary values to the Pol_data field.

To allocate an element from the memory pool, the kernel calls the Mempool_alloc () function to pass the address and memory allocation flags of the mmepool_t object to it.

Mapping memory to a contiguous set of page boxes is the best choice, which takes advantage of fast caching and a lower average time to access.

Inserts a security area of size vmalloc_offset between the end of the physical memory map and the first memory area. The goal is to capture cross-border access to memory.


Each discontinuous memory area corresponds to a descriptive descriptor of type vm_struct.

The Get_vm_area () function looks for a spare area between the linear address Vmalloc_start and Vmalloc_end, which uses two of the parameters. The byte size of the memory area being created and the specified spare area type.

Map_vm_area () does not touch the page table of the current process. So. A missing pages occurs when a kernel-state process visits a non-contiguous memory region. The table entry for the corresponding Process page table for this memory area is empty.

However, the page fault handler checks to see if the address is in the main kernel pages table (INIT_MM.PGD page Global and its Child pages table). Once the handler discovers a main kernel, the page table includes a non-empty entry with this linear address. The value is copied to the corresponding Process page table entry. and resume the normal operation of the process.

The kernel will never retract the parent folder, the Page Intermediate folder, and the page table rooted in the Global folder of the kernel page.


DMA ignores paged cells and directly visits address bus, so the requested buffer must be in a contiguous page box.

Frequent changes to the page table will inevitably result in an average number of access memory additions, as this causes the CPU to frequently flush the contents of the transform backup buffer tlb.


__get_free_page () or _alloc_page () Gets the page box from the partitioned page box allocator, Kmem_cache_alloc () or Kmem_alloc () uses the slab allocator to allocate blocks for private or generic objects, while VMALLC () or vmalloc_32 () to obtain a contiguous area of memory. Assuming that the requested memory area is satisfied, these functions return a page descriptor address or a linear address (that is, the starting address of the allocated dynamic memory).


Memory under Linux

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.