Linux Memory Management Learning notes-memory addressing

Last Update:2014-07-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently began to want to learn a little bit more about the Linux kernel, the main reference is "deep understanding of the Linux kernel" and "deep understanding of the Linux kernel architecture" and the source code, experience is limited, can only analyze the limited content, read this after more in-depth study.

1, memory address

Logical address: Contains the address used in machine language to specify an operand or an instruction.

Linear address: A 32-bit unsigned number for direct mapping of physical addresses

Physical Address: Address of the on-chip pin addressing level

2, logical address, linear address

2.1 Segment Selector and segment register

Logical Address: Segment Selector (16-bit) + Intra-segment offset (32-bit)

Index: The position of the descriptor in the GDT or LDT segment

TI: Segment Descriptor in GDT (ti=0), segment descriptor in LDT (Ti=1)

RPL: Requester privilege level, which indicates the current privilege level of the CPU when the segment selector is loaded into the CS register

To conveniently locate the segment selector, the processor provides six segment registers to hold the segment selector.

CS SS ds es FS GS

Where CS: Code segment Register, point to segment containing program directives

SS: Stack segment register, pointing to the segment containing the current program stack

DS: Data segment Register, point to include static number or global data segment

The RPL field of the CS register represents the current privilege level (CPL) of the CPU, the kernel state 0, the user State 3

2.2 Segment Descriptor

Each segment is represented by a 8-byte segment descriptor that describes the characteristics of the segment.

The Global Descriptor Descriptor (GDT) and the local Descriptor Descriptor (LDT) store segment descriptors.

Typically only one GDT is defined, and each process, in addition to the segments in the GDT, can have its own LDT if additional segments need to be created.

The location and size of the GDT in main memory is stored in the GDTR control register, and the LDT address and size currently being used are placed in the LDTR register. 、

In order to accelerate the main memory and CPU data exchange, the introduction of cache, in the middle of main memory and CPU, each segment selector into the segment selector, the corresponding segment descriptor is placed in this register, speed up the data exchange

Base: Subgrade Address

G: Granularity flag: Clear 0, Segment size in bytes, otherwise in 4096 bytes.

Limit: Segment Length

S: System flags, 0: System segments, storing critical data structures such as LDT

DPL: Descriptor privilege level, used to access the privilege level required by this segment.

2.3 conversion of logical addresses

Implementation in the 3,linux

3.1Linux the segment in

Linux uses fragmentation in a very limited way.

Segment descriptors for four major Linux segments

The corresponding segment selection Fu Yuhong __user_cs,__user_ds,__kernel_cs,__kernel_ds are defined separately.

In order to address the kernel snippet, simply load the value generated by the __KERNEL_CS macro into the CS register.

All segments have subgrade addresses of 0x00000000, and the address space is from 0~232-1.

But the logical address in Linux is consistent with the linear address, and the offset field of the logical address always corresponds to the value of the corresponding linear address.

3.2Linux GDT

One CPU corresponds to a GDT, and all the GDT is stored in the cpu_gdt_table array.

Each GDT contains 18 segment descriptors and 14 empty, unused, or reserved items.

3 Local thread storage segment (TLS): Thread-Private data

4 user-State and kernel-state code snippets, data segments.

TSS segment: Task status segment, one per CPU. All the task status segments are stored in the INIT_TSS array.

G Mark Clear 0,limit is 0xeb, that is, the length is 236bytes. DPL is 0 and does not allow user-state access.

This segment is used to hold the contents of the CPU register when the process is switched and the process context switches.

LDT segment: General point to the segment containing the default LDT table. (most user-configured programs do not use the LDT, so define a default LDT for most processes to share)

The Modify_ldt () system call allows a process to create its own local descriptor descriptor (for example, a wine program), at which time the LDT section is modified accordingly.

Double fault TSS: Special TSS segment dealing with double-error anomalies??

3 segments related to advanced power management

5 segments related to the BIOS service that supports PNP functionality.

4, linear address, physical address

The physical address in the control register CR3 that holds the page directory that is in use.

Page catalog Items and page table entries have the same structure

Present flag: 1, the page or page table in main memory, 0, not in main memory.

When you access an address, the present flag for the page Catalog entry or page table entry is 0.

The paging unit stores the linear address in register CR2, resulting in a 14th exception: page fault.

Field:20 bit, the field in the page catalog entry points to a page box that contains a page table.

The field in the page table entry points to a page box that contains one page of data.

PCD/PWT flag: Hardware Cache related

Access flag: This flag is set whenever a paging unit addresses the corresponding page box.

Dirty flag: Applies only to page table entries, and is set whenever a page box is written.

Read/write: Access permissions for page or page tables

User/supervisor: The privilege level required to access a page or page table

Page Size: Applies only to page catalog items. Set to 1, the page catalog entry points to 2M or 4 m of memory. (Hugepage)

Global flag: Applies only to page table entries to prevent common pages (global pages) from being flushed out of the TLB. (This flag is valid when the CR4 Register PGE (page global enabled) flag is placed)

For 2M pages, pass kernel parameters at startup

hugepages=1024

For 1GB pages:

DEFAULT_HUGEPAGESZ=1G hugepagesz=1g hugepages=4

The hugepage size supported by the CPU can be learned by the CPU flag

If PSE exists, 2M hugepages is supported; If PDPE1GB exists, 1G Hugepages is supported.

For 2 MB pages, there was also the option of allocating hugepages after the system

Has booted.

echo 1024x768 >/sys/kernel/mm/hugepages/hugepages-2048kb/nr_hugepages

Mkdir/mnt/huge

Mount-t Hugetlbfs Nodev/mnt/huge

Whenever a file is created in the/mnt/huge/directory, it is mapped into memory using 2MB as the basic unit of paging. It is worth mentioning that the file in HUGETLBFS is not supported by read/write system calls (such as read () or write () , etc.), the general access to it is in the form of memory-mapped.

In practice, in order to use large pages, you also need to link the application with the library libhugetlb . The libhugetlb Library overloads the commonly used memory-related library functions such as malloc ()/free ( ) so that the application's data can be placed in a memory area with large pages to improve memory performance.

4.2 Accelerating linear address translation

To reduce the speed mismatch between the CPU and RAM, a hardware cache is introduced. Based on the principle of locality.

Cache line: The length of the data transmitted between the cache and the memory at one time.

PCD When you access the data in the page box, the cache feature is enabled or disabled.

PWT: When data is written to a page box, a write-through policy or a write-back policy is used.

Linux enables caching for all page boxes, with write-back policies.

TLB (Transition lookaside buffers): When a linear address is first searched, the CPU will view the page table in slow RAM and then store the result in the corresponding TLB of the CPU, but when referencing the linear address again, the contents of the TLB are called directly. To expedite addressing. Multi-core systems do not need to synchronize TLB for each CPU, because the same linear address may point to different physical spaces for different CPUs

Pagination in the 5,linux

In 32-bit systems, Upper dir is deleted, Middle dir is equivalent to the page directory in hardware paging, table corresponds to the page table, and global dir is equivalent to PDPT.

In fact, the paging mechanism has completed the following two design goals:

Assign different physical spaces to different processes to avoid memory errors.
Decoupled page boxes and pages. The former is physically actual storage space, in RAM, and the page is just a bunch of data. The paging mechanism allows a page to be stored in a page box, then taken out, and then placed inside another page box. This is the basis for the implementation of virtual memory in Linux.

Here's a look at how the specific implementation of the page table is handled in Linux (code flow), with each section analyzing one or two functions enough to understand the various mechanisms.

5.1 : Linear Address field

Page_shift: Specify the number of digits in the offset field

8 #define Page_shift 13

PMD_SHIFT Specifies the total number of digits for the offset and table fields, defined two times in the two-page table and the three-level page table, in the same way as Page_shift.

There are also PUD_SHIFT,PGDIR_SHIFT,PTRS_PER_PTE,PTRS_PER_PMD fields that describe the number of different catalog items and table entries.

5.2 : Page Table processing

Only 32-bit system condition is discussed, 64-bit condition is protected by pgprot_t

Pte_t,pmd_t,pud_t,pgd_t describes the format of page table entries, page intermediate catalog items, page ancestor directories, and page global catalog entries:

94 #ifdef CONFIG_64BIT_PHYS_ADDR

#ifdef CONFIG_CPU_MIPS32

typedef struct unsigned long pte_low, Pte_high; } pte_t;

#define PTE_VAL (x) ((x). Pte_low | ((unsignedlonglong) (x). Pte_high << 32))

98 #define __PTE (x) ({pte_t __pte = {(x), ((unsignedlong) (x)) >>); __pte ; })

#else

typedef struct unsigned Long Long Pte; } pte_t;

101 #define PTE_VAL (x) ((x). Pte)

102 #define __PTE (x) ((pte_t) {(x)})

103 #endif

104 #else

typedef struct unsigned Long Pte; } pte_t;

106 #define PTE_VAL (x) ((x). Pte)

107 #define __PTE (x) ((pte_t) {(x)})

108 #endif

typedef struct page *pgtable_t;

As you can see, if you do not specify 64-bit hardware, it is handled by default on a 32-bit hardware system, even if it is defined as a 64-bit physical address, if the CPU's MIPS 32 bits are also defined separately.

The kernel also provides macros that can read/set page flags, with Pte_user () and Pte_wrprotect () as an example:

Static inline return Pte_val (PTE) & __page_prot_user; }

method definitions for inline functions

Static inline pte_t pte_wrprotect (pte_t Pte)

121 {

122 Pte_val (PTE) &= ~ (_page_write | _page_silent_write);

return Pte;

124}

One of the _page_silent_write is to see whether it is a 32-bit system or a 64-bit system.

5.3 : Page Table Operations

This part of the content is related to the virtual address, and is not analyzed. Overall, it is to get the address of the different table items of each page table, here is just a look at one of the functions, Pgd_alloc (mm)

Pgd_alloc (MM) assigns a new page global catalog, and if PAE is activated, three corresponding page intermediate directories are also assigned. This function call mechanism is more complex, here is not the details of the description of the process and design of the data structure. In the subsequent memory management will be detailed analysis of the page table of all the important data structures and functions, first straighten out the overall idea.

Call starts at

Static inline pgd_t *pgd_alloc (struct mm_struct *mm)

16 {

return (pgd_t *) get_zeroed_page (Gfp_kernel);

18}

The incoming mm is ignored in the 80x86 system. The __get_free_pages () function is then called through the Get_zeroed_page () function, which is a more complex function that returns a 32-bit address, which is not used to represent a high-address page table, as mentioned earlier.

unsigned Long unsigned int order)

2751 {

struct page *page;

/*

* __get_free_pages () returns a 32-bit address, which cannot represent

* A Highmem page

*/

2758 vm_bug_on ((Gfp_mask & __gfp_highmem)! = 0);

2760 page = alloc_pages (gfp_mask, order);

if (!page)

return 0;

return (unsignedlong) page_address (page);

2764}

Then call the Alloc_pages () function to actually call the following procedure

struct unsigned order)

2112 {

struct mempolicy *pol = Get_task_policy (current);

struct page *page;

unsigned int cpuset_mems_cookie;

if (!pol | | in_interrupt () | | (GFP & __gfp_thisnode))

2118 pol = &default_policy;

2120 Retry_cpuset:

2121 Cpuset_mems_cookie = get_mems_allowed ();

/*

* No reference counting needed for Current->mempolicy

* NOR system Default_policy

*/

if (Pol->mode = = Mpol_interleave)

2128 page = Alloc_page_interleave (GFP, Order, Interleave_nodes (POL));

Else

2130 page = __alloc_pages_nodemask (GFP, order,

2131 policy_zonelist (GFP, Pol, numa_node_id ()),

2132 Policy_nodemask (GFP, Pol));

if (Unlikely (!put_mems_allowed (Cpuset_mems_cookie) &&!page))

Goto Retry_cpuset;

return page;

2138}

In this function, the first is the Get_task_policy () allocates memory node, if there is no special policy allocation, will use the default policy, and then allocate memory,

Static inline unsigned int get_mems_allowed (void)

97 {

return read_seqcount_begin (&CURRENT->MEMS_ALLOWED_SEQ);

99}

Then according to the parameters to determine whether the page is interleaved or alloc_pages_nodemask (), this is also the core of the page allocation.

Then return the successfully allocated page, which is the process. In the beginning, they were assigned a zero_page first.

Process Page Table:

Each process has its own page global catalog and page table set, and when a process switch occurs, the contents of the CR3 are saved in the task_struct of the previous execution process (TASK_STRUCT->MM->PGD)

Load the PGD address of the next process into the CR3 register.

The kernel provides a rich API to view the status and values of page tables

The linear address space of a process is divided into two parts

0-3G User state and kernel state are addressable

3G-4G only the kernel state can be addressed

The first part of the page global catalog for a process has a linear address that is less than 3G of the table entry map

The remaining table entries are the same for all processes, equal to the corresponding table entries in the Kernel page table.

Kernel Page Table:

The kernel maintains a set of page tables that are used by itself, residing in the main kernel page global catalog. Established when the system is initialized.

The initialization of a kernel page table is divided into two phases:

The first stage, the CPU is in real mode, the paging function is not open, the kernel creates a limited space, the kernel of the code snippet, data section, the initial page table and for storing dynamic data structure into RAM, the second phase, through the temporary page of the Global Directory initialization and temporary page table initialization to get the temporary kernel page table, It is stored in the Swapper_pg_dir. In the first stage, assuming that the kernel uses the segment and the corresponding data structure can accommodate the first 8MB of RAM, then the user mode of 0x00000000 to 0X007FFFFF and the kernel's 0xc0000000 to 0xc07fffff mapped to the first 8MB physical address of RAM, That is, 0x00000000 to 0X007FFFFF, and then the 0xc0000000 start of the linear address into a physical address starting from 0, complete the final kernel page table.

This is the approximate process, the specific code implementation needs to be refined.

6, Handling hardware cache and TLB

Hardware cache synchronization, which is done automatically by the processor.

The synchronization of the TLB is done by the kernel, because the mapping of the linear address to the physical address is valid and is determined by the kernel.

Functions running on one processor send inter-processor interrupts (???? ), to other CPUs, forcing them to perform appropriate functions to flush the TLB.

Process switching generally causes TLB table entries to be refreshed, except in the following cases:

Two normal processes that use the same page table set perform process switching (threads, mm_struct)

Process switching between normal and kernel threads (kernel threads directly use the mm_struct of the previous process)

To avoid a useless TLB flush on multiprocessor systems, the kernel uses a technique called the lazy tlb pattern.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux Memory Management Learning notes-memory addressing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linux Memory Management Learning notes-memory addressing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support