Logical address, physical address, linear address

Source: Internet
Author: User

Logical addresses (Logical address)
Refers to the part of the offset address that is generated by the program that is related to the segment.
For example, you can read the pointer variable itself (& operation) in the C language pointer programming, which is actually the logical address, which is relative to the address of the data segment of your current process and is not coherent with the absolute physical address. Only in Intel Real mode will the logical address be equal to the physical address (because there is no fragmentation or paging mechanism for real mode, the CPU does not perform automatic address translation);
In protected mode, the program executes an offset address within the length of the code segment (assuming that the code snippet, data segment is exactly the same).
The application staff only has to deal with logical addresses, and the staging and paging mechanisms are completely transparent to you, and are only for system programmers. While the application programmer can manipulate the memory directly, it will only operate on the memory segments assigned to you by the operating system.



If it is a programmer, then the logical address should be easy for you to understand. When we write C code, we often say that we define the offset of the first address of the struct, the entry offset of the function, the first address of the array, and so on. When we are focusing on these concepts, it is actually relative to your program. Not for the entire operating system. That is, the logical address is relative to the specific program you are compiling (or the process, which is actually executed as a process at run time). The entry address of your compiled program can be considered as the first address, and the logical address we can usually think of is in this program, the compiler assigns us the offset relative to this first address, or the first address as the starting point of a relative address value.

When we double-click an executable program, it gives the operating system the entry address that the program runs on. The shell then passes the address of the executable file to the kernel. After entering the kernel, a new process is forked, and the new process allocates the corresponding memory area first. Here comes a famous concept called copy.
On
Write, which is the copy technology at the time of writing. This is not explained in detail, in short, after the new process has been fork out, the new process will get the entire PCB structure, and then call the EXEC function to go to the disk to load the code into the memory area. At this point, the PCB of the process is added to the queue of the executable process, and when the CPU is dispatched to this process, it is actually executed.


We can interpret the entry address of the program as the starting address of the logical address, that is, the start address of a program. And the related data of the program used later or the location of the code relative to the starting address (which is arranged by the compiler beforehand), constitutes what we call the logical address. A logical address is relative to a specific program (in fact a process, that is, the relative address of the program when it is actually run). While we understand that there may be some nuances of detail, it is much better and more practical than some vague, unintelligible descriptions on the web, and when you have a deeper understanding of the address, make some additions or corrections to the above understanding.

In a word, the logical address is relative to the application.



Historical background of logical address generation:

seeked, Intel's 8-bit machine 8080CPU, data bus (DB) is 8 bits, address bus (AB) is 16 bits. Then this 16-bit address information is also to be transmitted through a 8-bit data bus, but also to the scratchpad in the data channel, and in the CPU registers and memory, but because AB is exactly
The integer multiples of the db, so no contradiction will arise!


However, when the ascent to 16-bit machine, INTEL8086/8088CPU's design due to the current year IC integration technology and external packaging and PIN technology limitations, can not exceed 40 pins. But also felt that 8-bit machine original address addressing ability 2^16=64kb too little, but directly increased to 16 of the integer times even if ab=32 bit is not up to. So we can only temporarily increase the AB 4 to become 20 article. The
2^20=1MB has increased its addressing capacity by 16 times times. This, however, creates a contradiction between the 20-bit and DB 16-bit of AB, where the 20-bit address information cannot be transmitted on the DB or stored in 16-bit CPU registers and memory units. So the emergence of a CPU segment structure of the principle.




Linear addresses (Linear address)
Is the middle tier between the logical address and the physical address transformation. The program code generates a logical address, or an offset address in a segment, and a linear address is generated with the base address of the corresponding segment.
If the paging mechanism is enabled, the linear address can then be transformed to produce a physical address. If the paging mechanism is not enabled, then the linear address is directly the physical address. Intel
80386 of the linear address space capacity is 4G (2 of the 32-time-32 address bus addressing).

Linear Address:

we know that each computer has a CPU (we do it from a single CPU.) Multi-CPU should be the same, and eventually all the operation of instructions or data, and so on the operation of this CPU, and the CPU-related register is the storage memory device to hold some relevant information. Therefore, from the perspective of the CPU, we can easily divide the computer related devices or components into two categories: first, data or instructions to store memory devices (such as registers, memory, etc.), a data or instruction path (such as address lines, data lines, etc.). The essence of a linear address is "the address that the CPU sees". If we traced it, we would find that the linear address was the result of the development of Intel's X86 architecture. When the 32-bit CPU appears, it has an addressable range of 4GB, which is quite a huge number relative to the memory size, and we don't usually use that much memory. So this time the CPU can see the 4GB space and the actual capacity of the memory to create a gap. The linear address is used to describe the 4GB space that is visible to the CPU. We know that in a multi-process operating system, each process has a separate address space and has a separate resource. But for a particular moment, only one process runs on top of the CPU. At this point, the CPU sees the 4GB space occupied by this process, which is the linear address. What the CPU does is also for this linear space. It's called linear space, presumably because people think it's easier to understand how a continuous space is lined up. is actually the addressable range of the CPU.


For Linux, the CPU divides 4GB into two parts, 0-3GB for user space (also known as outer space), and 3-4GB as kernel space (also known as nuclear space). Operating system-related code, the kernel part of code data, is mapped to kernel space, and user processes are mapped to user space. As for how the system translates the linear address into actual physical memory, that is another topic. You can find articles on the Internet everywhere, I am not here to wordy. For X86, there is no outside-section management and page-style management.

Physical addresses (physical address)
is the address signal addressing physical memory that appears on the external address bus of the CPU, which is the final result address of the address transformation.
If the paging mechanism is enabled, the linear address is transformed into a physical address using the items in the page directory and the page table. If the paging mechanism is not enabled, then the linear address becomes the physical address directly.

Virtual Memory (Vsan)
Refers to the amount of memory that the computer presents that is much larger than the actual memory.
so it allows programmers to compile and run programs that are much larger in memory than the actual system. This enables many large projects to be implemented on systems with limited memory resources. A very proper analogy is that you don't need a long track to get a train from Shanghai to Beijing. You only need long enough rails (say 3 km) to complete this task. The way to do this is to put the rear rails immediately in front of the train, as long as your operation is fast enough to meet the requirements, the train will be able to run like a complete track. This is the task that virtual memory management needs to accomplish. In Linux
0.11 cores, each program (process) is divided into a total capacity of 64MB of virtual memory space. So the program's logical address range is 0x0000000 to 0x4000000.

Sometimes we also refer to logical addresses as virtual addresses. Because similar to the concept of virtual memory space, the logical address is independent of the actual physical memory capacity.

The "gap" between the logical address and the physical address is 0xc0000000, due to the exact difference between the virtual address, the linear address, and the physical address mapping. This value is specified by the operating system.



The conversion method of virtual address to physical address is architecture-related. Generally, there are two ways of segmenting and paging. Take the current x86 CPU as an example, the segmented paging is supported. Memory
Mangement
The unit is responsible for conversions from virtual addresses to physical addresses. The logical address is the form of the segment identifier + offset within the segment, and the MMU can convert the logical address into a linear address by querying the segment table . If the CPU does not turn on paging, then the linear address is the physical address, and if the CPU turns on paging, the MMU also needs to query the page table to translate the linear address into a physical address:
Logical Address
----(section table)---> Linear address-(page table)
Physical Address
Different logical addresses can be mapped to the same linear address, and different linear addresses can be mapped to the same physical address; so it's a many-to-one relationship. In addition, the same linear address may be reloaded to another physical address after a page change occurs. So this many-to-one mapping also changes over time.

Split Line

One, logical address to linear address

The memory address that appears in the machine language instruction is a logical address that needs to be converted to a linear address and then converted to a physical address by the MMU (Memory management unit in the CPU) to be accessible.

We write the simplest Hello World program, compile with Gccs, and then decompile to see the following command:

mov 0x80495b0,%eax

The memory address here, 0X80495B0, is a logical address that must be added to the base address of the hidden DS data segment to form a linear address. This means that 0x80495b0 is an offset within the DS data segment of the current task.

In x86 protected mode, the segment information (subgrade linear address, length, permissions, etc.) is a segment descriptor of 8 bytes, the segment information cannot be stored directly in the segment register (the segment registers only 2 bytes). Intel's design is that the segment descriptor is centrally stored in the GDT or LDT, while the segment register holds the index value (index) of the segment descriptor within the GDT or Ldt .

Linux the logical address in is equal to the linear address . Why do you say that? Because all Linux segments (user code snippet, user data segment, kernel code snippet, kernel data segment) linear address all start from 0x00000000, length 4G, so linear address = logical address + 0x00000000, that is, logical address equals linear address.

In this case, Linux only uses the GDT, whether it is user tasks or kernel tasks, no use of the LDT. The 12th and 13 segment descriptors for GDT are __kernel_cs and __kernel_ds, and the 14th and 15 segment descriptors are __user_cs and __user_ds. Kernel tasks use __kernel_cs and __kernel_ds, and all user Tasks share __user_cs and __user_ds, meaning that you do not need to assign segment descriptors to each task separately. The kernel segment descriptor and user segment descriptor, although the starting linear address and length are the same, but DPL (descriptor privilege level) is not the same. The DPL value for __kernel_cs and __kernel_ds is 0 (highest privilege), and the DPL value for __user_cs and __user_ds is 3.

When debugging a program with GDB, use info reg to display the value of the current register:

CS 0x73 115

SS 0x7b 123

DS 0x7b 123

ES 0x7b 123

You can see that the DS value is 0x7b, and the conversion to binary is 00000000 01111011,ti field value is 0, which means using a GDT,GDT index value of 01111, or decimal 15, corresponding to the __user_data user Data segment descriptor within the GDT.

As can be seen from the above, Linux runs on x86 's segmented mechanism, but it is a clever way to bypass fragmentation.

Linux primarily implements memory management in a paging manner.

Second, the linear address to the physical address

The previous said that the logical address in Linux is equal to the linear address, then the linear address how to correspond to the physical address? As we all know, it is through the paging mechanism, specifically, is through the page table lookup to correspond to the physical address.

The exact paging is a mechanism provided by the CPU, and Linux is using it to implement memory management only according to the rules of this mechanism.

In protected mode, the highest bit PG bit of the control register CR0 controls whether the paging management mechanism is in effect, and if pg=1, the paging mechanism takes effect, a page table lookup is required to convert the linear address to the physical address. If pg=0, the paging mechanism is not valid, and the linear address is directly the physical address.

The rationale for paging is to divide the memory into units of fixed size, each of which is called a page, and each page contains a 4k byte address space (for simplified analysis, we do not consider the case of extended paging). The starting address for each page is then 4k-byte aligned. To be able to convert to a physical address, we need to give the CPU a lookup table of the linear address to the physical address of the current task , the page table. Note that in order to achieve a flat virtual memory for each task, each task has its own page Catalog table and page table .

In order to conserve the memory space occupied by the page table, x86 translates the linear address into a physical address through a page catalog table and a page table two-level lookup.

A 32-bit linear address is divided into 3 parts:

The maximum 10-bit Directory page Catalog table offset, the middle 10-bit table is the page table offset, and the lowest 12-bit offset is the byte offset within the physical page.

The size of the page catalog table is 4k (exactly the size of a page), contains 1024 items, each item is 4 bytes (32 bits), and the content stored in the project is the Physical address of the page table . If the page table in the page catalog table is not already assigned, the physical address is filled with 0.

The size of the page table is also 4k, which also contains 1024 items, 4 bytes per item, and the content is the physical memory start address of the final physical page.

For each active task, you must first assign it a page catalog table and deposit the physical address of the page catalog table in CR3 registers. The page table can be allocated in advance, or it can be allocated at the time of use .

Or the Mov 0x80495b0, the address in the%EAX as an example to analyze the linear address to the physical address of the process.

When we say that the logical address in Linux is equal to the linear address, then the linear address we want to convert is 0x80495b0. The process of conversion is done automatically by the CPU, and what Linux has to do is prepare to convert the required page catalog tables and page tables (assuming that the process of allocating physical memory to the page Catalog table and page table is complex, and then analyzed later).

The kernel first fills the physical address of the page directory table of the current task into the CR3 register.

Linear address 0x80495b0 converted to binary is 0000 1000 0000 0100 1001 0101 1011 0000, up to 10 bits 0000 1000 00 Decimal is the 32nd item in the 32,CPU View page Catalog table, which contains the physical address of the page table. Linear address median 10 bit 00 0100 1001 Decimal is 73, the 73rd item of the page table stores the physical starting address of the final physical page. The physical page base address plus the lowest 12-bit offset in the linear addresses, the CPU finds the final corresponding physical internal deposit element of the linear address.

We know that the linear address of the user process in Linux can be addressed in the range of 0-3g, then it is necessary to first put this 3g virtual memory page table is established? In general, physical memory is much smaller than 3G, plus there are many processes running at the same time, it is impossible to create a 3G Linear Address page table for each process in advance. Linux uses a mechanism of CPU to solve this problem. After the process is created, we can fill in the table key values of the page catalog table 0,CPU when the table entry is found, if the contents of the tables are 0, a page fault is thrown, the process pauses execution, and the Linux kernel can allocate a physical page through a series of complex algorithms and populate the table with the address of the physical page. The process resumes execution. Of course the process is blinded in the process, and its own sense of access to physical memory is normal.

Logical address, physical address, linear address

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.