Linux Virtual Storage

Last Update:2018-02-22 Source: Internet

Author: User

Tags byte sizes

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Virtual memory

Virtual memory is a perfect interaction between hardware exceptions, hardware address translation, main memory, disk files, and kernel software, which provides a large, consistent, and private address space for each process. With a very clear mechanism, virtual memory provides three important capabilities:

(1) It uses main memory as a cache of address space stored on disk, storing only the active area in main memory, and transmitting data back and forth between disk and main memory as needed, in this way, it is efficient in using main memory.

(2) It simplifies memory management by providing a consistent address space for each process .

(3) it protects the address space of each process from being destroyed by other processes .

Physical and virtual addressing

Physical addressing

The main memory of a computer system is organized into an array of cells consisting of m contiguous byte sizes. Each byte has a unique physical address (physical ADDRESS,PA). The address of the first byte is 0, the next byte has an address of 1, the next is 2, and so on. Given this simple structure, the most natural way for a CPU to access memory is to use physical addresses, which we call physical addressing.

Virtual addressing

When using virtual addressing, the CPU accesses the main memory by generating a virtual Address,va address, which is converted to the appropriate physical address before being sent to the memory. The task of translating a virtual address into a physical address is called address translation (translation). Like exception handling, address translation requires close collaboration between the CPU hardware and the operating system. The dedicated hardware called the memory Management Unit,mmu on the CPU chip uses the query table stored in main memory to dynamically translate the virtual address, and the contents of the table are managed by the operating system.

The MMU (memory management unit, storage Management Unit) uses the query table stored in main memory to dynamically translate the virtual address, which is managed by the OS .

Address space

The address space (adress space) is an ordered collection of non-integer addresses: {0,1,2,...}

If the integer in the address space is contiguous, then we say it is a linear address space (linear). In a system with virtual memory, the CPU generates a virtual address from an n = 2 ^ n address space called the virtual address space: {0,1,2,3,..., N-1}

The size of an address space is described by the number of times required to represent the maximum address. For example, a virtual address space that contains an n=2^n address is called an n-bit address space. Now the system typically supports 32-bit or 64-bit virtual address space.

A system also has a physical address space (physical addresss space) that corresponds to the M-byte of the physical memory in the system: {0,1,2,... M-1}

M does not require a power of 2, but in order to simplify the discussion, we assume that M = 2 ^ M.

The concept of address space is important because it clearly distinguishes between data Objects (bytes) and their properties (addresses). Once this distinction is realized, we can generalize it, allowing each data object to have multiple independent addresses, each of which is selected from a different address space (discontinuous meaning?). )。 This is the basic idea of virtual memory . Each byte in main memory has a virtual address selected from the virtual address space and a physical address selected from the Physical address space. (This paragraph does not understand how to read ~ ~)

Virtual memory as a tool for caching

Conceptually, a virtual memory (VM) is organized into an array of cells that are stored in n contiguous byte sizes on a disk. Each byte has a unique virtual address, and this unique virtual address is used as an index to the array. The contents of the array on the disk are cached in main memory. As with other caches in the memory hierarchy, the data on the disk (lower layer) is segmented into blocks that act as transmission units between the disk and main memory (higher). The VM system handles this problem by dividing the virtual memory into blocks of fixed size called virtual pages (vitual PAGE,VP). The size of each virtual page is P = 2 ^ n bytes. Similarly, physical memory is split into physical pages (physical page,pp), and the size is also p-bytes (physical pages are also called page frames).

At any one time, the collection of virtual pages is divided into three disjoint subsets:

Unassigned : A page that the VM system has not yet assigned (or created). Unallocated blocks do not have any data associated with them, so they do not occupy any disk space. (No calls to malloc or mmap)
Cached : The allocated pages in the physical store are currently being deferred. (already called malloc and Mmap, which is being referenced in the program)
Not cached : the allocated pages in the physical memory are not being slowed. (already called malloc and mmap, not yet referenced in the program)

Virtual pages 0 and 3 have not yet been assigned, so they do not yet exist on disk. Virtual pages 1, 4, and 6 are slow to exist in physical memory. Pages 2, 5, and 7 are already assigned, but are not currently cached in main memory and exist only on disk.

Page table

As with any cache, a virtual storage system must have some way of determining whether a virtual page is stored somewhere in the DRAM. If so, the system must also determine which physical page the virtual page resides on. If not, the system must determine where the virtual page resides on the disk, select a sacrifice page in the physical memory, and copy the virtual page from disk to DRAM, replacing the sacrifice page.

These features are provided by a combination of hardware and software, including the operating system software, the address translation hardware in the MMU (Memory management unit), and a data structure in the physical memory called the page table, which maps the virtual page to the physical page. A page table is an array of page table entries (page tables entry,pte).

Linux Virtual memory system

Linux maintains a separate virtual address space for each process .

Where kernel virtual memory contains code and data structures in the kernel. Some areas of the kernel virtual storage are mapped to physical pages shared by all processes. For example, each process shares the kernel's code and global data structures. Other areas of the kernel virtual storage contain data that is not the same for each process. For example, the page table, the stack (kernel stack) used by the kernel to execute code in the context of the process, and the various data structures that record the current organization of the virtual address space.

Kernel virtual memory contains code and data structures in the kernel. Some areas of the kernel virtual storage are mapped to physical pages shared by all processes. For example, each process shares the kernel's code and global data Structures .

1. Linux Virtual memory area (also the concept of a region under Windows)

Linux organizes virtual storage into a collection of areas (also called segments). An area is a continuous slice (chunk) of existing (allocated) virtual memory that is associated in some way. For example, code snippets, data segments, heaps, shared library segments, and areas where the user stack is different. each virtual page that exists is saved in a zone, and a virtual page that is not part of a zone does not exist and cannot be referenced by the process. The concept of a zone is important because it allows for a gap in the virtual address space. The kernel does not have to log those virtual pages that do not exist, and such pages do not occupy memory. Any additional resources on the disk or the kernel itself .

The kernel maintains a separate task structure for each process in the system (task_struct in the source code). The elements in the task structure contain or point to all the information that the kernel needs to run the process (for example, a PID, a pointer to the user stack, the name of the executable target file, and the program counter).

An entry in task_struct points to Mm_struct, which describes the current state in the virtual storage. Where PGD points to the base address of the first-level page table (the page global catalog), and mmap points to a list of vm_area_struct (regional structures), where each vm_area_structs describes a region (area) of the current virtual address space. When the kernel runs this process, it stores the PGD in the CR3 control register.

A specific area structure contains the following fields:

Vm_start: Point to the beginning of the area.
Vm_end: Point at the end of the area.
Vm_prot: Describes the read and Write permission permissions for all pages contained within this area.
Vm_flags: Describes whether pages in this area are shared with other processes, or whether the process is private (and some other information is also described).
Vm_next: Points to the next area structure in the list.

2 , Linux pages exception handlingSuppose the MMU triggers a missing pages when attempting to translate a virtual address a. This exception causes control to be transferred to the kernel's fault-handling program, which then executes the following steps: 1) is virtual address a legal? In other words, a is within a region defined by a region structure? To answer this question, the fault-pages handler searches the list of regional structures and compares the Vm_start and Vm_end in a with each regional structure. If this directive is illegal, then the fault-pages handler triggers a segment error (indicating that the virtual address is not mapped), thereby terminating the process. Is this condition identified in Figure 9-28 as "1" 2) the attempted memory access is legitimate? In other words, does the process have permission to read, write, or execute pages within this area? For example, is this page fault caused by a storage instruction that attempts to write to read-only pages in this code snippet? Is this missing because a process running in user mode attempts to read words from kernel virtual memory? If the attempted access is illegal, the fault-pages handler triggers a protection exception that terminates the process. This scenario is identified as "2" 3) at this point, the kernel knows that this page is due to legitimate operation of the virtual address. This is how it handles the page fault: Select a sacrifice, if the Sacrifice page has been modified, swap it out, swap in a new page, and update the page table. When the fault-message handler returns, the CPU restarts the instruction that caused the missing pages, and this instruction sends a to the MMU again. This time, the MMU will be able to translate a normally, without the interruption of the missing pages. memory Mapping (similar mechanism under Windows, called Memory mapping) Linux (and some other forms of Unix) associates a virtual memory area with an object on a disk to initialize the contents of this virtual memory area, a process known as memory mapping。 A virtual memory area can be mapped to one of two types of objects: (1) Common Files on Unix files: An area can be mapped to a contiguous portion of a regular disk file, such as an executable target file. The file Area (section) is divided into page-sized slices, each containing the initialization of a virtual page. Because the page height is on demand, these virtual pages do not actually carry physical memory until the first time the CPU refers to the page (that is, it launches a virtual address that falls within the scope of the address space page). If the zone file area is larger, then use zero to populate the rest of the area. (2) Anonymous Files: An area can also be mapped to an anonymous file, the anonymous file is created by the kernel and contains all binary zeros. When the CPU first refers to a virtual page in such a region, the kernel finds a suitable sacrifice page in the physical memory, and if the page is modified, the page is swapped out, and the page table is overwritten with a binary zero overlay, which marks the page as residing in memory. Note there is no actual data transfer between the disk and the storage. For this reason, pages in areas that map to anonymous files are sometimes referred to as requests for binary zeros (Demand-zero page). In either case, once a virtual page is initialized, it is swapped between a dedicated swap file maintained by the kernel (swap files). Swap files are also called Swap spaces (swap space) or swap areas (swap area). An important point to be aware of, at any moment, the swap space limits the total number of virtual pages that the currently running process can allocate。 look at the shared object againAn object can be mapped to a region of virtual storage, either as a shared object or as a private object. If a process maps a shared object to a region of its virtual address space, then any write to this zone by the process is also visible to other processes that also map the shared object to their virtual storage. Also, this change is reflected in the original object on disk. (One way of IPC) on the other hand, changes made to a zone mapped to a private object are not visible to other processes, and any writes that the process makes to the zone are not reflected in the objects on disk. A virtual storage area mapped to a shared object is called a shared zone. Similarly, there are private areas.

The key point of a shared object is that even if the object is mapped to multiple shared areas, the physical memory only needs to hold a copy of the shared object.

A shared object (note that physical pages are not necessarily contiguous.) )

Private objects are mapped to virtual memory using a clever technique called write-time copy (Copy-on-write). For each process that maps private objects, the page table entries for the corresponding private zone are marked as read-only, and the zone structure is marked as private write-time copies.

Look at the fork function again

When the fork function is called by the current process, the kernel creates a variety of data structures for the new process and assigns it a unique PID. To create a virtual memory for this new process, it creates the mm_struct, the region structure, and the original copy of the page table for the current process. It marks every page in two processes as read-only and marks each of the two processes as private write-time copies.

When fork is returned in a new process, the new process now has the same virtual memory as the virtual memory that existed when the fork was called. When either of these processes later writes, the write-time copy mechanism creates a new page, thus preserving the abstract concept of private address space for each process.

Look at the Execve function again.

Assume that a program running in the current process performs the following calls:

Execve ("A.out", null,null);

The EXECVE function loads and runs the program contained in the executable target file a.out in the current process, effectively replacing the current program with the A.out program. The following steps are required to load and run A.out:

deletes a user area that already exists . Deletes the existing zone structure in the user portion of the current process virtual address.
Maps Private areas . Creates a new regional structure for the text, data, BSS, and stack areas of the new program. All of these new areas are private, copy-on-write. Text and data regions are mapped to text and data areas in the A.out file. A BSS region is a binary zero that is mapped to an anonymous file whose size is contained in the a.out. The stack and heap regions are also requesting binary zeros.
map shared areas . If the A.out program is linked to a shared object (or target), such as the standard C library libc.so, then these objects are dynamically linked to the program and then mapped to a shared area in the user's virtual address space.
Set the program counter (PC). The last thing Execve do is set up the program counter in the current process context to point to the entry point of the text area.

The next time the process is dispatched, it will start executing from this entry point. Linux will swap in code and data pages as needed. user-level memory mapping using the MMAP function [CPP]View Plaincopy

#include <unistd.h>
#include <sys/mman.h>
void *mmap (void *start,size_t length,int prot,int flags,int fd,off_t offset);
//return: A pointer to the map area if successful, map_failed ( -1) If an error occurs

The MMAP function requires that the kernel create a new virtual memory area, preferably a region starting at address start, and map a contiguous slice (chunk) of the object specified by the file descriptor FD to the new region. The contiguous size of the object slice is the length byte, starting at the offset byte from the beginning of the file. The start address is only a hint and is usually defined as null.

[CPP]View Plaincopy

The Munmap function deletes the area of the virtual storage:
#include <unistd.h>
#include <sys/mman.h>
int Munmap (void *start,size_t length);
//return: 0 If successful, 1 if an error occurs

virtual Memory is an abstraction of main memory. Processors that support virtual memory refer to main memory by using an indirect form called virtual addressing. The processor generates a virtual address that is translated into a physical address before it is sent to main memory. Address translation from virtual address space to physical address space requires hardware and software to work closely together. Specialized hardware translates virtual addresses by using a page table, and the contents of the page table are provided by the operating system. The virtual memory provides three important functions. First, it automatically caches the contents of the virtual address space on the most recently used storage disk in main memory. A block in the virtual memory cache is called a page. References to pages on the disk trigger a page fault, which transfers control to a fault-handling program in the operating system. The page fault handler copies the pages from the disk to the main memory cache and, if necessary, writes back to the evicted page. Second, virtual memory simplifies memory management and, in turn, simplifies linking, sharing data between processes, memory allocation for processes, and program loading. Finally, virtual memory simplifies memory protection by adding a protection bit to each page table entry. the process of address translation must be integrated with all the hardware cache operations in the system. Most page table entries are located in the L1 cache, but an on-chip cache of page table entries called TLB typically eliminates the overhead of accessing page table entries on L1. Modern systems initialize virtual memory slices by associating virtual memory slices with file slices on disk, a process known as memory mapping. Memory mapping provides an efficient mechanism for sharing data, creating new processes, and loading programs. Apps can use the MMAP function to manually create and delete areas of the virtual address space. However, most programs rely on dynamic memory allocators, such as malloc, which manages a region called a heap within the virtual address space area. The dynamic memory allocator is an application-level program that feels like a system-level program that directly operates the memory without the help of a type system. There are two types of allocators. An explicit allocator requires an application to explicitly release their memory blocks. An implicit allocator (garbage collector) automatically releases any unused and unreachable blocks.

Linux Virtual Storage

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More