Operating System Learning notes: virtual memory

Source: Internet
Author: User
Tags data structures memory usage requires terminates cpu usage disk usage advantage

First, Introduction

The various memory management policies of the operating system are for the same purpose: Multiple processes are stored in memory at the same time to allow for multi-channel program design. However, these policies require that the entire process be placed in memory before the process executes. Dynamic onboarding, while mitigating this limitation, requires the programmer to be careful with the application and to spend extra work.

While virtual memory allows the execution of the process part in memory, one notable advantage is that the program can be larger than the physical memory. and virtual memory abstracts memory into a huge array, separating the logical memory of the user's horizon from the physical memory, making the programmer not subject to memory storage limitations. In short, virtual memory shows up in front of programmers with a much larger, contiguous memory space than physical memory, which in fact maps to fragmented physical memory and even disk.



However, the implementation of virtual memory is not easy, but improper use can greatly reduce performance.

second, on-demand paging

1. Basic Concepts

When the page needs to be used, it is only transferred into memory.

This scenario requires hardware support to differentiate which pages are in memory and which are on disk. Represented by a valid/invalid bit. When the bit of an entry is valid in a page table, it means that the page is valid and in memory, and vice versa, may be illegal, or it may be legitimate but not in memory.

When a process tries to access a page that has not been paged into memory, it causes a page fault trap (Page-fault trap). Follow these steps to process:

1) Check the Process internal page table, usually with the PCB to save. To determine the legitimacy of the reference

2) If it is illegal, the process terminates; otherwise, the incoming:

3) Find a free frame

4) Schedule a disk operation to transfer the required pages into the newly allocated frame

5) After the disk read operation is complete, modify the Internal table and page table (valid invalid bit. ), indicating that the page is already in memory

6) Restart the command that was interrupted by the trap.


2, on-demand paging performance

Reducing page error rates is critical for on-demand paging.

In addition is the use of the processing of the swap space. Disk IO to swap space is usually faster than the file system because swap space is allocated in chunks and does not use file lookup and indirect allocation methods. Therefore, it is possible to achieve better performance by copying the entire file image to swap space at the beginning of the process and performing a paging schedule from the space Exchange.

Another option is to start with an on-demand paging from the file system, but the displaced pages write to the swap space, and then the paging is read from the swap space. This approach ensures that only the required pages are transferred from the file system and that certain performance is guaranteed.


third, copy on write

Some processes, such as fork () out of the sub-process, do not need to page-on-demand, but the first to share the page with the parent process, when the child process needs to modify the page, the page to copy a copy, the copy is modified. is for write-time replication.

When a page needs to be copied when it is written, it is important to allocate the free pages from where. Many operating systems provide free buffer pools for this purpose.


Iv. page Replacement

Memory is sometimes overallocated, the process needs to use a page larger than the allocated memory, and the memory is not only for the process of the page, the IO cache also needs to use a lot of memory, there will be memory relative demand go round situation, when the process occurs page error, the operating system ready to dial the required page, But found no free frames to allocate. is the so-called house optimistic, the car is also bullish, everything looked color ball.

1, page replacement

In this case, the operating system can choose to terminate the starving process, or to swap out an unlucky process. More often, it will take the form of page substitution:

If there are no free frames, look for frames that are not currently in use, release them, and empty them to save the page where the process was faulted (that is, the page that needs to be swapped in).

You must also write the page back to disk if there are changes to the page that you have swapped out. You can improve performance by setting modify bits or dirty bits.


Page Replacement algorithm:

2, FIFO page replacement

The simplest page substitution algorithm. Select the oldest page to displace. To create a FIFO queue to manage all the pages in memory, the first page in the queue is replaced, and the newly paged pages are added to the tail of the queue.

FIFO algorithms are easy to understand and implement, but performance is not always good. The replacement page may still be in use, change the page immediately after the error, request to change back.


3. Optimal Replacement

Replaces pages that are not used for the longest time (not long unused, but predicts their future to be used for the longest time). )。 This algorithm has the lowest page error rate.

The problem with this algorithm is that it is difficult to achieve.


4, LRU page replacement

Approximation of the optimal permutation. The key difference between the optimal permutation and FIFO is that the FIFO uses the page-in time, and the optimal permutation values the time that the page will be used in the future. If you use a near-future approximation that is closest to the past, you can replace pages that are not used for the longest time. Guess the future according to the past. This method is called the least recently used algorithm.

The implementation of the LRU algorithm, available counters, also available stacks: Where used pages, put to the top, do not have to sink to the bottom of the stack.


5, approximate LRU page replacement

Few computer systems can provide enough hardware to support true LRU page substitution. However, many systems approximate permutations by means of a reference bit:

Each entry in the page table is associated with a reference bit, and the corresponding reference bit is set by the hardware whenever a page is referenced;

At first, all the reference bits were zeroed, and many were then set to 1. By examining the reference bits, you can know which pages have been used and which are not. This information is the basis of the approximate LRU permutation algorithm.

There are several approximate LRU-placement algorithms:

1) Additional reference bit algorithm

Each page has a 8-bit byte that makes a reference bit, and periodically refreshes the reference bit. There is a reference when the byte is the highest position 1, the other bits move right, squeezing out the original lowest bit. A page that refers to a minimum value can then be replaced.


2) Two chance algorithm

When a bad page is selected, check its reference bit, if it is 0, replace it directly; If the reference bit is 1, give it a chance, let it go, and keep looking for the next bad page. The page that gets the chance to respawn, its reference bit zeroed, resets the time. It will not be replaced at least until all pages have been looked over.


3) Enhanced two-time Opportunity algorithm

Consider the reference bit and the modifier bit as an ordered pair:

(0,0) has not been used or modified recently: change it, don't hesitate.

(0,1) recently unused but modified: before replacing, write back to disk, think twice.

(1,0) Recently used but not modified: may soon be used again

Recently used and modified: Think twice before you can use it again and write it back to disk before replacing it.


6. Count-based page substitution

Set up a counter for each page to form two scenarios

1) The most infrequently used page substitution algorithm (LFU)

Displacement Count Minimum page. The reason is that the active page should have a greater number of references. However, there may be a problem: a page may start with a lot, but it will not be used in the future. The workaround is to shift the number register to the right one bit on a regular basis to form the average number of times the exponential attenuation is used.


2) most commonly used page substitution algorithms (MFU)

The Maximum Displacement Count page. Reason: The minimum number of pages may have just redeployment and haven't been used yet.


7. Page buffer Algorithm

Keep a free frame buffer pool.

1) Maintain a list of modified pages. Whenever the paging device is idle, select a modification page to write to the disk and reset its modification bit. This scheme adds clean pages and reduces the probability of writing out when the permutation is made.

2) Keep a pool of free frames and remember the correspondence between the page and the frame. When a frame needs to be reused, it is taken from the pool and there is no disk IO.


8. Application and Page substitution

In some cases, the use of virtual memory by an application through the operating system results in worse. The database is an example. Because the database provides its own memory management and IO buffering, it is better able to understand its own memory usage and disk usage. Based on this, the operating system allows special programs to use the disk as a logical block array without having to pass through the operating system's file system.


Five, frame allocation

How to allocate a certain amount of free memory between processes.

The simple way to do this is to hang the frame on the free frame list and assign it when the page error occurs. When the process terminates, the frame is put back in the idle frame list again.

Frame allocation policies are limited in many ways. For example, the number of allocations cannot exceed the number of frames available, and at least a minimum number must be assigned. One of the reasons to ensure the least amount is performance. An increase in page faults slows down the execution of the process. Also, a page error occurs before the instruction completes, and the instruction must be re-executed. So it's important to have enough frames.

The minimum number of frames per process is determined by the architecture, and the maximum number is determined by the amount of available physical memory.

1, the Frame allocation algorithm has

1) Average distribution, as much as every process

2) scale by process size

3) by Process priority sub-

4) combination of size and priority


2. Global distribution and local distribution

Global displacement allows a process to select a permutation from all frame collections, regardless of whether the frame has been assigned to another process, that is, it can snatch frames from other processes, such as high-priority looting of low-priority frames, while local allocations require that each process be allocated only from its own allocation frame.

Global displacement usually has better throughput and is more commonly used.


six, System bumps

If the process does not have the frame it needs, then a page error is generated quickly, and a page must be replaced. However, all pages are in use, replacing one, immediately and again, page errors occur frequently, called bumps.

Bumps lead to serious performance problems. The operating system looks at CPU usage at all times, and if the CPU usage is too low, the system introduces a new process. Using the global permutation algorithm, regardless of the page belongs to which process, grab to change. Suppose a process requires more frames, starts a page fault, and grabs a frame from another process. The robbed process moves out of the ready queue, CPU usage decreases, and when the CPU scheduler discovers it, it tries to get the CPU to come up with more processes. The new incoming process starving, the frames are robbed more fiercely, waiting for the queue longer, CPU utilization is further reduced, the CPU scheduler is more effort to tune into more processes ...

Finally, the process is mainly busy paging, the system can not complete a work.

Using local displacement can limit system bumps, but it does not solve the problem completely.

1. Working Set Model

To prevent bumps, the process must get enough frames to start. The operating system tracks the working collection of each process, assigning it a larger number of frames than its working collection. If there is still idle, it is possible to start another process. If the sum of all working collections of a process exceeds the total number of frames available, it is paused and its frames are assigned to other processes. The pending process waits for a later restart. This is the work collection model. The difficulty lies in tracking the work set.

2. Page Error frequency policy

In addition to the work set, another scenario to prevent bumps is the page fault frequency policy.

If a process, the page fault frequency is too high, the description needs more frames to give it. If the page error frequency is too low, it indicates that the frame has surplus, give some to others. Sets the page error rate for the process up and down, and assigns frames in a motorized manner.

As with the work set model, if frames are required but no frames can be allocated, then the process should pause and release to other processes with the same high page error frequency.


vii. Memory-mapped files

Typically, a file needs one system call and disk access per access, but there is another way to do this: use virtual memory technology to access file IO as normal memory. This means that accessing a file is like accessing memory.

1. Basic mechanism

Map disk blocks to memory pages (one or more pages). In the beginning, page scheduling will result in page errors, so that the contents of the file are read into physical memory. The file reads and writes as if it were memory accesses, simplifying the memory operations file rather than the system calling read () and write ().

Where the write to the file may not be immediately written to disk unless the dirty page is replaced or the operating system is checked periodically, or the file is closed.

If a file is shared by multiple processes, it is mapped to its own virtual memory to allow data sharing. Any process modifies the data in virtual memory that is visible to other processes. If there is a modification, it modifies the respective copy and copies it on write. There may also be mutexes.

2. Shared memory of WIN32 API

Placing a file that exists on disk into the virtual address space of a process and generating a zone in the virtual address space of the process to "store" the file, which is called file View (stored in the virtual memory of the process), and the system generates a file Mapping Object (stored in physical memory) is used to maintain this mapping so that when multiple processes need to read and write data from that file, their file view is actually the same as file Mapping Object, which saves memory and keeps the data synchronized. and achieve the purpose of data sharing.

3. Memory-Mapped IO

By mapping an IO device to memory, reading and writing that portion of memory is like reading and writing an IO device without having to manipulate the IO device directly. For example, each point on the screen corresponds to a memory address, program memory, you can control the screen display.


viii. allocation of kernel memory

When the user-state process requires additional memory, the page can be fetched from the list of free page frames maintained by the kernel. Typically, page frames are scattered in physical memory, but kernel memory is typically obtained from a pool of free memory, mainly for two reasons:

1) The kernel needs to allocate memory for data structures of different sizes, so you must save usage and minimize debris waste. Kernel code and data for many operating systems are not controlled by paging system

2) Some hardware needs to work directly with physical memory without going through the virtual memory interface, so memory is required to reside in a contiguous physical page

Two methods for memory management of kernel processes:

1. Buddy System

From a physically continuous, fixed-size segment to allocate, according to the power of 2 to allocate, such as 4K, 8K and so on. The advantage is that larger segments can be quickly formed by merging, but are prone to fragmentation.


2, Slab distribution

According to the size of the kernel object data structure requirements, pre-allocated a number of memory blocks, waiting for the call to use.

Specifically, kernel object pairs should be cached, and the cache contains several slab (the appropriate size of memory blocks). )。 There are three states of slab: full, empty, and partial. When assigned, it is allocated from the idle state portion, not enough to be allocated from empty parts, and not enough to allocate new from the physical continuous page.

Advantages:

1) Size due to core object requirements variable, no fragmentation

2) Pre-prepared to meet the requirements quickly


Ix. Other Considerations

1. Pre-adjustment page

A notable feature of pure on-demand paging is that a large number of page faults occur when a process starts. The policy of the prefetch page is that all the required pages are simultaneously paged into memory. The key is whether the cost is less than the cost of the corresponding page fault.


2. Page Size

Whether to use a large page or a small page is a problem.

1) Large page helps reduce page table

2) Small pages help to reduce fragmentation and better use of memory

3) Small page transmission fast, large page io good, but not necessarily, small page because of addressing, transmission fast, local improvement, the total IO will be reduced, then, should use small pages.

4) However, large pages can reduce the number of page faults

......

Spacek, now you tell me whether to use large pages or small pages.


3. TLB Range

TLB can improve memory access speed, if there is no TLB, then each fetch data will need two times to access the memory, that is, the page table to obtain physical address and fetch data.

The TLB maintains only a small subset of entries in the page table, and the logical address translates the physical address process, first in the TLB, and if found, the physical address is at your fingertips, and if not in the TLB, the substitution algorithm is used to replace the related entry into the TLB and then the physical address.

It is important to increase the TLB hit ratio.

Increasing the TLB hit ratio increases the number of TLB bars, but at a cost, because the associated memory used to construct the TLB is expensive and power-consuming. Another way is to increase the size of the page, or to provide multiple page sizes.


4. Reverse Page Table

The Reverse page table saves memory, but when the page referenced by the process is not in memory, an external page table is still needed to obtain information about which virtual memory page the physical frame holds. Fortunately, this is only necessary when the page is wrong, the external page table itself can be swapped out, not demanding certain complete.


5. Program Structure

We usually write programs that don't care about memory at all. But sometimes knowing a bit of memory knowledge can improve system performance:

For example, there is a 128*128 two-dimensional array, data is stored in rows, how to traverse high performance.

int i,j;
INT[128][128] data;

If our outer loop is column-based:

for (int j=0;j<128;j++) for
     (int i=0;i<128;i++)
           data[i][j] = 0;

If the page is exactly 128 characters in size, then the above notation is equivalent to one page for each inner loop, and each tune is just to modify a number. If the number of frames assigned to the process is less than 128, then a total of 128 * 128 = 16,384 page faults are generated.

But if you write this:

for (int i=0;i<128;i++) for
     (int j=0;j<128;j++)
           data[i][j] = 0;


On each page, the number on the page is modified to the next page, resulting in a total of 128 page faults.


6. I/O interlock

Allows the page to be locked in memory.

In the global permutation algorithm, a process makes an IO request, is added to the IO device waiting queue, and the CPU is handed over to other processes. These processes have page faults that replace the cached pages that wait for the process for IO, and the pages are swapped out. Well, the request IO process waits for the IO device, IO for the specified address, but the frame is used by different pages of the other process.

There are usually two ways to solve this problem:

1) Never io The user's memory, and if you want to do IO, copy the user's memory data to the system memory. To replicate once, the overhead is too high.

2) The physical frame has a lock bit that allows the page to be locked in memory. If it is locked, it cannot be replaced. When IO is complete, the page is unlocked.

Lock bits are useful, such as operating system kernel pages are usually locked, and the pages of low-priority processes must be run at least once to unlock the replacements.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.