Operating System Optimization Based on Multi-core platforms

Source: Internet
Author: User

I. Development of multi-core and multi-thread technology


1) The gap between memory and processor speed leads to CPU waste of time waiting for memory access to obtain data. The two basic methods can be improved physically: Increase the cache and increase the clock frequency.

However, the cache cost is high, and there are some physical limitations. The increase in time frequency allows more operations to be completed at the same time, but it also brings about the following problems: the correlation and delay between programs increase.

2) As a result, people think that not only do they break through physical limitations to improve performance, but also increase throughput in parallel in space and time. Although users are most concerned about the response time of interactive programs, managers are concerned about the number of tasks completed per unit time, but ultimately they are the most completed tasks in the shortest time.

 

Since the original bottleneck is that the processor needs to block and wait for the access to waste hardware resources, the CPU can do other things during this delay, for example, read the command from other Instruction Streams to run a ready process or thread.

 

Ii. Problems arising from the differences between multi-core and multi-thread programs and traditional programs


1. The programs run in the traditional system can run in the CMT system with few changes, but some inevitable threads will affect the execution of other threads. How can this problem be solved?

2. What should I do if Load Balancing conflicts with hot cache? For example, to reduce resource contention, placing less contention on one core leads to a high load on this core.

3. When is the scheduling of processes and the threads they contain?

4. How does multi-core scheduling work with multi-thread scheduling in the kernel? How do I schedule a task to the core and then schedule a thread in the core?

5. If there are few tasks, will it only allow some CPUs to work and avoid complicated redundant work caused by scheduling and distribution, or will it still achieve load balancing?

 

Iii. Solaris multi-core support
1) thread architecture in Solaris

A. It can be divided into user-level and kernel-level. user-level operations can also be performed inside or outside the core. However, context switching overhead is required outside the core. The core is more conducive to the concurrent use of CPU, and can also become the scheduler of multiple user threads.
B. Each process requires a thread as its instruction execution body and the thread is allocated to each CPU for running. Each user-level thread must be bound to an lwp, which is then associated with a kernel thread. Lwp is not necessarily created when a process is created, but is requested to be created again when it is used. This user-level thread can be used as the scheduling execution entity and has independent priority scheduling, which is separated from the kernel priority scheduling and invisible to the kernel.
Lwp exists in the kernel and records the thread status, but kernel-level threads do not necessarily have lwp, such as service threads.
C. user-level threads have scheduling thread management for each process in the thread library.
Each thread exists in the chain list of processes to which it belongs, and also in the chain list of kernel-level threads. The kernel-level thread determines the CPU on which the thread runs, and when the thread is scheduled.

2) Solaris uses the scheduling level (classes) to define the kernel scheduling and execution process rules. There are a total of six classes, which can be scheduled in the same core and can be adaptive. Each classes has a list to look for threads.

3) Load Balancing)
In order to reduce unnecessary scheduling between each core, each core has its own scheduling queue, and Its thread can re-assign the time slice size, and then return it to the queue after use.
The kernel can seize other time slices and multiple scheduling classes at the same time. However, an indirect function call scheduling code is required for each decision.
There is a kernel abstraction that represents the logical CPI and shares physical resources such as cache and sockets interfaces.

4) Hot cache mechanism (making full use of existing data in the cache)
The threads of Public threads are placed on the same logical CPU;
The threads that can be shared with the cache are also placed on the same logical CPU.

 

4. Scheduling of linux2.6 and above

1) Support for SMP and load balancing
When a task is created, it is placed in a given logical CPU running pair column, but it does not know the running duration of the task. Therefore, the initial allocation may be unsatisfactory, so the task can be re-distributed. Therefore, the executable process or thread in the CPU with a heavy load is rescheduled to the CPU with a lighter load, even if the current load is balanced.
In fact, every 200 ms, the processor will check whether the CPU load is balanced. If not, it will be re-allocated once.
However, it is clear that this has another negative impact, that is, the new task needs to reload data for the cold cache of the CPU.

2) Hot Cache
Each CPU runs a queue, and each task is closely related to the CPU, so that hot cache can be used better. That is, the data required for running a task on the CPU is stored in the cache of the CPU and can be directly read. The cache is a local memory (On-Chip), which improves the access speed, reduce latency.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.