The implementation of the SMP system requires collaboration between software and hardware. As a hardware, the CPU that makes up an SMP system must support inter-processor communication, and the hardware must provide a mechanism to maintain the consistency of cache content between CPUs. As a software OS, it is necessary to work with hardware to schedule processes among various CPUs and handle various external interruptions.
1. Synchronization and mutex between processors
Inter-process synchronization can actually be attributed to mutex operations on critical resources. In a single processor structure, as long as it can ensure that no process scheduling will occur in the operation on critical resources, and there will be no interruption, or even if there is an interruption, it is also irrelevant to the operation object, this ensures the mutual exclusion of operations. Even in extreme circumstances (for example, the disconnection is not allowed), operations on critical resources can be completed in a single command, which ensures mutual exclusion of operations, because the interruption can only happen between commands, but not in the middle of executing a command.
In general, as long as the "atomicity" of critical resource operations can be ensured, mutual exclusion can be ensured. A single processor system is based on this mechanism. In a single processor system, operations that can be completed in a single command are considered as "Atomic operations ". However, in the SMP structure, operations that can be completed in a single command may be affected because multiple processors are running independently. Compared with the single-processor structure, the SMP structure requires a higher "resolution" for mutually exclusive operations. Some "Atomic operations" in the single-processor structure are no longer atomic in the SMP structure.
Solution: Reading or writing is inherently Atomic. The problem is that some commands that require both reading and writing require two or more microoperations to complete, the i386 CPU provides a means to lock the bus during command execution. There is a lead lock on the CPU chip. If the assembler adds the prefix "Lock" before a command ", the compiled machine code will lower the lead lock potential when the CPU executes this command, thereby locking the bus. Other CPUs on the same bus will not be able to access the memory through the bus for the time being. There is a special case: When xchg is executed, the CPU will automatically lock the bus, instead of using the prefix "lock'' in the program ''. The xchg command swaps the content of a memory unit with the content of a register, so it is often used for operations on the kernel semaphore (semaphore.
2. Consistency between cache and memory
In the SMP structure, the cache is more complex than a single processor system, because a CPU does not know when other CPUs will change the memory content. The cache write operation has two modes: the write-through mode. The cache write operation does not seem to exist. Each write operation is directly written to the memory, in fact, it only uses the cache for read operations, so the efficiency is relatively low. In the "Write-Back" mode, the cache is first written when writing, then, the cache hardware automatically writes data into the memory using the buffer line during the turnaround, or the software actively "scrubbed" the related buffer lines. Therefore, before changing the content of the buffer page and starting the DMA write operation to write it to the disk, you must first "flush" the buffer lines related to the cache, because the changed content may not be written back to the memory buffer.
There is a register in the Intel Pentium CPU called "memory type range register, mtrr ), with this Reg, you can set different intervals in the memory to use or not use cache, and use the penetration mode or write-back mode for write operations.
The usage of cache may change the order of memory operations. Assume there are two observers, one observing the order in which the cache inside the CPU is accessed, and the other observing the order in which the memory is accessed, the two may be quite different. The former is the sorted order in the program, which is called "program ordering". The latter is the order actually displayed outside the processor, that is, on the system bus, it is called "process ordering ). When no cache is used, the two are the same. If the cache is used, it depends on the specific situation and operation. If the "processor order" is the same as the "command order", it is called the "strong order" (strong ordering). If the "processor order" is sometimes different from the "command order ", it is called weak ordering ). For a single-processor system, the difference between the two is not a problem, but the SMP structure system may become a problem.
The DMA operations in a single processor system are actively started by the device driver, so the device driver knows when to discard the buffer line content and when to flush the buffer line content. In the SMP structure, each CPU may change the content in the memory asynchronously. Each CPU only knows when it will change the content in the memory, however, I do not know when other CPUs will change the memory content or whether the content in the local cache is inconsistent with that in the memory, each CPU may also change the memory content to make the cache of other CPUs inconsistent.
Solution: For the content in the cache, only data is consistent, because the commands are generally read-only and are not dynamically changed during the running process. Intel provides an automatic and memory-consistent mechanism for Cache-mounted data in the Pentium CPU, called snooping ). Each CPU has a part of dedicated hardware. Once cache is enabled, it monitors the memory operations on the system bus at all times. Since memory operations must pass through the system bus, no actual memory access operation can escape monitoring. If there are write operations from other CPUs, And the cache of the CPU caches the write operation targets, the corresponding buffer lines will be automatically discarded, this makes it necessary to re-load the data into the cache when the data is needed, so that the two are consistent. In this way, the data consistency between cache and memory in the SMP structure is transparent to the software.
3. interrupt handling
In a single processor structure, the entire system has only one CPU, and all interrupt requests are responded and processed by this CPU, however, the SMP structure cannot be fixed so that one of the CPUs can process all interrupt requests; otherwise, the other CPUs cannot interrupt the clock, so if the processes running on those CPUs are in an endless loop, there will never be a chance to make system calls, and these CPUs will never have process scheduling. In addition, if it is to let all the CPUs take turns to process interruptions, or who is idle, how will the allocation of Interrupt requests be handled? These need to be done in collaboration with the software and hardware.
The traditional i386 processor uses the 8259a interrupt controller. Generally, 8259a serves to provide connections between multiple external interrupt sources and a single CPU. If the SMP structure still uses the 8259a interrupt controller, all external interrupt sources can be divided into several groups statically, and each group is connected to an 8259a, the 825a9 is connected to the CPU one-to-one, so that it cannot dynamically allocate interrupt requests. Intel designed a more general Interrupt Controller for Pentium, known as the advanced programmable interrupt controller (APIC ). Considering the need for "interrupt requests between processors", each CPU must have a local APIC, because the CPU often has to send an interrupt request to other CPUs in the system. Starting from Pentium, Intel has integrated local APIC In the CPU chip, but an external and global APIC is also required in the SMP structure to form the structure shown in.