Out-of-order execution principle and rename solution for reverse dependency

Source: Internet
Author: User
Tags sub command

Reference: http://www.programmer.com.cn/14528/

The processor basically followsProgramThe sequential execution of machine commands written in. Execution in writing order is called in-order ). When executed in the writing order, if a long instruction (waiting for the result) such as the loading instruction and division instruction for reading data from the memory is followed by the instruction that uses the result of the instruction, it will be stuck in a long wait. Although this situation is helpless, sometimes the next instruction does not depend on the previous instruction with a long delay and can be executed as long as it has an operand.

At this point, the order of machine commands can be disrupted. Even if the command is behind, it can be executed as long as it can be executed. This is the out-of-order execution ).

In case of disorderly execution, commands that cannot be executed immediately due to data dependency will be delayed, which can reduce the impact of data disasters.

Reserved site

The out-of-order execution uses a reservation station similar to the reception room facility, as shown in 1.

Figure 1 execution in disorder using reserved sites

The decoded commands of the decoding unit are not directly sent to the pipeline, but are sent to their respective retention stations for storage according to their respective Instruction types. If an operand is in a register, it is read from the register and put together with the instruction into the reserved station. On the contrary, if the operand is still calculated by the preceding command, the recognition information of the command is saved.

Then, the reserved station sends complete and executable commands to the pipeline for calculation. Even if the command is in the front, if the operand is not ready, it cannot be started. Therefore, the command execution sequence in the reserved station is inconsistent with that in the Program (out of order ). In addition, the retained site monitors the output results of the execution pipeline. If the generated results are the operands of the waiting commands, read them. After the operands are complete, the waiting command can be executed.

In addition, figure 1 sets a retention station for each command, while some processors use a retention station to control all pipelines.

Reverse dependency

Unordered execution can complete other tasks within the waiting time to improve efficiency, but it may also cause problems. For example, the following program.

LD R1, [a]; then reads the memory variable A into the register r1 (loaded)

Add R2, R1, R5; then R1 and R5 are added and saved to R2.

Sub R1, R5, R4; reset R5 minus R4, save to r1

When this type of program is executed, it takes a long time to read variable A to R1 when the loading command cache at the beginning is not hit. The next add command uses the value of R1 as the operand. Therefore, the Add command cannot be executed before the command is loaded.

However, the value of R4 and R5 in the next sub command has been obtained. The sub command can be executed without waiting for the LD command and add command, however, the operand r1 of the add command is exactly the position where the sub command saves the result, so there is anti-dependency ). 2. If the sub command is executed before the Add command, the operand r1 of the add command is no longer the result of the LD command, but the result of the sub command, the content of R2. In addition, the behavior of the LD command to save the result to R1 is later than that of the sub command to save the result to R1. Therefore, the value of R1 will also change due to disorderly execution.

Figure 2 reverse dependency problem: when the sub command is executed before the Add command

Therefore, if there is a reverse dependency, adjusting the command execution sequence will lead to incorrect results.

Rename -- remove reverse dependency

To avoid reverse dependency, rename is required for execution in disorder.

The rename process maps the register numbers (called "logical registers") recorded in the program to the physical register numbers. The logical registers of the write results of each instruction must be allocated to idle physical registers.

LD P11, [a]; then reads the memory variable into the P11 (R1) Register)

Add P12, P11, R5; merge P11 (R1) and R5, save to p12 (R2)

Sub P13, R5, R4; reset R5 minus R4, save to P13 (R1)

As shown in figure 3, the LD command saves the result to R1, but is actually renamed, and the result is saved to the physical register P11. When decoding the next add command, R1 = P11 is recorded in the corresponding table, so the part using R1 is changed to using P11. In addition, The R2 register that stores the Add command result corresponds to the idle physical register p12. The sub command results must also be saved to R1, which corresponds to the idle physical register P13.

In this way, although the logical registers are R1, the actual physical register numbers that save the LD command results and sub command results are different. Therefore, even if the sub command is completed earlier than the LD command, and no problems will occur. This kind of processing is called renaming registers.

Figure 3 reverse dependency problem: after renaming

Principle of register renaming

To rename registers, the unordered execution processor must have a physical register pool and a table corresponding to the logical registers and physical registers. Idle physical registers are allocated during Instruction Decoding, record the corresponding relationship to the table. In addition, you need to find the corresponding table during Instruction Decoding and convert the logic registers used by subsequent instruction operations into physical registers. At the end of instruction execution, unused physical registers need to be recycled and put back into the idle physical register pool.

In addition, if the loading command is followed by another loading command, in the case of sequential execution, it is theoretically not impossible to perform pipeline processing, but in fact, resource scheduling is very difficult, therefore, the next load command is executed after the previous load command is completed. In the case of disorderly execution, as long as the address of the subsequent loading command can be determined, the subsequent loading command can be executed after the previous loading command is completed.

4. During execution in disorder, the processing of 1st memory accesses and the last memory access mostly overlap parallel execution. Therefore, the average memory access wait time is shorter than that of sequential execution, it is inversely proportional to the number of Memory Access commands executed in parallel.

Of course, to achieve overlapping execution of Multiple Memory Access commands, the processing unit of Memory Access commands must support pipeline execution.

Figure 4 execute multiple loading commands in parallel in an out-of-order manner

Ensure proper interruption

However, problems may occur after the command order is changed in disordered execution.

LD R1, [a]; then reads the memory variable A into the register r1 (loaded)

Add R2, R1, R5; then R1 and R5 are added and saved to R2.

Sub R3, R3, R4; divide R3 minus R4, and save it to R3.

In the above example, the memory address of the variable A to be accessed by the LD command will cause the TLB of the page management facility to miss, further searching for the page table on the memory, if the memory address is on a page not allocated to the program, the LD command cannot be executed.

In this case, the processor encounters an illegal access exception and notifies the operating system. Then the operating system will execute necessary exception handling, such as allocating physical memory to the page and re-executing the LD commands in the program. If the sub command is executed and the result is written to the R3 register, the sub command is executed twice when the LD command is executed again, the R3 value is incorrect because the value of R4 is subtracted twice from R3.

However, if you rename it as follows:

LD P11, [a]; Swap reads memory variable A into Register P11 (R1)

Add P12, P11, R5; merge P11 (R1) and R5, save to p12 (R2)

Sub P13, R3, R4; divide R3 minus R4, save to P13 (r3)

As shown in figure 5, when an exception occurs, the tables P11, p12, and P13 that store the LD command and the subsequent command results are restored to the status before the LD command is executed, just as these commands have never been executed.

Figure 5 restoring the processor status when an exception occurs

The main purpose of renaming is to eliminate reverse dependencies. After the restoration function is added, the executed commands can be canceled to properly interrupt the execution.

In short, the rename mechanism can solve the problem of reverse dependency. When an exception occurs, it can also ensure the same status as sequential execution. Therefore, disorderly execution has no effect on the program.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.