This article is from the "Advanced FPGA Design" corresponding to the Chinese version of "High-level FPGA, architecture, implementation, and optimization" in the first chapter of the content
The improvement of timing in FPGA, I believe is also one of the most concerned about the topic, in this book listed some methods to provide reference.
1, insert register (add register Layers), in the Chinese version is translated into: Add register hierarchy. That is, the register is inserted in the critical path.
This approach increases the time lag (clock latency) of the design. Several registers are inserted and the resulting output is extended for several cycles, which can be done without violating the design specifications (which are required for clock latency) and the time lag where the function has no effect.
2, parallel structure. Change the serial into parallel. The most typical is the multiplier.
As a 16bit multiplier, the most resource is to wait for 16 clock out of the results, can also be designed to the largest area but the result is the fastest, only need a period to come out results.
3, logical expansion (Flatten logic structures). The Chinese version of the same translation is very conservative: flatten the logical structure.
Look carefully, think inside should contain even a knowledge point. The first is logical replication, especially for large fanout (details are mentioned in Altera's official video material), usually using generate or in the integrated device. The second is to eliminate the precedence in the code. Here's what you need to say: Now the tools are smart, even if you write a priority structure for if else, and sometimes you can synthesize parallel structures. If parallelism also conforms to your design requirements, it is better to write a case for security reasons.
4, register balance (register balancing).
Register balancing is about moving your registers in your critical path. The first is that you move manually--Change the code. The second is to set up the integrated device to move it by itself--not as a last resort, because so much of it leads to poor code portability.
5, Path reorganization
This is one of the most interesting ways to reflect the level of your design. The example given in the result book really surprises me. Why, just post the code.
First edition:
1 ModuleRandomlogic_1 (2 Output Reg[7:0] out,3 input[7:0] A, B, C,4 inputCLK,5 inputCond1, Cond2);6 always@(PosedgeCLK)7 if(COND1)8Out <=A;9 Else if(Cond2 && (C <8))TenOut <=B; One Else AOut <=C; - Endmodule
Second Edition:
1 Modulerandomlogic_2 (2 Output Reg[7:0] out,3 input[7:0] A, B, C,4 inputCLK,5 inputCond1, Cond2);6 7 WireCondb = (Cond2 &!)Cond1);8 9 always@(PosedgeCLK)Ten if(Condb && (C <8)) OneOut <=B; A Else if(COND1) -Out <=A; - Else theOut <=C; - Endmodule
From the code, version 2 seems to be longer than the first version of the path, because out <= B, the path from Cond2 && (C < 8) became (Cond2 &!) COND1) && (C < 8). It seems to be getting longer, how is it called optimization? In fact, if we do not look at the figure below, we imitate the RTL Viewer, found that the 2 version of the critical path is really shorter than 1. The same is true of the illustrations given in the book, which is the first version of the view, the critical path experienced by 4 devices
This is the second version of the view, the critical path is actually missing a device.
That's where I'm surprised, because it doesn't look like a code. So you need a higher level of hardware knowledge to harness AH.
Methods of improving timing performance in FPGA