Fanout, that is, the number of lower-level modules directly called by the module. If this value is too large, the FPGA directly shows a large value of net delay, which is not conducive to Time Series Convergence. Therefore, you should try to avoid high fan-out situations when writing code. However, in some special cases, it is necessary to use other optimization methods to solve the problems caused by high fan output due to the need for the overall structure design or the inability to modify the code. The following describes three methods:
First, let's take a look at the following example. 1 shows the Key Path timing report in the transpose FIR filter. In the DSP in FPGA's FIR topic, we will introduce the large fan-out of input data of the transpose structure FIR filter, as shown in figure 1, the net delay is as high as 1.231ns. As shown in figure 2, the input data drives 11 dsp48e1 instances.
Figure 1
Figure 2
Without optimization, the design fmax: 206.016 MHz
1. Register Replication
Register replication is one of the most common methods to solve the high fan-out problem. By copying several identical registers, we share the tasks of all modules driven by the original register, and then reduce the fan-out. Through simple code modification, as shown in 3, four registers are copied: din_d0, din_d1, din_d2, din_d3, din_d, din_d0, din_d1, and din_d2 to drive two dsp48e1, respectively, the din_d3 driver has three dsp48e1 drivers. In the Code, to prevent the synthesizer from optimizing the same register, the (* equivalent_register_removal = "no" *) attribute is added to the corresponding signal to avoid optimization.
Figure 3
According to Time Series Report 4, fanout is reduced to 2 on the data path, and net delay is reduced to 0.57ns. As shown in design 5, as expected, four registers are copied to share fanout. After register optimization, The fmax: 252.143 MHz is obtained.
Figure 4
Figure 5
2. max_fanout attributes
You can set the signal attribute in the code to set the max_fanout attribute of the corresponding signal to a reasonable value. When the fanout of the signal exceeds this value in the actual design, the synthesizer automatically uses Optimization Methods for the signal. The common method is register replication. The following code sets properties:
(* Max_fanout = "3" *) reg Signed [15:0] din_d;
Set the max_fanout attribute of the din_d signal to 3. After comprehensive implementation, we will obtain the Time Series Report 6, where fanout is only 2, and the corresponding net delay is only 0.61ns. The automatic optimization effect is good. As shown in structure 7, din_d_12_1, din_d_12_2, and din_d_12_3 are automatically added after the optimizer is optimized to implement the register replication function. After the max_fanout attribute is set, the fmax: 257.135 MHz is obtained.
Figure 6
Figure 7
3. bufg
Generally, bufg is a resource used for global clock, which can solve the problem caused by high fan output. However, it is generally used for fan-out super-large signals such as clock or reset. The logic involved in such signals is distributed across the entire chip, while bufg can optimize the wiring from a global perspective. In addition, the bufg resources in an FPGA chip are limited, and there are only 32 bufg Resources in the 7k325tffg900. If it is used for high fan-out Optimization of common signals, it is not realistic. Therefore, it is necessary to use bufg on the clock. However, if you encounter timing problems caused by high fan output due to some reset signals in the design, you can use bufg to optimize the signal.
In summary, when the signal is high fan-out, you can use register replication for common signals or set max_fanout attribute optimization. For reset signals, you can add bufg optimization.