Discussion on clock factors affecting FPGA design

Last Update:2018-12-07 Source: Internet

Author: User

Tags case statement fsm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://www.fpga.com.cn/advance/skill/speed.htm

Http://www.fpga.com.cn/advance/skill/design_skill3.htm

The clock is the most important and special signal of the entire circuit. Most devices in the system are operated on the hop-on-line of the clock, which requires that the delay deviation of the clock signal be very small, otherwise, the timing logic status may be incorrect. Therefore, the factors that determine the system clock in FPGA design are clarified,It is of great significance to ensure the stability of the design to minimize the latency of the clock..

1.1 creation time and Retention Time

Tsu: set up time refers to the time required for data to be unstable before the clock arrives, if the created time does not meet the requirements, the data cannot be stably pushed into the trigger on the rising edge of the clock;

The holding time (TH: hold time) is the time for maintaining the index data after stability. If the holding time does not meet the requirements, the data cannot be stably written into the trigger. The establishment and retention time are as follows: 1.

Figure 1 retention time and Creation Time

The same module designed by FPGA often containsCombination logic and time series logicTo ensure that the data at these logical interfaces can be processed stably, it is very important to establish a clear concept of the Creation Time and retention time. The following are some questions about the concept of building time and holding time.

Figure 2 a basic model in synchronous design

Figure 2 shows a basic model for a unified synchronization design using a clock. Figure

TCO:Is the latency of the trigger's data output;

Tdelay:Is the latency of the combination logic;

Tsetup:Is the trigger creation time;

TPD:Is the delay of the clock (negligible ).

T:For the clock cycle

T3:D2 Creation Time

T4:D2 retention time

If the maximum creation time of the first trigger d1 is T1max, the minimum value is T1min, the maximum latency of the combination logic is T2max, and the minimum value is T2min. Ask the second trigger D2 what conditions should be met for establishing the time t3 and the retention time t4, or know the maximum clock period allowed by T3 and T4. This problem must be considered in the design. Only by clarifying this problem can we ensure that the designCombination LogicWhether the latency meets the requirements.

The following is an analysis using a sequence diagram: Set the input of the first triggerD1, The output isQ1;The input of the second trigger isD2, The output isQ2;

The clock is uniformly Sampled on the rising edge. In order to facilitate analysis, we discuss two situations: first, assuming that the latency TPD of the clock is zero, this situation is often met in FPGA design, in FPGA design, unifiedSystem clockThat is, the clock input from the global clock pin is used to completely delay the internal clock.Negligible. In this case, you do not need to consider the retention time, because every data is subject to a clock cycle and line delay, that is, the clock-based latency is far less than the data delay, therefore, the retention time can meet the requirements, with the focus on the Creation time. If the D2 creation time meets the requirements, the time sequence diagram should be shown in 3.

We can see that if:

T-TCO-tdelay> T3

That is:Tdelay <T-Tco-T3 (signals can arrive at D2 from the combined logic D1 within the D2 build time, that is, before the second CLK is established, the data is already in TsUP)

Therefore, it satisfies the time requirement, where T is the cycle of the clock,In this case, the second trigger can obtain D2 stably at the rising edge of the second clock., As shown in sequence 3.

{D1 => creation time => retention time => trigger data output latency => combined logic latency => D2 => ...}

Figure 3 Timing Diagram

If the delay of the combination logic is too large

T-TCO-tdelay <t3 (tcox <D2 Creation Time)

The second trigger will not be able to meet the requirements. The rising edge of the second clock will produce an indefinite state, as shown in figure 4. The circuit will not work normally.

Figure 4 the timing sequence of the combination logic is too large to meet the requirements

So that we can launch

T-Tco-T2max> = T3

This is the required D2 establishment time.

From the time sequence diagram above, we can see that the establishment time and retention time of D2 have nothing to do with the establishment and retention time of D1, but only the combination logic andD1 data transmission latencyThis is also an important conclusion. DescriptionNo superposition effect on latency.

In the second case, if the clock has a delay, you need to consider the retention time and the establishment time. The latency of the clock is mostlyAsynchronous Clock Design MethodThis method is difficult to ensure data synchronization, so the actual designRarely used. In this case, if both the creation time and retention time meet the requirements, the output time series is shown in step 5.

Figure 5 time delay but time sequence satisfied

As shown in figure 5The build time is relaxed by TPDSo the establishment time of D2 must meet the requirements:

TPD + T-Tco-T2max> = T3 (T3 is the D2 build time, T2max combined logic maximum latency, tPD is the clock latency)

Since the sum of Creation Time and retention time is stableOne clock cycle (t)If the clock has a delay and the data delay is small, the creation time will inevitably increase, and the retention time will decrease accordingly, if the data is reduced to a value that does not meet the D2 retention time requirement, the correct data cannot be collected, as shown in figure 6.

T-(Tpd-Tco-T2min)

T-(TPD + T-Tco-T2min)> = T4 (TCO + T2min-Tpd> = T4 (D2 Holding Time)

From the above formula, we can see that if TPD = 0, that is, the clock delay is 0, the TCO + T2min> T4 is also required, however, in practical applications, because the latency of T2, that is, the line latency, is much longer than the trigger retention time, that is, T4, it is unnecessary to maintain the relationship.

Figure 6 the clock has a delay and the retention time does not meet the requirements

To sum up, if you do not consider the delay of the clock, you only needCare about build timeIf the latency of the clock is consideredPay more attention to the retention time. Next we will analyze how to improve the clock in the synchronization system in FPGA design.

1.2 How to Improve the working clock in the synchronization system

From the above analysis, we can see that the requirements for D2 establishment time t3 during synchronization systems are:

T-Tco-T2max> = T3

Therefore, it is easy to launch T> = T3 + TCO + T2max, where T3 is the establishment time tset of D2, and T2 is the delay of the combination logic. In a designBoth T3 and TCO are fixed values determined by the device.And only controllableWhen T2.So reduce T2 as much as possible to increase the system clock. In order to reduce T2, the following methods can be used in the design.

1.2.1 reduce latency by changing the cabling Mode

Taking the Altera device as an example, we can see many blocks in the timing closure floorplan in Quartus. We can split blocks by row and by column. Each block represents one lab, each lab contains 8 or 10 le. Their cabling latency relationships are as follows:In the same lab (fastest) <same column or same row <different rows and different columns. We add appropriate constraints to the synthesizer (Constraints should be appropriate. Generally, it is more appropriate to add 5% margin,For example, if the circuit works at 100 MHz, you can add the constraint to MHz. The effect of excessive constraints is not good, and the overall time is greatly increased)You can deploy the relevant logic closer to a point during cabling to reduce the cabling latency..

1.2.2 reduce latency (Assembly Line)

Generally, the synchronous circuit has more than the first-level lock memory (8), but must make the circuit work stably,The clock period must meet the maximum latency requirements.,Shorten the longest delay pathTo increase the frequency of the circuit. 7. We can break down a large combination logic into smaller parts and insert a trigger in the middle to increase the frequency of the circuit. This is also called"Assembly LineBasic Principles of pipelining technology.

For the upper part of figure 8, the clock frequency is subject to the latency of the second large combination logic, and the average allocation of the combination logic through appropriate methods, this avoids excessive latency between two triggers,Eliminate speed bottlenecks.

Figure 7 splitting and combination logic

Figure 8 transfer combination logic

So how to split the combination logic in the design? A better method should be accumulated in practice, but some good design ideas and methods should also be mastered. We know that most FPGA4-input LUT-basedIf the judgment condition for an output is greater than four inputs, multiple LUT-level connections are required.First-level combination logic latency, We need to reduce the combination logic,It is nothing more than entering as few conditions as possibleIn this way, less LUTs can be cascaded, thus reducing the latency caused by the combination logic.

The flow we hear at ordinary times is a way of passing throughCutting large combination logic(Insert a level-1 or multi-level D trigger to reduce the combined logic between registers.) To improve the working frequency.For example, if a 32-bit counter has a long carry chain, it will inevitably reduce the operating frequency. We can divide it into 4-bit and 8-bit counts, every time a four-digit counter is counted to 15, an eight-digit counter is triggered. This way, the counter is cut and the working frequency is increased.

In the state machineThe large counter must also be moved out of the state machine, because the counter is usually greater than 4 input. If it is used together with other conditions as the State jump criterion, yesAdd LUT CascadeTo increase the combination logic. Taking a counter with 6 inputs as an example, we originally wanted to change the status when the counter was counted as 111100. Now we willThe counter is placed out of the state machine. When the counter reaches 111011, an enable signal is generated to trigger the status jump.In this way, the combination logic is reduced.

A state machine generally contains three modules,

(1) An output module,

(2) A module that determines the next status

(3) A module that saves the current status.

The logic used to form the three modules is also different. The output module usually includes both the combination logic and the time sequence logic. The module that determines the next state is usually composed of the combination logic. The current State is usually saved by the time sequence logic. The relationship between the three modules is shown in Figure 9.

Figure 9 Composition of a State Machine

Generally, the state machine is written into three parts according to the three modules. Below is a good method for designing the state machine:

/*-----------------------------------------------------
This is FSM demo program
Design name: arbiter
File Name: arbiter2.v
-----------------------------------------------------*/
Module arbiter2 (
Clock, // clock
Reset, // active high, Syn Reset
Req_0, // request 0
Req_1, // request 1
Gnt_0,
Gnt_1
);
// ------------- Input ports -----------------------------
Input clock;
Input reset;
Input req_0;
Input req_1;
// ------------- Output ports ----------------------------
Output gnt_0;
Output gnt_1;
// ------------- Input ports data type -------------------
Wire clock;
Wire reset;
Wire req_0;
Wire req_1;
// ------------- Output ports data type ------------------
Reg gnt_0;
Reg gnt_1;
// ------------- Internal constants --------------------------
Parameter size = 3;
Parameter idle = 3'b001,
Gnt0 = 3 'b010,
Gnt1 = 3'b100;
// ------------- Internal variables ---------------------------
Reg [size-1: 0] State; // seq part of the FSM
Wire [size-1: 0] next_state; // combo part of FSM

// ---------- Code startes here ------------------------
Assign next_state = fsm_function (req_0, req_1 );
// ------------ Fsm_function --------------//
Function [size-1: 0] fsm_function;
Input req_0; // Parameter
Input req_1; // Parameter
Begin
Case (state)
Idle:
If (req_0 = 1 'b1)
Fsm_function = gnt0;
Else if (req_1 = 1 'b1)
Fsm_function = gnt1;
Else
Fsm_function = idle;
Gnt0:
If (req_0 = 1 'b1)
Fsm_function = gnt0;
Else
Fsm_function = idle;
Gnt1:
If (req_1 = 1 'b1)
Fsm_function = gnt1;
Else
Fsm_function = idle;
Default: fsm_function = idle;
Endcase
End
Endfunction

Always @ (posedge clock)
Begin
If (reset = 1 'b1)
State <= idle;
Else
State <= next_state;
End

// ---------- Output logic -----------------------------
Always @ (posedge clock)
Begin
If (reset = 1 'b1)
Begin
Gnt_0 <= #1 1' B0;
Gnt_1 <= #1 1' B0;
End
Else
Begin
Case (state)
Idle:
Begin
Gnt_0 <= #1 1' B0;
Gnt_1 <= #1 1' B0;
End
Gnt0:
Begin
Gnt_0 <= #1 'b1;
Gnt_1 <= #1 1' B0;
End
Gnt1:
Begin
Gnt_0 <= #1 1' B0;
Gnt_1 <= #1 1' B1;
End
Default:
Begin
Gnt_0 <= #1 1' B0;
Gnt_1 <= #1 1' B0;
End
Endcase
End
End // end of block output _

Endmodule

The state machine is usually written into three segments to avoid too many combinations of logic..

All of the above are available throughStreamline the combined LogicBut in some cases, it is difficult for us to cut the combination logic. What should we do in these cases?

The state machine is such an example. We cannot add a stream to the State decoding combination logic. If there are dozens of state machines in our design, its state decoding logic will be very

It's huge. There's no doubt, this is very likely to beKey paths in Design. What should we do?Old thinking,Reduce the combination logic. We can

Analyze the output of the state, reclassify them, and reclassify them based on thisDefined as a group of small state machines,Select Input(Case statement) and trigger the corresponding small state machine,Thus, the large state machine is cut into a small state machine.. In the ata6 specification (hard disk standard), there are about 20 types of input commands, and each command corresponds to many States. If a large state machine (State set) is used) it is unimaginable to do this. We can use the case statement to decode the command and trigger the corresponding state machine. In this way, the frequency of this module can be relatively high. (Nesting)

Conclusion: The essence of increasing the frequency of work isReduces latency from registers to registersThe most effective method isAvoid large combination logicThat is, try to satisfyFour input conditions,Reduces the number of LUT cascade operations. We can useMethod for adding constraints, flow, and cutting statusIncrease the working frequency.

Pay attention to the following points when designing the clock in FPGA:

1. Try to use only one clock for a module. One module here refers to a module or an entity. Cross-Clock Domain Design is involved in the design of multiple clock domains.It is better to have a dedicated module for clock domain isolation. In this way, the synthesizer can generate better results.

2. do not use a clock gate unless it is a low-power design. Instead, use the global clock resource bufg inside FPGA to control the clock speed of the trigger along the input end. Instead, use the combination logic and other time series logic (such as the divider) the signal generated as the trigger clock along the input http://www.cnblogs.com/crazybingo/archive/2010/12/08/1900388.html#) -- This increases design instability where the clock is usedUse the descent of the clock to take a shot and then output the signal to the clock phase..

3.Do not use the signal after the counter division to clock other modulesInstead, use the clock.Enabling MethodOtherwise, this clock is designed

The reliability is extremely unfavorable and greatly increased.Static timing analysisComplexity.

1.4 synchronization between different clock domains

When two modules in a design use two working clocks respectively, they work in asynchronous mode at their interfaces,In this case, two modules must be synchronized to ensure correct data processing.

The different clock domains here are usually the following two situations :(Discrete clock source)

1. The two clocks have different frequencies;

2. Although the two clocks have the same frequency, they are two independent clocks, and their phases are irrelevant.
The two figures are as follows:

Figure 10 the frequencies of the two clocks are completely different

Figure 11 the two clocks have the same frequency but the phase is irrelevant.

Data transmitted between two clock domains usually uses different Synchronization Methods Based on Different bit widths.

1. Single-Bit Synchronization and each pulse sent has at least one cycle width

This type of synchronization is mainly used for someControl signal self-synchronization. The common method is to use the output data in the receiving module.Two triggers use the system clock for two beats., As shown in Figure 12. Note the following points for such synchronization.

Figure 12Design of a single-digit synchronizator

(1) In Figure 12, the synchronous circuit is actually called "one-bit synchronization". It can only be used to synchronize one asynchronous signal, andThe signal width must be greater than the pulse width of the current clock.Otherwise, the asynchronous signal may not be obtained at all.

(2) Why is the synchronous circuit in Figure 1 only used to synchronize an asynchronous signal? (A) WhenTwo or more asynchronous signals(Control or address) simultaneously enters the current time domain to control the local time domain

If these signals are synchronized using the synchronous circuit shown in figure 12, the problem occurs.Connection DelayOrOther delaysTwo or moreAsynchronous signal (control or address) is generated

Skew, Then this skew is synchronized to the current time domain through the synx in figure 12, it will generateVery Large Skew or competition

, Resulting inThis time domain circuit has an error.

As shown in Figure 13:

Figure 13An error occurred while synchronizing multiple control signals.

(B) If yesAsynchronous Data BusTo enter the current time domain, the circuit in Figure 12 cannot be used, because the data changes are random. The width of 0 or the width of 1 is irrelevant to the time domain clock pulse, therefore, the circuit in Figure 12 may fail to obtain the correct data.

(3) Note that the second trigger does not avoid the occurrence of the sub-steady state.The circuit can prevent the spread of sub-steady state.. That is to say, once the first trigger has a sub-steady state (possibility exists), the sub-Steady State will not be transmitted to the circuit after the second trigger.

(4) The first-level trigger has a sub-steady state, and a recovery time is required to stabilize it, or to exit the sub-steady state. When the recovery time plus the second-level trigger creation time (more precise, but also minus the clock skew) is less than or equal to the clock cycle (this condition is easy to meet, generally, the two-level trigger should be as close as possible. There is no combination logic in the middle, and the skew of the clock is small.) The second-level trigger can stably sample and obtain stable and definite data, it prevents the spread of sub-steady state.

(5) ff2 samples the output of ff1. Of course, ff1 outputs and ff2 outputs. The delay is only one period. Note: The reason why the Sub-steady state is called the sub-steady state is that once ff1 enters, its output level may be variable and may be correct or wrong. So it must be noted that although this methodIt can prevent the spread of the sub-steady state, but it cannot ensure that the data after the two-level trigger is correct.Therefore, this type of circuit has a certain amount of error level data, so it is only applicable to a small number of places that are not sensitive to errors.For sensitive circuits, dual-port RAM or FIFO can be used..

2 input pulse is possibleLessA synchronous circuit with a clock cycle width (how is it possible? Isn't it smaller than the original clock ?)

Normally, 14 feedback circuit is used for the case of 2. The analysis of this circuit is as follows: assume that the input data is high, because the first trigger ff1 is high-level erasing, all outputs are also high-level, correct. If the input is low, ff1 is forcibly cleared. At this time, the output bit is zero. This ensures the correctness of the output.

Figure 14 the input pulse may be less than a Synchronization Circuit with a clock cycle width

For details about how to control multiple signals, refer to the detailed analysis: the comprehensive design of the multi-hour clock system in www.fpga.com.cnis so skillful.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More