Discussion on the factors affecting the clock in FPGA design "turn"

Source: Internet
Author: User
Tags case statement fsm

Crazy Bingolearn to walk first before your want to run ... Discussion on the factors affecting the clock in FPGA design

Http://www.fpga.com.cn/advance/skill/speed.htm

Http://www.fpga.com.cn/advance/skill/design_skill3.htm

The clock is the most important and special signal of the whole circuit, the movement of most of the devices in the system is on the hopping edge of the clock, which requires the clock signal PMD to be very small, otherwise it may cause the timing logic state error, thus defining the factors that determine the system clock in the FPGA design, Minimizing the delay of the clock is very important to ensure the stability of the design.

1.1 Build time and hold time

The settling time (tsu:set up times) refers to the time required for the data to stabilize before the clock comes along, and if the established time does not meet the requirements then the data will not be able to rise along the clock in the steady break-in trigger;

Th:hold time is the time the data is stabilized, and if the hold time does not meet the requirements, the data can not be stabilized into the trigger. Establish and hold the time as simple as 1 shown.

Figure 1 Holding time vs. settling time

In the same module of FPGA design often contains combinatorial logic and timing logic , in order to ensure that the data at these logical interfaces can be processed stably, it is important to establish a clear concept of settling time and hold time. Here are some questions about the concept of settling time and holding time.

Figure 2 A basic model in the synchronization design

Figure 2 is a basic model of a synchronous design in which a clock is used uniformly. In the figure

Tco: is the delay of the data output of the trigger;

Tdelay: is the delay of combinatorial logic;

Tsetup: is the setting time of the trigger;

TPD: The delay of the clock (negligible).

T: for clock cycles

T3: D2 Settling Time

T4: D2 hold Time

If the first trigger D1 has a maximum settling time of T1max and a minimum of t1min, the combined logic has a maximum delay of t2max and a minimum of t2min. Ask the second trigger D2 the setup time T3 and the hold time T4 should meet what conditions, or know T3 and T4 so the maximum allowable clock period is how much. This problem is the problem that must be considered in the design, only to understand this problem can ensure that the design of the combined logic delay is satisfied with the requirements.

The following is analyzed by Time series diagram: The input of the first trigger is D1, the output is Q1 , the input of the second trigger is D2, and the output is Q2;

Clock unification is sampled on the rising edge, in order to facilitate analysis we discuss two situations namely first: Assume that the clock delay TPD is zero, in fact, this situation in the FPGA design is often satisfied, because in the FPGA design is generally used in the unified system clock , That is, using a clock that is entered from the global clock pin so that the delay in the internal clock is completely negligible. In this case, there is no need to consider the retention time, because each data is to maintain a clock beat while there are line delay, that is, the clock is based on the delay is much less than the data delay basis, so the retention time can meet the requirements, the focus is to care about settling time, At this point, if the settling time of the D2 satisfies the requirement, then the sequence diagram should be 3.

You can see if:

T-tco-tdelay>t3

That is: tdelay< t-tco-t3 (the signal can be reached D2 from the combined logic D1 during the D2 settling time, i.e. before the second CLK is established, the data is already in TsUP)

Then it satisfies the requirement of settling time, where T is the cycle of the clock, in which case the second trigger can be stably picked up to D2 on the rising edge of the second clock, as shown in Figure 3.

{D1 = build time = hold time = Trigger data output delay = combination logic Delay = = D2 = ...}

Figure 3 Timing diagram that meets the requirements

If the delay of the combined logic is too large,

T-tco-tdelay < T3 (Tcox<d2 settling time)

Then the request will not be met, and the second trigger will be picked up on the rising edge of the second clock as an indeterminate state, as shown in 4. Then the circuit will not work properly.

Figure 4 Delay too large timing of combinatorial logic does not meet the requirements

So you can launch

T-tco-t2max>=t3

This is also the time required for the D2 to be established.

It can also be seen from the time series diagram above that the settling time and the hold time of the D2 are not related to the establishment and retention of the D1, but only with the combination logic of the D2 and the D1 data transmission delay , which is also a very important conclusion. It is indicated that the delay has no superposition effect .

In the second case, if there is a delay in the clock, it is necessary to consider holding the time, and also to consider settling time. The clock has a large delay is the use of asynchronous clock design method , this method is more difficult to ensure the synchronization of data, so the actual design is seldom used . At this point, if the settling time and the hold time are satisfied, then the output sequence 5 is shown.

Figure 5: There is a delay but satisfies the timing

It is easy to see from Figure 5 that the TPD is relaxed for settling time, so the settling time of the D2 needs to meet the requirements:

Tpd+t-tco-t2max>=t3 (T3 is D2 settling time, T2max combined logic maximum delay, TPD is clock delay)

Since settling time and holding time and is a stable clock cycle (T), if the clock has a delay, while the data delay is also small then the settling time must be increased, the retention time will be reduced, if reduced to not meet the D2 hold time requirements can not collect the correct data, As shown in 6.

This is T (tpd-tco-t2min)

T (tpd+t-tco-t2min) >=T4 i.e. tco+t2min-tpd>=t4 (D2 hold time)

From the above can also be seen if the tpd=0 is the clock delay of 0 is also required tco+t2min>t4, but in the actual application due to the delay of the T2 is the delay of the line is far greater than the trigger holding time is T4 so unnecessary relationship hold time.

Figure 6: There is a delay and the hold time does not meet the requirements

To sum up, if you do not consider the clock delay then only need to care about settling time , if you consider the clock delay so more care to maintain time . The following is an analysis of how to improve the working clocks in a synchronous system in FPGA design.

1.2 How to improve the working clock in the synchronization system

From the above analysis can be seen in the synchronization system when the D2 settling time T3 requirements are:

T-tco-t2max>=t3

So it is easy to roll out T>=t3+tco+t2max, where T3 is the D2 settling time tset,t2 for the combination logic delay. In a design T3 and TCO are determined by the device fixed value , controllable also only T2 the time delay of the input combination logic , so by minimizing the T2 can improve the system's working clock. In order to achieve the reduction of T2 in the design can be used in the following different methods of synthesis to achieve.

1.2.1 reduces the delay by changing the way the line is changed.

As an example of Altera's device, we can see a lot of resell in timing closure floorplan in quartus, we can divide resell by row and column, each bar represents 1 labs, each lab has 8 or 10 le. The relationship between their route delay is as follows: in the same lab (fastest) < same column or with different rows < different columns . We add the appropriate constraints to the integrated device ( binding to moderate, generally to add 5% margin more appropriate, such as the circuit work in 100Mhz, then add the constraint to 105Mhz can be, too large restraint effect is not good, and greatly increase the comprehensive time) The related logic can be routed as close as possible to a point, thereby reducing the time delay of the traces.

1.2.2 Reduction of delay ( pipelining) by splitting the combined logic

As the general synchronization circuit is not only a primary latch (8), and to make the circuit stable operation, the clock cycle must meet the maximum delay requirements , shorten the longest delay path , can improve the operating frequency of the circuit. 7: We can decompose the larger combinational logic into smaller pieces, the middle insert trigger, which can increase the operating frequency of the circuit. This is also the basic principle of the so-called " pipelining " (pipelining) technology.

For the upper part of Figure 8, its clock frequency is constrained by the delay of the second large combinational logic, and by means of the appropriate method, the combined logic is distributed evenly, which avoids the excessive delay between the two triggers and eliminates the speed bottleneck .

Figure 7 Split Combinatorial logic

Figure 8 Transfer Combinatorial logic

Then how to split the combinatorial logic in the design, the better way to accumulate in the practice, but some good design ideas and methods also need to master. We know that most of the FPGA is based on the 4 input Lut , if an output corresponding to the judgment condition greater than four input is to be completed by multiple LUT cascade, so that the introduction of a first-class combinatorial logic delay , we want to reduce the combination of logic, Nothing more than to enter the conditions as little as possible , so that can cascade the LUT less, thus reducing the time delay caused by the combinatorial logic.

The water we hear is a way to increase the frequency of work by cutting large combinations of logic (where a primary or multi-level D trigger is inserted to reduce the combination logic between registers and registers ). such as a 32-bit counter, the carry chain of the counter is very long, it will inevitably reduce the operating frequency, we can split it into 4-bit and 8-bit count, whenever the 4-bit counter counts to 15 after the triggering of a 8-bit counter, so that the counter cut, but also improve the operating frequency.

  In the state machine, generally also to move the large counter outside the state machine, because the counter this thing is usually more than 4 input, if the other conditions together as a state of the jumping criterion, will inevitably increase the level of Lut Cascade, thereby increasing the combination of logic. As an example of a 6 input counter, we had hoped that when the counter counted to 111100 the state jump, now we put the counter outside the state machine, when the counter is counted to 111011 to generate an enable signal to trigger the state jump , so that the combinatorial logic is reduced.

A state machine typically contains three modules,

(1) An output module,

(2) A module that determines what the next state is

(3) A module that holds the current state.

The logic used to make up the three modules is also different. The output module usually consists of both combinational logic and sequential logic, and the module that determines what the next state is is usually composed of combinatorial logic; The current state is usually composed of sequential logic. The relationship of three modules is shown in 9.

Figure 9 The composition of the state machine

All commonly written state machines also follow these three modules to separate the state machine into three parts to write, such as the following is a good state machine design method:

/*-----------------------------------------------------
This was FSM demo program
Design Name:arbiter
File NAME:ARBITER2.V
-----------------------------------------------------*/
Module Arbiter2 (
Clock,//clock
Reset,//Active high, SYN reset
REQ_0,//Request 0
Req_1,//Request 1
GNT_0,
Gnt_1
);
-------------Input Ports-----------------------------
Input clock;
Input reset;
Input req_0;
Input req_1;
-------------Output Ports----------------------------
Output gnt_0;
Output gnt_1;
-------------Input ports Data Type-------------------
Wire clock;
Wire reset;
Wire req_0;
Wire req_1;
-------------Output Ports Data Type------------------
Reg GNT_0;
Reg Gnt_1;
-------------Internal Constants--------------------------
Parameter SIZE = 3;
Parameter IDLE = 3 ' b001,
GNT0 = 3 ' b010,
GNT1 = 3 ' b100;
-------------Internal Variables---------------------------
reg [size-1:0] state; Seq part of the FSM
Wire [size-1:0] next_state; Combo part of FSM

----------Code startes here------------------------
Assign next_state = Fsm_function (Req_0, req_1);
------------fsm_function--------------//
function [size-1:0] fsm_function;
Input req_0; Parameter
Input req_1; Parameter
Begin
Case (state)
IDLE:
if (req_0 = = 1 ' B1)
Fsm_function = GNT0;
else if (req_1 = = 1 ' B1)
Fsm_function = GNT1;
Else
Fsm_function = IDLE;
GNT0:
if (req_0 = = 1 ' B1)
Fsm_function = GNT0;
Else
Fsm_function = IDLE;
GNT1:
if (req_1 = = 1 ' B1)
Fsm_function = GNT1;
Else
Fsm_function =idle;
Default:fsm_function = IDLE;
Endcase
End
Endfunction

[Email protected] (Posedge clock)
Begin
if (reset = = 1 ' B1)
State <= IDLE;
Else
State <= Next_State;
End

----------Output Logic-----------------------------
Always @ (Posedge clock)
Begin
if (reset = = 1 ' B1)
Begin
Gnt_0 <= #1 1 ' b0;
Gnt_1 <= #1 1 ' b0;
End
Else
Begin
Case (state)
IDLE:
Begin
Gnt_0 <= #1 1 ' b0;
Gnt_1 <= #1 1 ' b0;
End
GNT0:
Begin
Gnt_0 <= #1 1 ' B1;
Gnt_1 <= #1 1 ' b0;
End
GNT1:
Begin
Gnt_0 <= #1 1 ' b0;
Gnt_1 <= #1 1 ' B1;
End
Default:
Begin
Gnt_0 <= #1 1 ' b0;
Gnt_1 <= #1 1 ' b0;
End
Endcase
End
End//End of Block output_

Endmodule

  state machines are usually written in 3-segment form, thus avoiding excessive combinatorial logic .

It is said that the combination of logic can be cut through the flow of the situation, but in some cases we are difficult to cut the combination of logic, in these cases we should do?

State machine is such an example, we can not go to state decoding the combination of logic to add water. If our design has a state machine of dozens of states, its state decoding logic will be very

Huge, no doubt, this is probably the key path in the design . So what do we do? still the old idea, reduce the combinatorial logic . We can do the right-like

The output of the state is analyzed, reclassified, and redefined as a group of small State machines , by selecting the input (case statement) and triggering the corresponding small state machine, which enables the large state machine to be cut into a small state machine. . In the specification of the ATA6 (hard disk standard), the input command about 200 kinds, each command also corresponds to a lot of states, if with a large state machine (state set state) to do that is unthinkable, we can through the case statement to the command decoding, and trigger the corresponding state machine, so do down The frequency of this one module can be run relatively high. ( nested )

Summary: To improve the nature of the operating frequency is to reduce the register to register the delay , the most effective way is to avoid the emergence of large combinatorial logic , that is, to try to meet the conditions of four input , reduce the number of LUT cascade . We can increase the working frequency by means of constraint, flowing water and cutting state .

There are several points to note when designing your clock in an FPGA:

1. A module tries to use only one clock, and one of the modules here refers to either a module or an entity. In the design of multi-clock domain involved in the design of cross-clock domain, it is better to have a special module to do clock domain isolation . This allows the synthesis to synthesize better results.

2. Unless it is a low-power design, do not use a gated clock (without the FPGA internal clock resource BUFG to control the trigger clock along the input, but with the combination logic and other timing logic (such as a divider) generated by the signal as a trigger clock along the input port/HTTP/ www.cnblogs.com/crazybingo/archive/2010/12/08/1900388.html#)-This increases the design instability, where the gating clock is used, and the gate is used to control the signal with the falling edge of the clock Make a beat and then output the clock phase.

3. do not use the counter-divided signal to do other modules of the clock , and to be used to change the clock to enable the way, otherwise this clock flying way to design

Also greatly increases the complexity of static timing analysis .

1.4 synchronization between different clock domains

When a design of two modules with two working clocks, then at their interface to work in the asynchronous mode, in order to ensure that the data can be processed correctly then the two modules will be synchronized.

The different clock domains here are usually the following two cases: ( discrete clock source )

1, two clock frequency is different;

2, although the frequency of the two clocks is the same, but they are two independent clocks, its phase has no relationship.
As shown in the following two graphs:

Figure 102 clocks are quite different in frequency

Figure 112 clocks are the same frequency, but phase is irrelevant

  The data transmitted between the two clock domains is usually synchronized in different ways depending on the bit width.

1, the synchronization between single bit and each pulse sent at least 1 cycle width of the case

This type of synchronization is mainly used for some control signals of their own synchronization . The usual approach is to use the output data in the received module using two triggers using the system clock two beats , as shown in 12. The following points need to be explained for this synchronization.

Figure One Synchronizer design

(1) The synchronization circuit in Figure 12 is actually called "a Synchronizer", it can only be used to synchronize an asynchronous signal, and the width of the signal must be greater than the pulse width of the current level clock , otherwise it may not be able to pick up the asynchronous signal.

(2) Why the synchronous circuit in figure one can only be used to synchronize an asynchronous signal? (a) When two or more asynchronous signals (control or address) are simultaneously entered into the time domain to control the

Circuit, if these signals are synchronized with the synchronization circuit shown in Figure 12, there is a problem, because the connection delay or other delay causes two or more asynchronous signals (control or address) between

skew, then this skew after synchronizing into this time domain by the Synchronizer of Figure 12, will produce very big skew or produce the competition

, resulting in an error in this time domain circuit .

The problem arises as shown in 13:

Fig. error when synchronizing multiple control signals

(b) If the asynchronous data bus to enter the time domain, the same can not be used in Figure 12 circuit, because the data changes are very random, its width of 0 or 1 of the width and the time domain clock pulse Independent, so the circuit of Figure 12 may not be able to get the correct data.

(3) Note that the second trigger is not to avoid "metastable", specifically, the circuit can prevent metastable propagation . In other words, once the first trigger has a metastable (possibility exists), because of the second trigger, the metastable state will not propagate to the circuit after the second trigger.

(4) The first level trigger has a metastable state, which requires a recovery time to stabilize, or to exit the metastable. When the recovery time plus the second level of the trigger's settling time (more precise, minus clock skew) is less than or equal to the clock period (this condition is very easy to meet, generally require two-level trigger as close as possible, without any combination of logic, clock skew smaller), The second level trigger can stabilize the sampling, obtain the stable and definite data, and prevent the metastable propagation.

(5) FF2 is sampled FF1 output, of course, FF1 output what, FF2 output what. Only 1 cycles were delayed. Note that the metastable state is called metastable, which means that once FF1 enters, its output level is variable and may be correct or wrong. It must be stated that although this method prevents metastable propagation, it does not guarantee that the data after the two-level trigger is correct , so the circuit has a certain number of error level data, so it is only suitable for a small amount of error-insensitive places. for sensitive circuits, a dual-port RAM or FIFO can be used.

2 input pulse may be less than a clock cycle width of the synchronous circuit (how can it?) Isn't it smaller than the original clock? )

A feedback circuit such as 14 is usually used for 2 of cases. The analysis of the circuit is as follows: Assuming that the input data is high, then because the first trigger FF1 is high-level clear, all outputs are also high, using the correct one. If the input is low then the FF1 is forced to clear zero, this time the output bit zero. This guarantees the correctness of the output.

Figure 14 input pulse may be less than one clock cycle width of the synchronous circuit

For situations where multiple signals are to be controlled, you can refer to the detailed analysis: Designing asynchronous multi-clock systems in www.fpga.com.cn and describing techniques. pdf.

Discussion on the factors affecting the clock in FPGA design "turn"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.