Go FPGA asynchronous timing and multi-clock modules

Source: Internet
Author: User

Http://bbs.ednchina.com/BLOG_ARTICLE_3019907.HTM

Sixth Chapter Clock Domain

There is an interesting phenomenon that many digital designs, especially those related to FPGA design, have a particular emphasis on the best use of a unique clock domain for the entire design. In other words, only one independent network can drive a clock port for all the triggers in the design. While this simplifies timing analysis and reduces many problems associated with multiple clock domains, it is often unrealistic to use only one clock due to various system limitations outside of fpg**. FPGAs often need to exchange data between two different clock frequency systems, receive and send data between systems through multiple I/O interfaces, process asynchronous signals, and prototype a low-power ASIC with gated clocks. This chapter discusses the problems and solutions of multi-clock domain and asynchronous signal processing in FPGA design, and provides practical guidance.

The clock domain mentioned here and later in this chapter refers to a set of logic in which all synchronization units (triggers, synchronous ram blocks, and stream multipliers, etc.) use the same network as the clock. If all the triggers in the design use a global network, such as the master clock input of the FPGA, then we say that this design has only one clock domain. If the design has two input clocks, 6-1, one clock for the interface 1 to use, and another for the interface 2 to use, then we say that the design has two clock domain.

Figure 6-1: Dual-clock domain design

The gated clocks, derived clocks, and event-driven triggers that we encounter in our designs can all be classified as clock domain categories. As shown in 6-2, a new clock domain is created with a simple gated clock. We know that this kind of clock control is not respected in the FPGA design (the clock can be used instead of the clock gating), but it is very helpful for us to understand the concept of clock domain.

We will focus on the following topics in detail in this chapter:

L Transmission of signals between two different clock domains.

Generation of N-metastable and its influence on design reliability

N Avoid metastable by phase control

n transmits a single signal between the clock domains and signals two beats

n use FIFO to transfer multiple bits of data between clock domains

n using the partition Synchronizer module to improve the organizational structure of the design

L HANDLE the gated clock in the ASIC verification prototype

n Create a single-clock module

n Automatic door-control removal

Figure 6-2: Clock domain created by the gated clock

6.1 Cross-clock domains

The design contains multiple clock domains, the first thing to solve is to transfer the signal between different clock domains. Signal cross-clock domain transmission will be a major problem for the following reasons:

1, the signal across the clock domain transmission caused by the fault is always not easy to reproduce. If there are two asynchronous clock domains in the design, the fault is often related to the relative timing of the two clocks along. Clocks from an off-chip clock source are usually not associated with the actual device functionality.

2, according to the different technology, the problem is not the same. (although this is not always the case due to other factors) we often find that, if the constraint is less established and maintained, high-speed design techniques are statistically less prone to failure than low-speed design techniques. At the same time, other factors, such as the design of the synchronous device to achieve the output buffer, will also have a significant impact on a possible failure.

3, EDA tools generally do not detect and annotate such problems, the static Time series analysis tool is based on a separate clock region for time series analysis, and only in a specific way according to the specified requirements in order to perform the cross-clock domain timing analysis.

4, usually, if not well understood, cross-clock domain fault is difficult to detect and difficult to debug. Therefore, all cross-clock domain interfaces must be well defined and handled before any feature implementations are implemented.

Let's start by looking at what errors can be generated by transmitting signals between different clock domains. Consider the case shown in Figure 6-3, where a signal propagates between two clock domains.

As shown in 6-4, the low-speed clock is twice times longer than the high-speed clock cycle. The interval between the rising edge of the low-speed clock and the rising edge of the high-speed clock is constant and always equal to DC. Because of this phase-matching relationship between the two clocks, the DC always remains the same (assuming no drift), and in this case the DC is always greater than the sum of the logic delay and the trigger settling time of the high-speed clock driver.

Figure 6-3: Simple signal transmission between clock domains

Figure 6-4: Timing relationship between two clock domains

When these clocks start, there is a fixed phase relationship between them, so that any settling time and hold-time violations can be avoided. As long as the clock does not drift, no timing violations occur, and the device works as expected. Now let's consider the other case, the same clock on the phase relationship 6-5 after the power is shown.

Figure 6-5: Clock phase relationship that causes a timing violation

This phase relationship between the two clocks in figure 6-5 can cause timing problems. This happens between two clock domains of any frequency. However, if the clock frequency does not match, this timing problem will not occur in this case.

In conclusion, the clock synchronization problem is often a non-reproducible problem in FPGA design, and it has serious consequences for the reliability of the design. We'll talk about solutions to this sort of problem later, and before we do, we have to discuss what happens when we establish and maintain time violations. The next section is about this topic.

6.1. 1 metastable State

The build time and hold time of the trigger define a time window around the rising edge of the clock, and a timing violation occurs if the data in the trigger's data entry port changes (or data updates) within that time window. This timing violation occurs because the settling time requirement and the hold time requirement are violated, when a node inside the trigger (an internal node or to be output to an external node) may float within a voltage range and cannot stabilize in the logic 0 or logic 1 state. In other words, if the data is captured in the above window, the transistor in the trigger cannot be reliably set to the level of logic 0 or logic 1. So at this point the transistor is not in the saturation zone corresponding to the high or low level, but before stabilizing to a certain level, hovering in an intermediate level state (this middle level may be a correct value, perhaps not). As shown in 6-6, this is called metastable state.

Figure 6-6: Timing violations lead to metastable

As shown in the 6-6 waveform, the signal's transition occurs within the time window that establishes and maintains the boundary, which means that the output is not a deterministic level for logic 0 or logic 1, but rather an intermediate level between them. If the trigger contains an output buffer, the metastable itself can be called a spurious transition over the output as the internal signal is gradually stabilized. The time at which the output remains metastable is random and may even remain metastable throughout the entire clock cycle. Then, if the metastable value is input to the combinational logic, the wrong operation can occur according to the threshold of the logic gate circuit. From the angle of timing convergence, the combination logic delay between two triggers is required to be less than the minimum clock period, but the time of metastable steady signal is in itself a disguised increment of logic delay. It is clear that a metastable signal will cause a fatal functional failure in the design, and that the signal will not be able to capture consistent results on each clock edge.

In fact, it is important to note that in the FPGA design process, it is very difficult to use simulation to determine the harm of metastable to design. A pure digital emulator does not check for establishing and maintaining violations, thus simulating a logical "X" (unknown) value when a violation occurs. While the normal RTL simulation does not appear to establish and maintain the violation, so there will be no signal to appear metastable state. While gate-level emulation checks to establish and maintain a violation, it is still difficult to simulate a synchronization failure caused by two asynchronous signal alignment. It is especially difficult to design or verify that the engineer is not looking for a problem at the beginning of the design. It is important, then, to understand how to maintain the reliability of the design and how to avoid the need for simulation to uncover the synchronization problems of the design. There are many ways to solve the metastable state, and we will discuss them one after the other.

6.1. 2 Solution for metastable State Solution 1: Phase control

Considering such a design, the cycles of two clock domains are different, and the phase relationship is arbitrary. If at least one clock is controlled by a PLL or DLL inside the FPGA, and within the accuracy range of the PLL or DLL, one of the clocks is several times longer than the other clock cycle. As shown in 6-7, it is possible to avoid a violation by phase alignment.

Consider an example where a signal is passed from a low-speed clock domain into another clock domain, where the Zongvic cycle is half of the low-speed clock domain. According to the previous analysis, if there is no guarantee of a phase relationship, then the timing violation is likely to occur. Then, by using a DLL to derive this high-speed clock from a low-speed clock, phase alignment can be achieved.

In Figure 6-7, the DLL adjusts the phase of the high-speed clock (acquisition) to align the low-speed clock (transmit). The time that data passes between two clock domains is DC, and the delivery time is always at its maximum possible value. In this case, there will be no settling time violations as long as the propagation delay from the low-speed trigger to the high-velocity trigger is less than the high-speed clock cycle. If the hold time requirement cannot be met because the clock skew is not small enough, the signal can be collected by configuring the falling edge of the practical high-speed clock, provided that there is sufficient timing margin to ensure that settling time requirements are met.

Figure 6-7: Using the DLL to align the phase

In summary, the phase control technique can be several times more than one clock at a clock frequency and one of the clocks can be used by the FPGA internal PLL or DLL control.

In many cases, it is very extravagant to design the phase relationship between the control clock domain. In particular, timing requirements are imposed by chips outside the fpg**, or when there is no definite phase relationship between clock domains. For example, if the FPGA provides an interface between the two systems, and the timing requirements of the two systems on the chip input and output delay are very tense, it is not possible to adjust the clock phase of any of the two systems. Similar examples are often encountered in practice, so new methods need to be used to resolve them, and this new approach will be discussed in the next section.

6.1. 3 Solution metastable Solution 2: Two beat processing, that is, two beats stored

When transmitting single-bit signals across two asynchronous clock domains, a two-beat technique can be used. According to the discussion in the previous section, establishing or maintaining a time violation causes the level of the node on a trigger to hover in an intermediate state, resulting in a metastable problem, and the signal from this intermediate state to a stable state takes time, and the length of this time is unknown. This unknown time is added to the clock-to-output time (TCO) (which affects latency on subsequent paths) and causes a timing violation at the next level. If the signal is entered into a control branch or a decision tree, it will be very dangerous. Unfortunately, there is no good way to predict how long this metastable state will last, nor is there a good way to label this information to time series analysis tools and optimization tools. Assuming that two clock domains are completely asynchronous (i.e. they cannot achieve phase control), one of the simplest ways to avoid metastable as much as possible is to use a double trigger. In other perhaps textbooks, this method is also called synchronous bit, level two trigger, or level two Synchronizer.

In the configuration shown in Figure 6-8, the Synchronizer circuit (whose input is din) may produce a metastable state, but the signal has the opportunity to stabilize before it is latched to the second stage and seen by other logic, 6-9.

Figure 6-8: Two beat processing

Figure 6-9: Hitting a two-beat Synchronizer

In Figure 6-9, Dsync is the output of the first trigger in the Synchronizer, and Dout is the output of the second trigger. Dout essentially waits for the synchronized signal to pass down once it stabilizes, and ensures that the other circuitry does not receive the metastable signal. Do not add any logic between the Synchronizer level two triggers, which allows the signal to get back to a stable state for as long as possible. So in summary, the two-beat Synchronizer is used to resynchronize the single-bit signal to the asynchronous clock domain when the single-bit signal is transmitted across an asynchronous clock.

Theoretically, the output of the first trigger should always remain indeterminate metastable, but in reality it will be stabilized by a series of factors in the actual system. For example, imagine a ball that stops steadily on a mountain tip and pushes the ball in any direction, and it will tumble from the mountain in the opposite direction. Similarly, a logic gate in the metastable state, the random fluctuations caused by heat, radiation, etc., will induce the metastable state to return to the steady state of logic 0 or logic 1.

When using the two-beat technique to sample an asynchronous signal, it is not possible to fully predict the signal jumps we want, whether it will occur at the current clock or the next clock. This two-beat technique is not helpful when the signal is part of a data bus (some data bits are a clock cycle late than other bits), or critical data must be accurate to a single clock cycle. However, for control signals, this technique is useful if they can tolerate changes in one or more clock cycles.

For example, an external event controls a bit to trigger an internal FPGA action, which can occur at a very low frequency, such as a microsecond or even millisecond interval between two events. In this example, some extra nanosecond delays do not affect the behavior of the event. If an external event-driven bit is entered into the control structure of a state machine, and two beats are handled by the Synchronizer, then the desired signal change is only delayed by a clock cycle. However, if two beats are not dealt with, then the decision logic may decode different state jumps from the metastable state of the asynchronous signal and make the state machine jump to different branches at the same time.

In addition to the pure digital system, there is a hybrid signaling system that typically generates asynchronous feedback signals to fpga,6-10.

Figure 6-10: Re-sync analog feedback

The Verilog code for the Synchronizer that beats two beats to the asynchronous signal is as follows:

Module Analog_interface (

...

Output REGFBR2,

Input feedback);

Reg FBR1;

Always @ (Posedge CLK) begin

fbr1<=feedback;

fbr2<=fbr1;//;d Oubleflop

End

...

The feedback signal generates a timing violation, and the FBR1 is in the metastable state for an indeterminate period of time along the clock. Then, the other logic can only use the signal fbr2.

It is important to specify timing constraints when using the two-beat synchronous processing technique, which is to specify the signal path between the first and second register clock domains as a false path, that is, to let the timing analyzer part of this path. Since the two-beat Synchronizer structure is used to resynchronize the signal, there is no synchronization path to analyze between the two clock domains. Also, as mentioned earlier, the timing between the two triggers should be as small as possible, thus reducing the likelihood of metastable being propagated to the second level of the trigger.

6.1. 4 Solution for metastable State Scenario 3: Using FIFO structure

The most common way to transfer data across a clock domain is to use a first-in, first-out (or FIFO) structure. FIFO can be used to transmit multiple bits of signal between two asynchronous clock domains. The FIFO applications we typically see include transferring data between two standard buses, and reading or writing data from a burst-accessible memory. For example, 6-11 shows an interface between a burst access memory and a PCI bus.

Figure 6-11:fifo in PCI applications

FIFO is a very useful data structure in many different applications, but here we only focus on its ability to handle burst data across clock domains.

FIFO is very similar to the checkout channel in the supermarket, each customer arrived at the checkout desk at a random amount of time, the checkout speed in a certain sense is uniform. Sometimes the checkout customer may be very few, and some other time will burst a lot of customers need to checkout, the cashier can not immediately for each customer service, so need to queue. In abstract terms, we call this line of data a sequence. The payee then serves each customer at a more or less average rate, ignoring the length of the queue. This collection structure cannot be supported if customers who need to check out are rushing into the checkout faster than the teller's service. At this point, steps need to be taken to speed up the rate of service of the payee or to reduce the number of new customers.

The same is true in data transmission, where the interval between a given clock domain and a certain time period is completely random, and sometimes a large burst of data may be encountered. In this case, the receiving device in another clock domain can only process the data at a specified rate. As shown in 6-12, a FIFO is used to cache data so that a data sequence is formed on the device.

Figure 6-12: Asynchronous FIFO

By using an asynchronous FIFO, the data sender can send data at random intervals, and the receiving end can take the data out of the data sequence and process it with its inherent bandwidth. Because any data sequence implemented by FIFO cannot be unlimited in length, some control is required to prevent FIFO overflow. At this point, there are two options to use:

L Pre-defined transmit rate (can burst or not burst), minimum receive rate and corresponding maximum sequence size.

L Handshake Control.

Note that the clock frequency of the sending device is not necessarily higher than the receiving end device, otherwise it is prone to overflow. Data is fed to the FIFO at a slower frequency, then the number of clock cycles that the data is written to the FIFO is less than the number of clock cycles that the receiving end will process the data. Then, if you do not take a handshake control, you must understand the worst case scenario where the above description will produce an overflow.

At any given time, it is very easy to make the system unsustainable if the rate at which the data is sent to the write FIFO is greater than the rate at which the processing data is received. Because no storage device can hold unlimited data, this problem needs to be resolved at the system architecture level. Typically, bursts occur in small, cyclical, or non-cyclical situations. So the maximum size of the FIFO is greater than or equal to the burst size (depending on the properties of the data receiver).

In many cases, no matter the burst size or the allocation of data arrival, it is not well defined. At such times, it is necessary to use a handshake control to prevent a FIFO from generating a data overflow. As shown in 6-13, this handshake control is usually achieved by some sign signals. These sign signals, one is the send side of the full flag, to indicate that the FIFO does not have extra space to store data, and the other is a null flag, used to prompt the receiving side, the FIFO has no data to be processed. Managing these handshake signals may also require a state machine, as shown in 6-13.

Figure 6-13:fifo Handshake Control

FIFO is typically achieved by encapsulating a two-port RAM in the FPGA. Seemingly insignificant signs such as empty and full instructions are actually more difficult to achieve. The reason is that input control often needs to be produced on the basis of output, and the same output control often needs to be generated based on input. For example, the logic driving the input must know whether the FIFO is full, which can only be learned by obtaining the amount of data read from the output. Similarly, the logic of reading data from the FIFO on the output side must be understood whether there is data in the FIFO (that is, whether the FIFO is empty), which can only be judged by the input port's write pointer.

Here we explore the use of FIFO to transfer data between two asynchronous clock domains, but also face the problem of handshake flags encountered when implementing the FIFO itself. In order to pass the necessary signals between the two clock domains, we must return to the two beat technique discussed in the previous section. The following is an example of a simple asynchronous FIFO block diagram shown in Figure 6-14.

Figure 6-14: Simple block diagram of asynchronous FIFO

In Figure 6-14, when an empty and full signal is generated, both the write address and the read address must be passed asynchronously to the opposing clock domain. In this way, when re-synchronizing multi-bit address bus, the problem is that depending on the different bits of the line, some bits in the bus may be a clock cycle later than the other bits. In other words, due to the natural nature of the two clock domain async, some bits of address bus are collected on one clock edge, while others are collected on the next clock edge, depending on whether the data is valid long enough before the clock of the first trigger arrives. If this occurs, then the system will have serious consequences, because some bits in the binary address change some bits are not, so the receiving logic will get a completely invalid address, the address is neither the current address nor the previous address.

This problem can be solved by converting the binary address to Gray code. Gray code is a very special counter, and only one bit in two adjacent addresses is different. So when the address changes, just change the address of a bit, so you can avoid the above mentioned problem. If the changed bit is not properly captured by the next clock, the address line will "synchronize" to retain the old address value. Any incorrect address (that is, neither the current address nor the old address) is eliminated. So, in summary, gray code is commonly used to pass multi-bit count values between asynchronous clock domains, and it is more useful in FIFO.

It is important to note that since only the read-write address needs to be passed between asynchronous clock domains, the address is likely to be a clock cycle later than expected, and also means that the empty or full flag is set late for a clock cycle, but this does not mean that the error caused the data overflow condition. If this happens when the address is passed to the read-clock domain, the read logic will simply assume that the data is not written and that the FIFO is empty, even though the FIFO has been written to a data. This will only have a small impact on the overall throughput rate, but will not cause an underflow (that is, a read-empty FIFO) condition to occur. Similarly, when the address is passed to the write clock domain, if the read address is delayed, then the write logic will assume that there is no extra space in the FIFO, even though the FIFO is not full at this time. This also only has a slight effect on the overall data throughput rate, but does not cause overflow (write full FIFO) to occur.

FIFO is a common enough module, and most FPGA vendors provide tools that allow customers to automatically generate soft cores based on their requirements. These user FIFO can be manually instantiated in the design by the user, as with other IP modules. Then, when using your own FIFO in an FPGA design, the issues discussed above will probably not need to be addressed by the design itself. Of course, the same problem often occurs when data is passed between asynchronous clock domains, so understanding such design practices is important for a high-level FPGA designer.

6.1. 5 Designing the Partition Synchronizer module

It is a good design practice to divide the design partition in the top layer for the design, so that any function module outside contains a separate Synchronizer module. This facilitates the realization of the so-called ideal clock domain based on the partitioning module (i.e., the entire design module has only one clock), as shown in 6-15.

Figure 6-15: Design the partition Synchronizer module

There are many reasons for partitioning the design. First, timing analysis for each individual function module is easy, as the modules are fully synchronous designs. Second, the timing exceptions in the entire synchronization module are easily defined. Thirdly, the Synchronizer plus timing exception of the lower-level module has greatly reduced the omission caused by human error when it is placed on the top of the design. Therefore, the synchronization register should be partitioned separately outside the function module. There are many similar design practices that are applied when using FPGAs as design prototypes for Asics, and we'll discuss them in more detail in the next section.

Gated clock in prototype design of 6.2 ASIC

ASIC design is generally very sensitive to power consumption, while the ASIC clock tree design is very flexible, so the entire design will often use the gating clock in the logic does not need to come and go to enable these logic. Although the use of FPGA as a prototype of the ASIC can simulate the entire logic function, but some of the physical properties of the two, such as power, or not quite the same. It is not necessary, then, to require the FPGA to emulate the entire low-power optimization of the ASIC. In fact, it is because of the extensive FPGA clock resources, it is not possible to simulate this aspect of the function. In this section we will discuss some ways to solve this problem, and then discuss some techniques that can be applied to ASIC design to make FPGA prototyping easier. For the gated clock, it is easy to refer to the third chapter in more detail.

6.2. 1: Module

If a large number of gated clocks are used in an ASIC design, it is recommended that all of these gating operations be uniformly placed in a dedicated clock generation module and isolated from the function module, as shown in 6-16.

Figure 6-16: Unified Clock Module

By placing the clock gating in a single module, it can be simpler to constrain processing and easier to make any modifications to the FPGA prototype. For example, if the designer chooses to delete all the gating units at a compile time, then a single module can be easily implemented. We'll discuss this in more detail in the next section.

6.2. 2: Gated Removal

There are many ways to remove clock gating from an FPGA prototype, and the following example shows an obvious, but cumbersome, approach. The code for this example is shown below, which removes all gating functions from the FPGA prototype.

' Define FPGA

' Define ASIC

Module Clocks_block (...)

' Ifdef ASIC

Assign clock_domain_1=system_clock_1&clock_enable_1;

' Else

Assign Clock_domain_1=system_clock_1;

' EndIf

If the above code requires an open clock gating, then only the macro definition needs to be modified in the FPGA prototype design. The disadvantage is that whenever you want to convert an FPGA prototype into an ASIC design, you always need to make some modifications (in fact, modify the macro definition). Many designers feel uncomfortable about this because they don't think they're using the same RTL. A better approach would be to use an automatic gating removal tool to eliminate any possible mistakes that might be considered. Many modern integrated tools now offer this functionality through the right constraints. For example, Synplify has a "Fix gated clocks" option, which is used to automatically remove the door from the clock line and move it to the data path. Let's look at the following code example:

Module Clockstest (

Output Reg ODat,

Input iclk,ienable,

Input iDat);

Wire gated_clock=iclk&ienable;

Always @ (Posedge gated_clock)

odat<=idat;

Endmodule

In the above code, the system clock is generated by a enable signal gating to generate a gated clock. This gated clock is used to drive trigger Odat, while Odat is used for register input idat. If the fixing the clock gating option is not enabled, then the integrated tool will implement the logic function directly, as shown in 6-17.

Figure 6-17: Direct Clock gating

In the logic implementation of Figure 6-17, a gated operation is placed on the clock line. There are now two clock domains in the design, and they must be constrained separately, and they must be laid out to the clock resource, respectively. However, if you start the clock gating Delete, the logic gate is easily moved to the data path, as shown in 6-18.

Figure 6-18: Clock gating removal

Now the logic unit in most logic devices provides a clock enable input, so that the enable input can not use this scenario. However, if a particular technology does not provide a trigger clock enable, then only this technique can be used to remove the clock gating, but this will increase the latency on the data path.

6.3 Summary of Points

L clock synchronization problems are often non-reproducible and provide reliability issues for FPGA designs.

The metastable state will cause a catastrophic failure of the FPGA.

L Phase control technology is used when one clock frequency is another and one of the clocks can be controlled by an internal PLL or DLL.

L-beat two Beats technology can be used to synchronize single-bit signals between asynchronous clock domains.

L in the two beat Synchronizer, time series analysis should ignore the first trigger and ensure that the delay between the two synchronization triggers is minimized.

The L FIFO is used to pass a multi-bit signal between two asynchronous clock domains.

The L Gray code is used to pass the count value data between two asynchronous clock domains and is used more internally in the FIFO.

The L Synchronous register should be partitioned independently of the function module.

L if possible, try not to use clock gating. If necessary, place all the gating clocks in a dedicated clock module and isolate them from other function modules.

Go FPGA asynchronous timing and multi-clock modules

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.