An example of FPGA timing problems-Protection of asynchronous interfaces and Burr-sensitive circuits

Source: Internet
Author: User
Tags ide hard drive




I. Introduction to problematic asynchronous Interfaces Riple

It is the sequence diagram of accessing the IDE Hard Drive Device by using MDMA on the host (PC). FPGA is used to design the interface on the device. Riple

The Dior-/diow-signal is driven by the host. The rising edge of the "read/write" data on the "Dior-/diow-" signal is sampled by "host/FPGA. Note that the Dior-/diow-signal is represented by a negative logic, and the falling edge corresponds to the rising edge of the physical signal in the figure. Riple

In the figure, TD = 70ns, TK = 25ns, TG and Th are 20ns and 10ns respectively. Riple

This interface seems simple. When reading data, it is driven by the FPGA control signal Dior-to the host and the chip-to-chip signal decoding, and is implemented using the combination logic. The written data is sampled by FPGA on the diow-rising edge, used for data input. The timing problem occurs on the Data Writing interface. Riple

The Data Writing interface is processed as follows: Riple

    1. The bus data is sampled by FPGA on the diow-rising edge, and the FPGA uses the register Group triggered by the diow-rising edge to store the sampling data. This is an Asynchronous Operation relative to the local clock. Riple
    2. A 50 MHz clock is used for two-level diow-signal synchronization. After synchronization, the synchronous pulse corresponding to the rising edge is extracted. This is an asynchronous signal synchronization signal change operation. Riple
    3. When the diow-rising edge synchronization pulse is valid, the temporary data is written into the synchronization FIFO. This is a synchronization operation. Riple

In this way, the data originally synchronized with the 50 MHz clock in the FPGA chip is saved for temporary storage and then written. Riple


Ii. Performance of Time Series Problems Riple

This interface is characterized by a data error, that is, data transmission verification for one consecutive sector using the "write first and read later" method may result in inconsistent read/write data. Riple

By checking the data stored on the media when an error occurs, it is found that the data error occurs in the write phase, and there is no problem in the read phase. Riple

This data error has three notable features: Riple

    1. Data with an incremental step of 2 appears in the originally continuous incremental data sequence with a step of 1. Riple
    2. After an error occurs, the data sequence becomes normal. Riple

For example, the data sequence (1) is originally 15, 16, 17, and 18. After an error occurs, it becomes a data sequence (2) of 15, 17, 17, and 18 ). Riple

There is a problem here. It is also data duplication. Can we think that the error pattern of sequence (2) is equivalent to the error pattern of the data sequence (3) of 15, 16, 16, and 18? No. Riple

In fact, sequence (3) and sequence (2) are completely different. The error pattern of sequence (3) is that the existing data 16 overwrites the expected data 17, resulting in inconsistency of the Data 16, 16, and 18; sequence (2) the error mode is that the expected data 17 overwrites the existing data 16, resulting in inconsistency of the Data 15, 17, and 17. Riple

Therefore, the sequence (2) mode is the third major feature of this time series problem. Riple

Because I initially thought that the error pattern is of the sequence (3) type, I found the wrong direction and failed to find the cause of the problem. Riple


Iii. Timing problem locating Riple

One of my colleagues found and pointed out the difference between sequence (2) and sequence (3) Error modes, and pointed out the contradiction: During data writing, only the data to be written by "prediction" in the interface circuit will be in the error mode of sequence (2), and "prediction" cannot appear in the digital circuit. Riple

On this basis, I have repeatedly thought about the reasons behind the "foresight" phenomenon. Riple

First, "foresight" is impossible. There must be a reasonable reason behind this unreasonable phenomenon. Second, such a mature interface circuit will not be "unpredictable" logically, it must have been caused by my processing of this interface. Again, this phenomenon occurs randomly and is isolated, so it is impossible to be a logical problem. It should be caused by an interface timing problem and an accidental event. Riple

So what kind of accidental events lead to abnormal behavior of a normal circuit in what time sequence? First, two concepts are defined. Riple

There is a certain time interval between the data saved from the bus and the data saved to the temporary storage being written to the FIFO-the data temporary storage time. The length of the temporary storage period is uniquely determined by the two-level synchronous circuit. The value ranges from 2 ~ Three synchronous clock periods. Riple

Interval from the rising edge of diow to the rising edge-sampling period. Write dd () is sampled at the rising edge of diow-, and the sampling period is defined by the MDMA Protocol. However, the actual circuit operation may result in "non-compliance, glitch is a violation. Riple

Theoretically, it is no problem to adopt the "save first and then write" method. This processing is essentially a pipeline. As long as the temporary storage time is smaller than the sampling period, the circuit runs correctly. Even if the bus data changes rapidly after the sampling time, the circuit will not go wrong.In this example, two-level diow-synchronization is performed using a clock with a period of 20 NS, and the temporary storage time is 40 ~ 60ns, and the sampling period is 100 ~ 120ns, which meets the preceding requirements. Riple

However, if the sampling period Changes in some cases, it is reduced to less than the temporary storage time (40 ~ 60ns), the temporary data will be updated. If the data on the bus has been driven by the data in the next sampling period, the data will be "predicted. Riple

Because the sampling period is only determined by the rising edge of diow-seen by FPGA, the possible cause of the smaller sampling period is the glitch of the diow-signal. In this case, the data correctness of the data sampling circuit is sensitive to the diow-glitch. Riple

Then, is the glitch possible to meet the above conditions: 40 ~ after the diow-rising edge ~ Within 60 ns, and at the same time, the data on the bus has been driven by new data? Riple

Let's go back to the previous instructions on processing this interface. Due to the different design of each nanqiao chip, the timing diagram provided by this interface is somewhat different from the actual situation. The features are as follows: Riple

    1. In the figure, the "X" area of write dd () has a definite value. On the motherboard we debugged, the write dd () after the rising edge of diow-is updated to the data in the next write cycle, that is, before the falling edge of the next write cycle, the new data has already appeared on the bus. Riple
    2. The ratio of TD to TK shown in the figure is incorrect. The actual ratio is. The sum of the low-level duration TD and the High-Level duration TK is 100 ~ 120ns, that is, the descent edge from the current write cycle diow-to the next write cycle diow-is only 25 ~ The time of 30ns. Riple

According to feature 1, as long as the burrs appear after the rising edge of diow-, 20 NS to 40 ~ Within the time range of 60ns, the timing problem of "predicted data" is determined. According to feature 2, during this time interval, the only event that may cause diow-signal glitch is 25 ~ after the rising edge of diow ~ Diow at 30ns-the descent along the hop of the signal. Riple

It is the timing simulation waveform of the circuit. The waveform between two time cursors is the situation where the temporary data is changed after being affected by the diow-signal descent along the glitch. This glitch eventually leads to the data sequence (2) error Mode. Riple


Iv. Timing Problem Solving Riple

To solve this problem, you can start from two aspects: Riple

    1. Shorten the temporary storage time and make it smaller than 20ns. Save data between 20 NS and 40 ~ after the rising edge of diow ~ The time range of 60ns is insecure. Narrowing down or even eliminating this time interval can effectively protect temporary data storage. Riple
    2. Eliminate the influence of diow-signal glitch on data sampling. Riple

In cross-clock domain signal processing, two-level synchronous processing of diow-signals is reliable and standard, at the cost of 2 ~ Latency of three clock cycles. When the synchronization clock period is 20 NS ~ The delay of 60ns is inevitable. Only when the clock period is less than 6.67ns can the synchronization latency be less than 20ns to eliminate insecure data storage time. Riple

Therefore, the fundamental solution to this problem is to use the 150 MHz (6.67ns) clock for the first data synchronization, and transfer the temporary data to the MHz clock within the secure period of the temporary data storage, then, synchronize the stored data to a 50 MHz clock domain. Riple

Sampling diow-signals with a MHz clock will lead to more sampling points at the same interval, increasing the probability of sampling to diow-signal glitch, the sampled glitch will be converted into a synchronous pulse by the Synchronization Circuit. This pulse will still cause the wrong data to be synchronized to the 50 MHz clock domain. Riple

Therefore, for the first sampling with a MHz clock, you need to add a filter glitch circuit so that the sampled glitch cannot pass through the filter circuit, thus eliminating the influence of glitch. Riple

The modified circuit is as follows: Riple

For example, timing simulation waveforms. We can see that although dd_temp is damaged by the glitch at the diow-descent edge, the data synchronized to the MHz and 50 MHz clock domains is not affected. We can also see the role of the glitch filtering circuit, and the high-frequency clock is easier to zoom in the glitch: Riple


5. methods to avoid and solve the problem Riple

  1. When an incremental data transmission error occurs, you need to identify whether it is a sequence (2) or a sequence (3) Error type, and sequence (2) is an error type with the "predicted" data characteristics, very rare. Riple
  2. An incorrect circuit is a processing method for cross-clock domain data capture. It is practical and standard in cross-clock domain processing, although the first layer of data is saved to a certain extent, it is sensitive to glitch. Riple
  3. The modified circuit protects burr-sensitive data by increasing the sampling clock frequency and shortening the data storage time. The cost is to increase a layer of data storage. Riple
  4. The correctness of the circuit used to filter out glitches remains to be discussed. Although it is very effective in application, it is suspected that it violates the cross-clock domain processing method. Riple
  5. The glitch filter circuit is mainly used to filter out the diow-signal falling edge glitch, which has no effect on the rising edge Burr. The rising edge burr does not cause data errors in this circuit. Riple 
  6. When using this glitch filter circuit, you must consider the proportional relationship between the level length of the filtered signal and the synchronization clock period. In this example, if the 20 NS clock is used for filtering, the high level of the diow-signal 25ns is filtered out. If the 6.67ns clock is used, only the length less than 13.3 ~ is filtered out ~ 20 ns high level signal, diow-normal operation of 25 ns high level is safe. Riple
  7. This glitch should not occur on the hard disk ide interface, which may be caused by external circuit design problems. In the circuit where the problem occurs, three media are used from the IDE interface of the motherboard to the FPGA: 80-line ide cable, PCB for switching, and 40-line cable. In normal applications, from the main board ide interface to FPGA, only 80-line ide cabling is required. Riple 
  8. Using signals transmitted from other chips (through complex external circuits) as the clock and using the hop edges as the trigger signals is risky, the impact of error triggering caused by glitch needs to be considered. Riple


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.