Common Hardware error records and software diagnostic algorithms of the DDR memory subsystem

Source: Internet
Author: User

In uboot, denx implements a strict DDR detection program for common memory faults. The process and method of detecting data lines, address lines, and DDR physical memory are described in three phases. The science is rigorous, and it is not surprising that the DDR subsystem is prone to faults but difficult to debug. This set of algorithms designed by denx is called the "Treasure" of DDR memory detection "!

Why check data lines first?
If the data line is disconnected, nothing can be said! The second step is to check the address line. Only when the data line address lines are OK Can the memory storage units be detected to make sense. Such an order is also conducive to segmentation and locating problems. The testing sequence block diagram on the first floor divides the entire detection process into three steps, which are represented by three dotted boxes.

Data line connection Error
There may be two kinds of errors in connecting data lines. One is disconnected, and the other is wiring or production causes mutual short circuits.

How to detect connection errors of data lines
I think the data line detection algorithm designed by denx is very tricky and accurate. After reading it for a while, I can understand it :(
For example, if there are two data lines, you only need to write and read a pattern = 0b01 to determine whether they are short or disconnected. Obviously, most embedded platforms have more than two data lines. Taking the 64-bit address line as an example, pattern = 0b1010101010101010... can detect data errors between parity bits. If this error is ruled out, each two data lines form a group (this is the key to understanding the next pattern), and then use the same method to detect whether there is a short circuit between the two adjacent groups, the second pattern, that is, 0b110011001100 ...... and so on, take four data lines as a group, and eight lines as a group. Six pattern types are obtained one after another: 0, 0, 0, 0xf0f0f0f0f0f0, 0xff00ff00ff00ff00, 0xffffff0000ff0000. You only need to write and read the six pattern successively to verify whether there is a data line Cross Short Circuit Error.

How to detect a short circuit or open circuit between a data line and other signal lines on the board
Take the anti-code of the above 6 paatern, a total of 12 pattern can detect that each bit can write and read 0 and 1.

What is a floating buses error?
Floating buses will "fool" the test software. If the test software writes and quickly reads a value, the write operation will be charged with the capacitor on the data line, and the bus will temporarily maintain its status. When you test the software read operation, the bus returns the newly written value, even if the data line is broken.

How to detect floating buses errors of data lines
The algorithm used to detect floating buses errors is not complex. insert an operation between write and read-back to write different values to different addresses. For example, if X is written to the X1 location, Y is written to the Y1 location, and X is read from the X1 location, the floating buses error does not exist.

 

Address Line Error
If the address line is incorrect, the problem is that two different locations in the address space are mapped to the same physical storage location. In a more general sense, it is to write a location but "change" another location.

Address Line Error Detection
The address Line Error Detection is relatively simple, and its algorithm is:
First, write the address value as the content to the address. The Assembly expression is (ADDR) = ADDR. This ensures that the content at each location is different.
Next, flip the value of an address line (FLIP/toggle) of the base address of the memory to get an address. If the value is the same as that of the base address, it indicates that an address line is faulty.
This algorithm detects only one address line at a time, which is simple and effective.

Storage unit error
The above data lines and address lines are used to detect cabling or factory production errors, while the detection of storage units is a real detection of the DDR memory chip. The common error of memory chip is bit-stuck. In short, it is to make it 0, it is partial to 1, let it be 1, it is partial to 0, I am doing it, do not listen to the number order :(
The detection method is also very simple, that is, to use different pattern to write as many addresses as possible and read back and compare them. There are some common pattern such as 0x5555, 0xaaaa, and so on.

Several simple methods for detecting DDR faults
The above DDR detection algorithm is comprehensive, but takes a long time and usually takes several hours. There are also several simple commands on the uboot command line to detect common memory faults, code can be executed without re-compilation.
1> mtest ADDR lenth Pattern
Note that after uboot is started, the DDR is mapped to the 0 address, but the uboot code starts with the heap and stack space 0x0000000, which cannot be flushed, otherwise, it will be suspended. The execution of mtest is in this space. It can only be said that the great water has rushed to the longwang Temple.
2> copy the content of nor flash to the memory, for example, CP. B 0x20080000 0x7fc0 20000, and then compare CMP. B 0x20080000 0x7fc0 20000.
3> download the kernel image to the memory, copy nor flash or TFTP, and call iminfo load_addr to detect CRC errors.

The first method is to use a specific pattern to fl the free space of the DDR. The second and third methods are more random.
Of course, the most thorough detection method is to run the Linux system for a long time. The above method is more suitable for locating errors when the system is unstable.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.