FB-DIMM memory performance Processor, Io, and memory are three key factors that affect the performance of a platform. On a balanced platform, the performance of these three aspects should be matched with each other, and they should not be too prominent or too "lame". This is the central idea of Amdahl's law. In recent years, the rapid development of CPU technology, in addition to the rapid increase in clock speed, also by adding hyper-threading technology, multi-core technology, etc. to improve the performance of the processor; the PCI bus that has been in use for more than 10 years has also been replaced by the PCI-express bus, with higher I/O performance and more flexible configuration; obviously, the memory technology also needs to be improved to adapt to the other two changes. First, the new memory technology must be able to solve the capacity and bandwidth problems, but also to maintain an ideal memory latency. The DDR/DDR2 memory used on the existing server platform is designed in parallel. As the memory frequency increases, it becomes increasingly difficult to suppress interference between multiple parallel signal lines. The general solution is to reduce the number of dimm supported by each channel, but this is inconsistent with the development trend of the server platform requiring larger and larger memory capacity. Although the capacity of a single memory module is growing, the limited number of dimm per channel cannot be offset.
|
DRAM transfer rate limits the number of devices per channel |
Why not improve the memory controller so that it can support more channels? Currently, the number of pins for each interface in the parallel memory channel has reached 240 pins. If you continue to increase the number of memory channels supported by the memory control, the complexity of related circuits will make the cost of the entire platform unacceptable. Second, the new memory technology upgrade cost cannot be too high, and its own costs cannot be significantly improved; otherwise, it cannot be applied to mainstream platforms. To meet this demand, the new memory technology should minimize the impact of existing memory vendors, especially memory chip manufacturers, in order to obtain wider support, this allows new memory technology standards to survive. Third, the new memory technology standards must meet the needs of servers, storage, workstations, communications and other platforms in the next 10 years and provide sufficient reliability, availability, and applicability (RAS ). Advantages of FB-DIMM memory technology With the help of serial technology, the FB-DIMM (fully-buffered dimm) technology has successfully achieved the above design goal in the existing DDR2 memory module. As shown in, it inserts a buffer component between the memory controller interface and the existing Memory Module Interface, which is compatible with the current DDR2 technical specifications, therefore, the existing DDR2 memory chip can be directly used to assemble the FB-DIMM memory module (the same principle, the future of FB-DIMM can use ddr3 chip, to meet the needs of the future development of the server platform ). The point-to-point serial interface technology is used between the memory controller and the cache part, instead of the current shared parallel interface. FB-DIMM basic work Overview The buffer used by the FB-DIMM memory is called AMB (advanced memory buffer, advanced memory buffer) chip, Intel, IDT, NEC and other vendors have begun to produce this chip. The main function of this chip is to respond to the memory controller command, which sends the command from the memory controller to dram. It actually plays the role of serial and parallel conversion, because of its existence, can directly use the existing dram chip on the FB-DIMM. This design concept can greatly reduce the resistance to promoting this new memory technology because it does not require any changes from memory chip manufacturers, the previous transition from SDRAM to DDR or Rambus requires support from memory chip vendors. More importantly, it provides a large space for future memory technology upgrades, such as the future of ddr3 investment in application, FB-DIMM technology can also be through the adoption of ddr3 dram chip and "Upgrade ". And there is backward compatible words in the promotional materials of the FB-DIMM, that is to say, the existing FB-DIMM platform is likely to use FB-DIMM ddr3 memory directly-because the memory controller interacts with the dram chip through the AMB chip, This is not impossible. The Platform with FB-DIMM technology can support up to 6 channels, each channel can support 8 dimms, each dimm can support 2 ranks memory modules. Currently the highest capacity of FB-DIMM memory is 4 GB, So theoretically the platform can be configured with up to GB of memory. From Intel's released intel 5000 Series chipset, they support up to four channels, each of which supports four dimms, and theoretically 64 GB memory. However, we can see that most of the published servers are configured with 8 dimm, so the maximum memory capacity is 32 GB. Data output bandwidth per FB-DIMM channel can reach 6.7 Gb/s, 6 channels can provide a total of 40.2 Gb/s memory bandwidth. Even the current intel 26.8 P chipset can provide Gb/s. It can be seen that the FB-DIMM technology has greatly improved the memory capacity and bandwidth of the server platform, and laid the foundation for x86 servers to enter higher-end applications. FB-DIMM wiring advantages Each FB-DIMM channel requires only 69 signal lines, and the wiring complexity will be greatly reduced compared to the current parallel DDR2 channel requires 240 signal lines. Is to support DDR2 memory board wiring (left) and support FB-DIMM memory board wiring (right), on the motherboard supporting DDR2 registered dimm, one channel requires two routing layers for signal line, in addition, one layer is required for power supply, and on the motherboard that supports FB-DIMM, two channels only need two routing layers (power supply is also included ). Obviously, this will greatly simplify the motherboard wiring, reduce the difficulty of the motherboard design, shorten the product development cycle, and improve productivity. FB-DIMM technology is transparent to operating systems and applications, so existing operating systems and software can run directly on the hardware platform where FB-DIMM technology is applied. FB-DIMM technical details FB-DIMM technology, more specifically a new memory architecture, it is mainly used to solve the server platform for memory capacity, bandwidth and other aspects of the higher requirements. From the perspective of memory system architecture, FB-DIMM is very different from registered dimm and unbuffered dimm. The FB-DIMM protocol adopts many new technologies, among which supporting the Serial Link Interface of the packet data format and the dedicated read/write path is the key point. Point-to-point differential signal interconnection and de-emphasis are the key points of FBD channel links. Clock Recovery using data streams is a key point of FBD clock. The FB-DIMM supports clock re-synchronization and re-sampling operations. CRC (cyclic redundancy check, cyclic Verification Code) is transmitted along with the data stream to ensure reliability during high-speed data transmission. Failover supports running systems with dynamic I/O faults. In addition, the FB-DIMM uses the chrysanthemum link, that is, AMB and AMB, AMB and host, AMB and DRAM are point-to-point interconnection. Although the name of this chapter is "FB-DIMM technical details", but we have no intention or can not be in a short chapter to cover all, so we will learn through a few keywords, to further understand this new memory technology. Keywords:Frequency Overview of FB-DIMM memory systems There are multiple frequency parameters to be understood in FB-DIMM technology, including clock frequency, DRAM clock frequency, DRAM data transfer rate, channel transfer rate and channel unit interval (channel unit interval ). All frequency parameters have a proportional relationship. The external clock source provides the reference clock input signal for AMB and host, which is much lower than the channel and DRAM frequencies. AMB doubles the external clock input to provide clock signals for DRAM operations. The DRAM data transmission rate is twice that of the dram clock input signal-that is, the basic working mode of the DDR (double data rate. The channel speed is measured by the unit interval (ui: unit interval), which is the average time interval for Power Conversion in FBD channels. It is six times higher than DRAM data transmission rate. Therefore, if the external clock source is 166 MHz (6ns), then AMB is the dram output 333 MHz (3ns) signal source, it is easy to calculate the FBD channel unit interval is 250 ps (SEC, 1ns = 1000 ps), that is, the transmission rate of 4.0gbps. Main Frequency relationship of FB-DIMM
|
UI |
Clk_dram |
CLK-REF |
Frequency |
DDR2-533 |
312.5 PS |
266 MHz |
133 MHz |
3.2 Gb/s |
DDR2-667 |
250.0 PS |
333 MHz |
166 MHz |
4.0 Gb/s |
DDR2-800 |
208.3 PS |
400 MHz |
200 MHz |
4.8 Gb/s |
In the FB-DIMM platform design, the baseline clock signal provided for each AMB and host does not consider the issue of phase adjustment. This is mainly because its clock synchronization is implemented through the external reference clock of the FB-DIMM memory subsystem and the Channel Data Stream-this is the same as the implementation of the PCI-express bus. This method also causes non-synchronization between the transmitter and receiver when no data communication is available. This problem is solved by introducing the minimum conversion density in the FBD channel. Correction of clock signals Keyword 2:Interface AMB interface function Key components in the FB-DIMM are AMB (advanced memory buffer, advanced memory buffer) and all its interfaces. They include two FBD Link (LINKS) interfaces, one DDR2 channel interface and one SMBus interface. The FB-DIMM uses the chrysanthemum link topology, providing good scalability, each channel can support single dimm to 8 dimm expansion. FB-DIMM channel has two-way communication path, south bound link and north bound link, both use physical differential signal, representing different directions of signal transmission. The south direction refers to the direction from the master controller to the dimm, while the north direction is the opposite. Southbound commands are mainly used to transmit data to DRAM and to address and write data to dram. The north direction is used to read data from dram. The width of the southbound input link is 10 Lane, and the width of the northbound input link is 14 Lane. The DDR2 channel provided by AMB can be directly connected to DDR2 SDRAM. DDR2 channels Support 2 ranks. Each rank contains 8 banks and supports 16 row/column requests. Each rank has 64 data signal lines and 8 parity signal lines. AMB supports the SMBus interface, which means that it allows the system to access the configuration register without passing through the FBD link. However, AMB is not the main controller of SMBus and is only a subordinate interface. Its operating frequency is 100 kHz. This interface is very useful when starting the system and diagnosing link faults. It is required for Amb. Keywords 3:Frame and bandwidth The south and north directions have different data frame formats and frame formats. The data reading rate in the north direction is twice the Data Writing rate. The main reason for adopting this asymmetric design is that there are few reads and writes in actual applications. Frame format The south direction consists of 10 pairs of differential signal lines (20 physical signal lines). It adopts the 10x12 (10 Io [or Lane] x 12 Io switching) frame format, it can transmit 10x12 bit information per DRAM clock. A southbound frame is divided into three command slots, as shown in, Command slot a conveyor belt address command. Command Slot B and C conveyor belt boycott command or write data to dram. Write data is transmitted through the command + wdata frame in the FBD south to the command and data connection. The command + wdata frame in each FBD channel can transmit 72 bits data. Two command + wdata frames can transmit 144 bits of data, which matches the data volume transmitted within a single DRAM command clock cycle (18 bytes. One dram burst 8 transmission can be generated in a single channel or one burst 4 transmission can be generated in two consecutive channels, both of them can provide 72 bytes of data (64 bytes of data + 8 bytes of ECC data ). The theoretical throughput of southbound commands and data connections will be half the throughput of a single DRAM channel. For example, when DDR2 533 DRAM is used, the theoretical bandwidth of the southbound command and the data connection peak is 2.133 Gb/s. The north direction includes 14 pairs of differential signal lines (28 physical signal lines ). It uses 14x12 (14 Io [or Lane] x 12 Io switching) frames, and each DRAM clock can transmit 14x12 bit information. A northbound frame can be divided into two parts, which can read data from dram. The FBD channel's northbound Data Link is used to transmit read data. Each FBD north-to-north data frame can transmit 144bits data, that is, 18 bytes, which is exactly the capacity of ECC data that can be transmitted within a single DRAM command clock period. 14 A single channel can complete one dram 8 burst, and the adjacent two channels can complete one dram 4 burst, both methods can provide 72 bytes data (64 bytes Data + 8 bytes ECC data ). The northbound data link will share the same theoretical throughput as the peak throughput of a single DRAM channel. For example, when DDR2 533 DRAM is used, the theoretical bandwidth of the north direction data connection peak is 4.267 Gb/s. The theoretical peak throughput of a single FBD channel is the sum of the theoretical peak throughput of North-direction data connection and the theoretical throughput of south-direction data connection. The theoretical throughput of a single FBD channel is 1.5 times that of a single DRAM channel. For example, when DDR2 533 DRAM is used, the peak theoretical throughput of DDR2 533 channels is 4.267 Gb/s, and the peak theoretical throughput of FBD-533 channels is 6.4 Gb/s. Key words 4:FBD channel latency When the variable read latency function is not used, the latency of any fbdimm In the FBD channel is the same as that of other fbdimm. When more dimm are added to the FBD channel, the latency of reading data from each dimm increases accordingly. This is mainly because the FBD channel is composed of multiple point-to-point internal interconnection dimm, memory requests must pass through the N-1 buffer to reach the N buffer. Therefore, the idle read latency configured for the 4 dimm channel is longer than that configured for the 1 dimm channel. The variable read latency function can reduce the latency of dimm close to the host. Key words:Hot swapping(Hot-add and hot-Remove) The fbdimm channel itself does not have a mechanism to monitor whether fbdimm is added in the channel, or it still needs to rely on the system's memory controller to initialize the newly added fbdimm. Then execute a hot add reset operation, and the new fbdimm can work together with the original fbdimm. Before the fbdimm is removed, the host sends a fast reset operation to the last fbdimm to remove it from the working state. It should be noted that, although it is "Hot swapping", the power supply on the fbdimm slot will be broken when fbdimm is installed or removed, if such a function is implemented, it is the scope of the system platform. FB-DIMM memory exterior and physical specifications FB-DIMM shape size DDR2 dimm shape and size We have referenced the product specification documents released by Samsung and compared the exterior sizes of the FB-DIMM and registered DDR2 dimm respectively. Registered DDR2 dimm has a shape size of 133.35mm x 30mm while a FB-DIMM has a shape size of 133.35mm x 30.35mm. Although we have learned that the FB-DIMM actually only needs 69 valid signal lines, but the current FB-DIMM is still 240pin. The two memory modules are located at different locations, the FB-DIMM is located at the junction of 67mm and 51mm, And the DDR2 dimm is located at the junction of 63mm and 55mm. The distribution of the two memory chips is different. For example, the memory modules of 18 memory chips are used. The registered DDR2 dimm usually has 9 on both sides, while the FB-DIMM is on one side for 8, on the other side of 10, a small number of chip side also need to install AMB chip.
From the previous introduction, the AMB chip is a very busy chip, the entire FB-DIMM module of all instructions, data in and out of the chip. For multiple dimm in one channel, the AMB chip closest to the memory controller is the busiest. In this case, the chip heat can not be underestimated, so all the FB-DIMM memory modules need to install heat sink for AMB to heat, different manufacturers provide different solutions, for example, infineon only installs a separate heat sink for AMB, while Samsung uses an all-inclusive heat sink. According to data provided by Kingston, the FB-DIMM consumes more power than DDR2 registered dimm (about 3-5 watts per module), which is a high number for memory. Suppose that in the system with 8 FB-DIMM installed, the increased power consumption is 24-40 watt! FB-DIMM is Intel's attempt to guide the industry in terms of memory technology again, but this time intel's situation and strategy are different from those when Rambus was promoted a few years ago. First of all, FB-DIMM technology does not involve the memory chip standards, more specifically, it is based on the existing memory technology based on a new way of application, therefore, do not need such as Samsung, infinitus and other semiconductor giants support or not; secondly, FB-DIMM has become the JEDEC Standard, and has been the main memory module manufacturers support; third point, intel only plans to use it on its Xeon platform and does not involve PCs and laptops. Therefore, it is not very smooth, so it is easier to promote it. Of course, FB-DIMM memory also has some problems, it needs an AMB chip, according to the server vendor, the current FB-DIMM Memory price is 20-30% higher than the mainstream registerred ECC DDR2 memory. In addition, AMB chip will bring about 3-5 W additional power consumption, for the entire system is 20-40 W growth, so both system power supply and system heat dissipation require more investment. We it168 evaluation center through a Samsung FB-DIMM memory and an infinitus FB-DIMM memory on Intel P chipset and Intel v chipset performance were investigated. In terms of performance, they can provide to meet the needs of the next generation dual-core Xeon processor, but our test results cannot distinguish between 4-channel and 2-channel memory subsystems configured for FB-DIMM performance differences. In terms of compatibility, the two memories can work with different chipset, which can well support the existing Windows 2003 operating system and application software. ECC is short for "error checking and correcting". The Chinese name is "error check and correction ". ECC is a technology that can realize "error check and correction". ECC memory is the memory that applies this technology. It is generally used on servers and graphics workstations, this will make the entire computer system more secure and stable at work. To learn more about ECC technology, you cannot skip to parity (parity ). Before the emergence of ECC technology, the most widely used technology in memory was parity (parity ). We know that in a digital circuit, the smallest data unit is bit, also called bit. Bit is also the smallest unit in memory, it uses "1" and "0" to indicate high and low level signals of data. In a digital circuit, eight consecutive bits are one byte, and each byte in the memory without "parity" is only eight bits, if an error occurs in the storage of one of its buckets, the corresponding data stored in the bucket will change, resulting in an application error. In addition, each byte (8 bits) exists in the "parity" field, and an additional bit is added for error detection. For example, a certain number (1, 0, 1, 0, 1, 0, 0, 1, and 1) is stored in a byte ), add up each phase (1 + 0 + 1 + 0 + 1 + 0 + 1 + 1 + 1 = 5 ). If the result is an odd number, the parity bit is defined as 1 for the even check, and the reverse is 0 for the odd check. When the CPU returns the read data, it will add the data stored in the first eight bits, and check whether the calculation result is consistent with the check bits. When the CPU finds that the two are not at the same time, it will make a view to correct these errors, but the parity has a disadvantage. When the memory finds a data bit error, it may not be able to determine the location, the error may not be corrected. Therefore, the main function of the memory with parity check is to "detect errors" and correct some simple errors. Through the above analysis, we know that the parity Memory checks the correctness of the current 8-bit data by adding a data bit based on the original data bit, however, as the number of data bits increases, the number of data bits that parity is used to test also doubles. That is to say, when the number of data bits is 16 bits, it needs to add two bits for check, when the data bit is 32 bits, four bits need to be added, and so on. Especially when the data volume is very large, the probability of data errors increases. The parity check method that can only correct simple errors does not work. It is based on this situation that, A new memory technology was born with an answer. This is ECC (error check and correction). This technology is also implemented by adding a verification bit to the original data bit. The difference is that the two methods are different, which leads to the different main functions of the two. It differs from parity in that if the data bit is 8 bits, an increase of 5 bits are required for ECC error check and correction. Each time the data bit is doubled, ECC only adds one bits, that is to say, when the data bit is 16 bits, the ECC bit is 6 bits, 32 bits are 7 bits, and the data bit is 64 bits, the ECC bit is 8 bits, and so on, each time the data bit doubles, only one ECC bit is added. In short, ECC can allow errors in the memory and correct errors so that the system can continue normal operations without interruption due to errors, and ECC can automatically correct errors, you can locate the error bits that cannot be checked by parity and correct the errors. At present, some manufacturers usually use sd ram for general PC low-end servers without ECC. Pay attention to this indicator when purchasing ECs instances. 1: You compare it yourself. I don't think it's a concept! 2: Yes |