There are a number of pervasive problems with storage systems in the marketplace today. Typically, data is entered from the external interface, and the storage engine handles data processing. Each storage engine has its own characteristics that can be used for data applications, compression, encryption, and mapping. While the storage engine processes the data, it also responds and sends an answer to a high-level application. To achieve this, all storage systems use some kind of write cache to respond as quickly as possible, enabling applications to perform their respective tasks. In addition, the Data Processing section produces a lot of metadata. For each I/O to enter the system, metadata data processing are all the same. Obviously, metadata storage is also required to assist in data processing. As a result, these two types of load become bottlenecks in the storage System: write caching and metadata caching, which need to be addressed. As shown in Figure 1.
Figure 1 bottlenecks in existing storage system architectures
The main features of both types of load require that the storage device has the highest possible performance to cope with. The ideal goal is for both loads to achieve the same performance as the top memory in Figure 2. At the same time, also want to have with the figure in the bottom of the external storage equivalent to compensate for memory in the case of loss of data is easy to lose the problem. Looking at the level of storage from memory to the external layer, it can be seen that the desired storage between the two loads is between the memory and the level-a critical task performance gap (shown in Figure 2).
Figure 2 Performance Gap
How to deal with these challenges? The natural response may be: "Use SSD to solve it." "Looking back, that's exactly what most storage systems have been doing a few years ago-using SSD as write buffering and metadata caching." But as technology advances, flash memory has become a dominant storage medium, with IOPS escalating, and SSD is no longer able to meet these performance requirements. The real challenge lies in the degree to which SSD is written. Look at the degree of tolerance required for these loads. The vertical axis of Figure 3 indicates the number of daily writes. The horizontal axis, from left to right, increases IOPS from 50,000 to 1,000,000, showing the number of writes required to maintain each IOPS level.
Figure 3 Flash Tolerance Puzzle
A few years ago, the 50k IOPS system was still very high-end, but today, it has not even reached the entry-level level. It is now facing a 100k or even higher system. For example, a 100k IOPS system, if you plan to use 400GB SSD, as shown in the Blue line in Figure 3, the durability required for 400GB SSD is 100 times a day. This is clearly a flash that cannot be reached. Even if the capacity is increased to twice or even four times times, the demand for write-tolerance is reduced to 10 times a day, and SSD is barely manageable, and such SSDs can be expensive. Look to the future, the development trend is the upper right corner of the green block part. Obviously, due to the problem of write-resistant, flash memory is not capable of increasing IOPS speed.
Since the flash is not good, most people turn to DRAM. So, can dram be competent? It goes without saying that DRAM is ideal from a performance point of view, but the downside of DRAM is that it is susceptible to power failure. So you have to provide a protection for DRAM. The way to protect DRAM is usually to join an integrated UPS system or battery backup unit. The battery itself has a series of problems, such as poor reliability, shorter life cycle than the system, and so on, resulting in great maintenance difficulties. In addition, the battery takes up a lot of space. Of the storage racks with battery backups, One-fourth of the locations are occupied by batteries. Obviously, the space occupied by these batteries can be better utilized.
Looking at the issues before them, PMC created an ingenious NVRAM accelerator card solution. The scheme perfectly fills the gap between the memory layer and the SSD layer. PMC's NVRAM speed card has excellent performance, so it not only plays a role in filling gaps, but also realizes several unique functions.
Figure 4 Pmcflashtec NVRAM accelerator card
First of all, from the hardware appearance, this is a standard size half high, half a PCIe card. The design is compact and can be placed in any server, basically compatible with all servers. In addition, PMC has made several innovations on the interface layer connected with the host on the program. As shown in Figure 4, the left side shows the current application's native interfaces, all based on blocks. Therefore, we provide a NVME interface, which is a native interface for the application of block devices, and is easy to integrate. This is also a native interface for write caching with large chunks of data in and out. In addition, we provide memory-mapped access. In this mode, we map the memory capacity directly to the virtual memory address space of the application. Therefore, when an application needs to be accessed, it can be used as a native memory using the CPU's load/save command without consuming any storage cycles or touching any software layer. This approach makes it easy to access memory for metadata applications.
Fig. 5 Flashtecnvram Application Model
With these two interfaces: a block-based interface and a memory-mapped interface, a solution is found for write caching and metadata load problems. So what about the performance of the scheme in block mode? The test results show that the NVRAM accelerator card provides 1 million IOPS. Compared to SSD, the performance of the SSD is 10 times times higher than that of SSD.
Figure 6 Flashtecnvram Performance 10 times times the SSD
Figure 5 shows the results of the product with an extremely advanced PCIe SSD. As we all know, the performance of SSDs depends largely on flash memory. Flash memory performance is likely to be high, but in terms of sustainability, the flash memory is not write-resistant problem, so the performance of flash memory is uneven. In contrast, the PMC's NVRAM accelerator card consistently provides balanced performance, up to 1 million IOPS read/write, and has no durability problems.
Let's look at the memory-mapped interface. In the memory-mapped port, the card's performance is much more than this number, 64B random write, can continue to provide 15 million iops. At the beginning of the article, there are multiple meta-data I/o for each data buffer that needs to be processed. Therefore, to maintain the 1 million-time write ioPS, you need to process the corresponding multiple copies of the metadata. With 15 million random read/write IOPS, the solution's performance is very comfortable.
Figure 7 Random Write IOPS for 64B data processing up to 15 million times Flashtech
The last performance metric is CPU utilization. Compared with the memory based solution, the core advantage of adopting the NVMe driver is the ability to efficiently perform DMA processing. The NVME agreement is highly efficient in this respect. Using NVMe to move data from memory to the NVRAM solution is four times times more efficient than using CPU cycles. This is vital. Making a memory backup basically consumes CPU cycles, and these times can be used for the processing of upper-level applications, which is where CPU resources are most needed.
Figure 8 CPU utilization is better than NV-DIMM
PMC's NVRAM accelerator card creates a unique storage hierarchy between--dram and SSD. The product delivers million-write ioPS and millions of IOPS (4k blocks) to achieve a variety of applications required for the performance and write-resistant. The product adopts industry standard interface, PCIE interface, NVMe interface and native memory mapped access interface, so it can shorten the time of system listing and reduce total cost of ownership. Today's storage market is changing, the variety of applications, the variety of data, the introduction of the NVRAM Accelerator card for enterprise-class mission-critical applications, to provide equipment manufacturers and data center users a new option.
The difference between NVRAM and NV DIMMs
There are a number of new schemes such as NV DIMMs, that is, the ordinary memory to add capacitors and flash, power outages can provide protection, the mechanism is the same, the shape is different. Why is it that the NVRAM has an advantage over it, one is a large capacity, and then an NVRAM occupies the PCIe slot, instead of occupying the DIMM slot. Another NV DIMM has a number of disadvantages that require BIOS support, and the NV DIMM is plugged into the motherboard to differentiate it from other memory, the BIOS needs to be separated and the application needs to be separated. It is difficult to mix, the OS you have to change, you need to know which one is in the power protection. Then there is the need for hardware support, DRAM support, and additional support for the NV DIMM slots, which is more involved in the motherboard, and most seriously, the NVDIMM consumes CPU cycle memory and affects the application software.
The NVRAM has a processor that specifically moves data, replacing the CPU by moving the data into the NVRAM. If you use a DIMM, no one will do it, only the CPU can read it from the normal DIMM and write it into the NV DIMM. So it's not worth the CPU consuming a lot of cycles to copy data.