Performance impact of the SCSI software layer

Source: Internet
Author: User

in the field of flash storage, we can see that, whether in the market, customers or research and development, everyone is supporting NVMe Standard, one of its most important reasons is the traditional SCSI has been unable to meet performance requirements, which has become an important performance bottleneck for storage systems. From software layer, transmission protocol efficiency, software interface standard, chip interface, transmission link, There are many deficiencies in traditional sas/sata. Today storage old Wu and everyone to share the Sas/sata interface at the software level of importance can bottleneck point, from the perspective of research and development to explain why the SCSI software layer is an important performance bottleneck point.

we all know better . SCSI the organizational structure of the software layer consists of three major parts:


1, SCSI Upper Drive layer. This layer of drive mainly completes the function of SCSI device, such as disk drive,Tape driver,CD-ROM drive is implemented in this layer. For disk drives, it is often called an SD Drive, which implements a block device function. To the upper access block device driver layer, the bottom and the SCSI middle layer docking.

2, SCSI middle layer. The middle-tier software mainly completes the scsi command handling, error handling, timeout processing, etc. Above the middle layer are the various scsi function driven; here's the scsi the bottom drive.

3, SCSI The underlying driver. The underlying driver enables SCSI data transfer and HBA Drive. an ISCSI transmitter can be implemented at the bottom, a SCSI HBA can be emulated ,and, of course, LSI can be implemented HBA driver, and the data is forwarded to the actual hardware board by means of DMA.

In traditional disk storage, performance bottlenecks are on the disk side. CPUprocessor,NUMAThe concurrency of Architecture and software has an almost zero impact on storage performance. Store Old Wu in -years of research and developmentThin Provisioninglogical Volume system, trying to optimize the lock resource competition to improveIOperformance, the result is expected in vain. For disk storage,CPUthe performance has been more than rubbing. A disk'sIOPSYou can only run up to $, so interrupt theCPUThere is no pressure, in this case,SCSIThe software layer has no effect on performance in any way. So, disk storage is a realIO Intensiveapplication.

However, for flash storage, everything has changed,SSDperformance, whether it's bandwidth orIOPSare very high, therefore, the storage performance bottleneck points fromDiskthe end is transferred to theCPU,OSand the network side. In this case, let's take a look atSCSIsoftware stack. As shown, eachSCSIThe device provides only one request queue (Request Queue), no matter how many processing threads exist in the system, regardless of the number ofCPUall requests are queued in a competitive way. SCSIthe request queue for a device is a competitive resource for the system.


650) this.width=650; "title=" 1.jpg "src=" Http://s3.51cto.com/wyfs02/M01/72/42/wKiom1XfO5Hit0-HAACku5Zy-ms543.jpg " alt= "Wkiom1xfo5hit0-haacku5zy-ms543.jpg"/>


for SMP system, the processing of competing resources needs to be accessed by means of locking. In the implementation of Linux , The request queue is mutually exclusive protected by the Spinlock method. Because SSD performance is very high, so the CPU in the system is busy the request processing, these busy CPU eventually need to compete request queue lock To put the request in the queue. This large amount of competition results in a significant reduction in the processing efficiency of each CPU , a large amount of time in the spin state, waiting for the request queue lock. As a result, the overall IO processing efficiency is reduced, and storage performance is not improved due to software limitations.

In response to this problem, we have done actual testing and found that when IO when pressure comes up, the system CPU most of the time in the spin state, all in the competition request queue spin lock. Therefore, the single request queue of the SCSI layer is a serious performance bottleneck.

to solve this problem, Linux the SCSI the single queue has been improved, introducing Multi-queue the way. The multi-queue method can reduce and avoid The competition between thread/cpu, which can make full use of the efficiency of single CPU processing IO . This improves The performance of IO processing as a whole. SCSI introduces multiple queues as shown in the following:

650) this.width=650; "title=" 2.jpg "src=" Http://s3.51cto.com/wyfs02/M02/72/42/wKiom1XfO7uAe-D_AAC52PPeMGQ316.jpg " alt= "Wkiom1xfo7uae-d_aac52ppemgq316.jpg"/>


It is important to note that the software is for each SCSI once a device has been introduced to multiple queues, it also needs to HBA provides multi-queue support for software access, otherwise performance will be limited to HBA the card. Compared with traditional storage, the design and implementation of flash storage software not only change the data distribution and the organization of data, but also the efficiency of software implementation, The mining of CPU concurrency potential, and the consideration of computer architecture will become particularly important. So, I've always thought that flash storage is not just about storage technology, but high-performance computing.


This article is from the "Save the Way" blog, make sure to keep this source http://alanwu.blog.51cto.com/3652632/1689131

Performance impact of the SCSI software layer

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.