Zhang Dong: OpenPOWER capi why so fast? Two

Source: Internet
Author: User

 Zhang Dong: OpenPOWER CAPIWhy so fast? (ii)


PMC Company Data center storage architect Zhang Dong


with the How does the CAPI FPGA work?

First recognize the three characters in the system:

AFU (Acceleration Function Unit), the main acceleration logic part is the fpag acceleration chip, the user can put their own acceleration logic and Firmware write it in.

     psl-power Service Layer,provides an interface toAFUfor reading and writing main memory andv2pAddress Translation(with theCPUside uses the same page table and containsTLB), while also responsible forProbe CAPPImplement Globalcc, and provideCache. PSLbyIBMas a hard coreIPprovided toFPGAdeveloper.

     capp-coherent attached Processor Proxy,equivalentFPGAside ofccagent, but was placed in theCPUside, which maintains aFilterdirectory and accept from otherCPUof theProbe, not filtered outProbeforwardingPSL.

The operating points can be briefly summarized as follows 6 points:

  • for dedicated scenarios, Optimized for PCIE dedicated accelerator cards;

  • the FPGA directly accesses the entire virtual address space of the current process without having to turn it into a PCIE address;

  • the accelerator card can be used the Cache and The Probe operation of the CAPP automatically and main memory cc;

  • Accelerator card and The CPU sees the same address space and cc;

  • provide API, including open device, delivery task description information, etc., equivalent to the driver;

  • psl ibm provide, hard core ip afu through opcode psl send and receive data.   

        < Span lang= "ZH-CN" > in this process, capi is committed to the fpga cpu cpu cpu fpga See is no longer pcie space, so the mapping address is omitted. And then fpga cache cache

        now fpga has direct access to main memory space, but it does not access all physical space because CAPI 1.0 capi fpga CAPI 2.0 fpga 10 fpga fpga cache cpu

How much performance can be improved?

The hardware configuration is this:  

IBM Power8 Server, s822l

Ubuntu, kernel 3.18.0-14-generic

Nallatech 385 CAPI card

Samsung SM1715 1.6TB NVM EXPRESSSSD

when testing,? the PMC engineer uses an FPGA to create a text search engine, such as.

During the testing process,the host side main program reads data from the NVMe SSD and generates a task description linked list. AFU uses pooling to access the main memory to get the task description list and perform search tasks,Snooper used for debug and performance monitoring.

Performance – P8<->afu

When the queue depth of time, get a limit throughput, close to 6gb/s bandwidth, bandwidth is very large.

Delay is also very small, only 1.5 microseconds, average 90% read and write in 1.5 microseconds completed.

things that CAPI1.0 can't do temporarily

the CPU thread now does not see the address space on the AFU (except for theMMIO Control register address). Moreover,AFU can only be used by one process. would it be faster if the FPGA could be directly plugged into the FSB of the CPU in the future?



Zhang Dong: OpenPOWER capi why so fast? Two

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.