Baidu uses FPGA to accelerate SQL queries on a large scale

Source: Internet
Author: User

Baidu uses FPGA to accelerate SQL queries on a large scale
GuideAlthough our focus on Baidu's work this year is focused on the deep learning initiatives of the Chinese search giant, many others are critical, although not so cutting-edge applications present challenges brought about by big data.

As Baidu's Ouyang Jian talked about at this week's Hot Chips conference, Baidu has over 1 EB of data, processes about 100 PB of data every day, and updates 10 billion of webpages every day, every 24 hours, updates process more than 1 PB of log updates, regardless of Google. Baidu adopts a method similar to Google to solve potential bottlenecks on a large scale.

As we just talked about, Google is searching for all possible ways to defeat Moore's law, and Baidu is exploring the same thing, the exciting and fascinating machine learning work is fascinating, and the acceleration of key business tasks is also necessary. Ouyang mentioned that the gap between the demand for high-end services provided by the company based on its own data and the capability that the CPU can carry will gradually increase.

For Baidu's hundreds of millions of problems, the receiver of all data is a series of data analysis frameworks and platforms, from the company's massive knowledge ing, multimedia tools, natural language processing frameworks, this is the case with the recommendation engine and click stream analysis. In short, the primary issue of big data is this: a series of applications and overwhelming data matching them.

When talking about the challenges of accelerating Baidu's big data analysis, Ouyang talked about the difficulty of finding a universal method at the abstract computing core. "The diversity of big data applications and the changing computing types make this a challenge. It is difficult to integrate all of these into a distributed system because of the changing platform and programming model (MapReduce, spark, streaming, user defined, and so on ). There will be more data types and storage formats in the future ."

Despite these obstacles, Ouyang said that their team had found a common clue (between them. As he pointed out, the traditional SQL statements connect many of their data-intensive tasks. "About 40% of our data analysis tasks are written in SQL, and other data can be rewritten using SQL ." Furthermore, he says they can enjoy the benefits of existing SQL systems and match existing frameworks, such as Hive, Spark SQL, and Impala. The next step is to accelerate SQL queries. Baidu found that FPGA is the best hardware.

These boards are called processing units (PES) and automatically process key SQL functions when executing SQL statements. All of this is from the speech and we are not responsible for it. To be exact, the FPGA mentioned here is a bit mysterious, maybe intentionally. If Baidu has been promoted in the benchmark test, this is a competitive information. We will continue to introduce what is described here later. Simply put, when FPGA runs in a database and receives an SQL query, the software designed by the Team will be closely integrated with it.

Ouyang mentioned one thing. Their accelerators are limited by FPGA bandwidth, or their performance may be higher. In the following comments, Baidu has installed two 12 cores, intl E26230 CPU at 2.0 GHz and runs in 128 GB memory. SDA has five processing units, each of which processes different core functions (in the 300 MHz FPGA board) separately (FilterFilter,SortSort,AggregationAggregate,UnionJoinAndGroupGroup)

In order to accelerate the SQL query, Baidu has studied the Benchmark Test of TPC-DS and created a special engine called Processing Unit (PE), which is used to accelerate five key functions in the benchmark test, this includesFilterFilter,SortSort,AggregationAggregate,UnionJoinAndGroupGroup, (We didn't put all these words in uppercase like SQL ). The SDA device uses an offloading model. The acceleration card of different processing units forms the logic in FPGA. The type of SQL function and the number of each card are determined by the specific workload. Because these queries are executed in Baidu's system, the data used for query is pushed to the accelerator card in the column format (this will make the query very fast ), in addition, through a unified sda api and driver, SQL query is distributed to the correct processing unit and SQL operations are accelerated.

The SDA architecture uses a data flow model, where operations not supported by the acceleration unit are returned to the database system and then run locally, the performance of the SQL accelerator card developed by Baidu is limited by the memory bandwidth of the FPGA card. The accelerator card works across the entire Cluster machine. By the way, Baidu did not disclose how data and SQL operations are distributed to multiple machines.

We are limited to the details that Baidu is willing to disclose, but these benchmark test results are very encouraging, especially for Terasort, we will follow Baidu's footsteps after the Hot Chips conference to see if we can get details about how this is linked together and how to solve the memory bandwidth bottleneck.

From: https://linux.cn/article-7775-1.html

Address: http://www.linuxprobe.com/baidu-fpga-sql.html


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.