Complex SQL queries not moving around? DRDS read-only instance to solve!

Source: Internet
Author: User

Absrtact: In the actual business production environment, after using the OLTP database to store the data, there will be complex SQL query demands such as the background operation system for statistical report analysis and so on.

Background
In the actual business production environment, after the business application system uses the OLTP database to store the data, there will be complex SQL query demands such as the background operation system for statistical report analysis and so on.

In order to meet the needs of this kind of complex SQL query Fast response, DRDS team based on the third generation distributed SQL engine, further introduced the self-research MPP multi-machine parallel Computing engine (Fireworks) and corresponding optimization strategy, greatly reinforcing the DRDS complex query processing ability.

The results of distributed multi-table join, aggregation, ordering and sub-query operation in TENS data can greatly improve the response speed. Use the same data (RDS read-only) for processing, eliminating the need for data synchronization to other data sources, reducing the overall link complexity of the business architecture, and saving business operations and budget costs.

Key Features
Self-developed MPP multi-machine parallel computing engine Fireworks

The DRDS read-only instance is equipped with a SQL execution engine (Fireworks) with full multi-machine parallel processing capability. It differs significantly from the SQL execution engine that is installed on the DRDS master instance.

The execution engine of the DRDS master instance adopts the single-machine architecture, and adopts the strategy of pushing the computation down to the underlying physical sub-libraries, and realizes the distributed computation of logical SQL by the computing ability of the physical library.

The Fireworks engine on the DRDS read-only instance is a cluster of multiple compute nodes that transforms a SQL query into a distributed computing task, breaking down
The limitations of the computational power of the physical library significantly increase the computational speed for complex logical SQL, with significant acceleration for Join, Aggregate, and Sort calculations.

Fireworks will break down the compute tasks such as Join, Aggregate, and Sort by Shuffle and distribute them to multiple compute nodes in the compute cluster, and compute acceleration is achieved by parallel computation of multi-compute nodes.

Custom-built optimizer for multi-machine parallel execution mode

The original DRDS Master Instance optimizer focuses on the OLTP scenario, with the core idea of pushing everything down to the next physical library execution. The aim is to make full use of the computational resources of the physical library, and to avoid generating a large amount of data flow, so as to get faster response speed.

When faced with the large data volume level of the review query scenario, the overall performance will be under the limitations of the physical library, but also the physical library to create a greater pressure to affect the stability of the overall view of its OLAP capabilities have many limitations.

After the introduction of the MPP multi-machine parallel computing engine Fireworks, the DRDS itself has been greatly improved in computing power, the optimizer's overall optimization strategy has also been adjusted:

As far as possible, the complex computation (such as Join, Aggregation, Sort) is carried out to its own execution engine calculation, and the calculation acceleration and expansibility are realized by the Fireworks computing cluster.
Reduce the cost of data fetching by continuing to push the lightweight calculations (such as Project, Filter) down to the physical library.
The DRDS distributed SQL Optimizer can generate a friendly execution plan for multi-machine parallel execution engines by optimizing the granularity of the execution plan for better execution efficiency.

At the same time, a refinement operator push-down strategy is provided to push the operator of the small RDS pressure to the physical library to obtain a higher cost-performance ratio, and to protect the RDS from the costly operators, thus guaranteeing the stability of the online traffic.

Direct analysis based on online data

Emerging Internet services, represented by the new retail business, have emerged, in addition to real-time OLTP requirements, along with the need for some complex quasi-real-time OLAP to support real-time decision-making.

While most of the data analysis scenario solutions need to export the OLTP database production data to other data sources for offline analysis, this traditional scheme is difficult to meet the quasi-real-time requirements, and there is also the risk of data loss when the data is exported to the offline system.

DRDS-read-only instances do not require lengthy and tedious data synchronization tasks, which are based on RDS read-only instances or RDS master instances to directly perform complex data processing, reduce the overall link complexity of the business architecture, and save business operations and budget costs.

DRDS read-only instance avoids data synchronization, can guarantee the timeliness of processing, the maximum can be read commited real-time (based on RDS Master instance).

Clear borders for SQL compatibility

The DRDS read-only instance is fully compatible with the SQL query syntax of the DRDS master instance, consistent with the SQL compatibility and SQL support boundary height of the DRDS 5.3 version.

High compatibility with similar products and support for clear boundary features. can provide an almost consistent experience with the DRDS master instance.

DRDS cannot execute or perform slower complex SQL on the primary instance can be migrated directly to a read-only instance to execute, eliminating the additional overhead of SQL rewriting.

Flexible and independent product experience

DRDS read-only instance automatic synchronization DRDS Master instance of the account permissions information, native VPC support, internal and external network can be opened at the same time, according to the business situation of flexible distribution, data processing ability to improve linearly.

Technical Architecture Overview

DRDS read-only instance the overall schema is basically consistent with the DRDS master instance, and only changes in the query layer, adding the MPP execution engine and the corresponding optimizer as shown below:

The DRDs protocol layer handles the parsing of the network interaction and the MySQL protocol, and the SQL is forwarded to the query layer for processing when the query request is received. The query layer parses the SQL and produces an optimized execution plan by the executor, which is then referred to the execution engine to the storage layer for querying and calculation.
If you need to use fireworks engine calculations, the query layer will further convert the execution plan into a distributed execution plan and submit it to the fireworks Cluster as a distributed task after the execution plan is obtained. The remote Fireworks cluster is completed to the storage layer for data query and subsequent calculation work.

Simply put, the DRDS read-only instance can be considered as an additional execution link with multi-machine parallel processing capability on the original DRDS basis.

Applicable scenarios

In general, DRDS read-only instances are suitable for complex queries that deal with large data levels with low concurrency and high latency. such as data analysis and reporting scenarios, the typical characteristics of such scenarios are those containing large
The amount of correlation, aggregation and sequencing operations and the data involved in the calculation of large scale.

At present, DRDS read-only example in the Alibaba Group has landed a number of businesses, including the most representative of the box horse, business brain and other new retail scene. Around people, goods, fields, warehouses and multiple dimensions of the association analysis, scattered in different logical libraries of several or even more than 10 logical tables and then aggregation, sequencing to meet inventory reconciliation, decision support and other business needs.

DRDS read-only instance of the emergence of business development students no longer need to configure, maintain a large number of data synchronization links, do not worry about the data due to the results of different steps caused by poor timeliness or inaccurate problems, to a certain extent, to alleviate the development of students work burden.

For users already using DRDS, DRDS read-only instances can address the following two known issues:

Some complex SQL involving join, aggregation, and ordering may be found in the process of using DRDS because it is not possible to completely push down and needs to be evaluated in the DRDS execution engine two times, which cannot be performed because of the memory limitations of the single-machine execution engine.
The complex computational part of SQL can be pushed down but the large scale of data involved causes the physical library to increase the pressure to affect the OLTP service or the response time is too slow to reach the requirements.

Summary

Long-term DRDS by the single-machine architecture execution engine has been unable to provide good support for complex queries based on large data sizes, or to extend the physical resources to achieve linear scaling of their own local computing power.

The launch of the DRDS read-only instance completely compensates for DRDS's short board in OLAP scenarios, enabling DRDS to provide scalable OLAP capabilities while providing powerful OLTP capabilities, providing a one-stop solution for users with both OLTP requirements and mid-size data analysis needs, bringing Philip

In the next half-year time, the DRDS read-only instance will publish the cross-Logical Library association query function, and through more technical means, continuously enhance the core capabilities of the read-only instance, in the concurrency, Response time, data volume, interactive query and other aspects will have better performance, to meet the stringent requirements of the enterprise application database.

Original link

This article is the original content of the cloud-Habitat community and cannot be reproduced without permission.

Complex SQL queries not moving around? DRDS read-only instance to solve!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.