Let's briefly talk about the use of Join in SQL, and talk about sqljoin.

Source: Internet
Author: User
Tags database join

Let's briefly talk about the use of Join in SQL, and talk about sqljoin.

Recently, I encountered a database CPU> 90% due to slow SQL, and the database hang was occupied .. Eventually, all other business queries fail.

A detailed analysis is caused by Join queries on several large tables. faults are common, but it reminds me of the need to explain how to avoid Join.

The following discussions are based on the premise that the database capabilities are limited. Otherwise, the subsequent discussions can be ignored directly.


First, let's talk about the use cases of SQL Join.


1) if the system exists


High concurrency, distributed

Simple business logic

Low data consistency requirements

Allow delayed reading


We recommend that you use Join less in SQL. In this type of business scenario, the purpose of reducing join operations is not only to reduce resource consumption in high concurrency, but also to reduce coupling between services and increase scalability. Services can be split into multiple microservices and multiple databases to facilitate configuration addition when part of the workload is too heavy, or directly change to the cache.


2) If the system exists


Writes complex data with low concurrency and frequency

CPU intensive, not IO intensive

Business logic is processed through a database and even contains a large number of Stored Procedures

Systems with high consistency and integrity requirements

Large volumes of reports and statistics are required


Therefore, database Join is required. Join cannot be avoided. For example, for some financial businesses, financial systems, and enterprise applications, complex join operations are inevitable. In addition to writing and writing well, you must make full use of the database.



For Case 1, we have the following common solutions to replace Join:


1. Multiple select statements are used to retrieve data from different tables and perform Join operations in the application code;

2. Create a wide redundant table while saving data separately, and retrieve and query data from the wide table;

3. The data to be joined is stored in the cache (such as redis). The cache can be active (the cache is updated when the data is modified) or passive (the cache is loaded only when the cache is deleted );

4. read data from an independent user API. Similar to method 1, aggregation is performed in the code.



In Case 2, our goal is to optimize Join and improve the corresponding performance. The common solution is as follows:


1.Use a small result set to drive large results. The purpose is to minimize the total number of NestedLoop loops in Join statements,For example, when two tables (table A and Table B) Join, if Table A is filtered by the WHERE condition, there are 10 records, and table B has 20 records. If we select Table A as the driving table, that is, the result set of the driven table is 20, we will filter the results of the driven table (table B) 10 times through the Join condition. If we select table B as the driving table, we need to filter Table A 20 times.


2. Make sure that the Join condition field in the drive table has been indexed,The purpose of ensuring that the Join condition field in the drive table has been indexed is to make sure that the Join condition field in the drive table is indexed only for the above 1st points, in order to ensure that each query in a loop can consume less resources, this is the actual Optimization Method for optimizing the inner loop.


3. Increase the size of Join_Buffer_Size,When MySQL meets certain join requirements (all row join/all index/scan join), in order to reduce the number of reads from the "driven table" involved in the join operation to improve performance, the join buffer must be used to complete the join operation. When the join buffer is too small, MySQL will not store the buffer into a disk file, but will first perform operations on the results of the join buffer and the table to be joined, and then clear the data in the join buffer, continue to write the remaining result set into the buffer. This will inevitably cause the driver table to be read multiple times, increasing IO access exponentially, reduce efficiency (if the using join buffer is used in the execution plan ). If the join statement is not very small, I suggest you increase join_buffer_size to about 1 MB. If the memory is sufficient, set it to 2 MB.


The final conclusion is that when the data is not big, it is best to design it according to the database specifications. When the data volume is large, only some specifications can be sacrificed for performance. Any Specification is a compromise under certain circumstances. It is not necessarily true if it is detached from this environment.



Scan QR code or manually search

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.