The best way to obtain data from one table that is not in another table (Study on the Efficiency of JOIN and EXISTS) is as follows:

Source: Internet
Author: User

The best way to obtain data from one table that is not in another table (Study on the Efficiency of JOIN and EXISTS) is as follows:
Business Requirements:

The company has a CRM broker management system. The most difficult problem encountered the day before yesterday is to query all users in one table that are not in another table and display them on pages, however, the problem is that the external data volume is large. If not in (select...) is used ..), not exists (select ...) for subqueries like this, you need to filter all the information in the subquery table. The problem is that the data size of a single subquery is too large, so I checked some information in private, based on the examples provided in the documents, the corresponding test table was created and some tests were conducted. Then, the solution was handed over to the colleague min Hong of the technical department to solve the problem, because it was the last day before yesterday. It was originally because the incident was busy and had no time to write logs. It was just two days before I took a note of this incident. Now, I am still taking notes, share with you good, if you have a better way or said there are bugs in question, please feel free to contact pick up the day star happy.yin@qq.com

The following is a comparison of JOIN and EXISTS efficiency in a group of tests.

Open profiling to view the SQL Execution time.

Set profiling = 1;

Comparison of the join efficiency between exists subqueries and joins

EXPLAIN SELECT film_id, language_id FROM sakila.filmWHERE NOT EXISTS(    SELECT * FROM sakila.film_actor    WHERE film_actor.film_id = film.film_id);

 

EXPLAIN SELECT film_id, language_id FROM sakila.film    LEFT JOIN sakila.film_actor USING(film_id)WHERE film_actor.actor_id IS NULL; 

They only have a slight difference, one select_type, and the other is that the latter in Extra uses Not exists, that is, early termination algorithm. When the first actor_id is Not null, discard this filter.

Finally, let's look at the efficiency of both.

The efficiency of using the subquery Exists is lower.

SELECT DISTINCT film_id FROM sakila.film    JOIN sakila.film_actor USING(film_id);SELECT film_id FROM sakila.film    WHERE EXISTS (    SELECT * FROM sakila.film_actor    WHERE film.film_id = film_actor.film_id);SHOW PROFILES;

It can be seen that the EXISTS efficiency is still low.
PS: In the episode of "Picking stars from the sky", some people say that using the JOIN method to operate 10 million data tables will become slow, but it is not a problem of the SQL statement itself, the problem of 10 million slow data records in a single table is hard damage caused by disk I/O, which cannot be avoided by any optimization. Using any software layer optimization cannot avoid the hard damage caused by disk I/O issues to the database ..

Conclusion: The following is an optional method for judging that the data of one table is not in the other table (Prerequisites: the aid of table a and the aid of Table B must have an index, B. aid cannot have NULL values ):
SELECT aid FROM a left join B ON a. aid = B. aid WHERE B. aid IS NULL LIMIT 0,100;

SELECT aid FROM a left join B USING (aid) WHERE B. aid is null limit 0,100; (the actual effect of this syntax IS the same as above, which IS a short form of the preceding SQL statement. For USING syntax, see LIMIT at the end of this article)

The implementation principle is actually very simple. When left join is used for a linked list, if the external table data does not exist, the returned data is NULL,
SELECT aid FROM a left join on. aid = B. aid WHERE B. aid is null only returns data when aid in Table a IS equal to aid in Table B and aid in Table B IS NULL. There IS only one condition for data that meets this condition, B is satisfied only when the data in Table B does not exist. aid is NULL and. aid is equal, that is, the data we are looking for in Table a but not in table B.
 
The usage and effect of USING are as follows:

Used to specify the connection conditions for table connection (abbreviated)

SELECT * FROM a JOIN B ON a. id = B. id;

USING can be written

SELECT * FROM a JOIN B USING (id );


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.