Count of SQL optimizations, table connection order, condition order, in and exist

Last Update:2014-07-12 Source: Internet

Author: User

Tags comparable

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, about Count

Having read some articles about count (*) and Count (column), is the efficiency of Count (column) certainly higher than count (*)?

In fact, the individual think that count (*) and Count (column) is not comparable at all, COUNT (*) is the total number of bars in the table, and Count (column) is the number of non-empty records when the column.

But we can compare them by experiment:

First create the test table:

drop table test Purge;
CREATE TABLE Test as SELECT * from Dba_objects;

Update test set object_id =rownum;
Set timing on
Set Linesize 1000
Set Autotrace on

Perform

Select COUNT (*) from test;
Select COUNT (object_id) from test;

Discover that time-consuming is the same, is their efficiency actually the same?

Let's try creating an index on column object_id

CREATE INDEX idx_object_id on test (object_id);

And then execute

Select COUNT (*) from test;
Select COUNT (object_id) from test;

Is it obvious that count (object_id) is significantly higher than count (*), because the index is used by count (object_id), so efficiency is improved a lot?

Let's modify the column properties of the OBJECT_ID

ALTER TABLE test modify OBJECT_ID NOT null;

And then execute

Select COUNT (*) from test;
Select COUNT (object_id) from test;

found that their speed is actually as fast as the count (*) is also available to the index.

In fact, the premise of efficiency comparison is that two statements should be equivalent, the two formulations are not equivalent, and therefore not comparable.

For the Oracle Optimizer, we can experiment to find that the count of different columns, the time of the statistics is not the same, the approximate trend is that the column is more expensive, the higher the cost of access, the column offsets the performance of the Jedi Access. The cost of Count (*) is independent of the offset. Therefore, on some occasions count (*) is the fastest.

Ii. about in and exist

About in and exist is mostly said in the efficiency than exist high, so there must be in the place to be replaced exist and so on. But is that really the case?

We can do an experiment:

In the 10g;

SELECT * FROM dept where deptno not in (select Deptno from EMP);
SELECT * FROM dept where not exists (select Deptno from emp where Emp.deptno=dept.deptno);

We find that exist is indeed more efficient than in. This statement seems to be set up AH.

But we'll execute the following statement

SELECT * FROM dept where Deptno not in (the Select Deptno from emp where deptno are NOT null) and DEPTNO are NOT null;

You will find that the efficiency of in and exist is the same after adding a non-empty constraint.

View the execution plan for three statements you will find that the in statement and the exist statement without the non-null constraint are all anti, so the efficiency is the same, and the in statement without the non-null constraint is using the filter instead of the anti algorithm, so the efficiency is worse.

So we can conclude that in Oracle 10g, if you can ensure non-null, the in constraint can use the Anti semi-join algorithm, at which time the efficiency and exist are the same.

In 11g:

SELECT * FROM dept where deptno not in (select Deptno from EMP);
SELECT * FROM dept where not exists (select Deptno from emp where Emp.deptno=dept.deptno);

We found that the efficiency of the two statements was the same, as was the view of the execution plan. Oracle has been optimized in 11g, so the efficiency of in and exist is the same.

From this we can conclude that in 11g, the efficiency of using in and exist is the same, because they are taking more efficient anti algorithms.

Third, about the connection order of the Size list

On the internet we can see a lot of such articles, in the multi-table query, with a small table or cross-table to do the base table, put in the back, the big table is placed in the back of the position, because the table is the order of access from right to left.

But is that really the case?

We can do the experiment to verify (11g):

CREATE TABLE Tab_big as SELECT * from Dba_objects where rownum<=30000;
CREATE TABLE Tab_small as SELECT * from Dba_objects where rownum<=10;
Set Autotrace traceonly
Set Linesize 1000
Set timing on
Select COUNT (*) from Tab_big,tab_small;
Select COUNT (*) from Tab_small,tab_big;

As we look at the execution plan, we can see that the efficiency of the two statements is the same, is it not related to the order and efficiency of multiple table queries, tables?

We are executing the following statement:

Select/*+rule*/Count (*) from Tab_big,tab_small;
Select/*+rule*/Count (*) from Tab_small,tab_big;

We can clearly find that the small table in the right, big table in the left statement, query efficiency is much higher.

In fact, in the rule-based era, the query efficiency is related to the connection order of the table, small table or cross-table in the left, large table on the right of the execution efficiency will be higher. But now it's basically a cost-based era, so the order and efficiency of the size tables are irrelevant, and the Oracle optimizer automatically optimizes for efficiency.

Iv. Join condition Order in the WHERE clause

In the rule-based era, Oracle uses a bottom-up order to parse the WHERE clause, and according to this principle, we typically place the table with the lowest possible return rows in the last face, and the clause in the WHERE clause with the filter condition placed on the last side.

But in today's cost-based era, this optimization has been optimized with the help of the Oracle optimizer, so the order and condition of the tables have no effect on our query efficiency.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More