SQL Optimization-count, table join sequence, conditional order, in, exist, countexist
1. About count
I have read some articles about count (*) and count (Column). Is the efficiency of count (column) higher than that of count?
In fact, I personally think that count (*) and count (column) are not comparable at all. count (*) counts the total number of rows in the table, while count (column) counts the number of non-empty records in a column.
However, we can compare them through experiments:
First, create a test table:
Drop table test purge;
Create table test as select * from dba_objects;
Update test set object_id = rownum;
Set timing on
Set linesize 1000
Set autotrace on
Run
Select count (*) from test;
Select count (object_id) from test;
The time consumed is the same. Is their efficiency the same?
Let's try to create an index on the column object_id.
Create index idx_object_id on test (object_id );
Then execute
Select count (*) from test;
Select count (object_id) from test;
It is found that the speed of count (object_id) is significantly higher than that of count (*). Is it because count (object_id) can use indexes, so the efficiency has improved a lot?
Modify the column attribute of object_id.
Alter table test modify object_id not null;
Then execute
Select count (*) from test;
Select count (object_id) from test;
It is found that their speed is as fast as they are, and count (*) can also be used to index.
In fact, the premise of efficiency comparison is that the writing of the two statements should be equivalent. These two statements are not equivalent at all, so they are not comparable.
For the oracle optimizer, we can use experiments to find that the statistical time varies depending on the count column. The general trend is that the closer the column is to the back, the higher the access overhead, column Offset Performance of Jedi access. The overhead of count (*) is irrelevant to the offset. Therefore, in some cases, count (*) is the fastest.
2. About in and exist
Most of the statements about in and exist indicate that in is more efficient than exist. Therefore, exist must be used to replace in. But is that true?
We can make an experiment:
In 10 Gb;
Select * from dept where deptno not in (select deptno from emp );
Select * from dept where not exists (select deptno from emp where emp. deptno = dept. deptno );
We found that exist is indeed more efficient than in. This statement seems to be true.
But we will execute the following statement again.
Select * from dept where deptno not in (select deptno from emp where deptno is not null) and deptno is not null;
You will find that the efficiency of in and exist is the same with that of non-null constraints.
View the execution plans of the three statements, and you will find that the in statement and the exist statement without any non-null constraints go through the ANTI semi-join algorithm, so the efficiency is the same, the in statement without non-null constraints uses filter instead of ANTI algorithm, so the efficiency is lower.
Therefore, we can draw a conclusion: in oracle 10 Gb, if it can ensure that it is not empty, the in constraint can use the ANTI semi-join algorithm. At this time, the efficiency is the same as that of exist.
In 11g:
Select * from dept where deptno not in (select deptno from emp );
Select * from dept where not exists (select deptno from emp where emp. deptno = dept. deptno );
We found that the efficiency of the two statements is the same, and the Execution Plan is also the same. Oracle has been optimized in 11 GB, so the efficiency of in and exist is the same.
From this we can conclude that in 11g, the efficiency of using in and exist is the same, because they all use relatively efficient ANTI algorithms.
Iii. Connection sequence of large and small tables
On the Internet, we can see a lot of such articles. When performing multi-table queries, we can use small tables or cross tables as basic tables, put them behind, and put large tables behind the from, because the table access sequence is from right to left.
But is that true?
We can perform an experiment to verify (11g ):
Create table tab_big as select * from dba_objects where rownum <= 30000;
Create table tab_small as select * from dba_objects where rownum <= 10;
Set autotrace traceonly
Set linesize 1000
Set timing on
Select count (*) from tab_big, tab_small;
Select count (*) from tab_small, tab_big;
We can check the execution plan and find that the two statements are the same in efficiency. Is the table order and efficiency irrelevant to multi-table queries?
We are executing the following statement:
Select/* + rule */count (*) from tab_big, tab_small;
Select/* + rule */count (*) from tab_small, tab_big;
We can clearly find that the query efficiency is much higher for small tables on the right and large tables on the left.
In fact, in the rule-based era, query efficiency is related to the order of table connections. Small tables or cross tables are left, while large tables are executed on the right more efficiently. But now it is basically a price-based era, so the order and efficiency of the size table are irrelevant. The oracle optimizer will automatically optimize the efficiency.
4. Order of join conditions in the where clause
In the rule-based era, oracle uses a bottom-up sequence to parse the where clause. Based on this principle, we usually put the table with the least number of rows returned at the end, the where clause contains a filter condition at the end of the clause.
However, in the current price-based era, oracle optimizer has been used to optimize the optimization. Therefore, the order of the table and the order of the condition will not affect our query efficiency.
Multi-Table connection Optimization of SQL server. Is it better to put large tables on the front or back? Will the association between the where condition and the front edge table be faster?
Unlike Oracle, SQL server performs queries by itself and determines the order, so you do not need to consider the order of where.
However, there are some where condition optimizations that need to be considered. For example, = or exists is more efficient than in. You can query them online.
SQL has a function called "display the estimated execution plan" (you can find it by looking for it carefully on the right of the analysis)
It will help you analyze the execution plan and provide related optimization suggestions.
You can also optimize the query based on the analysis results and INDEX creation.
What is the order of tables in the from clause in the SQL connection query statement?
Clause,
Write from left to right, for example:
Select W. * from wc W left join cw C on W. number = C. number;