People tend to get into a misunderstanding when they use SQL, it is too focused on whether the resulting results are correct, ignoring the possible performance differences between different implementations, which are particularly evident in large or complex database environments, such as online transaction processing OLTP or DSS for decision support systems. In the work practice, the author finds that bad SQL often comes from improper index design, inadequate connection condition and the WHERE clause which is not optimized. After the proper optimization of them, the speed of their operation has been significantly improved! I will summarize these three aspects separately:
To give a more intuitive explanation of the problem, the SQL run time in all instances is tested, and no more than 1 seconds is represented (< 1 seconds).
Test environment
Host: HP LH II
Frequency: 330MHZ
Memory: 128 MB
Operating system: Operserver5.0.4
Database: Sybase11.0.3
First, unreasonable index design
Example: The table record has 620000 rows, and the following SQL runs under different indexes:
1. A non-clustered index was built on date
select count(*) from record where date >
'19991201' and date < '19991214'and amount >
2000 (25秒)
select date,sum(amount) from record group by date
(55秒)
select count(*) from record where date >
'19990901' and place in ('BJ','SH') (27秒)
Analysis:
Date has a large number of duplicate values, under a non-clustered index, the data is physically stored on the data page, and a table scan must be performed to find all the rows in the range when the range is searched.
2. A clustered index on date
select count(*) from record where date >
'19991201' and date < '19991214' and amount >
2000 (14秒)
select date,sum(amount) from record group by date
(28秒)
select count(*) from record where date >
'19990901' and place in ('BJ','SH')(14秒)
Analysis:
Under the cluster index, the data is physically sequentially on the data page, and the duplicate values are also grouped together, so in the range lookup, you can find the starting point of the range, and only in this range scan the data page, avoid a wide range of scans, improve the query speed.