【It168] SQL developers on each database platform are struggling to survive. We always make the same mistake again and again, because the database field is still relatively immature. Yes, every database vendor is making various efforts, but as a developer, it is still necessary to overcome various problems, whether in the SQL Server, Oracle, DB2, Sybase, MySQL database, writing SQL code on other relational database platforms, the speed of concurrency, resource management, space management, and SQL operations is always plaguing developers.
Unfortunately, some of the solutions to these problems do not have any panacea or best practices. Generally, developers have their favorite SQL writing habits and are reluctant to study other feasible solutions. Of course, this may be due to lack of training. What I see most is that SQL queries run well in the test environment, but the test run has not yet been conducted on the production system, so it will end up. As for the problems found later, I will modify them passively, therefore, the end user is suffering.
I don't want developers to become DBAs, but we have to consider production issues when writing code. If we don't do this at the early stage of development, DBAs will only force us to rework after discovering it.
We usually say that database debugging is a technology and an art, because there are few ready-made rules that can adapt to all problems, the problem you solve on one system may not be a problem on another system, and vice versa. No answer is correct when it comes to query adjustment, but it does not mean you should give up.
Some principles can be followed to make your work easier. This article lists seven principles that can be used flexibly. They can help you increase the SQL query speed, of course, you can consult dBA for more information about these skills.
1. Replace update with case
To update a record, we will immediately think of update, which is very common. Many developers often ignore this principle, because the use of update looks very natural and logical.
Assume that you extract records from the customer table, and you want to mark orders of more than $0.1 million as "preferred", you will think of using an update statement to update the customerrank column to "preferred ", the problem is that the update statement has logs, which means that each record is written twice. The solution to this problem is to embed the case statement in the SQL query, before the "preferred" mark is written to a table, it checks each row based on the order amount condition, and updates only when the condition is met. The performance improvement is amazing.
2. Do not blindly reuse code
This problem is also very common. It is a pleasure to directly use the code written by others at work. You know that this code can query the data you need, however, the problem is that some data is not what you need, but we often do not want to modify it. Therefore, the returned data set is often a superset, it is very likely that you can solve the problem by using an external connection or a where clause. Therefore, you 'd better check the code when reusing the code, and make adaptive modifications if necessary.
3. Extract only the columns you need
This problem is similar to 2, but this time it is a specific column. Maybe we feel very comfortable when using select *, so it's easy! If you want to write each column name, it is too troublesome. This is the idea of many people, but this idea is wrong, because this will retrieve redundant data columns, I have seen this error code countless times. Once a developer used select * to query a table with 120 columns and millions of rows of data, however, he only uses three or five columns, which is a great waste of resources. We recommend that you refuse to write select *. You can query whatever you want, and the excess returned results are useless to you, although it does not affect the functions you want to implement, it has a great impact on the database performance.
4. Try to query a large table only once
This is also a mistake I have seen many people make. For example, a stored procedure retrieves data from a large table with millions of records, developers want to extract information about customers who live in California and earn more than $40 thousand, so they first put customers who live in California in a temporary table, then, query the customer whose income is higher than 40 thousand US dollars, put the query results into another temporary table, and connect the two temporary tables to query the final result.
Maybe someone thinks I'm kidding, right? But the fact is that some people do this. This should be done in one query, but two large tables are queried.
In a slightly different situation, when multiple steps in a process require a large table subset, each step may have to query a large table. The way to avoid multiple queries is to persist the subset of the first query, and then point the subsequent steps to this persistent subset.
5. Use a temporary table
This problem may be a little difficult to solve, but the effect is obvious. In fact, you can use temporary tables in many cases. Using temporary tables can effectively reduce operations on large tables, if you have to connect a table to a large table and have conditions on the large table, you can output the data required in the large table to a temporary table, and then use the temporary table for connection, in this way, the query speed will be significantly improved. If multiple queries in your Stored Procedure need to be connected to the same table, you can also use a temporary table.
6. Pre-store data
This is my favorite one, because it is a very old technology that is often overlooked. If you have a report or stored procedure that needs to be connected to a large table and extract data from the big table in advance, if the data is stored persistently in another table, the report can use the pre-stored dataset to improve the overall execution efficiency.
Not all the time you have the opportunity to use this technology, but once you can use it, you will find that it is a very effective way to save server resources.
Unfortunately, many developers are trying their best to avoid this technology. In fact, they only need to create a view to solve the problem, however, the problem with this method is that every report that requires it is run once, but for the same report, if it was run once 10 minutes ago, someone needs to run the report again, the Join Operation on the large table can be avoided. We recommend that you use this technology to pre-store data for frequently-queried tables to save a lot of server resources.
7. Batch deletion and update
This is also an easy-to-be-ignored technique. It may be a nightmare to delete or update data in a large table. The problem is that these two operations are a single transaction, if you need to kill them, or if the system encounters problems during execution, you must roll back the entire transaction. This may take a very long time, which is why when we delete several 100,000 records, if it is almost useless to try to kill the process in the middle, these operations will also affect other transactions. Otherwise, it will lead to an endless loop. Therefore, use it with caution.
The solution to this problem is to delete or update a few rows in batches. First, you only need to roll back a few rows for whatever reason. In addition, you need to submit data to the disk in small batches, i/O requirements are also lower, and concurrency can be greatly improved.
In addition, we should try to select off-peak hours for deletion and update operations.
Summary
Follow these methods to always receive results, but in practice, you should evaluate and select one or more of the best solutions. You must remember that there is no way to be omnipotent. In addition, these skills apply to all database types, so you must master them all!