Objective
Some days ago to import some of the car business initialization data, a significant data error occurred, after the discovery is SQL write wrong, resulting in the entire data table was update, of course, the amount of data is not very large, restored two hours, the problem solved. But this thing let me consider for a long time, after all, after all the years, has not been such a mistake. Here are some of the options I want to introduce to avoid this happening.
Body
First, I think the persistence layer of the framework lacks certain checksum for DML operation, and it is problematic in security.
Eg:update tb_01 set a =xx,b==xxx where 1=1;
Update tb_01 set a =xx,b==xxx;
Update tb_01 set a =xx,b==xxx where A and b;
SQL similar to this is a lack of validation. https://my.oschina.net/mingdongcheng/blog/52440, this is in open source China above to see the resolution Mybaits prevent batch update do interceptor. We currently use the Rose Framework, by looking at Rose's source code, in the Rose (Systeminterpreter) class can also be added to the corresponding check interceptor, the verification interceptor includes the following:
(1) The WHERE keyword exists in the UPDATE statement and is not updated if it does not exist, and throws an exception.
(2) There is a where keyword in the UPDATE statement, if present, check whether there is (=,<>,!==) and other symbols, if present, then continue, if not exist, whether there is (like, in not, exist, not exist) and other keywords, of course not In or not exist can not contain, if present, continue, otherwise it will not be updated and throws an exception.
This prevents the issue of bulk updates.
In the past, I think an UPDATE statement actually does not need to have a return value, actually experienced this, I think this is a good coding behavior, if you update the data and your expected data inconsistent, it proves that you are wrong or you write a SQL error, This can effectively prevent the problem of this batch update to occur.
int count=update tb_02 set a=xxx where b=xx;
if (count!=1) {
throw new SQLException ("The data you have updated affects 0 rows, please check");
}
The above is pseudo-code. But the meaning is obvious. We anticipate that SQL can only update a successful record and that there are other records that are wrong. Of course, there is one more solution is to refer to the case of Mybaits, set the primary key annotation, in the DML operation, this is to prevent the batch update produced data problems.
Of course, sometimes we update the data is not only one row, there may be many lines, and unpredictable. At this point we can use two scenarios for processing
1) Use the trigger to handle the method:
Sets the trigger for the corresponding DML operation in the database, comparing the total number of queries and the total number of current data impacts each time a DML operation is made.
Cons: Batch processing with SQL scripts may be affected, and DML performance may be impacted, but it is not particularly large and negligible.
2) before DML operations, the total number is queried, followed by DML operations, and then the return value and total number are compared
Cons: Performance may be affected, but if it is particularly important data, I think the performance problem at this time is basically negligible.
Third, in fact, this is also a personal habit problem, each time the SQL finished, need to check, in the development environment to test. This is the necessary step, but sometimes the time is more urgent, you may forget this step, feel sure no problem, on-line is deadly.
Four, more important data, we can do some audit log, which is also necessary, for data flow and data recovery and so on have a trace, for data recovery has a relatively large advantage. Of course, the audit log can be processed using an asynchronous concurrency framework or Message Queuing.
Five, of course, when on-line, the data backup this is very necessary, before the launch of the database, on-line in the process of the repair of the problem is relatively large, you can do data fallback, reduce losses.
Disadvantage: Data backup may be a phenomenon of data latency, if the data fallback, there may be data loss.
Summarize
The solutions mentioned above protect against data problems caused by batch updates in DML operations, from improvements in the persistence layer framework, to the formation of some basic good habits for programmers, audit logs, and data backup. DML mainly includes (delete,update) operations.
In fact, for us, the Internet is playing data, data is the root of everything, but also the source of all, so the security and effectiveness of data is particularly important.
DML Action Considerations for databases