SQL statements are a convenient and confusing language. This is mainly reflected in its set operation features. Update statements are identical regardless of whether the data volume is one or 0.1 billion. However, the actual execution results (or whether the results can be returned) vary greatly.
The author's philosophy in the DBA field is: as a developer, the database and data should be in awe. Before a statement is issued, at least two issues should be considered: what is the total data volume of the target data table (after production )? How much data is involved in this operation? Different answers have different solutions.
Updating big table data is a common scenario in development and O & M, especially in the data migration field. The answer to the above two questions is: The target data table is large as a whole, and the update scope is also large. An SQL statement can be processed theoretically. However, in practice, this solution has many problems.
This article mainly introduces several common Big Table processing strategies and analyzes their advantages and disadvantages. As our developers and DBAs, the selection criteria are also flexible: According to your operation type (O & M operations or daily system jobs), program running environment (whether the hardware environment supports parallel operation) and the programming environment (whether all resources can be exclusively exclusive.
First, we need to prepare a large table.
1. Prepare the environment
We chose Oracle 11.2 for testing.
SQL> select * from v $ version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0-Production
PL/SQL Release 11.2.0.1.0-Production
CORE 11.2.0.1.0 Production
TNS for Linux: Version 11.2.0.1.0-Production
NLSRTL Version 11.2.0.1.0-Production
Prepare a large table.
SQL> create table t as select * from dba_objects;
Table created
SQL> insert into t select * from t;
72797 rows inserted
SQL> insert into t select * from t;
145594 rows inserted
(Space reason, intermediate process ......)
SQL> commit;
Commit complete
SQL> select bytes/1024/1024/1024 from dba_segments where owner = 'sys 'and segment_name = 'T ';
BYTES/1024/1024/1024
--------------------
1.0673828125
SQL> select count (*) from t;
COUNT (*)
----------
9318016
Executed in 14.711 seconds
Data Table T is used as the data source. It contains more than million records and a total space of about 1 GB. The experiment environment of the author is a virtual CPU on the virtual machine, so the scheme for Parallel operations is indicative, not representative.
Next, let's look at the simplest method: Direct update.
2. Method 1: Update directly
The simplest and most prone method is to directly update data tables regardless of the number of novices or novices. Even many old programmers and DBAs always choose such a strategy. In fact, even if the results can come out, there is a lot of luck in it.
Let's first look at the author's experiment and then discuss the reasons. Create an experimental data table t_target first.
SQL> create table t_targettablespace users as select * from t;
Table created
SQL> update t_target set owner = to_char (length (owner ));
(Long wait ......)
While waiting, I found the following phenomena:
Ü the database server runs very slowly, and many connection operations are slowed down, so the database server cannot even log on to the database for a period of time;
Ü background session wait time is concentrated on Data Reading, log space buffer, space allocation, and other events;
Ü wait for a long time, and an exception occurs at the operating system level. Undo tablespace expansion;
Ü frequent log Switching;
In addition, a friend who chooses this policy may encounter the following symptoms: foreground error throws an exception, client connection is disconnected, and so on.
I encountered such a scenario, which is also quite tangled. First, long waits (or even one night) may eventually have no results. The most terrible thing is that you do not dare to undo the operation easily, because Oracle needs to perform the rollback operation of the update operation. An hour later, I gave up.
Updatet_target set owner = to_char (length (owner ))
ORA-01013: user request to cancel current action
(Close to one hour unfinished)
After that, it is a rollback wait of the same time. Generally, the transaction is executed for too long and the rollback duration is long. During this period, you can use the x $ ktuxe background internal table to observe and estimate the rollback speed. In this process, we only have to wait ".
SQL> select KTUXESIZ from x $ ktuxe where KTUXESTA <> 'inactive ';
KTUXESIZ
----------
62877
(......)
SQL> select KTUXESIZ from x $ ktuxe where KTUXESTA <> 'inactive ';
KTUXESIZ
----------
511
The results of this strategy are generally: Peer complaints (affecting their job execution), fear (not sure where the execution is done), resource depletion (CPU, memory, or I/O is full) ). In the formal production environment, you must take responsibility for affecting business production.
We will analyze the problem of this policy in detail:
First, we need to acknowledge that the advantages of this method are simple and one-sided efficiency. Compared with other methods described in this article, this method has the least amount of code. In addition, this method can submit all tasks to the Database SQL engine at a time to maximize one aspect of the system (CPU, IO, or memory.
If the data table is small and the experience is limited, this method is suitable. We can consider using it.
On the other hand, we need to see another aspect of Oracle Update, that is, the Undo, Redo and process workload issues. Oracle users know that Undo and Redo are very important during DML operations. When we Update and Delete data, the "pre-image" before the data block is modified will be saved in the Undo Tablespace. Note: Undo Tablespace is a special Tablespace that needs to be stored on the disk. The existence of Undo is mainly to support "consistent read" operations in other sessions of the database. As long as the transaction is not committed or rollback, the Undo data is retained in the database and cannot be overwritten ".
Redo records the "post-image" for DML operations. Redo generation is related to the data volume we modified. The actual problem is that the total number of records to be modified remains unchanged unless we try the nologging option. If a single Redo log member is small, the Redo generation speed of Oracle applications is relatively high. The Redo Group has a high frequency of switching, and the system is faced with a large number of Log switching or Log Space Buffer-related wait events.
If we select the first method, the Undo tablespace is a big bottleneck. A large amount of pre-image data is stored in the Undo tablespace and cannot be released, which continuously causes Undo File expansion. If the Undo file cannot be expanded (autoextend = no), the Oracle DML operation will report an error at some time. Even if expansion is allowed, a large number of data files are written into DBWR. This means that when we perform a large number of updates, we can see a lot of DBWR writes in the event wait event. Because these writes do not always update your data tables, and many of them are Undo tablespace writes.
At the same time, a long wait operation may trigger the upper limit of Oracle and OS load, and many strange things may also happen. For example, the process is dead and the connection is disconnected.
The biggest problem with this method is the rollback action. If we encounter some exceptions during a long transaction process, it is usually because of data exceptions and the entire data needs to be rolled back. Rollback is a tool for Oracle to protect itself and maintain transaction integrity. When a long-term DML update action is interrupted, Oracle enters its own rollback stage until it is finally completed. In this process, the system runs slowly. Even if you restart the server, the rollback process is complete.
Therefore, this method must be used with caution when processing large tables !! At least evaluate the risk.
For more details, please continue to read the highlights on the next page: