The opening remarks are coming soon. after another year, I haven't met my cnblogs friend for a long time. I also saw a few of my MSSQL replies, which always said I was looking forward to my next good article. Alas! I'm sorry, but I haven't published anything for too long. This is mainly because I am a little busy. If I add it back, I will learn, so I don't have time to sort out the materials for you. Recently, some articles will be published in the future because we want to train the company. The so-called concentration is the essence, so let's look forward to it.
In fact, writing an article summary is also a bit of repetition of your own knowledge. You can check for missing information during the summary, or provide some comments and ideas to you. I remember reading an architect's graph and talking about the importance of "writing an article.
There are a lot of opening remarks today. Please be patient. In fact, I have always wanted to write an article about how to learn things. It's only next time. Well, let's go to the topic (I will talk about it with MSDN ).
Aerial view
A) Application transactions and database storage engine transactions
B) application processing concurrency and database storage engine processing concurrency
There is a reason why we should consider it in two ways: first, the way the mechanism is handled, and the environment and circumstances in the software are different. Therefore, we can divide the transaction and concurrency into two parts. This article focuses on MSSQL transactions, in the following article, I will talk about how to process transactions in java or C # and how to combine the concurrent architecture model mentioned by Martin Fowler.
L what is a transaction?
Let's look at the definition given by MSDN: a transaction is a series of operations performed as a single logical unit of work. A logical unit of work must have four attributes: atomicity, consistency, isolation, and durability. Only in this way can a transaction be made.
Don't look down on these definitions. We often get to the deepest level from the most basic definition. I often see some friends in CSDN or cnblogs, it is not correct to say what to use, and what to use. If you are not familiar with ACID of transactions, you can look at it.
Ms-help: // MS. MSDNQTR. v90.chs/udb9/html/c193ad34-be19-408a-a0fa-9723a7936a3c.htm (local MSDN installed ).
L what is concurrency?
When multiple users access data at the same time, it is called concurrency in this case.
Well, the above definition is intuitive and simple. I believe most of my friends have known these things for a long time.
Next, let's talk about concurrency first, ms SQL SERVER 2005 gives us two ways to handle concurrency, as we are familiar with using transaction isolation levels, and lock tips (not clearly defined, can be seen later. Those who know the principles should know that the above is based on "resource lock. There is also a mechanism for processing concurrency, which you may not know very well. It is based on "Row Version Control". To put it bluntly, it is to maintain a copy of a row and some processing mechanisms.
So let's look at the definition of MSDN,
· Locking
Each transaction requests different types of locks for the dependent resources (such as rows, pages, or tables. The lock can prevent other transactions from modifying resources in a way that may cause transaction request lock errors. When the transaction no longer depends on the locked resource, it releases the lock.
· Row Version Control
When the row version control-based isolation level is enabled, the database engine maintains the version of each row modified. The application can specify that the transaction uses the row version to view the data that exists at the beginning of the transaction or query, rather than using the lock to protect all reads. By using row version control, the possibility of reading operations to block other transactions is greatly reduced.
When we do not use this write method in MSSQL, it may lead to some unexpected results, so we can look at some exceptions. (I will cite many original Microsoft documents. Sorry. I am mainly responsible for explaining what is hard to understand, some provide some code)
Test environment: ms SQL SERVER 2005
Test Tool: SSMS
Run the following code to create a test environment:
SET NOCOUNT ON
GO
USE [Master]
GO
DECLARE @ Path VARCHAR (MAX );
SET @ Path = (select substring (physical_name, 1, CHARINDEX ('master. mdf ', physical_name)-1) FROM sys. master_files WHERE file_id = 1 AND database_id = 1 );
SELECT @ Path AS 'your database folder'
-- Detect whether the database is exist
IF DB_ID ('goodgoodstudydaydayup') IS NOT NULL
Drop database goodgoodstudydayup;
GO
-- Create the test-driven database
Create database goodgoodstudydayup;
GO
-- Change the context
USE goodgoodstudydayup;
GO
L update loss
When two or more transactions select the same row and update the row based on the originally selected value, the update will be lost. Every transaction does not know the existence of other transactions. The final update will overwrite the updates made by other firms, which will lead to data loss. For example, two editors make an electronic copy of the same document. Each editor changes its copy independently and saves the changed copy to overwrite the original document. The edited person who saves the change copy finally overwrites the changes made by another edited person. This problem can be avoided if the other editor cannot access the same file before an editor completes and submits the transaction.
Obviously, if you and your colleagues modify a code at the same time, when you are all ready to submit, the terrible thing will happen, because late submissions often win, it will overwrite your version, which means you are busy. Similarly, this kind of thing can be hidden in many cases.
L uncommitted read (dirty read) (Example 1)
When another row is being updated is selected for the second transaction, uncommitted dependency issues will occur. The data being read by the second transaction has not been committed and may be changed by the transaction that updates this row.
For example, an editor is changing an electronic document. During the change process, another editor copied the document (this copy contains all changes made so far) and distributed it to the expected users. After that, the first editor thought that the current change was incorrect, so he deleted the edit and saved the document. Documents distributed to users include edited content that no longer exists, and such edited content should be deemed as never exists. This issue can be avoided if no one can read the modified document before the first editor saves the final change and submits the transaction.
T-SQL Demo:
Session 1:
USE [goodgoodstudydayup]
GO
BEGIN TRAN
-- The editor starts to read the first record.
SELECT * FROM dbo. Test1 WHERE TestID = 1
-- Final modification and editing
UPDATE dbo. Test1 SET String = 'aaa' WHERE TestID = 1
Waitfor delay '00: 00: 04 ';
-- If an error is found, cancel the submission.
ROLLBACK
Session 2:
USE [goodgoodstudydayup]
GO
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
-- The editors start to read the first record and send it to others.
SELECT * FROM dbo. Test1 WHERE TestID = 1
-- Or directly run
SELECT * FROM dbo. Test1 WITH (NOLOCK) WHERE TestID = 1
Note: Two new queries are created. Copy the above Code in query 1 and query 2, and then run 1 and then 2.
We found that the session first updates the record, and then rolls back after A delay of 4 seconds. Then the final record is still, session 2 immediately reads the dirty data while updating the data (that is, the UNCOMMITTED data, so you need to set the isolation level to read uncommitted, that is, commit, because Session 1 has not been submitted yet.
So what's wrong with the header here? Why is the NOLOCK "lock prompt" I used (once again, not in a hurry) better than setting the isolation level to dirty reading? Hey, to become a master, you must be careful with what you see! @! # @, I am a cainiao, er ....
So what does it use? Actually, I
When we see the "Lock prompt", it implies that the lock is used to control the concurrency.
Microsoft provides us with a series of views and stored procedures to observe locking.
T-SQL:
USE goodgoodstudydayup;
GO
-- 2000
SP_LOCK
-2005
SELECT * FROM sys. dm_tran_locks
Back to the problem we just encountered, let's take a look at what locks were obtained during session 1.
First, execute Session 1 and immediately execute Session 3 (that is, the session that queries the locked information)
Result:
Resource_type request_mode
PAGE IX
OBJECT IX
KEY X
The results show that the intention exclusive lock, intention exclusive lock, and exclusive lock on the resource PAGE, OBJECT, and KEY are requested in the updated transaction respectively. I will explain the lock concept one by one. Let's first lock the KEY to see if the KEY (KEY value) is the record locked by the primary KEY during the update, so the requested lock resource is the KEY, then, request_mode indicates the requested lock type, and X indicates exclusive lock. Please remember that the so-called exclusive lock, we can see in the MSDN lock compatibility that it is not compatible with all other types of locks. What does that mean? When we request a lock on a certain resource, we cannot obtain any other lock.
The following is a simple test:
We just need to modify the code and remove it.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
And WITH (NOLOCK ).
Then we will first execute Session 1 for updates. Now we have requested an exclusive lock on the KEY, so we will immediately execute the modified Session 2 and query all requested lock resources in session 3.
Result:
PAGE IS
PAGE IX
OBJECT IS
OBJECT IX
KEY X
KEY S
We found that there IS one more lock at the page level, OBJECT level, and KEY level. They are intention-sharing locks and shared locks. I will explain the shared locks first, when we execute a query statement, we will request a shared lock (for shared resources ).
So Session 2 can be explained after a lock wait, because Session 1 obtains the exclusive lock on the KEY (ID = 1) resource, and Session 2 wants to obtain the shared lock on the KEY, as mentioned above, the X exclusive lock is not compatible with any other locks, so Session 2 will be blocked.
Then, let's take a look at the official definition of S and X locks:
Share (S)
Read operations without changing or updating data, such as SELECT statements.
Exclusive (X)
Used to modify data, such as INSERT, UPDATE, or DELETE. Make sure that the same resource is not updated at the same time.
So we get an explanation. We have an UPDATE statement for session consistency. Hey hey, you are very smart. You may also guess why the isolation level and lock prompt setting won't stop Session 2, when the isolation level is set to "allow dirty read (read uncommitted)", the shared lock on the resource will not be requested. In this case, modify those sessions in the previous test, test:
PAGE IX
OBJECT IX
KEY X
So when we use
Set transaction isolation level read uncommitted and WITH (NOLOCK)
When you do not request a shared lock on the resource, there is another situation that does not request a shared lock, that is, the row version control technology. I will talk to you later.
Through the above example, I think we should understand some things, when will we request a shared lock, and when will we request an exclusive lock. First, I would like to explain the concept of "lock prompt": You can specify a lock prompt for a single table reference in the SELECT, INSERT, UPDATE, and DELETE statements. The system prompts you to specify a Microsoft SQL Server database engine instance for table data lock type or row version control. Table-Level Lock tips can be used to control the lock types obtained by objects in more detail. These lock prompts overwrite the current transaction isolation level of the session. Let's take a look at Microsoft's suggestions:
Note:
The Database Engine Query Optimizer almost always selects the correct lock level. We recommend that you use the table-Level Lock prompt only when necessary to change the default lock behavior. The prohibition lock level in turn affects concurrency.
If you are not a DBA expert, do not use the locking prompt.
Okay, come to an end. Let's figure it out in your head. Let's take a look at the previous test. I provided the download in the Example 1 folder. When I think about it, let's look at the following definition, which is the official definition of MSDN. I have summarized it and there are always some differences with the official one. I can see the most primitive one, it's easy to understand. Let's take a closer look: (I have already proposed it from MSDN)
Locking is a mechanism used by the Microsoft SQL Server database engine to synchronize simultaneous access to the same data block by multiple users.
Before a transaction obtains the dependency of the current status of a data block (for example, by reading or modifying data), it must protect itself from the impact of other transactions on the same data. The transaction locks the data block by request. The lock has multiple modes, such as sharing or exclusive. The lock mode defines the dependency level of the transaction on the data. If a transaction has obtained a lock for specific data, other transactions cannot obtain a lock that will conflict with the lock mode. If the lock mode of the transaction request conflicts with the lock that has been granted the same data, the database engine instance suspends the transaction request until the first lock is released.
When a transaction modifies a data block, it holds the lock that protects the modification until the transaction ends. The length of time that the transaction holds (the obtained lock is used to protect read Operations) depends on the transaction isolation level settings. All locks held by a transaction are released when the transaction is completed (whether committed or rolled back.
Generally, applications do not directly request locks. Locks are managed internally by a part of the database engine, called the lock manager. When a database engine instance processes a Transact-SQL statement, the Database Engine Query processor determines the resources to be accessed. The query processor determines the type of lock required to protect each Resource Based on the access type and transaction isolation level settings. The query processor then requests the appropriate lock from the lock manager. If it does not conflict with the lock held by other firms, the lock manager will grant the lock.
When you see this, are you awake a lot? Congratulations. So proceed .... In the learning test just now, we saw some other locks, such as IX and IS. What are they?
The Microsoft SQL Server database engine has multi-granularity locking, allowing a transaction to lock different types of resources. To minimize the lock overhead, the database engine automatically locks resources at the appropriate task level. Locking at a smaller granularity (such as rows) can increase the concurrency, but the overhead is high, because if many rows are locked, more locks need to be held. Locking in a large granularity (such as a table) reduces the concurrency, because locking the entire table limits access to any part of other transactions in the table. However, the overhead is low because there are fewer locks to be maintained.
Generally, database engines must obtain multi-granularity locks to fully protect resources. These multi-granularity locks are called lock hierarchies. For example, in order to completely protect the reading of indexes, Database Engine instances may have to obtain the row-based shared locks and the intention-shared locks on pages and tables. Here are the lock granularity and hierarchy results.
For the purpose of each lock, you can see the compatibility line between the lock and each lock. I strongly recommend that you be familiar with the use of these locks, so that you can perform resource concurrent processing better after learning about these locks. If you have not read the lock purpose, you will be able to make more effort in the next article, so you will continue after reading it. And the isolation level in the database engine.
L inconsistent analysis (non-repeated reading) (Example 2)
When the second transaction accesses the same row multiple times and reads different data each time, an inconsistent analysis problem occurs. The inconsistent analysis is similar to the uncommitted dependency because other transactions are changing the data being read by the second transaction. However, in an inconsistent analysis, the data read by the second transaction is committed by a transaction that has been changed. In addition, inconsistent analysis involves reading the same row multiple times (twice or more), and every time the information is changed by other transactions, we call it "non-repeated read ".
For example, the editor reads the same document twice, but the author overwrites the document between the two reads. The document has been changed when the editor reads the document for the second time. The original reads cannot be repeated. This problem can be avoided if the author cannot change the document before the last time the editor reads the document.
Similarly, we use T-SQL to simulate this environment:
Session 1:
/*
Author:
DateTime: 2009.06.15 20:13
Location:
Description: test session 1 that cannot be read repeatedly
*/
USE goodgoodstudydayup;
GO
BEGIN TRAN
-- The editor reads a record.
SELECT * FROM dbo. Test1 WHERE TestID = 1
-- He handled some things, got in the toilet, and waited for a second.
Waitfor delay '00: 00: 05'
-- Query again after processing
SELECT * FROM dbo. Test1 WHERE TestID = 1
COMMIT
Session 2:
/*
Author: longeur
DateTime: 2009.06.15 20:13
Location: Chengdu
Description: Repeatable read test session 2
*/
BEGIN TRAN
-- The editor modifies the data while Session 1.
UPDATE dbo. Test1 SET String = 'A' WHERE TestID = 1
COMMIT TRAN
-- Restore record
-- UPDATE dbo. Test1 SET String = 'A' WHERE TestID = 1
Session 3:
/*
Author: longeur
DateTime: 2009.06.15 20:13
Location: Chengdu
Description: test session 3 that cannot be read repeatedly
*/
USE goodgoodstudydayup;
GO
SELECT * FROM sys. dm_tran_locks
We run sessions 1, 2, and 3 respectively.
When a session is executed, it takes five seconds to immediately execute Session 2. Because the shared lock obtained by the query statement in session 1 is a matter of an instant, it will be released immediately after the query is completed, therefore, Session 2 does not wait for a lock. The update is successful and Session 3 is executed. Because Session 1's shared lock may have been released, and Session 2 update is also an instant event, therefore, no other locks can be obtained. Only three database share locks can be obtained because three sessions are enabled. (Of course you can place waitfor delay in session 2 for latency, so you can test the X lock observed in session 3 .)
The result of the last session is:
TestID String
-------------------------------
1
(1 row (s) affected)
TestID String
-------------------------------
1 AA
(1 row (s) affected)
Obviously, the error of repeated reading as described by MSDN occurs. The editor reads the same document twice, but the author overwrites the document between the two reads. The document has been changed when the editor reads the document for the second time.
If you READ the previous article about locks and the isolation level in the database engine, you should know that "committed read" read committed cannot prevent this problem, why? Large screen, please.
The principle of committed read is to obtain the S shared lock on updated resources. However, this is the case where the resource is released immediately after the shared lock is obtained and Session 2 updates the resource at this time. When the session continues, the updated data is read. That is to say, to prevent the isolation level of dirty reads, we can only ensure that the data is committed by other transactions at the moment of data acquisition; otherwise, lock wait occurs, wait for other transactions/sessions to release the X lock, and then hand it to the current transaction to obtain the S shared lock. Therefore, in an instant, it cannot be guaranteed that the reading of the entire transaction is not affected by other transactions. Committed reads, only ensure that the data you obtain is submitted by others.
In this case, I declare "repeatable" to prevent this problem:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
After Session 1 is set, run session 1, run Session 3 immediately, observe the obtained lock, run Session 2, and run row session 3.
Session 3 displays the following results:
First time:
PAGE IS
OBJECT IS
KEY S
Second:
PAGE IX
PAGE IS
OBJECT IX
OBJECT IS
KEY S
KEY X
The first result shows that when the isolation level is specified, the shared lock will be obtained from the specified resource and will continue to be exposed to transactions, which means, our session 2 will be blocked because it wants to obtain the exclusive lock of the resource. It can be seen from the results of 2nd times, so that when the process of document processing by Editor 1 is not affected by other sessions, the results of multiple reads are inconsistent.
Let's take a look at the detailed definition of this isolation level in MSDN: the specified statement cannot read the rows modified but not committed by other transactions, and specify, no other transaction can modify the data read by the current transaction before the current transaction is completed. A shared lock is set for all data read by each statement in the transaction, and the shared lock is kept until the transaction is completed. This prevents other transactions from modifying any row read by the current transaction. Other transactions can insert new rows that match the search conditions of the statements issued by the current firm. If the current transaction then re-runs the statement, it will retrieve the new row, resulting in phantom read. Because the shared lock is kept until the transaction ends, rather than released at the end of each statement, the concurrency level is lower than the default read committed isolation level. This option is used only when necessary.
The red part proves that it is correct to say that we have placed the shared lock for reading resources until the end of the transaction. He also mentioned that this method cannot prevent phantom reading. So let's take a look at this.
A phantom read problem occurs when an insert or delete operation is performed on a row that belongs to the row being read by a transaction. Due to the deletion operation of other transactions, the range of rows read for the first time by the transaction shows that one row no longer exists in the second or subsequent read content. Similarly, due to the insert operation of other transactions, the second or subsequent read content of the transaction shows a row that does not exist in the original read content.
L phantom read (Example 3)
For example, an editor changes the document submitted by the author, but when the production department merges the changes into the master copy of the document, the author has added unedited new materials to this document. Similar to non-repeated reading, this issue can be avoided if no one can add new materials to the document before the editors and production departments process the original document.
We still use three sessions to simulate this environment:
Session 1:
/*
Author: longeur
DateTime: 2009.06.15 20:13
Location: Chengdu
Description: phantom read test session 1
*/
-- Delete the ID record first
-- Delete from dbo. Test1 WHERE TestID = 5
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
SELECT * FROM dbo. Test1 WHERE TestID <= 7
Waitfor delay '00: 00: 05'
COMMIT
Session 2:
/*
Author: longeur
DateTime: 2009.06.15 20:13
Location: Chengdu
Description: phantom read test session 2
*/
SET IDENTITY_INSERT dbo. Test1 ON
Insert into dbo. Test1 (TestID, String) VALUES (5, 'E ');
SET IDENTITY_INSERT dbo. Test1 OFF
Session 3:
/*
Author: longeur
DateTime: 2009.06.15 20:13
Location: Chengdu
Description: phantom read test session 3
*/
USE goodgoodstudydayup;
GO
SELECT * FROM sys. dm_tran_locks
We run session 1, Session 3, Session 2, and Session 3 respectively. The final result of 3 is:
PAGE IX
PAGE IS
KEY RangeS-S
KEY RangeI-N
OBJECT IX
OBJECT IS
KEY RangeS-S
KEY RangeS-S
KEY RangeS-S
KEY RangeS-S
KEY RangeS-S
KEY RangeS-S
KEY RangeS-S
We can see that there is a RangeI-N lock in the result. Let's look at the definition of MSDN:
RangeI
Null
RangeI-N
Insert range, empty resource lock; used to test the range before the new key is inserted in the index.
Let's take a look at our test. In fact, there is a record that does not exist, so there is also a lock like this. For more information, see here.
Well, I have also described several situations. The readers should study the specific situation on their own. After all, you should make it clear and understand it on your own, right.
Now, I want to write other articles to help you summarize them. In fact, all the things mentioned in MSDN are clear enough, but a little messy :).
This article is a draft. It is used for training for the company. If it is modified, it cannot be sent. Sorry.
/Files/bhtfg538/T-SQL.rar