Transaction: Brief Introduction Concepts and features Transactions Considerations for using transactions Transaction type |
Lock: Functions of locks and locks Space Management Resources that can be locked Lock type and its compatibility Deadlock Session-level locks and table-level locks |
Brief Introduction Relational databases have four notable features: security, integrity, concurrency, and monitoring. Database security is to ensure the security of data in the database, prevent unauthorized users from modifying data in the database at will, and ensure data security. In most database management systems, the security of databases is guaranteed through licensing. Integrity is an important feature of the database and an important mechanism to ensure the effectiveness of data in the database, prevent errors, and implement business rules. In a database, the difference between the stored data is useless garbage or valuable information, mainly based on whether the database integrity is sound. In SQL Server 7.0, data integrity is guaranteed through a series of logics, which are divided into three aspects: entity integrity, domain integrity and reference integrity. This can be said for any system, without monitoring, there will be no optimization. This statement is also practical in the database management system. Only by conducting comprehensive performance monitoring on the database can we find the factors and bottlenecks that affect the system performance. In this way, we can take appropriate policies to address the bottlenecks and improve the system performance. Concurrency is also a very important concept. It is used to solve the problem when multiple users operate on the same data. Especially for network databases, this feature is even more prominent. It is not enough to increase the processing speed of the database simply by improving the physical speed of the computer. We must also fully consider the database concurrency problem and improve the efficiency of database concurrency. So how can we ensure concurrency? SQL Server 7.0, a database product oriented to the next century, solves the database concurrency problem by using transactions and locks. Concepts and features Transactions and locks are two closely related concepts. A transaction is a unit of work, including a series of operations. These operations are either all successful or all failed. The transaction ensures that the modification of multiple data is processed as a unit. For example, in banking, there is a accounting principle, that is, lending and lending are equal. In order to ensure this principle, you must ensure that the registration of loans and loans either succeeds at the same time or fails at the same time. If you record only the borrow or only the loan, you will be in violation of the accounting principle, and the wrong account will be recorded. SQL Server manages multiple transactions through the transaction mechanism to ensure transaction consistency. Transactions use locks to prevent other users from modifying data in another unfinished transaction. For multi-user systems, the lock mechanism is required. In SQL Server 7.0, transaction logs are used to ensure the integrity and recoverability of modifications. SQL Server has multiple locks that allow transactions to lock different resources. A lock is used to protect specified resources and is not operated by other transactions. To minimize the lock cost, SQL Server automatically locks the resource object based on the lock level corresponding to the task. Locking a small object, such as locking a row, can improve concurrency, but it has a high cost, because if many rows are locked, more locks are needed. Locking a large object, such as locking a table, significantly reduces the concurrency, because locking the entire table limits other transactions to access other parts of the table, but the cost is relatively low, because only a small number of locks need to be maintained. Transactions and locks have the following features: A transaction is a unit of work, either completely or completely. Transactions ensure operation consistency and recoverability Each Transact-SQL statement can be a transaction. The actual transaction is a user-defined transaction, which includes a series of operations or statements. In a multi-server environment, user-defined distributed transactions are used to ensure operation consistency. Lock is a means to ensure Concurrency Control Resources that can be locked include rows, pages, clusters, tables, and databases. The lock types mainly include shared locks and exclusive locks. Special types of locks include intent locks, modification locks, and mode locks. The shared lock allows other transactions to continue to use the locked resources. The exclusive lock allows only one transaction to access data. The system can handle deadlocks. You can customize the features of the lock based on the actual situation. Transactions Transaction Definition A transaction refers to the work of a unit, which is either done in full or not done in all. As a logical unit, it must have four attributes: atomicity, consistency, independence (isolation), and durability. Automatic means that a transaction must be an automatic unit operation, either to modify all data or to modify all data. Consistency means that all data must be consistent when the transaction is completed. In relational databases, all rules must be applied to transaction modifications to maintain the integrity of all data. All internal data structures, such as links between tree indexes and data, must be correct after the transaction ends. Independence means that the modification of a parallel transaction must be independent from the modification of other parallel transactions. The data seen by a transaction is either the status before the transaction is modified by another transaction or the data that has been modified by the second transaction. However, the transaction cannot see the data being modified. This feature is also known as seriality. Durability means that after a transaction is completed, its impact is permanently generated in the system, that is, the modification is written to the database. The transaction mechanism ensures that the modification of a group of data is either completely executed or not executed. SQL Server uses transactions to ensure data consistency and recoverability when the system fails. A transaction is a recoverable unit. It consists of one or more statements, which can affect one or more rows of data in a table. After the transaction is opened, it is committed after the transaction is successfully completed, or when the transaction fails to be fully canceled or rolled back. Working principle of transactions Transactions ensure data consistency and recoverability. The principle 1 of the transaction is shown in. Figure 1 transaction Working Principle After the transaction starts, some operations of the transaction are successively written into the transaction log. Generally, there are two types of operations to write data to logs: one is data operations and the other is task operations. Operations on data, such as insertion, deletion, and modification, are typical transaction operations. The objects of these operations are large amounts of data. Some operations are for tasks, such as creating indexes. These task operations record a flag in the transaction log to indicate that such operations are performed. When a transaction is canceled, the system automatically performs the inverse operation to ensure system consistency. The system automatically generates a checkpoint mechanism, which occurs periodically. The checkpoint period is the time interval automatically calculated by the system based on the user-defined time interval and the frequency of system activity. Check the transaction log periodically. If all the transactions are completed in the transaction log, the check point will commit the transactions in the transaction log to the database, and make a checkpoint commit mark in the transaction log. If the transaction is not completed in the transaction log, the check point will not commit the transactions in the transaction log to the database, and a checkpoint is not submitted in the transaction log. The transaction recovery and checkpoint Protection System Integrity and Recoverability can be described in the example 2. Figure 2 example of transaction recovery and Check Point In this example, there are five transactions: Transaction 1, transaction 2, transaction 3, transaction 4, and transaction 5. The box indicates the start and completion of the transaction. The horizontal direction indicates the time. A Checkpoint indicates that a checkpoint mechanism occurs at a certain time point. A system failure indicates that a system failure occurs at a certain time point due to power failure or system software failure. The completion of transaction 1 occurs between checkpoints, so transaction 1 is committed to the database. Transaction 2 and transaction 4 are completed before the system fails, so these two transactions can be rolled forward to the database by the system. TRANSACTIONS 3 and 5 were canceled because the system failed and were not completed. Considerations for using transactions When using transactions, in principle, the transactions should be kept as short as possible and transaction nesting should be avoided. The transaction should be as short as possible, because a relatively long transaction increases the time that the transaction occupies data, so that other transactions that must wait for access to the transaction to lock the data, and prolong the waiting time for data access. When using transactions, we should adopt some corresponding methods to make the transactions as short as possible. To minimize time, be careful when using some Transact-SQL statements. For example, when using a loop statement while, you must confirm the length and time of the loop in advance, so that the loop must be as short as possible before completing the corresponding functions. Before starting a transaction, you must understand the information that requires interactive operations. In this way, some time-consuming interactive operations can be avoided during the transaction process, shortening the time of the transaction process. In a user-defined transaction, we should try to use some data manipulation languages, such as insert, update, and delete statements, because these statements mainly manipulate data in the database. Some data definition languages should be used less or used as much as possible, because the operations of these data definition languages take a long time and a large amount of resources, in addition, the operations in these data definition languages usually do not involve data, so they should be used as little as possible or not in transactions. In addition, when using data manipulation languages, you must use conditional judgment statements in these statements so that these data manipulation languages involve as few records as possible, this shortens the transaction processing time. Pay attention to some issues when nesting transactions. Although it is possible to nest a transaction in the middle of the transaction, it does not affect the performance of SQL Server to process the transaction. However, in fact, using nested transactions not only makes transactions more complex, but also has no obvious advantages. Therefore, nested transactions are not recommended. Transaction type Based on System settings, transactions can be divided into two types. One is the transaction provided by the system, and the other is the user-defined transaction. A transaction provided by the system means that a statement is a transaction when executing certain statements. It should be clear that the object of a statement may be either a row of data in the table, multiple rows of data in the table, or even all data in the table. Therefore, a transaction composed of only one statement may contain processing of multiple rows of data. For example, execute the following data manipulation statement: Update authors Set state = 'CA' This is a statement, which itself constitutes a transaction. This statement is used to modify all data in the table because it does not have any restrictions. Therefore, the transaction object is to modify all the data in the table. If there are 1000 rows of data in the authors table, the modification of the 1000 rows of data is either successful or all fail. Another transaction is defined by the user. In practical applications, most transactions are processed using user-defined transactions. When developing an application, you can use the begin transaction statement to define clear User-Defined transactions. When using user-defined transactions, pay attention to two points: first, the transaction must end with a clear concluding sentence. If you do not end with a clear conclusion, the system may treat all the operations from the beginning of the transaction to the end of the user's connection as a transaction. You can use one of the following two statements to stop a transaction: commit statement and rollback statement. A commit statement is a commit statement that explicitly submits all completed statements to the database. A rollback statement is a cancel statement that cancels all transaction operations, indicating that the transaction operation failed. There is also a special user-defined transaction, which is a distributed transaction. The transactions mentioned above are all operations on a server. The data integrity and consistency guaranteed by the transactions refer to the integrity and consistency on a server. However, if there may be multiple servers in a complex environment, a distributed transaction must be defined to ensure transaction integrity and consistency in a multi-server environment. In this distributed transaction, all operations can involve operations on multiple servers. When these operations are successful, all these operations are committed to the database of the corresponding server, if one of these operations fails, all operations in the distributed transaction will be canceled. Functions of locks and locks A lock is used to prevent other transactions from accessing specified resources. Locks are the main method for implementing concurrency control. They are an important guarantee that multiple users can simultaneously manipulate data in the same database without data inconsistency. In general, locks can prevent dirty reads, non-repeated reads, and Phantom reads. Dirty read means that when a transaction is accessing data and modifying the data has not been committed to the database, another transaction also accesses the data, then the data is used. Because the data has not been committed, the data read by another transaction is dirty data, and the operations performed based on the dirty data may be incorrect. A non-repeated read refers to reading the same data multiple times in a transaction. When the transaction is not completed, another transaction also accesses the same data. Therefore, the data read twice in the first transaction may be different because of the modification of the second transaction. In this way, the data read twice in a transaction is different, so it is called non-repeated read. Phantom read refers to a phenomenon that occurs when a transaction is not executed independently. For example, the first transaction modifies the data in a table, which involves all the data rows in the table. At the same time, the second transaction also modifies the data in this table. This modification inserts a new row of data into the table. In the future, the user who operates the first transaction will find that there are still data rows in the table that have not been modified, just like an illusion. Space Management Lock is a major means to prevent other transactions from accessing specified resource control and implement concurrency control. To improve system performance, speed up transaction processing, and shorten the transaction wait time, the locked resources should be minimized. To control locked resources, you should first understand the system space management. In the SQL Server System, the smallest unit of space management is a page, with a page of 8 K. All data, logs, and indexes are stored on the page. In addition, there is a restriction on the use of pages. This means that a row of data in the table must be on the same page and cannot be cross-page. The space management unit on the page is a cluster, and a cluster is eight consecutive pages. The smallest unit of table and index occupation is cluster. A database consists of one or more tables or indexes, that is, multiple clusters. The space management structure of the SQL server system is shown in 3. Figure 3 SQL server space management Resources that can be locked To optimize the system concurrency, you should lock different resources based on the transaction size and the degree of system activity. That is to say, you can lock a relatively large resource or a relatively small resource. These requirements have been fully met in the SQL server system. In SQL Server 7.0, there are multiple types of resources that can be locked: rows, pages, clusters, tables, and databases, their corresponding locks are row-level locks, page-level locks, cluster-level locks, table-level locks, and database-level locks. In the structure shown in 4, data rows are stored on pages and pages are stored on clusters. A table consists of several clusters, and several tables form a database. Among these lockable resources, the most basic resources are rows, pages, and tables, while clusters and databases are special resources that can be locked. Figure 4 Structure of tables, pages, and rows Rows are the minimum space that can be locked. In SQL Server 7.0, row-level locks are implemented. A row-Level Lock is a transaction that locks one or several rows of data while manipulating data. Other transactions cannot process the data of these rows at the same time. Row-level locks consume the least amount of data resources. Therefore, during transaction processing, other transactions are allowed to continue to manipulate other data on the same table or page, it greatly reduces the waiting time of other transactions and improves the concurrency of the system. Page-level locks are the optimal locks, because row-level locks cannot be wasted because data is occupied and not used. In Figure 5, the elliptical shape indicates the data occupied by row-level locks, while other data outside the elliptical shape can still be used by other transactions. Row-level locks are an important feature of SQL Server 7.0. Their introduction has led to a change in the data storage engine. Figure 5 row-Level Lock Page-level locks are used to lock a page at a time regardless of the amount of data processed by a transaction. Data on this page cannot be manipulated by other transactions. Before SQL Server 7.0, a page lock was used. There are more page-Level Lock resources than Row-Level Lock data resources. In a page-Level Lock, even if a transaction operates only one row of data on the page, other data rows on the page cannot be used by other transactions. Therefore, when page-level locks are used, data is wasted, that is, data is occupied but not used on the same page. In this case, the data waste cannot exceed the data rows on a single page. In figure 6, the circular area represents a page lock. In this circular area, only one transaction can use data in the circular area, and other transactions can only use data outside the circular area. Figure 6 page-Level Lock A cluster-level lock is a special type of lock and can only be used in some special cases. A cluster-level lock means that a transaction occupies a cluster, which cannot be occupied by other transactions at the same time. For example, this type of lock is used when the system allocates physical space during database creation and table creation. The system allocates space according to the cluster. When the system allocates space, use a cluster-level lock to prevent other transactions from using the same cluster at the same time. After the system allocates space, this type of cluster-level lock is no longer used. In particular, the cluster-level lock is not used when transactions involving data operations are involved. The structure of the cluster-level lock is 7. The elliptical area indicates the data occupied by the cluster-level lock. Other transactions can only use clusters other than the cluster. Figure 7 cluster-level lock Table-level locks are also very important. Table-Level Lock refers to the entire table in which the data of a table is locked when the transaction operates. Other transactions cannot access other data in the table. Table-level locks are generally used when the volume of data processed by a transaction is large. Table-level locks use a small amount of system resources, but occupy a large amount of data resources. Compared with row-level locks and page-level locks, table-level locks consume less system resources, such as memory, but consume the largest data resources. During table-level locks, a large amount of data may be wasted. Because table-level locks lock on the entire table, other transactions cannot manipulate other data in the table. In this way, the waiting time of other transactions will be extended and the system's concurrent performance will be reduced. The table-Level Lock structure is shown in figure 8. The elliptical shape indicates the table-Level Lock. Figure 8 Table-Level Lock Database-level locks are used to lock the entire database and prevent any user or transaction from accessing the locked database. A database-Level Lock is a very special lock, which is only used during database recovery operations. A lock of this level is the highest level because it controls the operation of the entire database. As long as the database is restored, you need to set the database to the single-user mode, so that the system can prevent other users from performing various operations on the database. The database-Level Lock Structure 9 is shown. Strictly speaking, database-level locks are not a lock, but a single-user mode mechanism similar to a lock. However, this single-user mode mechanism is very similar to the lock mechanism. Therefore, this single-user mode can also be called a database-Level Lock. Figure 9 database-level locks Lock type and its compatibility There are two basic methods to lock resources. One is the shared lock required by read operations, and the other is the exclusive lock required by write operations. In addition to these two basic types, there are also some special cases of locks, such as intention locks, modification locks and mode locks. Among these various types of locks, some types of locks are compatible, and some types of locks are incompatible. The shared lock allows parallel transactions to read the same resource. In this case, the transaction cannot modify the accessed data. When a shared lock is used to lock a resource, the transaction that modifies the data cannot access the data. When a transaction reads data, it immediately releases the occupied resources. Generally, when a SELECT statement is used to access data, the system automatically uses a shared lock to lock the accessed data. For transactions that modify data, such as insert, update, and delete statements, the system automatically places the exclusive lock on the modified transaction. Exclusive locks allow only one transaction to access one type of resource within the same time period, and other transactions cannot access resources with exclusive locks. Shared locks cannot be placed on resources with exclusive locks, that is, transactions that can generate shared locks are not allowed to access these resources. Only when the transaction that generates an exclusive lock ends, the resources that are locked by the exclusive lock can be used by other transactions. In addition to the above basic locks, SQL Server can also use some other types of locks based on different situations. These special types of locks include intention locks, modification locks, and mode locks. The system uses intent locks to minimize conflicts between locks. The intent lock establishes a layered structure of the lock mechanism, which is based on the locks of the resource range from low to high in turn is the row-Level Lock layer, page-Level Lock layer and table-Level Lock layer. Intention locks indicate that the system wants to obtain the shared or exclusive locks on low-level resources. For example, an intent lock placed at the table level indicates that a transaction can place a shared lock on a page or row in the table. Set a shared lock at the table level to prevent another transaction that modifies the page of the table from placing an exclusive lock on the table that contains the page. Intention locks can improve performance because the system only needs to check the intention locks at the table level to determine whether a transaction can safely obtain a lock on which table, you do not need to check every row lock or page lock on the table to determine whether a transaction can lock the entire table. There are three types of intent locks: intent sharing locks, intent exclusive locks, and intention-exclusive locks. The intent shared lock indicates the intention of the transaction to read low-level resources, and the shared lock is placed on these individual resources. Intention exclusive locks indicate the intention to modify low-level transactions and place the exclusive locks on these individual resources. Intention exclusive lock includes intention sharing lock, which is the superset of intention sharing lock. Use the shared lock of intention scheduling to indicate the intention of allowing the transaction to read the top-level resources in parallel, modify some low-level resources, and place the intent exclusive lock on these individual resources. For example, a shared lock with the intention to be applied to a table places the shared lock on the table, allows concurrent reading, and places the intent exclusive lock on the page to be modified, and places the exclusive lock on the modified row. Each table can have only one shared lock with the intention to arrange it at a time because the table-level shared lock prevents any modification to the table. The share lock that uses intent exclusive locks is a combination of shared locks and intent exclusive locks. Use the modify lock when the system wants to modify a page. Before the system modifies the page, the system automatically locks the modification page to the page lock to prevent conflicts between locks. When the page is read for the first time, the modification lock is obtained at the beginning of the modification operation. Modification locks are compatible with shared locks. If this page is modified, the modification lock is upgraded to the exclusive lock. Mode Lock ensures that a table or index cannot be deleted or modified when it is referenced by another session. The SQL Server System provides two types of mode locks: Mode stability locks and Mode Modification locks. The mode lock ensures that the locked resources cannot be deleted, and the Mode Modification lock ensures that other sessions cannot refer to the resources being modified. Some locks are compatible, for example, between a shared lock and a modified lock. Some locks are incompatible, such as exclusive locks and shared locks. Table 1 below lists the compatibility between various locks provided by the SQL server system. Table 1 compatibility between various locks provided by the SQL Server System In addition to the compatibility between the locks listed in Table 1, the pattern modification locks are incompatible with all locks, the Mode stability lock is compatible with all the locks except the Mode Modification lock. Deadlock Deadlock is an important topic. Deadlock is an inevitable phenomenon in the use of transactions and locks. In either case, a deadlock may occur. The first case is that when two transactions lock two separate objects, each transaction requires a lock on the object locked by another transaction, therefore, each transaction must wait for another transaction to release the occupied lock. In this case, a deadlock occurs. This deadlock is the most typical form of deadlock. There are two transactions A and B at the same time, and transaction A has two operations: locking the table part and requesting access to the table supplier; transaction B also has two operations: lock table supplier and request access table part. As a result, a deadlock occurs between transaction a and transaction B, as shown in process 10. Figure 10 deadlock The second case of deadlock is that when in a database, several long-running transactions execute parallel operations, when the query analyzer processes a very complex query, such as a connection query, the processing sequence cannot be controlled, and deadlock may occur. When a deadlock occurs, the system can automatically detect the deadlock and end the deadlock by automatically canceling one of the transactions. In the two transactions with deadlocks, their priority is determined based on the length of the transaction processing time as a rule. A transaction with a long processing time has a higher priority, and a transaction with a short processing time has a lower priority. In the event of a conflict, the transaction with a higher priority is retained to cancel the transaction with a lower priority. Session-level locks and table-level locks Generally, the lock is automatically provided by the system. However, in some cases, you can also customize the lock. That is to say, users can customize session-level locks and table-level locks. Session-Level Lock customization includes two aspects: transaction isolation level and lock timeout limit. The transaction isolation level protects specified transactions. This isolation level allows you to set an isolation level for all transactions in a session. When the isolation level is set, the default locking behavior is specified for all statements in the session. Use the following options when specifying the transaction isolation level. The read committed option instructs the system to use a shared lock when reading data. Under this option, you cannot see dirty reads, that is, you cannot see the data being modified. However, data can be changed before the transaction ends, so data that cannot be read repeatedly or phantom data can be generated. The read uncommitted option instructs the system neither to use a shared lock nor an exclusive lock. Under this option, you can see the data being modified and see dirty reads. This option is the minimum limit. The Repeatable read option instructs the system to place a lock on all data used for query to prevent other users from modifying data, but phantom reading can occur. The serializable option instructs the system to place a lock on the data. The user can modify or insert data only after the transaction is completed. This option is the most restrictive setting. In addition, you can customize the time when the transaction waits for access, which requires setting the lock timeout limit. After the lock timeout time is set, if the waiting time of the transaction exceeds the lock timeout time, the transaction is automatically canceled. A custom table-Level Lock is a table-Level Lock action by specifying one or more options for the table. In fact, custom table-level locks use an optimized and hidden method. Optimization and hiding refer to attaching relevant content options after the from clause to improve the system's ability to identify operations. There are many options for customizing table-level locks. The rowlock option instructs the system to use row-level locks instead of page-level locks or table-level locks. This is the default option of the system. The paglock option instructs the system to use page-level locks. The tablock option instructs the system to use table-level locks instead of Row-level locks or page-level locks with finer object locking, and to use shared locks. Therefore, other transactions are allowed to read table data, however, the data in the table cannot be modified. The tablockx option instructs the system to use the exclusive lock to prevent other transactions from reading or modifying any data in the table. The nolock option instructs the system not to use locks, neither share locks nor exclusive locks. In this case, you can experience dirty reading. The holdlock option instructs the system to occupy the shared lock until the transaction ends, rather than releasing it immediately when other lock requests exist. The updlock option instructs a transaction to replace the shared lock with a modified page lock when reading table data. The lock is used until the statement or transaction ends. Conclusion Network technology is the development trend of information technology. Multi-user, multi-transaction, scalability, replication, and data warehouse all aim to adapt to the development direction of network technology databases. As an important basic concept of database technology, transactions play a fundamental role in protecting database recoverability, multiple users, and multi-transactions. A transaction is a unit of work. A transaction may include one statement or one hundred statements, and all operations of these statements are completed or canceled. Transactions also play an important role in database backup and recovery. transactions can be used for transaction log backup and Incremental backup, instead, you do not have to perform full backup of time, effort, and backup media every time. A lock is a means of implementing concurrent processing methods such as multiple users and multiple transactions. There are multiple lock types and resources. The lock is automatically provided by the system and can be customized by users. In SQL Server 7.0, a major feature is the use of Row-level locks. The use of Row-level locks has caused changes in the data storage engine and transaction management methods. |