Why SQL Server has three backup methods for full, differential, and transaction log backups, and why the database has three recovery modes, simple mode, full mode, and bulk-logged mode. This article applies to all versions of SQL Server databases over 2005.
In the case of operation alone, the database backup and restore process in SQL Server is fairly simple, can be manipulated through the Managementstudio graphical interface, or can be done using a few T-SQL statements. However, to understand the whole process of backup recovery, custom conforms to the system requirements database backup scheme, but need to know the implementation principle of the database. Backup and recovery is a core feature of the database.
Many students may have some knowledge of SQL Server backup and restore, and often use backup and restore capabilities, and I believe that most of our team of developers, with the exception of DBAs, only use the most basic functionality for backup and restore, and have only a general understanding of it.
First, the principle of the database
This paper briefly discusses the physical and logical structure of SQL Server database storage, which is the theoretical basis of backup recovery.
1. Why use a database
Why use a database instead of a file, and then write some methods to save the data yourself, because using a database we don't have to implement the very complex features of the following algorithms on our own:
A. Efficient and consistent structured data access method for complex arbitrary conditions combined with nested data queries (relational database management system even designed a language for this function: SQL)
B. Safeguarding the atomicity and integrity of data operations
C. Ensure that the data remains intact in the event of server outages, network outages, server crashes, and client crashes
D. Quickly use backups on a remote device on a new server to restore data to the state before the disaster occurs when the server is completely physically destroyed
E. Restore data to any time or before any action at any time
A is primarily functional and performance
BCDE Four can all be attributed to completeness.
The core goal of the database system is to guarantee the data integrity and data access performance on the basis of realizing the functional requirements.
Physical and logical structure of 2.SqlServer storage
(1) How to achieve completeness
After a lot of theoretical research and practice, the most effective way to achieve integrity is to divide the data storage into data and transaction logs, and use locks to control concurrent access during the access process.
The data is in real-time state.
The transaction log records the process of changing the data, and any change to the database data is written as a record in the transaction log file.
A transaction is a set of operations that either all succeed or are not executed at all.
Any data change operation must be written to the transaction log file before writing to the data file, so that the transaction can be rolled back or rolled forward to the correct state correctly due to user cancellation, data logic error, or hardware/software failure.
Control concurrent access of multiple users to the same data through locks.
This is the most basic principle of the transaction system, which is used by most relational database systems and object-oriented NoSQL database systems.
SQL Server is no exception, and each SQL Server database is comprised of a data file (*.mdf) and a transaction log file (*.ldf). SQL Server includes a variety of granularity locks from rows to tables, from pages to files.
(2) How to achieve high performance
In the process of computer data access, the algorithm of guaranteeing performance can be summed up as follows: Index and cache. From the implementation level to the application level, SQL Server is building management of various indexes and various caches.
Paging is the most natural indexing method for data that requires random access, and it is easy to cache for efficient random access. We are familiar with the operating system memory management system, file management system has adopted a paging method.
In SQL Server, the data file MDF is primarily random access, so MDF is organized in pagination form, 8KB per page. Further, 8 contiguous pages are made into an extension that is easy to manage, similar to clusters in Windows.
There are several types of data stored in MDF, including table data, index data, large chunks of data, and so on. Only one data is saved per page.
Each record in a table is saved in a table data page, and records cannot be spread, so the maximum length of records in any table is a value that is slightly less than 8KB (a large column such as ntext only computes the size of a reference). Slightly smaller because each page requires some space to hold the page's own information (header) and where it is stored in the page.
In SQL Server, for the transaction log file LDF, the primary is sequential access. During normal operation, the database management system writes the log order to the LDF file on a regular basis, and the LDF file is read sequentially on recovery. Therefore, the log files do not need paging, as long as the transaction log records that are stored sequentially are composed. Log records are also cached and are written to the LDF file periodically.
Although the database system is a system that allows concurrent access, but writes to the transaction log are serialized and cannot be concurrent, the transaction log records for any one atomic operation have their unique location in the transaction log file, that is, the unique transaction log record number, which is always increased and is not duplicated. The newer the record number, the higher the storage location.
The above content is summarized from SQL Server product documentation Sqlserverbooksonline, if interested can be in the inside to learn more detailed content. These are fundamental principles of backup and recovery.
2 Implementation of database backup
As mentioned earlier, the data file MDF preserves the real-time state of the data, and the transaction log file LDF records the process of data changes in the database. This time, we have to backup the database, there is a natural two choices:
Because the MDF file holds the real-time state of the data in the database, we only have to copy the MDF file to achieve the current time of the database backup.
Since the LDF file holds the entire process of data changes in the database, we can back up the LDF file, then rebuild the entire database from scratch with these backed up transaction log records, and restore the database to any point in time, compared to the first method, through transaction logging.
In fact, SQL Server uses both of these methods.
Recovery mode
Right-click Database Properties--options--you can see the "Recovery Model", the SQLSERVER2008 Database recovery model is divided into three types: The full recovery model, the bulk-logged recovery model, the simple recovery model, such as.
Full Recovery model
To the default recovery model. It records every step of the operations database in its entirety. Using the full recovery model, you can restore an entire database to a specific point in time, which can be the most recent backup available, a specific date and time, or a marked transaction.
Bulk-Logged Recovery model
In a nutshell, minimal logging of bulk operations saves space on log files (such as importing data, batch updates, Selectinto, and so on). For example, when inserting hundreds of thousands of records in the database, each inserted record in the full recovery model is recorded in the log, making the log file very large, in the bulk-logged recovery model, only the necessary operations are logged, all logs are not recorded, which can greatly improve the performance of the database. However, because the log is incomplete, the data may not be recoverable in the event of a problem. Therefore, the recovery mode is generally changed to the bulk-logged recovery model only when a large number of data operations are required, and the recovery mode is changed back to the full recovery model immediately after the data processing is complete.
Simple Recovery Model
In this mode, the database automatically puts inactive logs * * *, thus simplifying the restore of the backup, but because there is no transaction log backup, it cannot be recovered to a point in time of failure. Typically, this pattern is only used for databases that require less security for database data, and in this mode, the database can only make full and differential backups.
You can see that the difference between the three recovery models is that the "log" is handled differently, in terms of "log" size: Full recovery mode > Bulk-logged Recovery model > Simple Recovery model.
Backup method
SQLSERVER2008 provides four ways to back up: full, differential, transaction log, file, and filegroup backups.
Full backup
Backs up all the contents of the entire database, including the transaction log. The backup type requires a large storage space to store the backup files, and the backups take longer, and when the data is restored, only one backup file is restored.
Differential backup
A differential backup is a complement to a full backup that backs up only the data that was changed since the last full backup. Compared to full backup, the amount of data for a differential backup is smaller than full data backup, and backups are faster than full backups. Therefore, differential backups are often used as a common backup method. When restoring data, restore the previous full backup and then restore the last differential backup so that the data in the database is restored to the same content as the last differential backup.
Transaction log Backups
Transaction log backups only back up the contents of the transaction log. The transaction log records all the changes to the database after the last full or transaction log backup. The transaction log records the changes to the database over a period of time, so a full backup must be done before the transaction log backup. Similar to differential backups, transaction log backups generate smaller files and take less time, but when you restore data, you restore each transaction log backup in turn, instead of restoring only the last transaction log backup (which is the difference from a differential backup), in addition to restoring the full backup first.
File and filegroup Backups
If you created multiple database files or filegroups for the database when you created the database, you can use that backup method. Using file and filegroup backups, you can back up only some of the files in the database, which is very effective when the database file is very large, because you can back up the database in multiple times by backing up only one or several files or filegroups at a time, and avoiding large database backups for too long. Also, because file and filegroup backups back up only one or more of the data files, it is possible to restore only damaged files or filegroup backups when one or more of the files in the database are corrupted.
Examples Show
Full backup
For example, a full backup was made at 8 o ' January 1, 2012 in the morning, so that in the future, you can revert to the database state at 8 on the morning of 1st in January 2012.
Differential backup
A differential backup is a change of data after a full backup is backed up. For example, after a full backup at 8 o ' Day on January 1, 2012 and a differential backup on January 2 and January 3, the data changes from January 1 to January 2 were recorded in a differential backup of January 2, In the differential backup of January 3, data changes were recorded in the period from January 1 to January 3. Therefore, if you want to revert to the January 3 state, simply restore the full backup of January 1, and then restore the differential backup made on January 3.
Transaction log Backups
A transaction log backup is a transaction log file as a Backup object, which is equivalent to documenting every operation in the database. Assuming that a full backup was made at 8 O ' 8 o'clock in the morning on the morning of January 1, 2012, the data in the database was changed 100 times by January 2, and if a differential backup is made, the differential backup records the database state after the 100th data change, and if a transaction log backup is made at this time, The backup will be a 100-time data change situation.
For example, after a full backup at 8 o ' Day on January 1, 2012 and a transaction log backup on January 2 and January 3, the data changes from January 1 to January 2 were recorded in the transaction log backup of January 2. In the transaction log backup on January 3, data changes were recorded over a period of time from January 2 to January 3. Therefore, if you want to restore data to January 3, you need to restore the full backup of January 1, restore the transaction log backups made on January 2, and finally restore the transaction log backups made by January 3.
Choice of Backup methods
After understanding the above database backup methods, you can use the above method to back up the database for your own database. Reasonable backup database needs to consider several aspects, first, data security, followed by backup file size, and finally to do backup and restore can withstand the time range.
Small amount of data changes
For example, if the amount of data changed daily in the database is small, a full backup can be done weekly (Sunday), and a transaction log backup is made every day (before work), then once the database has a problem, the data can be restored to the previous (off-duty) state.
Of course, you can also make a full backup once a week (Sunday), and then make a differential backup every day (before work), so that once the database has a problem, you can also restore the data back to the end of the workday. Just a few days of the week. When you make a differential backup, the backup time and the backup file are incremented. However, this also has the advantage that, in the case of data corruption, as long as the full backup of the data and the day before the differential backup of the data can be, do not need to recover the transaction log backups every days, recovery time is relatively short.
Large amount of data changes
If the data in the database changes more frequently, the loss of one hours of data is a very serious loss, using the above method to back up the data is not possible, you can alternately use three backup methods to back up the database.
For example, make a full backup every day at work, make a differential backup every eight hours between two full backups, and make a transaction log backup every hour between two differential backups. As a result, data corruption can restore data to a state that is within the last one hours, while reducing the time of database backup data and the size of the backup data file.
Large database file
As mentioned earlier, when the database file is too large to be backed up, you can back up the database files or filegroups separately, and separate a database into multiple backups. In a real-world operation, there is also a situation where a backup of the database file can be used. For example, in a database where the data in some tables changes very little, and the data in some tables changes frequently, consider storing the tables in separate files or filegroups, and then backing up the files and filegroups with different backup frequencies. However, by using files and filegroups for backup, the data is restored several times to restore the entire database, so do not use the backup method unless the database file is too large to be backup-ready.
Tail-Log backup
Can you see that the data is still incomplete for the above backup scenarios? For example, yesterday night 12 o'clock made a full backup, every hour to do a transaction log backup, the last transaction log backup is today 12 o'clock noon, today is noon 12:10, found that the database data was lost or destroyed, the last transaction log backup is today noon 12 o'clock, If I restore the database to 12 points at this point, then the data that is not compromised until after 12 is lost (for example, there are three tables in the database, data for one table is corrupted, and data for the other two tables is changed by other users). At this point, "tail-log Backup" is used, the tail-log backup principle is to start from the last transaction log backup point in time, all subsequent operations are backed up, restore can find the correct data after 12 points.
Note: When you make a tail-log backup, the database forces the database to stop, and if you do not stop the database and the user continues, the tail-log backup loses its meaning. SQLServer2012 If you make a change to the data after the last backup of the transaction log, the transaction log (that is, the LSN of the current log file record (log sequence number) is greater than the maximum LSN recorded in the last transaction log backup, SQL Server differentiates log records by LSN, and it does not have a tail-log backup, it prompts and asks you to do a tail backup first.
What is the principle of SQL Server 2008 data backup and restore?