The SQL Server database startup process You do not know (a problem with the user database loading process)

Source: Internet
Author: User
Tags file copy filegroup system log

Preface

This article is mainly a supplement to the previous article, we introduced the SQL Server service startup process encountered some problems and solutions, you can click to view, we mainly describe the SQL Server startup process about the user database loading process, and provide a solution based on a range of issues encountered during the loading process.

In fact, SQL Server as a good RDBMS of Microsoft, it started the process, the system itself with the problems of those systems are relatively small, in our usual use, the problem is mostly focused on our own established user database.

And, compared to the emphasis on the surface, in fact, we are more concerned about our own set up the user database, if the system database problems, and even instances of problems, the worst case we re-environment, but if our application of the user database is broken, it is not a re-environment can be solved. This involves the interests of the company, the seriousness of the problem is self-evident!

Gossip less, we speed into the topic of this article.

In the previous article we introduced the SQL Server instance startup process, and analyzed its detailed process, in this process, there is a step is very critical, is to load the recovery of the user database process, we can intercept this log information:

Above is a normal start of the individual user library process, SQL Server will be multi-threaded database startup, and in this process for consistency check to ensure that the database started to work properly.

There are a lot of problems in this process, and before I analyze the problem, I'll first describe several common states of SQL Server databases:

RECOVERING (in recovery):

This status indicates that after the data is started, recovery is taking place, that is, the recovery process in the above log, and the same as other relational databases, SQL Server writes the transaction log first for all database behavior, then modifies the in-memory data. Then through the background of a process at the appropriate time to write to the hard disk (Lazy write), so while the database is running, the data on the disk is not up-to-date, if this time off, in the next boot process SQL Server will be based on records in the transaction log, the disk The old data is rewritten, and the rewrite process is:

1. Redo Redo

2. Rollback and Revocation Undo/rollback

The purpose of the above is to ensure database consistency.

If there is a problem with the process above, it will go into the following state:

RECOVERY PENDING (pending restore):

This process is to suspend the process of recovering data, which is basically the reason why the database files used are not opened properly. This is the first thing to remember here, and I will reproduce the problem and give the solution.

If you can find the file or can open the file, but there is a problem with the file, the opportunity appears in the following state:

SUSPECT (question):

This state, I believe a lot of users if playing the database for a long time, will occasionally encounter, relative to other states, this state is the highest.

The reason is simple: The database file is broken.

When experiencing the above several states do not have problems, the above several states, the database is not used, will enter the following state:

Online:

This state should be the most expected, the database online, normal use, the default is the normal online status.

Of course, in addition to the database state that the above several databases themselves form, when our administrator processes the database also changes the state, here we incidentally mention:

OFFLINE (offline): Online status is offline, very simple, let the database offline, users can not use

RESTORING (in Restore): This state is simple and the administrator is restoring the database without explaining

EMERGENCY (Emergency): This state is also used by the administrator, that is, the database has a problem, it is trying to solve

In the above several states, occurs during the startup process, and the problem that occurs is the above RECOVERY PENDING (hang Restore),SUSPECT (questioned), RECOVERING (in recovery):

In turn, we look at:

RECOVERY PENDING (pending restore):

This state usually occurs because the database file is not found, or the file can not be found to access the permissions, we look at the problem error message:

In the database storage mode, divided into the main filegroup and the auxiliary file group and log files, in order to display the convenience we have deliberately built a test library to reproduce the part of the problem:

<1> Primary File group issues

When the primary filegroup file cannot be accessed, that is, the Cnblogstestdb.mdf file above, the following error is reported:

Let's look at the database first:

In the process of instance initiation, there happens to be a library showing the above we mentioned a State: RECOVERING (in the recovery), I gave the picture, of course, the situation is normal, sometimes refresh is normal, other user libraries are not shown because the library is too small, recovery time is too short, we can not capture.

In our view, the test library Cnblogstestdb we built above is inaccessible, so let's take a look at the error message in error:

The error message is obviously that this file cannot be accessed, and the exact name of this is an operating system error, and we look at the operating system error Record:

You can see that part of the error message is also visible in the Windows system log.

Solution :

The solution to this problem is still very simple, generally because of the permissions problem, only need to put the database administrator account group, the right to read and write permissions can be, and then restart the service:

The above situation is to find the database file, but can not open the database file, of course, it is possible to find the database file directly, the system will report the following error:

will give a 17204 error, the report cannot find the file error

Solution :

A, if you can find the best data file, copy to the wrong path, and then restart the instance

B, can not find the file, then you have to delete the library, re-create a new library with the same name, restore from the backup file

In general, the above problems occurred in the physical storage failure, of course, do not rule out some software operations, such as anti-virus software, and others for the mistake of deletion and other reasons. If there is no backup, this can be a big disaster, the basic can be determined that the possibility of a full restore is not high! So remember: the importance of backing up the database!

<2> Auxiliary File Group issues

Above the problem of the file is the primary filegroup of the database, when our database is hosted to a certain amount of data, we take a number of auxiliary filegroups to accommodate the data, let's look at the secondary filegroup problem:

The same prompt for the secondary filegroup does not open properly, or cannot find the associated secondary filegroups, how can we solve this problem?

In fact, SQL Server database auxiliary file storage mainly for the data content information of the database, some of the schema information about this library is placed in the primary (primary) filegroup, so we can first

Solution:

A, we will not open or inaccessible database files (auxiliary files) set to offline, and then the normal data file on-line, to ensure that in addition to the damaged part of the file's other library information can be accessed normally, we change by the following code:

ALTER DATABASE cnblogstestdb MODIFY FILE (name=cnblogstestdb2,offline)
GO
ALTER DATABASE Cnblogstestdb Set ONLINE
GO

In this way, we refresh the database so that we can access the correct data information properly:

When we are in the production environment, the production library does not start normally, at the moment of burn, take the above method to ensure that some of the data can be accessed normally is also a kind of slow discussion.

The following steps are to find the auxiliary file, and to ensure that there is normal access to the permissions, more importantly, the found secondary file can not be corrupted, and then copied to the error file given in the path, and then restart the instance, live the library.

b, of course, most of the time, we can not find the file, or the file is damaged, then we have to take a second solution, restore through backup, based on previous experience, the recommended measures are:

Make a backup of the database that you can access first, and then restore the problematic filegroups by using the filegroup recovery method.

<3> log file groups

In fact, from all the databases on the market, its own mechanism is to write the log first, and then write (lazy Write) to the disk through a process, in order to avoid the IO blocking, because we all know that the problem of disk IO has been the biggest bottleneck of all file read and write.

Therefore, the log file is an integral part of the database. When the database in the startup process, the record in the log to do a consistency check of the data, the beginning of the article is introduced.

So, what happens to our SQL Server database if the log file is inaccessible, or if something goes wrong?

Let's take a look at the database schema in simple mode, and I'll set our test library to Simple mode:

Use Cnblogstestdbgoalter DATABASE [Cnblogstestdb] SET RECOVERY simple with No_waitgo

Then we stop the instance and then delete the library's log file and restart

You can see that in the simple mode, if the log file errors occur during the startup process is not any problem, here is the reason we can find the answer in the error log file launch:

After the log analysis above, we can see that when the database is in simple mode, when the database is started, if any information related to the log is found, a log file is recreated to ensure the normal access of the database.

If this is how the integrity of our database is guaranteed, so, if the database is in simple mode, when our database is closed, the system will first write all of the committed transactions to disk, all the rollback is revoked.

The above can create a normal database log file prerequisites are two:1, the database is a simple mode, 2, the database is shut down properly, to ensure that the transaction is written to disk

Now we're looking at how the SQL Server database is handled if the recovery model is in "full" mode, the last time the database was not normal.

We first change the database to the full recovery model, stop the instance, then delete the log, and then start

Then we start, can see this time, the database does not normally access the error of the log information for:

The log information for this error is also logged for us under the Windows platform:

In fact, the above error is normal, because some of the database's transactional operations have been recorded in the transaction log, has not been written to the disk data page, the time of the outage, or abnormal shutdown, this is a SQL Server database can be met, but, And in the startup process cannot find the relevant transaction log with the dedicated rollback and write operation, so the data of the library is non-consistent, so SQL Server is not let us use the library, this kind of error occurs, we have the following solutions:

Solution:

A, if there is a backup, the best way is to restore the database backup or find the log file copy to the wrong path ( recommended )

b, if there is no backup, we can only repair the database by using the CHECKDB command ( not recommended )

The CHECKDB command in the above solution is a last resort, and I can clearly tell you that this command can cause data loss when used, and in large databases, the running cycle is long!

Of course, in the case of last resort, we also take, the process is as follows:

We first set the database to EMERGENCY (emergency) mode , and the single-user (Single_user) mode

Use Cnblogstestdbgoalter database cnblogstestdb set emergencygoalter database Cnblogstestdb set Single_usergo

With our above settings, the library is set to "emergency" mode, and only for single-user access, easy for us to repair data

Then we execute the CHECKDB command to fix the database.

DBCC CHECKDB (Cnblogstestdb,repair_allow_data_loss) GO

After the repair of this command, the database will create a new log for the system, but not guarantee the consistency of the transaction, that is, it will lose data, so it is not recommended a way!

And, in the process, if it is a large database, the repair process will be very long, of course, I can not give a long reference value, because the process there will be other errors need to repair.

So consider it as appropriate.

Of course, after the recovery is complete, don't forget to change the database back to multi-user mode

Use [Master]goalter DATABASE [Cnblogstestdb] SET  multi_user with ROLLBACK Immediatego

At this point, the problematic library will be able to access it normally.

----------------------------------------------------------domineering split-line--------------------------------------------------------- --------------

After experiencing the above file-level error, the process of database startup is often a data page-level error, relative to the above file error level, the error granularity in the data page is smaller, and basically does not reflect the database level, that is, in the case of data page level error, This data can be normally accessed, but only when accessing the wrong data page will be error, when we encounter such a mistake, how to solve it?

Below we analyze in turn, first we make a classic 824 error , the following part of the content of the database part of the basis, confined to space, we do not do a detailed description:

<1> First we create a new table in our test library, and we make the table a way to act on a data page, which means that a row of databases can host a data page in a database

Use cnblogstestdbgocreate TABLE [dbo]. [Testpage] (    [A] [int] null,    [b] [nvarchar] (3900) null) on [PRIMARY]

The script is simple, a table, two columns, a column int type, a column of nvarchar (3900), a row of data storage space is: 3900*2 (nvarchar (3900)) byte +4 (int) + 96 bytes (header) + 36 bytes (line offset) = 7932 bytes, we know that a data page stores information as 8k=8192 bytes, including other consumption so if the table row of data is populated, a row of data will occupy almost one data page.

Let's add three rows of data and view the page information:

--Insert three data insert [Testpage]values (1, REPLICATE ('A',3900) Insert [Testpage]values (2, REPLICATE ('B',3900) Insert [Testpage]values (3, REPLICATE ('C',3900)) Go--View page Information dbcc TRACEON (3604)--To view a collection of pages in a library DBCC EXTENTINFO (Cnblogstestdb,[testpage])

As you can see, there are now three data pages in the table, so let's see if the data pages are nearly stained.

shows that by scanning the table information, a total of 3 data pages, the amount of data in each data page storage accounted for 96.55%, that is, basically filled.

Of course, we can also view the specific contents of each page through the DBCC PAGE command, and we'll simply look at a page numbered 90:

You can see from the above command that the data on the first row in the table is stored in the data page, and stored in the database store file in 16-binary encoding.

Of course, if you feel this way is not intuitive, you can use a small tool for data page view, here I recommend the use of internal views (this tool in the Birch Blog in detail), can more visually display the data storage page information:

Here we can click on the data content page of the first row above to view it

After analyzing the above steps, my goal is to reproduce the Classic error 824 errors that are frequently encountered during SQL Server startup or on-line databases

The above process is a principle, because we must know the underlying principles of data storage in order to understand the causes of this error and to find the correct processing method.

Next, let's reproduce the cause of the error, and we know that in my new test table there are two fields: A and B, and A is of type int, B is the nvarchar type

Then we introduced the underlying storage mechanism, and I now change the shape data content Store of the first column a field to a string type, which in turn destroys the contents of the data page .

I first stop the service, then use the file Editing tool, modify the contents of this data page, the content of the data page is the hexadecimal content, of course, before I break this part of the data page I make a full backup

Then modify the data page information, here I use the UltraEdit text Editing tool, open the file, find the data page content

We'll take the above source data a bit more to get this data page out of the way.

We save and then restart the database to see

This is our usual common 824 error process, and this process may be caused by bad disk, or mistakenly modify the file, and many other reasons, but this problem is still more common

Of course, the impact of this data page damage may not be the library level, it will not make the database inaccessible, the other table is able to access the normal, but only in the operation of this corrupted data page when the error, but sometimes these data page damage to the business impact may be fatal, So we're going to get it out.

Solemn hint : The above process can also be correct to change data page data, but if there is no precise grasp, basically can make the database paralyzed, I was to reproduce the problem before modifying the underlying metadata, so in their own production library do not mess!

A consistency check occurs during database startup, so the error should be logged in the error log file of the errors, and we'll look at:

Error logs under the Windows platform:

Of course, the problem may occur during startup, such as a disk bad path, and a series of data pages may not be accessible. So SQL Server logs these corrupted pages into the msdb system library, where we find the corrupted page collection in this library:

Now that we have reproduced the classic 824 error, how do we solve this problem?

Workaround:

A, if this problem occurs when the page is a data hosting page, also said that the page is stored as content data or a clustered index of leaf node data, and there is a mirror, version in SQL Server2005 above, then this error can be ignored, SQL Server can automatically help you fix this error.

b, if the problem occurs in an environment without mirroring, it is necessary to distinguish whether the page is corrupted by the clustered index leaf node data, if so, it is simple, directly rebuild the index is good, if not, then this kind of solution can not be solved, the method is as follows:

Using the DBCC PAGE command to view the contents of the current data page, depending on which object the page is located on Objectid, the value of METDATA:INDEXID determines whether it is a node value in the index tree, and if greater than 0 is the index value, the index can be rebuilt at this time. Like what:

We find the object that the page belongs to from the database, based on the Objectid of the page.

C, if the above solutions can not meet, that only to adopt such a scheme, we can use the database backup to restore, of course, in order to minimize the database offline, we'd better take the data page restore, the simplest way, the fastest restore speed, to minimize the database offline time, and ensure data integrity.

Here is a hint: in the SQL Server2012 version, SSMS does not provide an image of the data page restore method, in the later version of SQL Sever, there is an image of the interface operation.

Therefore, we can only restore with the following script:

RESTORE DATABASE cnblogstestdbpage='1:90'= N'F:\SQLTest \cnlogstestdb.bak'withNORECOVERY

Of course there is a transaction log, update backup, you need to restore all of the process of backup, do not forget to back up the tail log.

But this approach also has limitations:

If the corrupted data page is

1. Allocation page: GAM, Sgam, and PFS pages

2. Start page for all data files

If the above two types of corruption occur, you cannot restore the page by this backup. If this is the case, it is recommended that you consider finding a suitable time period for a full-library recovery operation. (recommended)

D, the above situation is in the case of a backup, if there is no database backup, then we can only choose the last move, that is the DBCC CHECKDB command, the same as above, this method may cause data loss, it is not recommended, if you can tolerate data loss, The process used refers to the upper part of the article. (not recommended)

At this point, we've completed a SQL Server startup process or a classic error 824 error that we usually encounter most often, let's summarize:

824 Error Reason: Most of the data page corruption caused by disk storage caused SQL Server to read the error.

Causes the error scenario: This error often occurs in the case of a bad disk, a sudden power outage, and so on.

----------------------------------------------------------domineering split-line--------------------------------------------------------- --------------

There is also a 823 error related to the 824 error, let's introduce the error message
Because of the limitations of the scene, I will not reproduce the error, here I detail the reasons and principles of these two errors, you can, if encountered, the way to solve the basic is consistent, you can refer to the above 824 error resolution method.

Each time SQL Server writes to the page, it calculates a checksum based on the data in the page, which is stored in the page together. The next time you read the page, based on the page data read, a new check value is calculated. If the data written and read are identical, then the two checksum values are equal. If the two checksum values are not equal, it means that the last data written by SQL Server is different from the one read, and the data being read is problematic.

A 823 error means that SQL Server encountered a failure in Windows read or write requests when it requested a page to read or write to the operating system. So most of the reason for this problem stems from the operating system level, or, more specifically, the physical file corruption caused by this error, such as device driver causes.

824 error is when reading the data page, found that the data page has a problem, such as reading out the checksum value is not correct.

When the 823 and 824 errors described above occur in large areas, or if the direct partial data file is completely broken, the database suspect "questioned" state occurs during SQL Server startup.

after my repeated data page destruction and devastation, I have successfully put our test library into a questioning state , we look at SUSPECT (questioned) the State Library:

Here I direct DBCC CHECKDB command try to recover under look

So here, all we have to do is avoid the above mistakes. If there is a situation in my production library, and then there is no database backup, then the rest of what you have to do: I guess is to prepare a resume ....

Conclusion

The end of this article ... The article is mainly to analyze the SQL Server startup process, load user database, encountered a series of problems, the text part of the content needs to have a certain database basic knowledge to read, space is limited, we did not do in-depth analysis, such as the above several important commands dbcc PAGE .... DBCC CHECKDB. And so on, any one can write a series of content, we focus on the solution of the problem, and the cause of the problem analysis, the following article will describe this series of command function, as well as the correct use of skills.

.... This article takes four days to complete .... Part of the database errors are my efforts to spend energy step-by-step adjustment, the purpose is to show the actual error details, in fact, the problem is easy to solve, the process of the recurrence of complex.

If you use SQL Server frequently, these problems are often encountered, so we have to remember the corresponding solution, to be prepared!

Of course, personal ability is limited, part of the improper, but also hope to point out the spirit of enlighten.

At the end of this article, we give the related article:

The SQL Server database startup process you don't know, and the analysis and resolution techniques for various issues that don't start up

If you read this blog, feel that you have something to gain, please do not skimp on your " recommendation ".

The SQL Server database startup process You do not know (a problem with the user database loading process)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.