Recently, customers have reported that the computer room has a power outage, the server after the system restarts, the database service since the start of failure. The first time you encounter this situation, in order to see if the power outage caused the database file corruption, from the client's server to get the database log, for analysis.
Database working mechanism
To analyze why the database failed to start, first explain how the database service works.
The database is divided into six major services:
There is a dependency between the six services of the database and the startup process:
Reason for service auto-start failure
From the customer, got two copies of the log, a boot from the log information, the database failed to start. The other is after the boot, manually start the database service log information, the database started successfully.
First look at the first log:
As you can see from the log, the snapshot data service has started successfully, but there is no subsequent log information indicating that the problem may be in the snapshot service, or the next equation service that will start. In the other log, all services are successfully started and run normally, which excludes the possibility of file corruption due to unexpected server power loss.
The database can be successfully started manually, indicating that the database file information is correct, but the first startup only logger, historian, snapshot service successfully started, because the subsequent no log output, it is difficult to analyze the cause of the problem.
Since the database does not have a corresponding log, it should be analyzed through the system log, where you need to seek the help of the system's event log.
In Computer Management, locate the Event Viewer and select the system, such as:
By default, the operating system logs some system events when a service fails with an exception such as a startup failure.
You can see the following information from this:
7022
There are event ID and event description, Google, but did not find the corresponding solution, no way, or only from the database log to analyze.
Analyze the log again to see a situation like this:
Snapshot service starts from the start, to the start of the end, a total of two minutes and a half, this time for the Windows service startup, the time is very long. Based on this situation, as well as the description of the system event, it is possible that the service startup timed out.
To verify this speculation, add a sleep (60000) when the database service starts, which is enough time to timeout. Starting the service from Service Manager, after a period of time, the service fails to start, and from the Event Viewer, you see the "suspended Message at service startup", which basically determines that the database service failed to start due to a time-out.
Service startup Timeout Reason
Knowing that the service failed to start is due to a time-out, but also to analyze why the service will start time-out when booting from.
or starting from the log, fortunately the log has a verbose output log, from these logs, see two time-consuming operations:
Lock historical data cache and lock the history of the write cache two operations, time-consuming in a minute or so, two operations together accounted for most of the time, the startup time-out is caused by these two operations.
Now that these two operations are time-consuming, what are they doing quietly? These two operations are actually locked in physical memory, so that they try not to swap to disk, then after boot, all the cached data is not loaded into memory, this memory is locked, it will cause the operating system to load this cached data from the disk into memory. Seemingly complex, the actual operating system is reading the disk, and the mechanical disk read speed is limited and very slow, when the cache file is very large, the pure disk time will be very long. In this case, two of the cache files are around 15G, so time is longer.
From this analysis, we find the reason for the service startup timeout.
Analysis of the reason for the smooth start of service second start
This is not the end, the service startup timeout is known, but why the second boot, you can start smoothly. Then analyze.
Then look at the second boot log:
As seen in this log, in the first boot, each time it takes up to 60 seconds to lock memory operation, only 3 seconds, it can be said very quickly.
As mentioned above, the lock memory is actually reading the disk, and the disk speed is very slow, 15G cache file, in 3 seconds read, is simply impossible. Then the only possibility is that the disk is not read at all.
In fact, this is a Windows memory management mechanism, the memory map file is unloaded, and does not immediately go to free memory, the time to free memory is determined by the operating system. When the snapshot service reloads this memory-mapped file in a very short time, the operating system discovers that the memory-mapped file is still in memory, and it is no longer loaded, because the speed of reading the disk is too slow. At boot-time, the disk is read once because the memory-mapped file is loaded for the first time and is not present in memory. So the time to lock the memory is changed from 60 seconds to 3 seconds at the first boot.
Set Windows service startup time-out
The service timeout for Windows systems defaults to 30s, and when a service's startup time exceeds this time, Service Manager considers the service to be an exception and treats it as a startup failure and then logs some system event information. But sometimes this time is relatively short, such as our snapshot service, which requires modifying the registry to solve this problem. The registry key is Hkey_local_machine/system/currentcontrolset/control/servicespipetimeout, and this value may not exist if it does not exist and needs to be added. The type is DWORD and the unit is milliseconds.
Of course, this time-out is valid for all services, and modifying this value is not the best solution. It is best to optimize the load caching process at boot time within the snapshot program to shorten startup times or to put long-time operations on startup.
Summarize
- You can modify the service startup time-out for Windows by modifying the registry.
- A memory management mechanism for Windows, memory-mapped files are unloaded and do not immediately release memory, and the time to free memory is determined by the operating system. When the program reloads this memory-mapped file in a short amount of time, the operating system discovers that the memory-mapped file is still in memory and is no longer loaded.
Of course, there is the most important point, the log system is very important, Sao years, obediently to the program added to the log!
Series Links
Go to Windows Service Series--Create a Windows service
The registration and uninstallation of Windows Service series--debug, release version, and its rationale
Gaming Windows Service Family--no COM interface causes and solutions for Windows service startup failure
Go to Windows Service Series--Analysis of service operation and stop process
Go to Windows Service Series--windows Service Tips
Go to Windows service family--command-line management for Windows services
Gaming Windows Service Series--windows service start time out