The monitoring of database servers can be broadly divided into two categories:
(1) Status monitoring: Is the database server running in a healthy way?
(2) Performance monitoring: Healthy operation at the same time, there is no performance problems? Could it be quicker?
a . Server
1. Status Monitoring
(1) is the server accessible?
(2) is the database service enabled?
(3) Errors or alarms in the operating system event log
(4) Disk free space
2. Performance Monitoring
(1) IO pressure
(2) Memory usage
(3) CPU usage
(4) Network bandwidth consumption
This 1,2,3,4 is in the order of the bottleneck, due to the disk read and write speed limit, usually Io is the most prone to bottlenecks, we do many optimizations are also for the IO, such as: Index optimization, read and write separation, and so on.
two . Database
1. Status Monitoring
(1) database can be opened (database status)
(2) Errors or alarms in the SQL Server/sql Server Agent error log
(3) database/Filegroup free space
(4) SQL Agent job Run status
(5) Database backup has no success
(6) Results of database restore tests
(7) Results of database consistency check (DBCC CHECKDB)
The following status monitoring, usually need to compare with the system average/baseline value is meaningful, otherwise there is no alarm standard.
(8) Number of connections, number of requests, number of transactions, number of threads
(9) Size of database/File/table
(10) Table usage, number of rows
2. Performance Monitoring
(1) There is no long-running query (generally referred to as not being blocked by any request, inefficient query)
(2) There are no blocked queries (may run very quickly, but together with other requests, it takes a long time to have a lock waiting)
(3) There is no deadlock (developer/account "deadlock" is usually blocking/waiting, database deadlock usually rarely makes the user feel waiting, generally the request is interrupted, because the kill dropped)
(4) There is no waiting (generally refers to a variety of resources waiting, waiting and blocking the intersection is the lock wait)
The following performance monitoring, usually in performance optimization as a reference, or such as: Index defragmentation/Statistics update, directly set as a background maintenance job, not directly alarm.
(5) There are no missing/unused/inefficient indexes, and index fragmentation
(6) There are no outdated statistical information
(7) There is no contention for database files (for example: Log files, tempdb contention)
(8) There is no CPU consuming large, IO read and write more queries (usually IO consumes large, that is, memory consumption of large queries)
three . other
(1). If there is a deployment of highly available policies, there will be mirroring, replication, log shipping, cluster status monitoring;
(2). Some business data have strict consistency requirements, the calibration of business data, it is best to do in the monitoring alarm inside;
(3). The availability of objects such as database/instance options, parameter settings, linked servers, and so on, is usually checked in the health check of each year/quarter, and if not, it can also be placed in a monitored alarm.
Four . How do I deploy monitoring?
1. do not select Scripts / commands for dependencies
To monitor whether the service is started as an example, the script is as follows:
(1) SQL Extended Stored procedure
123456789 |
--参数1: QueryState 检查服务状态/ Start启动服务/ Stop停掉服务
--参数2: 服务名
exec
master.dbo.xp_servicecontrol
‘QueryState‘
,
‘MSSQLServer‘
exec
master.dbo.xp_servicecontrol
‘QueryState‘
,
‘SQLServerAgent‘
exec master.dbo.xp_servicecontrol
‘QueryState‘
,
‘SQLBrowser‘
exec
master.dbo.xp_servicecontrol
‘QueryState‘
,
‘NetLogon‘
EXEC
xp_servicecontrolN
‘Stop‘
, N
‘SQLServerAGENT‘
EXEC xp_servicecontrolN
‘Start‘
,N
‘SQLServerAGENT‘
|
(2) SQL invoke operating system commands
12345678910 |
if OBJECT_ID(
‘tempdb..#tmp_started_services‘
)
is
not
null
drop
table #tmp_started_services
create
table
#tmp_started_services(started_servicesvarchar(255))
insert
into
#tmp_started_services(started_services)
exec
master..xp_cmdshell
‘net start‘
select
*
from
#tmp_started_services
where
LTRIM(RTRIM(started_services))
like
‘SQL%‘
|
If SQL Server does not start, these scripts will not run at all, and how to monitor it?
Perhaps, there will be such a train of thought, when the server is normal, send an email notification, if not received the message that the server is not normal, but if there are many servers, how to know who did not send e-mail it?
2. deployed on a dedicated / Multi-monitor machine
Server status monitoring, whether using a third-party tool or a custom script, is recommended to be deployed on a dedicated monitoring machine to monitor the target machine remotely.
Because: if the server down or failure, may be the local program/script can not run, and how to monitor it?
At last
Based on the above monitoring list, you also need to automate monitoring and alert you when problems are found.
Transferred from: http://blog.51cto.com/qianzhang/1256127
SQL Server Monitoring Checklist