I. Server 1. Status Monitoring (1) is the server accessible? (2) is the corresponding database service enabled? (3) errors or alarms in Operating System Event Logs (4) Monitoring of server status of available disk space, whether using third-party tools or custom scripts, we recommend that you deploy them on a dedicated monitoring machine, because if the server DOW
I. Server 1. Status Monitoring (1) is the server accessible? (2) is the corresponding database service enabled? (3) errors or alarms in Operating System Event Logs (4) Monitoring of server status of available disk space, whether using third-party tools or custom scripts, we recommend that you deploy them on one or more dedicated monitoring machines, because if the server DOW
I. Servers
1. Status Monitoring
(1) is the server accessible?
(2) is the corresponding database service enabled?
(3) errors or alarms in Operating System Event Logs
(4) available disk space
Server Status Monitoring, whether using third-party tools or custom scripts, is recommended to be deployed on one or more dedicated monitoring machines, because if the server is DOWN or fails, any local program/script may be unable to run, thus losing the significance of monitoring.
Some people even think about writing SQL statements in the local SQL Server to monitor the Server status. Although it can be implemented, it is a bit self-contradictory. Maybe there is another way of thinking. When the Server is normal, SQL Server sends an email notification. If there is no email, it means the Server is abnormal. If there are many servers, how can I know who hasn't sent an email?
2. Performance monitoring
(1) IO pressure
(2) memory usage
(3) CPU usage
(4) network bandwidth usage
This 1, Hong Kong server, 2, Hong Kong server, and 3, 4 are arranged in the order prone to bottlenecks. Due to disk read/write speed restrictions, I/O is usually the most prone to bottlenecks, many of our optimizations are for IO, such as index optimization and read/write splitting.
From the DBA's point of view, some server performance monitoring, if possible, can be done from the database layer.
Ii. Database
1. Status Monitoring
(1) Whether the database can be opened (Database status)
(2) is the database backup successful?
(3) errors or alarms in SQL Server/SQL Server Agent error logs
(4) SQL Agent job running status
(5) database consistency check results (dbcc checkdb)
(6) database restoration test results
The following status monitoring items usually need to be compared with the average value/baseline value of the system. Otherwise, there is no alarm standard.
(7) Number of connections, requests, and transactions
(8) database/File Usage, size, and available space
(9) Table usage, number of rows, and occupied space
2. Performance monitoring
(1) Whether a query has been running for a long time (generally, a query that is not blocked by any requests and is inefficient)
(2) Whether the query is blocked (it may run fast independently, but it takes a long time to wait with other requests due to lock wait)
(3) Is there a deadlock? (the "deadlock" mentioned by developers/users is usually blocking/waiting. Database deadlocks usually seldom make users feel waiting. Hong Kong servers generally interrupt requests, because it is killed)
(4) Whether there is a wait (generally refers to the waiting of various resources. The intersection of waiting and blocking is the lock wait)
(5) Are there missing/unused/inefficient indexes and index fragmentation?
(6) Are there any expired statistics?
(7) Is there any competition for database files (such as log files and tempdb contention)
(8) Are there queries that consume a large amount of CPU and I/O reads and writes (I/O consumes a lot, that is, queries that consume a large amount of memory)
Iii. Others
(1) If a high availability policy is deployed, images, replication, log transmission, and cluster status will be monitored;
(2). Some business data has strict consistency requirements, and business data validation is also recommended in monitoring alarms;
(3 ). the availability of database/instance options, parameter settings, login, user, linked server, and other objects is usually checked in the health check every year/quarter, if you are not at ease, you can also place them in monitoring alarms.