identify and Alert for long-running Agent Jobs
Being a DBA is like being a train conductor. One of the biggest responsibilities is making sure all jobs are running as expected, or making sure ' all trains ru Nning on time ' so to speak. As my partner-in-crime Devin Knight (Blog | Twitter) Posted earlier, we have come up with a solution to identify and alert for when SQL Agent jobs are running longer than expected.
The need for this solution came to the fact that despite I have alerts for failed agent jobs, we had a process pull a Palin and went rogue on us. The job is supposed to process a cube but since it never failed, we (admins) weren ' t notified. The only way we got notified is when a user finally alerted us and said "the cube hasn ' t been updated into a couple days, W Hat ' s up? '. Sad trombone.
As Devin mentioned in his post the code/solution below are very much a-version 1 product so if you have any modifications/s Uggestions then have at it. We ' ve documented in-line so you can figure out what the code is doing. Some Caveats Here:this solution has been on SQL Server (tested/validated) and SP4 2008 (R2). Code requires a table to is created in a database. I ' ve setup a dbadmin database on all servers-for-custom scripts for DBAs such as this, Brent Ozar ' s Blitz script, Ola Hallengren ' s maintenance solution, Adam machanic ' s sp_whoisactive, etc. You can use the "any" database you are like to keep your scripts in but just is aware of the use statement at top of this particul AR code This solution requires the have Database Mail setup/configured to setup This solution, create a Agent job th At runs ever few minutes (we ' re using 5) to call this stored procedure FYI, I set the "mail profile name" to be the same as The server name. One–makes it easy for me to standardize naming conventions across servers. Two–lets me is lazy and code stuff like I did in the line setting the "mail profile name." If your mail profile is set differently, make sure you correct it there. Thresholds–this is documented in code but I ' m calling it out anyways. We ' ve set it up so this any job whose average runtime are less than 5 minutes, the threshold is average runtime + minute s (e.g. Job runs average of 2 minutes would have an alert threshold to minutes). Anything beyond a 5 minute average runtime is controlled by variable value, with default value of 150% of average. For example, a job so averages minute runtime would have an alert threshold of minutes. If a job triggers an alert, then information is inserted into a table. Subsequent runs of the stored procedure then check the table to the if the alert has already been. We did this to avoid has admins emailed every subsequent run of the stored.
Code (WARNING: This code was currently beta and subject to change as we improve it)
Last Script update:7/12/2012
Change log:7/12/2012– Updated code to deal with "phantom" jobs that weren ' t really running. Improved logic to handle this. Beware, uses undocumented stored procedure Xp_sqlagent_enum_jobs
Download Script Link–click here
--create Long Running Jobs table use [dbadmin] go IF object_id (' dbo. Longrunningjobs ') is not NULL DROP TABLE dbo. Longrunningjobs CREATE TABLE [dbo]. [Longrunningjobs] ([ID] [int] IDENTITY (1,1) not NULL, [JobName] [sysname] isn't null, [Jobid] [uniqueidentifier] NOT NULL, [Startexecutio Ndate] [datetime] NULL, [avgdurationmin] [int] null, [durationlimit] [int] null, [currentduration] [int] null, [Rowins Ertdate] [datetime] Not NULL in [PRIMARY] go ALTER TABLE [dbo]. [Longrunningjobs] ADD CONSTRAINT [df_longrunningjobs_date] DEFAULT (GETDATE ()) for [rowinsertdate] go--create Stored Procedure Runningjobs use [dbadmin] go/****** object:storedprocedure [dbo]. [Usp_longrunningjobs] Script date:07/12/2012 08:16:01 ******/IF EXISTS (SELECT * from sys.objects WHERE object_id = object_id (N ' [dbo].[ Usp_longrunningjobs] and type in (n ' P ', n ' PC ')) DROP PROCEDURE [dbo].
[Usp_longrunningjobs] Go with [dbadmin] go/****** object:storedprocedure [dbo]. [Usp_longrunningjOBS] Script date:07/12/2012 08:16:01 ******/SET ansi_nulls on Go SET quoted_identifier in Go--=================== ==========================--Author:devin Knight and Jorge Segarra--Create date:7/6/2012--Description:mo Nitors currently running SQL Agent jobs and--alerts admins if runtime passes set threshold--updates:7/11/2012 change D method for capturing currently running jobs to use master.dbo.xp_sqlagent_enum_jobs 1, '----===================== ======================== CREATE PROCEDURE [dbo].
[Usp_longrunningjobs] As--set Mail profile DECLARE @MailProfile VARCHAR Set @MailProfile = (SELECT @ @SERVERNAME)--replace with your Mail profile name--set Email Recipients DECLARE @MailRecipients VARCHAR Set @MailRecipients = ' Dbagroup@adventurew Orks.com '--set limit in minutes (applies to all jobs)--note:percentage limit are applied to all jobs where average runt IME greater than 5 minutes--else the time limit-simply average + minuTES DECLARE @JobLimitPercentage FLOAT SET @JobLimitPercentage =--use whole percentages greater--Create I Ntermediate work tables for currently running jobs DECLARE @currently_running_jobs TABLE (job_id uniqueidentifier not N ull, last_run_date int not NULL, last_run_time int is not NULL, next_run_date int is not NULL, Next_run_time int is not NULL, next_run_schedule_id int NOT NULL, Requested_to_run int NOT null,--BOOL Request_source int not NULL, Request_source_ ID SYSNAME COLLATE database_default null, running int NOT null,--BOOL current_step int is not NULL, Current_retry_attem PT int NOT NULL, job_state int NOT null)--0 = not idle or suspended, 1 = executing, 2 = Waiting for Thread, 3 = Betwe En retries, 4 = Idle, 5 = suspended, [6 = Waitingforsteptofinish], 7 = performingcompletionactions--capture Jobs Current Ly working INSERT into @currently_running_jobs EXECUTE master.dbo.xp_sqlagent_enum_jobs 1, '--temp table exists check IF OBJECT_ID (' tempdb. # #RuNningjobs ' is not a null DROP TABLE # #RunningJobs CREATE Table # #RunningJobs ([Jobid] [uniqueidentifier] NOT NULL, [Jo Bname] [sysname] NOT NULL, [startexecutiondate] [DATETIME] NOT NULL, [avgdurationmin] [int] NULL, [durationlimit] [int] NULL, [currentduration] [INT] null) INSERT into # #RunningJobs (Jobid, JobName, Startexecutiondate, Avgdurationmi N, Durationlimit, currentduration) SELECT jobs. job_id as Jobid, jobs.name as JobName, act.start_execution_date as Startexecutiondate, AVG (FLOOR (run_duration/100)) A S Avgdurationmin, Case--if job average less than 5 minutes then limit are avg+10 minutes when avg (FLOOR (run_duration )) <= 5 THEN (FLOOR (run_duration/100)) +--if job average greater than 5 minutes THEN limit is a Vg*limit percentage ELSE (AVG (FLOOR (run_duration/100)) * (@JobLimitPercentage/100)) end as Durationlimit, DATED IFF (MI, Act.start_execution_date, GETDATE ()) as [currentduration] from @currently_running_jobsCRJ INNER JOIN msdb. Sysjobs as jobs on crj.job_id = jobs.job_id INNER JOIN msdb. Sysjobactivity as Act on act.job_id = crj.job_id and Act.stop_execution_date are NULL and act.start_execution_date is not NULL INNER JOIN msdb ... Sysjobhistory as hist on hist.job_id = crj.job_id and hist.step_id = 0 WHERE crj.job_state = 1 GROUP by jobs.job_id, job S.name, Act.start_execution_date, DATEDIFF (MI, Act.start_execution_date, GETDATE ()) have case when AVG (FLOOR ration/100)) <= 5 THEN (avg (FLOOR (run_duration/100))) + ELSE (avg. FLOOR (run_duration/100)) * (@JobLimitPe RCENTAGE/100)) End < DATEDIFF (MI, Act.start_execution_date, GETDATE ())--checks to = If a long running job has Already been identified so and are not alerted multiple times IF EXISTS (SELECT rj.* from # #RunningJobs RJ WHERE CH Ecksum (RJ. Jobid, RJ. Startexecutiondate) not in (SELECT CHECKSUM (Jobid, startexecutiondate) from dbo. longrunningjobs)--send email with results oF long-running jobs EXEC msdb.dbo.sp_send_dbmail @profile_name = @MailProfile, @recipients = @MailRecipients, @query = ' Use dbadmin; Select rj.* from # #RunningJobs RJ WHERE CHECKSUM (RJ. Jobid,rj. Startexecutiondate) not in (The Select CHECKSUM (jobid,startexecutiondate) from dbo.
Longrunningjobs) ', @body = ' view attachment to view long running jobs ', @subject = ' long running SQL Agent Job Alert '
, @attach_query_result_as_file = 1; --populate longrunningjobs table with jobs exceeding established limits inserts into [Dbadmin]. [dbo]. [Longrunningjobs] ([Jobid], [JobName], [startexecutiondate], [avgdurationmin], [Durationlimit], [currentduration]) (SELECT rj.* from # #RunningJobs RJ WHERE CHECKSUM (RJ. Jobid, RJ. Startexecutiondate) not in (SELECT CHECKSUM (Jobid, startexecutiondate) from dbo.
longrunningjobs)) Go
Got any feedback/comments/criticisms? Let me hear them in the comments!
Reprinted from Here