Background
SQL Server Agent provides the user with regular administrative tasks as Windows services, which are called jobs, and how the scenario for applying mirroring solves the job synchronization problem, and AWS RDS does this by ignoring the user's maintenance job, If a switch user is required to create a job on the new principal side, the other is to keep the synchronization job on the mirror side, and to try to make the user unaware of any unnecessary maintenance actions after switching, but this practice can encounter very serious problems in some cases-memory exhaustion.
Troubleshooting analysis
First-time analysis
When the problem occurs, the errorlog of the instance occurs:
Error: 701, Severity: 17, State: 123.
and recorded Memorystatus, according to Memorystatus's information itself can be inferred, but since there is a field we may wish to use a DAC (dedicated Administrator Connection) to see:
SELECTTOP 10 [type],sum ( single_pages_kb) as [SPA MEM/KB],SUM(multi_pages_kb) AS [MPA MEM/KB]FROM sys.dm_os_memory_clerksGROUP BY typeORDER BY SUM ( single_pages_kb+multi_pages_kb) DESC ;
Memory consumption
The most occupied objectstore_sni_packet, Memoryclerk_sqlconnectionpool generally with the number of connections, network PACKET size;
selectc.session_id,c.net_packet_size,s.host_name as client_host_name,s.program_name,s.client_interface_namefrom sys.dm_exec_connections cjoin sys.dm_exec_sessions son c.session_id = s.session_idjoin sys.endpoints eon c.endpoint_id = e.endpoint_idorder by c.net_packet_size desc
Net_packet_size
select value_in_use from sys.configurations where configuration_id=1544select program_name,count(*) as conn_num from sys.sysprocesses where spid>50 group by program_name order by conn_num desc
Memory specifications
1w+ link hold, according sys.dm_exec_sessions.program_name
to can be confirmed from SQL Server Agent, each session packet size is 4K, the instance of max server memory is 1G, so memory exhaustion;
So what's the 1W link doing? According to the sys.dm_exec_sessions.program_name
exposure of the job_id we find the corresponding job, first look at what the job to do;
select * from msdb.dbo.sysjobs where job_id=0x825F84340AFD5B4BA1D5AD82A8E76C1A
Job Information
First inference
This part of the business logic I no longer post, the point is that the database used by Job_step is in the Mirror (RESTORING), because the mirrored db is not available, schedule is set too frequently, so SQL Server The agent failed to release this part of the link in time to cause memory exhaustion.
First time verification
Re-built a test environment to create a similar scenario but the problem did not replicate. Interested in the small partner can be tested under: Mirror + mirror-side job + job link Image Library and frequency 10s + job content unlimited (why not open to follow the profiler can be seen).
With profiler and session-related views, we can see that when Job-step's link db is set to mirrored db, it appears:
Error: 18456, Severity: 14, State: 38.
Indicates that the account verification was successful, but the database is not accessible or has insufficient logon rights, the SQL Server Agent retries multiple times, but eventually releases the link, which is inconsistent with the previous inference.
Profiler information
select program_name,count(*) as conn_num from sys.sysprocesses where spid>50 group by program_name order by conn_num desc
Connection
Second analysis
Back to the original site, we do the analysis; When we try to reproduce, we find that the complete link is not established, but how this 1W link is done:
select spid,program_name,loginame,login_time as time,db_name(dbid) as dbname from sys.sysprocesses where spid>50
dbname
According to the SPID, let's look at the last SQL executed by the non-freed link:
InputBuffer
(@P1 nvarchar(max),@P2 uniqueidentifier,@P3 int,@P4 int)DECLARE @logTextWithPreAndPost nvarchar(max) set @logTextWithPreAndPost = N‘‘ + @P1 + N‘‘; EXECUTE msdb.dbo.sp_write_sysjobstep_log @job_id = @P2, @step_id = @P3, @[email protected], @[email protected]
Look at msdb.dbo.sp_write_sysjobstep_log
this stored procedure:
exec sp_helptext ' Sp_write_sysjobstep_log ' Text-------------------------------------------------------------------------------CREATE PROCEDURE sp_write_ Sysjobstep_log @job_id uniqueidentifier, @step_id int, @log_text NVARCHAR (MAX), @append_to_last int = 0ASBEGIN DECLARE @step_uid uniqueidentifier DECLARE @log_already_exists int Set @log_already_exists = 0 Set @step_uid = (SELECT Step_uid from Msdb.dbo.sysjobsteps WHERE (job_id = @job_id) and (step_id = @step_id)) IF (EXISTS (SELECT * F ROM msdb.dbo.sysjobstepslogs WHERE step_uid = @step_uid)) BEGIN SET @log_already_exists = 1 EN D--need Create log If "Overwrite is selected or log does not exists. IF (@append_to_last = 0) OR (@log_already_exists = 0) BEGIN--Flag is overwrite--if overwrite and log exists, D Elete it IF (@append_to_last = 0 and @log_already_exists = 1) BEGIN-Remove previous logs entries EXEC Sp_delete_jobsteplog @job_id, NULL, @step_id, NULL END INSERT into msdb.dbo.sysjobstepslogs (log, log_size, Step_uid) VALUES (@log_text, datalength (@log_text), @step_uid) END ELSE BEGIN Declar E @log_id INT--selecting TOP is just a safety net-there are only one log entry row per step. SET @log_id = (SELECT TOP 1 log_id from Msdb.dbo.sysjobstepslogs WHERE (step_uid = @step_uid) ORDER by log_id DESC)--Append @log_text to the existing log record. Note If this-action would make the value of the the log column longer than-nvarchar (max) and then the engine would raise error 599. UPDATE msdb.dbo.sysjobstepslogs SET Log. WRITE (@log_text, null,0), log_size = datalength (log) + datalength (@log_text), date_modified = getd Ate () WHERE log_id = @log_id END RETURN (@ @error)--0 means Successend
MSDN did not find an exhaustive Document, but reading the definition can also confirm that it is a stored procedure used by Job-step for advanced Configuration, which is to write job-step logs into table msdb.dbo.sysjobstepslogs. Depending on the parameters may be overwrite or append;
Log_to_table
Second deduction
Based on the previous information we can extrapolate that this 1w+ idle link is due to the internal update of the msdb log table after the completion of Job-step, the link is not released after the update is complete.
Second-time verification
The construction of the scene is basically the same as the first time, only one additional point is that Job-step has log to table turned on.
Validation successful
The results are stable and reproducible, a new link is added around 40s (Schedule 10s does not turn on step Retry attempts and Retry interval), according to Msdb.dbo.sysjobstepslogs.log_ Size and Msdb.dbo.sysjobstepslogs.log can confirm the log update frequency at 40s.
Open Profiler, constantly adjust schedule time, monitor msdb.dbo.sysjobstepslogs.log_size size, you can find that when the dispatch frequency is greater than 40s, the new link in accordance with the frequency increase, when the frequency is less than 40s, Added links by 40s one increase.
Conclusion
If the link defined in Job-step is a mirrored library (RESTORING) and a log to Table is configured, Then each log session (Msdb.dbo.sysjobstepslogs) does not automatically close, that is, the agent in this scenario there is a link leak.
SQLServer Bug Analysis · Agent Link Leak analysis (reprint)