Don’t Just Rely on Query Execution Stats for T-SQL Execution

I recently had an incident where I was looking into the cause of a long running process for a client. It was a batch process that ran overnight, and execution time had been growing until it was now taking over 4 hours.

The database involved is a kind of staging area for MI, so I wanted to look at any heavy queries being executed against the database – I knew that these would all be part of the batch process. I asked them to run the following query for me:

SELECT TOP 50
   DB_NAME(qt.dbid) AS DatabaseName, 
   qt.TEXT, 
   qs.execution_count,
   qs.total_worker_time,
   qs.total_physical_reads,
   qs.total_logical_writes,
   qs.total_logical_reads,
   qs.total_elapsed_time
FROM sys.dm_exec_query_stats qs
CROSS apply sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE ISNULL(qt.dbid,0) = DB_ID();

I got less than 50 rows back so figured I had everything covered, but the total elapsed time across all the queries was less than ten minutes, I knew the server hadn’t been rebooted for about a month so potentially that could be as little as 20 seconds of query execution a night. Even if the full 10 minutes was from the last 24 hours that still didn’t account for the long run times.

So, I convinced myself it wasn’t the queries and started looking at other things they could check. It was a long list. Lots of theories revolved around the idea that something else was running on the host at the same time.

As part of that I decided to get them to run an extended events trace so I could look at exact timings and waits during the batch process (and anything else that might be going on in SQL). I got a lot of data back which took a fair while to index and load. Figuring I’d already ruled out query execution times for the batch process itself, I expected the analysis was going to be tricky to drill down on the problem, however as soon as I looked for heavy queries running during the execution window I found one specific stored procedure with the following metrics:

QueryExStats1

Ouch.

Thing is, these figures were for a stored procedure that was part of the batch run. I date for this in the original output but the figures for query execution duration and CPU were tiny in comparison to these. So, were the query execution stats wrong?

No, they weren’t, dm_exec_query_stats only captures figures for cached query plans, so you might have a query that’s not caching a plan for whatever reason – or might have something that’s doing work that’s not actually a query, imagine a while loop that’s trying to find all the prime numbers under a trillion without using sets or queries. That would consume a lot of CPU and duration but wouldn’t show in in the query execution stats – because it’s not a query.

In this case it turned out the stored procedure was:

  • Trying to process over 2 million rows using a cursor
  • For each row it would attempt to execute a stored procedure that didn’t exist using dynamic SQL
  • It would catch the failure and update an error message against the row – using the cursor
  • Every day it would attempt to process the failed rows again
  • Every day more rows would be inserted to the table that would fail

In particular it was the use of the cursor that was most expensive, and not being captured in the execution stats.

To be kind to the original developer, it was a clever generic solution to a specific problem. It wasn’t optimal from a performance point of view, but they only expected it to have to deal with a few rows daily.

Then the volume of data increased.

Then a subsequent developer made a change but didn’t know they needed to add a new supporting stored procedure. The resulting issue didn’t get picked up in testing.

The number of failed rows had been growing every night for a couple of years, and the execution time creeping up until it went exponential.

This post isn’t about bad code though, it’s about remembering that not all resource consumption on our SQL boxes comes from cached queries, so when trying to troubleshoot remember that query execution stats are one place to look, but don’t cover everything that might be being executed in T-SQL.

T-SQL Tuesday #106 – The Trouble with Triggers

tsql2sday150x150

For this month’s T-SQL Tuesday Steve Jones asks us to talk about our experiences with Triggers:

https://voiceofthedba.com/2018/09/03/t-sql-tuesday-106-trigger-headaches-or-happiness/

In my post I’m going to focus on DML (Data Modification Language) triggers. Their counterparts, DDL (Data Definition Language) triggers have any number of interesting applications. It’s usually DML triggers however that attract the most controversy.

DML triggers seem like a great functionality when you first discover them in your SQL career, especially if you come from an application coding background. Event driven programming for data – sounds brilliant. If someone does something to table X, have a trigger than in turn does something to tables Y and Z, you can even have triggers on Y and Z and cascade actions to other tables and beyond.

Once you’ve been around for a while though you realise what a headache this can end up being. You think you’re just inserting a record into table X but there’s this whole bunch of stuff going on behind the scenes that you may not know about – or want.

This is the biggest objection to triggers. They hide logic. If I have table called FeeTransaction and there is a bunch of logic that MUST happen when I insert a record here, then that’s clearer if it’s all encapsulated in a stored procedure that inserts the record. That makes it easier to maintain and troubleshoot rather than hunting around to find independent events that might be kicking off all over the place.

The only place I’m really a fan of DML triggers is in capturing Audit information. Most commonly for maintaining basic things like the last updated date and time for a record. I like having a mechanism that means I can see when data was last changed – whatever method was used to change it. i.e. if it was through the application, or if someone with write-access just ran an update query. Then I don’t need to rely on that person remembering to alter the LastUpdated column when they made their change.

The need to maintain this sort of basic audit info is so ubiquitous that it seems silly it is not a core feature of SQL Server, i.e. the ability to add a column to a table that will always automatically capture the time a record was modified, presumably this could also be done in a tamper-proof manner. In the same way we could have the ability to capture the Login context used to make the change or potentially anything else we need to capture.

Adam Machanic has raised a request with Microsoft to include functionality, if you agree it would be handy then please upvote here:

https://feedback.azure.com/forums/908035-sql-server/suggestions/32901964-provide-a-mechanism-for-columns-to-automatically-u