Extended Events Made Easy: Using XEvent Profiler

In this post we’ll look at how quick and easy it is to use the XEvent Profiler to create an Extended Events (XE) session to replace the most common usage of SQL Server Profiler.

Before you start, update your SSMS to a recent version (17.3) or later:

https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-2017

Now, let me ask you the question – how often do you just go into Profiler and create a trace taking all the defaults? If you’re like me then that’s pretty often. It’s nice and easy, select the menu item from SSMS and you’re going in a couple of clicks.

All XEvent Profiler does is implement a couple of standard XE sessions very similar to Profiler, that you can launch with a click or two.

You can find XE Profiler in the object explorer in SSMS under each of your SQL instances – just below SQL Server Agent:

XE_1

Because it’s there for each instance, you don’t have to then connect to the instance separately (as you do in Profiler).

You’ve got two options, Standard which pretty much equates to the default trace in Profiler, or TSQL which will just capture SQL statements being executed. Let’s right-click on Standard and select “Launch Session”. Here’s the Live Data view that comes up:

XE_2

You can see that looks pretty similar to what you see for your traditional traces.

Now, one thing I almost always do for this sort of trace in Profiler is capture the database name. This is a bit of a pain, because I have to select “View all columns”, scroll along to find the database name column, then select it for each relevant event I’m capturing.

In XE it’s much easier. I just right click on one of the column headers in the above view and select “Choose Columns”:

XE_3

Which brings up the menu of columns available to me:

XE_4

I can then just select the database_name column in the left and use the arrow button to push it into the list of selected columns. Want row_count? That’s there too, along with a whole bunch of other stuff. I’ll add those and remove a few I don’t want and exit the dialog.

The other thing I’m commonly going to want to do is add a filter to one or more of the columns. When you’re viewing live data you get the Extended Events toolbar shown in SSMS below the menu bar, just click on Filters…

XE_5

And you get the filters dialog:

XE_6

Click below “Field” to select a column and then add a value to filter on. In this case I’m going to filter to look at a specific database.

It’s worth noting that there are a bunch of operators you can use (not just equals), you can do greater than, less than, contains, does not contain, is null etc.

Now we can see the effect of those changes in the Live Data view:

XE_7

The above is what I’d usually do in Profiler, except that here I’ve also added an extra column and removed a few – because it was so easy to do. The whole thing felt like a simpler and cleaner experience than the old way.

But I get it, a marginally simpler experience probably isn’t enough to make you change habits. What might, is what you can do with your trace results once you’ve got them. In the next post we’ll look at sorting, grouping and aggregation.

Previous posts in this series:

There’s Still a Place for SQL Server Profiler

Using the built-in System Health session

Exporting Extended Events Session Data to a Table

 

Table Variable Performance “Fixed” in SQL 2019

Reading the new features for SQL 2019 I spotted this:TableVariables2019_1

One of the most popular posts on my blog last year was where I pretty much suggested that people not use table variables:

Think twice before using table variables

This wasn’t new information when I wrote it, but bad performance due to the use of table variables remained such a common anti-pattern that I thought it was worth stressing again.

So, when I saw the above 2019 feature I thought I’d better investigate and update what I’m telling people.

TL;DR It looks like table variables are no longer a problem.

I’m going to re-use my examples from the previous post (as well as some of the images captured). One thing to note though it that I’m running these tests on a different, much less performant box than I did last time, so execution times will be longer.

My first test last time was just to show how cardinality estimation was better for a temp table vs. a table variable. This time I’m just going to compare a table variable running under compatibility mode 140 (SQL 2017) with one running under compatibility mode 150 (SQL 2019).

I set the compatibility mode as follows:

ALTER DATABASE AdventureWorks2012 SET COMPATIBILITY_LEVEL = 140;

Then I run the first query:

DECLARE @a TABLE(i INT);
 
--Create a million rows
WITH Nums(i) AS
(
       SELECT 1
       FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n(i)
)
INSERT INTO @a(i)
SELECT n1.i
FROM Nums n1 
CROSS JOIN Nums n2 
CROSS JOIN Nums n3 
CROSS JOIN Nums n4 
CROSS JOIN Nums n5 
CROSS JOIN Nums n6;
 
SELECT i FROM @a;

The execution plan for that final select looks like this:
Variables1

And when I hover over the Table Scan operator I can see the properties:

Variables2

You can see that the estimated number of rows is calculated as 1 – but the actual number of rows was 1 million. This is (was) the problem with table variables.

Now I’ll change the compatibility level:

ALTER DATABASE AdventureWorks2012 SET COMPATIBILITY_LEVEL = 150;

And run the query again.

I get the same execution plan, so let’s just look at the properties of the Table Scan operator this time:
TableVariables2019_2
Here you can see that the estimation is correct, 1 million. This is looking good so far for the future of table variables.

In the last post I then looked at a case where the bad estimation caused a poor selection of execution plan so let’s repeat that test. First, I set my compatibility level back to 140, then I run the following (if you want to repeat, you need the AdventureWorks2012 database):

DECLARE @BusinessEntityId TABLE (BusinessEntityID INT NOT NULL PRIMARY KEY CLUSTERED);
 
INSERT INTO @BusinessEntityId
SELECT BusinessEntityID FROM Person.Person;
 
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
 
SELECT COUNT(*)
FROM @BusinessEntityId b
INNER JOIN Person.Person p
       ON b.BusinessEntityID = p.BusinessEntityID;
 
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;

Let’s look at the execution plan for the final query:
Variables5
Now let’s look at the properties for the clustered index scan:
Variables6
We can see that estimated number of rows is just 1, but that the actual number of rows is 19,972.

If we then look at the properties for the Index seek operator:
Variables7
We can see that the estimated number of executions was 1, but the actual number of executions was 19,972. That’s 19,972 seeks into a table with 19,972 rows. As previously discussed – there’s got to be a better way. A Nested Loops join is generally best when there’s a small number of rows from the top table, and a larger amount of rows from the bottom. In this place they both have 19,972 so it’s not optimal.

Here’s the output of the statistics commands for that query:

Table ‘Person’. Scan count 0, logical reads 59916, physical reads 3810, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘#B5D4D4C8’. Scan count 1, logical reads 35, physical reads 29, read-ahead reads 28, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 30951 ms.

Now we’ll change the compatibility level to 150 and try again. Here’s the execution plan we get this time:
TableVariables2019_3
You can see the join operator is now a Hash Match which is generally a much better choice for evenly sized tables. Let’s look at the properties of the Clustered Index Scan in this new plan:
TableVariables2019_4
This time the estimated and actual values match, which has enabled SQL to make the better choice of plan. We can see that if we look at the statistics output for this one:

Table ‘Workfile’. Scan count 4, logical reads 64, physical reads 0, read-ahead reads 64, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Person’. Scan count 1, logical reads 67, physical reads 1, read-ahead reads 65, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘#B9A565AC’. Scan count 1, logical reads 35, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 25220 ms.

You can see the CPU and reads are massively reduced.

So, it seems table variables aren’t so bad anymore. At least they won’t be once you are running SQL Server 2019.

I should probably add some caveats at this point. This has been a limited test, so you yourselves should make sure you test your own performance if you start using table variables instead of temp tables. There are also going to be deferred compilations when you do this in stored procedures. I haven’t dug down into exactly how that will work with table variables, but I guess it’s going to be similar to the way temp tables work in existing SQL versions.

Don’t Just Rely on Query Execution Stats for T-SQL Execution

I recently had an incident where I was looking into the cause of a long running process for a client. It was a batch process that ran overnight, and execution time had been growing until it was now taking over 4 hours.

The database involved is a kind of staging area for MI, so I wanted to look at any heavy queries being executed against the database – I knew that these would all be part of the batch process. I asked them to run the following query for me:

SELECT TOP 50
   DB_NAME(qt.dbid) AS DatabaseName, 
   qt.TEXT, 
   qs.execution_count,
   qs.total_worker_time,
   qs.total_physical_reads,
   qs.total_logical_writes,
   qs.total_logical_reads,
   qs.total_elapsed_time
FROM sys.dm_exec_query_stats qs
CROSS apply sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE ISNULL(qt.dbid,0) = DB_ID();

I got less than 50 rows back so figured I had everything covered, but the total elapsed time across all the queries was less than ten minutes, I knew the server hadn’t been rebooted for about a month so potentially that could be as little as 20 seconds of query execution a night. Even if the full 10 minutes was from the last 24 hours that still didn’t account for the long run times.

So, I convinced myself it wasn’t the queries and started looking at other things they could check. It was a long list. Lots of theories revolved around the idea that something else was running on the host at the same time.

As part of that I decided to get them to run an extended events trace so I could look at exact timings and waits during the batch process (and anything else that might be going on in SQL). I got a lot of data back which took a fair while to index and load. Figuring I’d already ruled out query execution times for the batch process itself, I expected the analysis was going to be tricky to drill down on the problem, however as soon as I looked for heavy queries running during the execution window I found one specific stored procedure with the following metrics:

QueryExStats1

Ouch.

Thing is, these figures were for a stored procedure that was part of the batch run. I date for this in the original output but the figures for query execution duration and CPU were tiny in comparison to these. So, were the query execution stats wrong?

No, they weren’t, dm_exec_query_stats only captures figures for cached query plans, so you might have a query that’s not caching a plan for whatever reason – or might have something that’s doing work that’s not actually a query, imagine a while loop that’s trying to find all the prime numbers under a trillion without using sets or queries. That would consume a lot of CPU and duration but wouldn’t show in in the query execution stats – because it’s not a query.

In this case it turned out the stored procedure was:

  • Trying to process over 2 million rows using a cursor
  • For each row it would attempt to execute a stored procedure that didn’t exist using dynamic SQL
  • It would catch the failure and update an error message against the row – using the cursor
  • Every day it would attempt to process the failed rows again
  • Every day more rows would be inserted to the table that would fail

In particular it was the use of the cursor that was most expensive, and not being captured in the execution stats.

To be kind to the original developer, it was a clever generic solution to a specific problem. It wasn’t optimal from a performance point of view, but they only expected it to have to deal with a few rows daily.

Then the volume of data increased.

Then a subsequent developer made a change but didn’t know they needed to add a new supporting stored procedure. The resulting issue didn’t get picked up in testing.

The number of failed rows had been growing every night for a couple of years, and the execution time creeping up until it went exponential.

This post isn’t about bad code though, it’s about remembering that not all resource consumption on our SQL boxes comes from cached queries, so when trying to troubleshoot remember that query execution stats are one place to look, but don’t cover everything that might be being executed in T-SQL.

T-SQL Tuesday #106 – The Trouble with Triggers

tsql2sday150x150

For this month’s T-SQL Tuesday Steve Jones asks us to talk about our experiences with Triggers:

https://voiceofthedba.com/2018/09/03/t-sql-tuesday-106-trigger-headaches-or-happiness/

In my post I’m going to focus on DML (Data Modification Language) triggers. Their counterparts, DDL (Data Definition Language) triggers have any number of interesting applications. It’s usually DML triggers however that attract the most controversy.

DML triggers seem like a great functionality when you first discover them in your SQL career, especially if you come from an application coding background. Event driven programming for data – sounds brilliant. If someone does something to table X, have a trigger than in turn does something to tables Y and Z, you can even have triggers on Y and Z and cascade actions to other tables and beyond.

Once you’ve been around for a while though you realise what a headache this can end up being. You think you’re just inserting a record into table X but there’s this whole bunch of stuff going on behind the scenes that you may not know about – or want.

This is the biggest objection to triggers. They hide logic. If I have table called FeeTransaction and there is a bunch of logic that MUST happen when I insert a record here, then that’s clearer if it’s all encapsulated in a stored procedure that inserts the record. That makes it easier to maintain and troubleshoot rather than hunting around to find independent events that might be kicking off all over the place.

The only place I’m really a fan of DML triggers is in capturing Audit information. Most commonly for maintaining basic things like the last updated date and time for a record. I like having a mechanism that means I can see when data was last changed – whatever method was used to change it. i.e. if it was through the application, or if someone with write-access just ran an update query. Then I don’t need to rely on that person remembering to alter the LastUpdated column when they made their change.

The need to maintain this sort of basic audit info is so ubiquitous that it seems silly it is not a core feature of SQL Server, i.e. the ability to add a column to a table that will always automatically capture the time a record was modified, presumably this could also be done in a tamper-proof manner. In the same way we could have the ability to capture the Login context used to make the change or potentially anything else we need to capture.

Adam Machanic has raised a request with Microsoft to include functionality, if you agree it would be handy then please upvote here:

https://feedback.azure.com/forums/908035-sql-server/suggestions/32901964-provide-a-mechanism-for-columns-to-automatically-u