Check Query Progress with Live Query Stats

This is something I touched on back in 2017 a little after the Live Query Statistics feature was introduced with SQL 2016, but I was using the functionality this morning and felt like it was worth a reminder.

https://matthewmcgiffen.com/2017/02/23/livequerystats/

You can use Live Query Stats to check on the progress of an executing query – and you can do it through the GUI in SSMS.

I created a long running query for a demo, and after 15 minutes I was still waiting for it to finish. That was a bit longer than I intended. Should I kill it – or did I just need to wait a few more minutes for it to complete.

You can check this quickly via the Activity Monitor:

Find the query you are interested in in the processes list:

Right-click and select “Show Live Execution Plan”. That will show you something like this:

I can see from this that my query is about 83% complete, so maybe I’ll just wait a little longer. Note that this is a live view, so the numbers keep updating. If I want I can watch the progress.

This is against a SQL 2019 instance and is the out of the box behaviour. Before SQL 2019 you had to enable trace flag 7412 if you wanted this to work:

DBCC TRACEON(7412,-1);

Find queries with high memory grants – using Query Store

I had a server that looked like it had been suffering from memory contention. I wanted to see what queries were being run that had high memory requirements. The problem was that it wasn’t happening right now – I needed to be able to see what had happened over the last 24 hours.

Enter Query Store. In the run-time stats captured by Query Store are included details relating to memory.

I decided to use the column max_query_max_used_memory from sys.query_store_runtime_stats. In books online this is defined as:

Maximum memory grant (reported as the number of 8 KB pages) for the query plan within the aggregation interval.

I took my script from the following post https://matthewmcgiffen.com/2017/11/01/capture-the-most-expensive-queries-across-your-sql-server-using-query-store/ and modified it to look just at this metric.

Here’s the script, it collates figures across all databases that have Query Store enabled and returns the top 50 queries with the highest memory grants. This is looking over the last 24 hours, but you can easily modify that to look at details for any interval you are interested in:

--Gather and report on most memory hungry queries
DECLARE @Reportinginterval int;
DECLARE @Database sysname;
DECLARE @StartDateText varchar(30);
DECLARE @TotalExecutions decimal(20,3);
DECLARE @TotalDuration decimal(20,3);
DECLARE @TotalCPU decimal(20,3);
DECLARE @TotalLogicalReads decimal(20,3);
DECLARE @SQL varchar(MAX);

--Set Reporting interval in days
SET @Reportinginterval = 1;

SET @StartDateText = CAST(DATEADD(DAY, -@Reportinginterval, GETUTCDATE()) AS varchar(30));

--Cursor to step through the databases
DECLARE curDatabases CURSOR FAST_FORWARD FOR
SELECT [name]
FROM sys.databases 
WHERE is_query_store_on = 1
  AND state_desc = 'ONLINE';

--Temp table to store the results
DROP TABLE IF EXISTS #Stats;
CREATE TABLE #Stats (
   DatabaseName sysname,
   SchemaName sysname NULL,
   ObjectName sysname NULL,
   QueryText varchar(1000),
   MaxMemoryGrantMB decimal(20,3)
);

OPEN curDatabases;
FETCH NEXT FROM curDatabases INTO @Database;

--Loop through the datbases and gather the stats
WHILE @@FETCH_STATUS = 0
BEGIN
    
    SET @SQL = '
	   USE [' + @Database + ']
	   INSERT INTO #Stats
		SELECT
			DB_NAME(),
			s.name AS SchemaName,
			o.name AS ObjectName,
			SUBSTRING(t.query_sql_text,1,1000) AS QueryText,
			(MAX(rs.max_query_max_used_memory)/128) AS MaxMemoryMB
		FROM sys.query_store_query q
		INNER JOIN sys.query_store_query_text t
			ON q.query_text_id = t.query_text_id
		INNER JOIN sys.query_store_plan p
			ON q.query_id = p.query_id
		INNER JOIN sys.query_store_runtime_stats rs
			ON p.plan_id = rs.plan_id
		INNER JOIN sys.query_store_runtime_stats_interval rsi
			ON rs.runtime_stats_interval_id = rsi.runtime_stats_interval_id
		LEFT JOIN sys.objects o
			ON q.OBJECT_ID = o.OBJECT_ID
		LEFT JOIN sys.schemas s
			ON o.schema_id = s.schema_id     
		WHERE rsi.start_time > ''' + @StartDateText + '''
		GROUP BY s.name, o.name, SUBSTRING(t.query_sql_text,1,1000)
		OPTION(RECOMPILE);'

    EXEC (@SQL);

    FETCH NEXT FROM curDatabases INTO @Database;
END;

CLOSE curDatabases;
DEALLOCATE curDatabases;

--Report Results
SELECT TOP 50
	DatabaseName,
	SchemaName,
	ObjectName,
	QueryText,
	MaxMemoryGrantMB
FROM #Stats
WHERE QueryText not like 'INSERT INTO #Stats%' --Exclude current query
ORDER BY MaxMemoryGrantMB DESC;

DROP TABLE #Stats;

Adding a Column with a Default Constraint

Last week a question came up about adding a column to a table, and giving that column a default constraint. Would that default value be assigned to all existing rows, and how much processing would be involved.

Unsurprisingly, the answer is that – “it depends”.

I’ve got a table with about a million rows that just has an identity column and a text column I’ve populated from sys.objects:

CREATE TABLE dbo.TestAddColumn (Id int IDENTITY(1,1), TextValue sysname);
INSERT INTO dbo.TestAddColumn(TextValue)
SELECT a.name FROM sys.objects a, sys.objects b, sys.objects c;

Let’s add a nullable bit column and give it a default value of Zero:

ALTER TABLE dbo.TestAddColumn
ADD NewFlag bit NULL CONSTRAINT DF_TestAddColumn_NewFlag DEFAULT 0;

If I look at the table I can see that the new column contains Null values:

i.e. the default value that I’ve defined hasn’t been assigned for existing rows.

I remove the column and the default constraint:

ALTER TABLE dbo.TestAddColumn
DROP CONSTRAINT DF_TestAddColumn_NewFlag;
ALTER TABLE dbo.TestAddColumn DROP COLUMN NewFlag;

Now let’s add the same column but we’ll disallow Null values:

ALTER TABLE dbo.TestAddColumn
ADD NewFlag bit NOT NULL CONSTRAINT DF_TestAddColumn_NewFlag DEFAULT 0;

If we check the table again:

This time we can see that the default value has been assigned.

So whether our default value gets assigned to existing rows depends on whether your column is nullable or not, a nullable column will retain Null as the value. A non-nullable column will get assigned the new default value.

If you want to override that behaviour, and have your default assigned even where the column is nullable, you can use the WITH VALUES statement. First I’ll remove the constraint and column then add it again with values:

ALTER TABLE dbo.TestAddColumn
ADD NewFlag bit NULL CONSTRAINT DF_TestAddColumn_NewFlag DEFAULT 0 WITH VALUES;

We look at the data again:

You can see that the value has been assigned even though the column is Nullable.

One neat thing to note, is the performance impact when carrying out these actions.

Each time I added the column I captured the execution overhead using:

SET STATISTICS IO, TIME ON;

In all cases the resource usage measured was Zero. The Add Column operation was a meta-data only operation – no data in the table needed to be updated – even where the new column was assigned a value.

This was some clever jiggery-pokery added in SQL Server 2012 .

Thanks for reading!

MAXDOP, Parallelism and the Cloud

Parallelism and MAXDOP

The pros and cons of parallelism have always been with us in SQL Server and I blogged about this a couple of years ago. This is an updated version of that post to include details of the new wait stat related to parallelism that was added in 2017 (CXCONSUMER), as well as to discuss the options available for cloud based SQL Server solutions.

There’s no doubt that parallelism in SQL is a great thing. It enables large queries to share the load across multiple processors and get the job done quicker.

However it’s important to understand that it has an overhead. There is extra effort involved in managing the separate streams of work and synchronising them back together to – for instance – present the results.

That can mean in some cases that adding more threads to a process doesn’t actually benefit us and in some cases it can slow down the overall execution.

We refer to the number of threads used in a query as the DOP (Degree of Parallelism) and in SQL Server we have the setting MAXDOP (Maximum Degree of Parallelism) which is the maximum DOP that will be used in executing a single query.

Microsoft generally recommend caution setting MAXDOP above 8:

https://support.microsoft.com/en-gb/help/2806535/recommendations-and-guidelines-for-the-max-degree-of-parallelism-configuration-option-in-sql-server

Here’s a nice post from Kendra Little talking about how having higher settings can actually slow down your query execution time:

https://www.brentozar.com/archive/2013/12/q-can-high-maxdop-make-a-query-slower/

Out of the box, MAXDOP is set to 0, which means there is no limit to the DOP for an individual query. It is almost always worth changing this to a more optimal setting for your workload.

Cost Threshold for Parallelism

This is another setting available to us in SQL Server and defines the cost level at which SQL will consider a parallel execution for a query. Out of the box this is set to 5 which is actually a pretty low number. Query costing is based on Algorithm’s from “Nick’s machine” the box used by the original developer who benchmarked queries for Microsoft.

Nicks_Machine

(Nick’s Machine)

Compared to modern servers Nick’s machine was pretty slow and as the Cost Threshold hasn’t changed for many years, it’s now generally considered too low for modern workloads/hardware. In reality we don’t want all our tiny queries to go parallel as the benefit is negligible and can even be negative, so it’s worth upping this number. Advice varies but generally recommendations say to set this somewhere in the range from 30 to 50 (and then tuning up and down based on your production workload).

There are many articles in the SQL Server community about how the out of the box setting is too low, and asking Microsoft to change it. Here’s a recent one:

http://www.scarydba.com/2017/03/13/change-the-cost-threshold-for-parallelism/

CXPACKET and CXCONSUMER waits

Often in tuning a SQL Server instance we will look at wait stats – which tell us what queries have been waiting for when they run. CXPACKET waits are usually associated with parallelism and particularly the case where multi-threaded queries have been stuck waiting for one or more of the threads to complete – i.e. the threads are taking different lengths of time because the load hasn’t been split evenly. Brent Ozar talks about that here:

https://www.brentozar.com/archive/2013/08/what-is-the-cxpacket-wait-type-and-how-do-you-reduce-it/

High CXPACKET waits can be – but aren’t necessarily – a problem. You can cure CXPACKET waits by simply setting MAXDOP to 1 at a server level (thus preventing parallelism) – but this isn’t necessarily the right solution. Though in some cases in can be, SharePoint for instance is best run with MAXDOP set to 1.

What you can definitely deduce from high CXPACKET waits however is that there is a lot of parallelism going on and that it is worth looking at your settings.

To make it easier to identify issues with parallelism, with SQL Server 2017 CU3 Microsoft added a second wait type related to parallelism – CXCONSUMER. This wait type was also added to SQL Server 2016 in SP2.

Waits related to parallelism are now split between CXPACKET and CXCONSUMER.

Here’s the original announcement from Microsoft regarding the change and giving more details:

https://blogs.msdn.microsoft.com/sql_server_team/making-parallelism-waits-actionable/

In brief, moving forward CXPACKET waits are the ones you might want to worry about, and CXCONSUMER waits are generally benign, encountered as a normal part of parallel execution.

Tuning Parallelism

In tuning parallelism we need to think about how we want different sized queries to act on our server.

Small Queries

In general we don’t want these to go parallel so we up the Cost Threshold to an appropriate number to avoid this. As discussed above 30 is a good number to start with. You can also query your plan cache and look at the actual costs of queries that have been executed on your SQL Instance to get a more accurate idea of where you want to set this. Grant Fritchey has an example of how to do that here:

http://www.scarydba.com/2017/02/20/estimated-costs-queries/

As he mentions in the post, this assessment can be quite expensive to run – so do it when things are quiet.

Medium to Large Queries

This is where we want to take advantage of parallelism, and do so by setting MAXDOP to an appropriate level. Follow the guidelines from the Microsoft article referenced above. Here it is again:

https://support.microsoft.com/en-gb/help/2806535/recommendations-and-guidelines-for-the-max-degree-of-parallelism-configuration-option-in-sql-server

Often the answer is going to be simply to set it to 8 – but then experiment with tuning it up and down slightly to see whether that makes things better or worse.

Very Large Queries

If we have a mixed workload on our server which includes some very expensive queries – possibly for reporting purposes – then we may want to look at upping the MAXDOP for these queries to allow them to take advantage of more processors. One thing to consider though is – do we really want these queries running during the day when things are busy? Ideally they should run in quieter times. If they must run during the day, then do we want to avoid them taking over all the server power and blocking our production workload? In which case we might just let them run at the MAXDOP defined above.

If we decide we want to let them have the extra power then we can override the server MAXDOP setting with a query hint OPTION(MAXDOP n):

https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query

You will want to experiment to find the “best” value for the given query. As discussed above and as shown in Kendra Little’s article, just setting it to the maximum number of cores available isn’t necessarily going to be the fastest option.

Exceptions to the Rule

Regardless of the size, there are some queries that just don’t benefit from parallelism so you may need to assess them on an individual basis to find the right degree of parallelism to use.

With SQL server you can specify the MAXDOP at the server level, but also override it at the database level using a SCOPED CONFIGURATION or for individual queries using a query hint. There are even other ways you can control this:

https://www.brentozar.com/archive/2016/12/ten-ways-set-maxdop/

Options in the Cloud

If your SQL Server is hosted in the cloud, then most of this still applies. You still need to think about tuning parallelism – it isn’t done for you, and the defaults are the same – so probably not optimal for most workloads.

There are in general two flavours of cloud implementation. The first is Infrastructure as a Service (IaaS) where you simply have a VM provided by your cloud provider and run an OS with SQL server on top of it in that VM. Regardless of your cloud provider (e.g. Azure, AWS etc.), if you’re using IaaS for SQL Server then the same rules apply, and you go about tuning parallelism in exactly the same way.

The other type of cloud approach is Platform as a Service (PaaS). This is where you use a managed service for SQL Server. This would include Azure SQL Database, Azure SQL Database Managed Instance, and Amazon RDS for SQL Server. In these cases, the rules still apply, but how you manage these settings may differ. Let’s look at that for the three PaaS options mentioned above.

Azure SQL Database

This is a single SQL Server database hosted in Azure. You don’t have access to server level settings, so you can’t change MAXDOP or the cost threshold. You can however specify MAXDOP at the database level e.g.

ALTER DATABASE SCOPED CONFIGURATION SET MAXDOP = 4;

Cost threshold for Parallelism however is unavailable to change in Azure SQL Database.

Azure SQL Database Managed Instance

This presents you with something that looks very much like the SQL Server you are used to, you just can’t access the box behind it. And similar to your regular SQL instance, you can set MAXDOP and the Cost threshold as normal.

Amazon RDS for SQL Server

This is similar to managed instance. It looks and acts like SQL Server but you can’t access the machine or OS. You access your RDS instance through an account that has permissions that are more limited than your usual sa account or sysadmin role allows. And one of the things you can’t do with your limited permissions is to change the parallelism settings.

Amazon have provided a way around this though and you can change both settings using something called a parameter group:

https://www.mssqltips.com/sqlservertip/5329/setting-sql-server-configuration-options-with-aws-rds-parameter-groups/

Closing Thoughts

Parallelism is a powerful tool at our disposal, but like all tools it should be used wisely and not thrown at every query to its maximum – and this is often what happens with the out of the box settings on SQL Server. Tuning parallelism is not a knee-jerk reaction to high CXPACKET waits, but something we should be considering carefully in all our SQL Server implementations.

Acknowledgements

I wanted to update my original article to include the cloud options noted above, but didn’t have access to an Azure SQL Database Managed Instance to check the state of play. Thanks to TravisGarland via Twitter (@RockyTopDBA) and Chrissy LeMaire via the SQL community slack (@cl) for checking this and letting me know!

Extended Events Made Easy: Using XEvent Profiler

In this post we’ll look at how quick and easy it is to use the XEvent Profiler to create an Extended Events (XE) session to replace the most common usage of SQL Server Profiler.

Before you start, update your SSMS to a recent version (17.3) or later:

https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-2017

Now, let me ask you the question – how often do you just go into Profiler and create a trace taking all the defaults? If you’re like me then that’s pretty often. It’s nice and easy, select the menu item from SSMS and you’re going in a couple of clicks.

All XEvent Profiler does is implement a couple of standard XE sessions very similar to Profiler, that you can launch with a click or two.

You can find XE Profiler in the object explorer in SSMS under each of your SQL instances – just below SQL Server Agent:

XE_1

Because it’s there for each instance, you don’t have to then connect to the instance separately (as you do in Profiler).

You’ve got two options, Standard which pretty much equates to the default trace in Profiler, or TSQL which will just capture SQL statements being executed. Let’s right-click on Standard and select “Launch Session”. Here’s the Live Data view that comes up:

XE_2

You can see that looks pretty similar to what you see for your traditional traces.

Now, one thing I almost always do for this sort of trace in Profiler is capture the database name. This is a bit of a pain, because I have to select “View all columns”, scroll along to find the database name column, then select it for each relevant event I’m capturing.

In XE it’s much easier. I just right click on one of the column headers in the above view and select “Choose Columns”:

XE_3

Which brings up the menu of columns available to me:

XE_4

I can then just select the database_name column in the left and use the arrow button to push it into the list of selected columns. Want row_count? That’s there too, along with a whole bunch of other stuff. I’ll add those and remove a few I don’t want and exit the dialog.

The other thing I’m commonly going to want to do is add a filter to one or more of the columns. When you’re viewing live data you get the Extended Events toolbar shown in SSMS below the menu bar, just click on Filters…

XE_5

And you get the filters dialog:

XE_6

Click below “Field” to select a column and then add a value to filter on. In this case I’m going to filter to look at a specific database.

It’s worth noting that there are a bunch of operators you can use (not just equals), you can do greater than, less than, contains, does not contain, is null etc.

Now we can see the effect of those changes in the Live Data view:

XE_7

The above is what I’d usually do in Profiler, except that here I’ve also added an extra column and removed a few – because it was so easy to do. The whole thing felt like a simpler and cleaner experience than the old way.

But I get it, a marginally simpler experience probably isn’t enough to make you change habits. What might, is what you can do with your trace results once you’ve got them. In the next post we’ll look at sorting, grouping and aggregation.

Previous posts in this series:

There’s Still a Place for SQL Server Profiler

Using the built-in System Health session

Exporting Extended Events Session Data to a Table

 

Table Variable Performance “Fixed” in SQL 2019

Reading the new features for SQL 2019 I spotted this:TableVariables2019_1

One of the most popular posts on my blog last year was where I pretty much suggested that people not use table variables:

Think twice before using table variables

This wasn’t new information when I wrote it, but bad performance due to the use of table variables remained such a common anti-pattern that I thought it was worth stressing again.

So, when I saw the above 2019 feature I thought I’d better investigate and update what I’m telling people.

TL;DR It looks like table variables are no longer a problem.

I’m going to re-use my examples from the previous post (as well as some of the images captured). One thing to note though it that I’m running these tests on a different, much less performant box than I did last time, so execution times will be longer.

My first test last time was just to show how cardinality estimation was better for a temp table vs. a table variable. This time I’m just going to compare a table variable running under compatibility mode 140 (SQL 2017) with one running under compatibility mode 150 (SQL 2019).

I set the compatibility mode as follows:

ALTER DATABASE AdventureWorks2012 SET COMPATIBILITY_LEVEL = 140;

Then I run the first query:

DECLARE @a TABLE(i INT);
 
--Create a million rows
WITH Nums(i) AS
(
       SELECT 1
       FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n(i)
)
INSERT INTO @a(i)
SELECT n1.i
FROM Nums n1 
CROSS JOIN Nums n2 
CROSS JOIN Nums n3 
CROSS JOIN Nums n4 
CROSS JOIN Nums n5 
CROSS JOIN Nums n6;
 
SELECT i FROM @a;

The execution plan for that final select looks like this:
Variables1

And when I hover over the Table Scan operator I can see the properties:

Variables2

You can see that the estimated number of rows is calculated as 1 – but the actual number of rows was 1 million. This is (was) the problem with table variables.

Now I’ll change the compatibility level:

ALTER DATABASE AdventureWorks2012 SET COMPATIBILITY_LEVEL = 150;

And run the query again.

I get the same execution plan, so let’s just look at the properties of the Table Scan operator this time:
TableVariables2019_2
Here you can see that the estimation is correct, 1 million. This is looking good so far for the future of table variables.

In the last post I then looked at a case where the bad estimation caused a poor selection of execution plan so let’s repeat that test. First, I set my compatibility level back to 140, then I run the following (if you want to repeat, you need the AdventureWorks2012 database):

DECLARE @BusinessEntityId TABLE (BusinessEntityID INT NOT NULL PRIMARY KEY CLUSTERED);
 
INSERT INTO @BusinessEntityId
SELECT BusinessEntityID FROM Person.Person;
 
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
 
SELECT COUNT(*)
FROM @BusinessEntityId b
INNER JOIN Person.Person p
       ON b.BusinessEntityID = p.BusinessEntityID;
 
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;

Let’s look at the execution plan for the final query:
Variables5
Now let’s look at the properties for the clustered index scan:
Variables6
We can see that estimated number of rows is just 1, but that the actual number of rows is 19,972.

If we then look at the properties for the Index seek operator:
Variables7
We can see that the estimated number of executions was 1, but the actual number of executions was 19,972. That’s 19,972 seeks into a table with 19,972 rows. As previously discussed – there’s got to be a better way. A Nested Loops join is generally best when there’s a small number of rows from the top table, and a larger amount of rows from the bottom. In this place they both have 19,972 so it’s not optimal.

Here’s the output of the statistics commands for that query:

Table ‘Person’. Scan count 0, logical reads 59916, physical reads 3810, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘#B5D4D4C8’. Scan count 1, logical reads 35, physical reads 29, read-ahead reads 28, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 30951 ms.

Now we’ll change the compatibility level to 150 and try again. Here’s the execution plan we get this time:
TableVariables2019_3
You can see the join operator is now a Hash Match which is generally a much better choice for evenly sized tables. Let’s look at the properties of the Clustered Index Scan in this new plan:
TableVariables2019_4
This time the estimated and actual values match, which has enabled SQL to make the better choice of plan. We can see that if we look at the statistics output for this one:

Table ‘Workfile’. Scan count 4, logical reads 64, physical reads 0, read-ahead reads 64, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Person’. Scan count 1, logical reads 67, physical reads 1, read-ahead reads 65, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘#B9A565AC’. Scan count 1, logical reads 35, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 25220 ms.

You can see the CPU and reads are massively reduced.

So, it seems table variables aren’t so bad anymore. At least they won’t be once you are running SQL Server 2019.

I should probably add some caveats at this point. This has been a limited test, so you yourselves should make sure you test your own performance if you start using table variables instead of temp tables. There are also going to be deferred compilations when you do this in stored procedures. I haven’t dug down into exactly how that will work with table variables, but I guess it’s going to be similar to the way temp tables work in existing SQL versions.

Don’t Just Rely on Query Execution Stats for T-SQL Execution

I recently had an incident where I was looking into the cause of a long running process for a client. It was a batch process that ran overnight, and execution time had been growing until it was now taking over 4 hours.

The database involved is a kind of staging area for MI, so I wanted to look at any heavy queries being executed against the database – I knew that these would all be part of the batch process. I asked them to run the following query for me:

SELECT TOP 50
   DB_NAME(qt.dbid) AS DatabaseName, 
   qt.TEXT, 
   qs.execution_count,
   qs.total_worker_time,
   qs.total_physical_reads,
   qs.total_logical_writes,
   qs.total_logical_reads,
   qs.total_elapsed_time
FROM sys.dm_exec_query_stats qs
CROSS apply sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE ISNULL(qt.dbid,0) = DB_ID();

I got less than 50 rows back so figured I had everything covered, but the total elapsed time across all the queries was less than ten minutes, I knew the server hadn’t been rebooted for about a month so potentially that could be as little as 20 seconds of query execution a night. Even if the full 10 minutes was from the last 24 hours that still didn’t account for the long run times.

So, I convinced myself it wasn’t the queries and started looking at other things they could check. It was a long list. Lots of theories revolved around the idea that something else was running on the host at the same time.

As part of that I decided to get them to run an extended events trace so I could look at exact timings and waits during the batch process (and anything else that might be going on in SQL). I got a lot of data back which took a fair while to index and load. Figuring I’d already ruled out query execution times for the batch process itself, I expected the analysis was going to be tricky to drill down on the problem, however as soon as I looked for heavy queries running during the execution window I found one specific stored procedure with the following metrics:

QueryExStats1

Ouch.

Thing is, these figures were for a stored procedure that was part of the batch run. I date for this in the original output but the figures for query execution duration and CPU were tiny in comparison to these. So, were the query execution stats wrong?

No, they weren’t, dm_exec_query_stats only captures figures for cached query plans, so you might have a query that’s not caching a plan for whatever reason – or might have something that’s doing work that’s not actually a query, imagine a while loop that’s trying to find all the prime numbers under a trillion without using sets or queries. That would consume a lot of CPU and duration but wouldn’t show in in the query execution stats – because it’s not a query.

In this case it turned out the stored procedure was:

  • Trying to process over 2 million rows using a cursor
  • For each row it would attempt to execute a stored procedure that didn’t exist using dynamic SQL
  • It would catch the failure and update an error message against the row – using the cursor
  • Every day it would attempt to process the failed rows again
  • Every day more rows would be inserted to the table that would fail

In particular it was the use of the cursor that was most expensive, and not being captured in the execution stats.

To be kind to the original developer, it was a clever generic solution to a specific problem. It wasn’t optimal from a performance point of view, but they only expected it to have to deal with a few rows daily.

Then the volume of data increased.

Then a subsequent developer made a change but didn’t know they needed to add a new supporting stored procedure. The resulting issue didn’t get picked up in testing.

The number of failed rows had been growing every night for a couple of years, and the execution time creeping up until it went exponential.

This post isn’t about bad code though, it’s about remembering that not all resource consumption on our SQL boxes comes from cached queries, so when trying to troubleshoot remember that query execution stats are one place to look, but don’t cover everything that might be being executed in T-SQL.

T-SQL Tuesday 104 – Capture expensive queries with query store

tsql2sday150x150

For this month’s T-SQL Tuesday Bert Wagner invites us to share SQL scripts we’ve written that we’d hate to live without.

Code You Would Hate To Live Without (T-SQL Tuesday #104 Invitation)

I don’t know that I’d go quite as far as “hate”, but I’ve been getting a lot of re-use out of a script I wrote and shared on this blog last year.

I’m lucky enough to have a number of SQL 2106 instances with Query Store enabled for the user databases, in particular that includes a performance test environment for one of the applications I work on.

In the past I’ve always been a sucker for those queries that you can use to find the most expensive queries on your SQL instance. Be that in terms of CPU, IO, or just longest running.

I wanted to use Query Store for an improved version of those scripts I’d been using, as Query Store retains statistics when a server restarts – or a plan gets cleared out of the cache.

With Query Store it’s easy enough to get figures per database, but I wanted to be able to view across all databases – and that was why I wrote the script.

Here’s the original post (and script):

Capture the most expensive queries across your SQL Server using Query Store

I find this useful as it is, but also I often cut it about to ask slightly different questions of the query store data.

Thoughts on Query Performance with TDE enabled

Microsoft state that enabling TDE (Transparent Data Encryption) usually has a performance  overhead of 2-4%. That doesn’t sound like very much, and personally I wouldn’t let it bother me if I want to make sure my data is encrypted at rest.

However, you may have heard other sources saying that it’s actually a lot more than that – and the performance impact is a high price to pay for the level of protection offered.

So, what’s the truth?

The critical thing to remember is that with TDE your data is encrypted on disk, but data held in memory (i.e. the buffer pool) is unencrypted. Therefore, you would only expect an overhead when reading from and writing to disk.

SQL Server tries to keep data that is referenced repeatedly in the buffer pool. So, if your SQL instance is provisioned with enough memory, a lot of your read queries can access the buffer pool and don’t have to go out to disk. Such queries should not be affected performance-wise by TDE.

There may be other read queries however that access older data that hasn’t been read for a while, these queries would need to retrieve that data from disk and so there would be an overhead from TDE.

Any queries that modify data will need the outcome to be written to disk so in these cases we will see an overhead. This overhead is likely to come in two parts, first when the transaction is written to the logfile before committing, and then later as the updated data gets written to the data file as part of a checkpoint operation.

We also have other operations that write or update encrypted data on disk, so we would also expect these to have some overhead. This would include operations such backups, or index rebuild operations.

You can see from this that the overhead will very much depend on how your application interacts with your data. At one extreme, if you have a set of static data that is small enough to be held in memory and is queried regularly then there should be no overhead. At the other end of the spectrum, if you have an application that writes to the database a lot, and reads less often, then the overhead will be higher.

Let’s look at some examples and try and quantify what amount of overhead we might be talking about. In this post we’re just going to focus on the effect TDE has when you are reading data.

First, I’ll create two databases, one with TDE enabled and one without. Then I’ll load the same set of data into each (Total size about 1GB).

You can find the script I used for this in my previous blog post:

Encrypting an existing database with TDE

In the first test we’ll perform a like query of the worst kind, one that tries to match for a value within a column. We have no indexes on the table, but none would be that helpful with this query anyway.

SELECT *
FROM dbo.SomeData 
WHERE SomeText LIKE '%Wibble%';

I’ll run across 4 test cases (capturing the total CPU consumed in each case). The test cases are:

  • TDE Protected database where the buffer cache is empty (i.e. all data has to be read from disk)
  • TDE protected database where all the data for the table is in the buffer cache (i.e. no data has to be read from disk)
  • Database without TDE where the buffer cache is empty (i.e. all data has to be read from disk)
  • Database without TDE where all the data for the table is in the buffer cache (i.e. no data has to be read from disk)

In each test I’ll run the query 5 times and total the CPU to even out variance between executions. For the tests involving disk reads I’ll run the command DBCC DROPCLEANBUFFERS in between executions of the query to empty the buffer cache.

The results looked like this, with time shown in seconds. Note that MAXDOP was set to 4 and each query went parallel over 4 threads:

TDE_ReadPerf1

There’s quite a variance between each run so I’m not going to take anything significant from small differences we see. However, we can see that the timings are pretty much the same when the data is in memory, but there seems to be about a 10% overhead with TDE when reading from disk.

In case you’re wondering why reading from disk didn’t add much elapsed time for the No-TDE database – the reads were “read-ahead” so were able to complete while the CPU was burning through the data.

Let’s try a different query, this one will still have to scan the whole table as we are dealing with a heap, but it uses an equality predicate so there is less work to do in matching the data:

SELECT *
FROM dbo.SomeData 
WHERE Id = 100000000;

I’ll run the same set of tests as above and we can look at the results:

TDE_ReadPerf2

The first thing we notice is that this query runs a lot quicker in general. And again, there is little difference with and without TDE when the data is in memory.

But look at the comparison when the data has to be read from disk. With TDE the CPU consumption is more than 10 times as large, or in percentages, over 1000% worse.

At this point you could be forgiven for panicking – are you willing to risk that TDE makes your queries that much worse.

In both the above two tests, the same amount of data is being read from disk. And if you re-examine the numbers, you’ll see that (very roughly) the same amount of CPU has been added in each case where we have TDE enabled – about 50 seconds. That 50 seconds was split over 4 cores so it would have been about 12.5 seconds per core.

In terms of elapsed time, we had approximately that increase with the first query because CPU was the resource under most contention – i.e. the reads were able to occur while waiting for the CPU to complete. In the second query we can see the reading from disk occupied most of the elapsed time for those queries, so the extra CPU consumption didn’t make the query run particularly longer.

By the time it had been executed 5 times (with the memory flushed between each execution) each query read about 600,000 pages sized at 8kb each – just under 5GB. If it took 50 seconds on the decryption of those pages, then each page took about 1 twelfth of a milli-second to decrypt – or alternatively, TDE decrypted about 12 pages per millisecond. Or in terms of disk size, 100MB per second. These were tests on a server with magnetic spinning disks (not SSDs) and you can see from the above figures, the straight disk access took about 40 seconds on its own.

When TDE doesn’t read from disk it doesn’t add any overhead, but how do we quantify what the overhead to queries is when it does have to access the disk?

From the above tests we could suggest it adds from 10% to over 1000% CPU.

Or alternatively between 10% to 0% elapsed time. Note that those figures are the same way round, i.e. when it added 10% CPU it added 10% elapsed time, but when it added 1000% CPU time – the elapsed time was about the same.

I could go on with this type of confusing analysis, but instead I’ll suggest this is the wrong way to think about performance in terms of TDE.

Don’t think about query performance.

Think about read performance.

TDE overhead depends on the level of your physical disk access. In the case of read query performance, it depends wholly on the level of physical reads, and seems to be a reasonable fixed overhead for each physical read.

That makes perfect sense, the overhead for querying is in decrypting data, and surely it will take pretty much the same amount of CPU to decrypt each 8KB page.

In theory this makes it simple for us to calculate what this overhead would look like on our production SQL Servers. There’s all sorts of ways of capturing physical reads (and writes).

Let’s say I take a quick look at Resource Monitor to get a ballpark figure for one of my databases on this server that I know to be quite heavy on physical reads. And let’s say I see that it averages 25MB/s during the peak hour.

From that, and from the figures above I can estimate what impact enabling TDE for this database would have on the CPU.

Well I know that 25MB equates to about 0.25 seconds of CPU to decrypt the data, and I know I have 4 cores, so I can expect that in the average second this adds 0.0625 seconds of CPU per core. I multiple that be 100 and I find that I’ve added 6.25% CPU.

The calculation I’ve just done is:

(Reads/Second) * 100

Divided by

(MBs TDE decrypts every second/CPU) * (Number of CPU cores)

This doesn’t include writes, and it doesn’t include backups – I hope to look at that in a later post.

Now, let’s say that produces a scary number, and I’m worried about the strain that’s going to put on my CPU…

My first question would be why am I experiencing so many reads and can I alleviate that? Does data have a short shelf-life in memory? Do I have enough memory in my server – and is enough allocated to SQL?

This isn’t just in terms of TDE. SQL Server is going to perform much better if your current dataset – i.e. the data you are currently accessing most, can be held in memory. So, if TDE is causing a problem, then it’s possible your queries are slow anyway.

Again, I’m not talking about writes just yet.

Or maybe your database is heavy on physical reads because it’s a data warehouse, regularly querying historical data. In that case, is it a suitable target for encryption? Hopefully the data is well anonymised if you’re using it for reporting and therefore doesn’t contain anything personal or sensitive.

I hope.

In summary…

Just to repeat myself, if you’re wondering about TDE and its impact of query performance, which we all have done, try to reframe the question and think about its impact on read performance.

It makes more sense and it may help you to more easily quantify the impact on your servers. And if it does look like the performance may be an issue – perhaps there is tuning you can perform on your database instance to reduce the physical disk access.

T-SQL Tuesday #101 – Some Great SQL Server Tools

tsql2sday150x150

This month for T-SQL Tuesday #101 Jens Vestergaard asks us to blog about the essential tools in our SQL Toolbelt.

http://t-sql.dk/?p=1947

I’d just completed by post on CMS when I realised I’ve blogged about a few of my favourite tools in the past and that this would be a good opportunity to share them again in a quick post. So, here’s my second of two T-SQL Tuesday posts.

First we have these two great tools:

Statistics  Parser – If you do performance tuning and you’re not using this then you’re missing out. Formats and summarises the results of your STATISTICS IO and TIME commands. Big shout to Richie Rump for creating this – I use it so much I just need to type “st” in my browser search and it comes straight up!

Live Query Stats – Watch an animated execution plan while your query executes. I’ll admit I’ve not used this a lot but I still think it’s really cool tool.

And then a few different ones around the same tool:

Query Store – track query performance and plan changes over time as well as forcing SQL to use the plan you want. I’m lucky to have quite a bit of stuff on SQL 2016 and Query Store is great! I was just using it minutes ago to look at how a performance fix put in last week has improved things.

Clone Database with Query Store – If you want to see how your queries would behave in your production environment then Clone Database copies out the stats used to generate the plan. With SQL 2016 you can also a copy of your query store data so you can query away at it happily in your dev environment.

Capture the most expensive queries across your instance with Query Store – blowing my own trumpet a bit here, but I’ve used this script a lot since I wrote it.

That’s it for now – at least for tools I’ve written about – I look forward to seeing what the rest of the T-SQLT community comes up with.