When do Statistics get updated?

Statistics objects are important to us for allowing SQL to make good estimates of the row-counts involved in different parts of a given query and to allow the SQL Optimiser to form efficient execution plans to delivery those query results.

Statistics get updated automatically when you rebuild (or re-organise) an index they are based on – but we only tend to rebuild indexes that are fragmented, and we don’t need fragmentation for statistics to be stale. We also may have many auto-created statistics objects that are not related to an index at all.

It’s generally recommended to have the database level setting AUTO_UPDATE_STATISTICS turned on, so that SQL can manage the process of keeping statistics up to date for us. The only excuse to turn it off is that you are managing the updates to stats yourself in a different manner. And you can always turn the auto update off at an individual table or statistics level if you need to, rather than for the whole database.

SQL Server has had the ability to automatically update statistics since version 7.0. Nonetheless for a long part of my career working with SQL Server, whenever a performance issue raised its head everyone’s knee-jerk response would be “Update Statistics!” In most cases though the people shouting that didn’t really understand what the “Statistics” were, or what mechanisms might already be in place for keeping them up to date.

Of course SQL Server isn’t perfect and sometimes it is helpful for human intelligence to intervene. But to provide intelligent intervention one has to understand how things work.

So how does the automatic updating of statistics work?

In the background SQL maintains a count of changes to tables that might affect statistics. This can be updates, inserts or deletes. So if I inserted 100 records, updated 100 records and then deleted 100 records, I would have made 300 changes.

When SQL forms an execution plan for a query it references various distribution statistics objects to estimate row-counts and to use that to try find the best plan. The statistics objects it looks at are referred to as being “interesting” in the context of the query.

Before using values from the statistics, the Optimizer will check to see if the statistics are “stale”, i.e. the modification counter exceeds a given threshold. If it does, SQL will trigger a resampling of the statistics before going on to form an execution plan. This means that the plan will be formed against up to date statistics for the table.

For subsequent executions of the query, the existing plan will be loaded from the plan cache. Within the plan, the Optimiser can see a list of the statistics objects that were deemed “interesting” in the first place. Once again it will check each of them to see if they are “stale”. If they are, an auto-update of the statistics object(s) will be triggered and once that is complete the plan will be recompiled, in case the updated statistics might suggest a better way of executing the query. Equally, if any of the statistics objects have been updated since the last execution then the plan will also be recompiled.

One important caveat to this is the database level setting AUTO_UPDATE_STATS_ASYNC (Asynchronously). Generally it is best to have this turned off, in which case the above behaviour is observed. If you turn it on however, in the case of stale stats the query execution will not wait for the stats to be updated, but will start them updating in the background while the query executes. The plan will only recompile to be based on the new stats at the next execution.

From SQL Server2008 R2 SP2 and SQL Server 2012 SP1 we have a new DMF (Dynamic Management Function) sys.dm_db_stats_properties that allows us to see how many row modifications have been captured against a given statistics object as well as when it was last refreshed, how many rows were sampled etc. Modifications are captured on a per column basis (though when statistics were originally introduced in SQL Server it was per table) so the counter will only be affected if the leading column for the statistics object has been affected by a given operation.

SELECT
s.name AS StatsName, sp.*
FROM sys.stats s
CROSS apply sys.dm_db_stats_properties(s.OBJECT_ID, s.stats_id) sp
WHERE s.name = 'IX_Test_TextValue'

Results:

Statistics_Properties

So what are the thresholds?

For a long time the thresholds were as follows. Statistics were considered stale if one of the following was true:

  • The table size has gone from 0 rows to more than 0 rows
  • The table had 500 rows or less when the statistics were last sampled and has since had more than 500 modifications
  • The table had more than 500 rows when the statistics were last sampled and the number of modifications is more than 500 + 20% of the row-count when the statistics were last sampled (when talking about tables with larger row-counts a lot of the documentation just describes this as 20% as the additional 500 becomes less and less relevant the larger the number you are dealing with).

Those thresholds did mean that when a table had a large number of rows, Statistics might not get updated that often. A table with a million rows would only have stats updated if about 200,000 rows changed. Depending on the distribution of the data and how it is being queried this could be a problem.

So, in SQL 2008 R2 SP2 Microsoft introduced Traceflag 2371 which when set would reduce the stale statistics threshold for larger tables. From SQL 2016 this is the default functionality.

That adds the following test for statistics being stale:

  • If the number of rows (R) when the statistics were last sampled is 25,000 or more and the number of modifications is more than the square root of R x 1000:

Statistics_1000R

Now, I’m just going to correct myself here, the documentation I’ve found SAYS the threshold is 25,000 but when I started to have a play that didn’t seem to be the case at all.

What actually seems to happen is that whichever of the two estimates is smaller gets used i.e

Either:

Statistics_20pcnt

Or:

Statistics_1000R

Whichever is smaller.

I don’t know if this means that both get evaluated and the smaller is used, or if the threshold between the two rules is simply defined at the point where the second formula gives the smaller result – which is after 19,682 rows. I discovered that threshold by solving where the two equations above would give the same result – then by experimenting to prove it in practice.

I think this incorrect stating of 25,000 as the threshold probably comes from confusion, taking an approximation (20%) as the actual figure. Remember I mentioned that people often generalise to say that statistics are stale after 20% of the rows change, and forget about the extra 500 rows. If that was true and it was exactly 20%, then the threshold would be 25,000 as that would be the point that both equations are equal.

Anyway it’s not even vaguely important to know that. I just found it interesting! Note that the tests above were carried out on SQL Server 2012 SP3 so could well be different on later versions.

To more visually understand the above rules, here’s a table showing the thresholds for some example table sizes under both the Old algorithm (without the traceflag) and the New algorithm (with the traceflag or on SQL 2016 or later).

R is the number of rows when the statistics were last sampled and T is the number of modifications for statistics to be considered stale:

Statistics_Thresholds

You can see for the larger table sizes there is a massive difference. If you’ve got large tables you’re querying against and are having to update the statistics manually to keep them fresh then you may find implementing the traceflag is a help.

For large tables statistics are sampled when being updated rather than the whole table being necessarily being read. I have details on that in this post:

Automatic Sample Sizes for Statistics Updates

How does Query Store capture cross database queries?

When I was writing the script shared in my last post Identify the (Top 20) most expensive queries across your SQL Server using Query Store a question crossed my mind:

Query Store is a configuration that is enabled per database, and the plans and stats for queries executed in that database are stored in the database itself. So what does query store do when a query spans more than one database?

Does it record the execution stats in all databases involved or does it store them in one based on some criteria (e.g. the one where the most work occurs)? Or does it somehow proportion them out between the databases?

This was relevant as it crossed my mind that if it records them in multiple database then my query in the above post could be double counting.

Time to test and find out.

I created three databases, Fred, Bert and Ernie. Then a table called Fred in database Fred, and a table called Bert in database Bert. In table Fred I created a bunch of records, then in table Bert I created a much bigger bunch of records:

DROP DATABASE IF EXISTS Fred;
DROP DATABASE IF EXISTS Bert;
DROP DATABASE IF EXISTS Ernie;

CREATE DATABASE Fred;
CREATE DATABASE Bert;
CREATE DATABASE Ernie;

USE Fred;
CREATE TABLE dbo.Fred(Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, FredText NVARCHAR(500));

INSERT INTO dbo.Fred(FredText)
SELECT a.name + b.name
FROM sys.objects a, sys.objects b;

USE Bert;
CREATE TABLE dbo.Bert(Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, BertText NVARCHAR(500));

INSERT INTO dbo.Bert(BertText)
SELECT a.name + b.name + c.name 
FROM sys.objects a, sys.objects b, sys.objects c;

Then I turned on Query Store for all three databases:

USE MASTER;
ALTER DATABASE Fred SET query_store = ON;
ALTER DATABASE Bert SET query_store = ON;
ALTER DATABASE Ernie SET query_store = ON;

Once that was done I concocted a horrible query that was bound to be horrendously slow – so I knew it would be easy to find when I queried the Query Store runtime stats:

SET STATISTICS IO ON

SELECT TOP 100000 *
FROM Fred.dbo.Fred f
INNER JOIN Bert.dbo.Bert b
   ON b.BertText LIKE  '%' + f.FredText + '%';

I turned STATISTICS IO on so I could see how much work was happening in each database.

I ran the query first in a query window pointing at the Fred database, then I ran my query store query from the previous post (Capture the most expensive queries across your SQL Server using Query Store) to see what had been captured. I made it slightly easier for myself by adding an additional where clause to the cursor so that it only looked at these databases:

--Cursor to step through the databases
DECLARE curDatabases CURSOR FAST_FORWARD FOR
SELECT [name]
FROM sys.databases 
WHERE is_query_store_on = 1
AND name IN ('Fred','Bert','Ernie');

I cleared down Query Store for all the databases:

USE MASTER;
ALTER DATABASE Fred SET QUERY_STORE CLEAR;
ALTER DATABASE Bert SET QUERY_STORE CLEAR;
ALTER DATABASE Ernie SET QUERY_STORE CLEAR;

Then I repeated these steps for Bert and Ernie.

The Statistics IO for the query (regardless of which database context I had set) was as follows:
Table ‘Bert’. Scan count 24, logical reads 5095742, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Fred’. Scan count 25, logical reads 50, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So you can see most of the work occurs in the Bert database, a little in Fred, and none in Ernie.

Now let’s see what query store captured when I ran the query pointing at database Fred:
QS_Fred

And pointing at database Bert:
QS_Bert

And pointing at database Ernie:
QS_Ernie

You can see that the figures get recorded against whichever database you are pointing at – regardless of where the data being accessed resides. I left the “TotalLogicalReads %” in the above screen shots so you can see I’m not hiding anything.

This has a few implications. First, I’m happy because it means my “Expensive queries” script isn’t double counting.

Second though, as you can’t turn on query store on in any of the system databases, you won’t be able to capture details for any queries executed with those as the context. That includes ad-hoc queries where the user may connect to master, but execute queries against your other databases.

Fortunately (because Query Store would be pretty pointless if it did) this doesn’t apply to stored procedures.

I’m going to wrap my horrible query into a stored procedure, and deploy it into database Ernie:

USE Ernie;
CREATE PROCEDURE dbo.Horrible
AS
BEGIN
   SELECT TOP 100000 *
   FROM Fred.dbo.Fred f
   INNER JOIN Bert.dbo.Bert b
      ON b.BertText LIKE  '%' + f.FredText + '%';
END;

Now I clear my Query Stores for the three database one last time. Then I’ll called the stored procedure from database Fred:

USE Fred;
EXEC Ernie.dbo.Horrible;

Here’s what I get from query store now:
QS_Sproc

So, Query Store logs the execution against database Ernie – where the stored procedure resides, rather than Fred – where it was called from, or Bert – where most of the work was done.

I hope you’ll trust me enough on that that I don’t have to demonstrate all the other combinations!

Related posts:

Introduction to SQL Server Query Store

Identify the (Top 20) most expensive queries across your SQL Server using Query Store

Identify the (Top 20) most expensive queries across your SQL Server using Query Store

I’m a big fan of using queries based on the dynamic management view sys.dm_exec_query_stats to identify the most resource hungry queries across a SQL instance. There are lots of versions of this sort of query around the place if you google for “Top 20 queries”.

That approach has some limitations though. First, it is cleared out every time an instance restarts, and second it only keeps figures for currently cached plans, so when a query recompiles, data is lost.

The DMVs provided by SQL Server Query Store solve these issues, as data is persisted in the database over time, so nothing is lost on restarts etc. And one extra thing you gain by using the Query Store DMVs is that you can slice by time interval, for instance if you want to look at the before and after states relating to a change that has been made – or you want to look at an interval where performance degradation has been reported.

Some time ago I wrote a query store version of the “Top 20 queries” query that will produce a ranked list of your most expensive queries – and I’ve ended up using this a lot.

The only downside of using the DMVs for Query Store is that they are per database whereas dm_exec_query_stats is a view across the whole instance. So I had to use a cursor and a temp table, populating the temp table for each database in turn.

Here’s the query:

--Gather and report on most resource hungry queries
DECLARE @Reportinginterval int;
DECLARE @Database sysname;
DECLARE @StartDateText varchar(30);
DECLARE @TotalExecutions decimal(20,3);
DECLARE @TotalDuration decimal(20,3);
DECLARE @TotalCPU decimal(20,3);
DECLARE @TotalLogicalReads decimal(20,3);
DECLARE @SQL varchar(MAX);

--Set Reporting interval in days
SET @Reportinginterval = 1;

SET @StartDateText = CAST(DATEADD(DAY, -@Reportinginterval, GETUTCDATE()) AS varchar(30));

--Cursor to step through the databases
DECLARE curDatabases CURSOR FAST_FORWARD FOR
SELECT [name]
FROM sys.databases 
WHERE is_query_store_on = 1;

--Temp table to store the results
DROP TABLE IF EXISTS #Stats;
CREATE TABLE #Stats (
   DatabaseName sysname,
   SchemaName sysname NULL,
   ObjectName sysname NULL,
   QueryText varchar(1000),
   TotalExecutions bigint,
   TotalDuration decimal(20,3),
   TotalCPU decimal(20,3),
   TotalLogicalReads bigint
);

OPEN curDatabases;
FETCH NEXT FROM curDatabases INTO @Database;

--Loop through the datbases and gather the stats
WHILE @@FETCH_STATUS = 0
BEGIN
    
    SET @SQL = '
	   USE [' + @Database + ']
	   INSERT intO #Stats
	   SELECT 
		  DB_NAME(),
		  s.name AS SchemaName,
		  o.name AS ObjectName,
		  SUBSTRING(t.query_sql_text,1,1000) AS QueryText,
		  SUM(rs.count_executions) AS TotalExecutions,
		  SUM(rs.avg_duration * rs.count_executions) AS TotalDuration,
		  SUM(rs.avg_cpu_time * rs.count_executions) AS TotalCPU,
		  SUM(rs.avg_logical_io_reads * rs.count_executions) AS TotalLogicalReads
	   FROM sys.query_store_query q
	   INNER JOIN sys.query_store_query_text t
		  ON q.query_text_id = t.query_text_id
	   INNER JOIN sys.query_store_plan p
		  ON q.query_id = p.query_id
	   INNER JOIN sys.query_store_runtime_stats rs
		  ON p.plan_id = rs.plan_id
	   INNER JOIN sys.query_store_runtime_stats_interval rsi
		  ON rs.runtime_stats_interval_id = rsi.runtime_stats_interval_id
	   LEFT JOIN sys.objects o
		  ON q.OBJECT_ID = o.OBJECT_ID
	   LEFT JOIN sys.schemas s
		  ON o.schema_id = s.schema_id     
	   WHERE rsi.start_time > ''' + @StartDateText + '''
	   GROUP BY s.name, o.name, SUBSTRING(t.query_sql_text,1,1000)
	   OPTION(RECOMPILE);';

    EXEC (@SQL);

    FETCH NEXT FROM curDatabases INTO @Database;
END;

CLOSE curDatabases;
DEALLOCATE curDatabases;

--Aggregate some totals
SELECT 
    @TotalExecutions = SUM(TotalExecutions),
    @TotalDuration = SUM (TotalDuration),
    @TotalCPU  = SUM(TotalCPU),
    @TotalLogicalReads = SUM(TotalLogicalReads)
FROM #Stats

--Produce output
SELECT TOP 20
    DatabaseName,
    SchemaName,
    ObjectName,
    QueryText,
    TotalExecutions,
    CAST((TotalExecutions/@TotalExecutions)*100 AS decimal(5,2)) AS [TotalExecutions %],
    CAST(TotalDuration/1000000 AS decimal(19,2)) AS [TotalDuration(s)],
    CAST((TotalDuration/@TotalDuration)*100 AS decimal(5,2)) AS [TotalDuration %],
    CAST((TotalDuration/TotalExecutions)/1000 AS decimal(19,2)) AS [AverageDuration(ms)],
    CAST(TotalCPU/1000000  AS decimal(19,2)) [TotalCPU(s)],
    CAST((TotalCPU/@TotalCPU)*100 AS decimal(5,2)) AS [TotalCPU %],
    CAST((TotalCPU/TotalExecutions)/1000 AS decimal(19,2)) AS [AverageCPU(ms)],   
    TotalLogicalReads,
    CAST((TotalLogicalReads/@TotalLogicalReads)*100 AS decimal(5,2)) AS [TotalLogicalReads %],
  CAST((TotalLogicalReads/TotalExecutions) AS decimal(19,2)) AS [AverageLogicalReads]   
FROM #Stats
--Order by the resource you're most interested in

--ORDER BY TotalExecutions DESC
--ORDER BY TotalDuration DESC
ORDER BY TotalCPU DESC
--ORDER BY TotalLogicalReads DESC

DROP TABLE #Stats;

The script limits itself to looking at databases where query store is enabled.

If you want to bring back more results you can just change the TOP statement, and if you want to look at the results ordered by a different resource (e.g. Reads) then just make sure the relevant ORDER BY clause is uncommented. With other small modifications I find this script useful in a myriad of scenarios. I hope you find it useful too.

Related:

How does Query Store capture cross database queries?

Introduction to SQL Server Query Store

Introduction to SQL Server Query Store

Introduced with SQL 2016, Query Store was, probably without doubt, the most anticipated and talked out new feature. In this post we’ll just take a brief look at it, what it is, how you set it running, and what you can use it for. This will be a fairly brief overview – you’d need a book to cover it in detail – but hopefully this will give you a flavour of how useful this will be and how to get started.

What it does, at a base level, is actually quite simple. It just stores information relating to query execution over time.

That information consists of two things:
• Execution Plans – the execution plans generated for each query are stored in the query store, and if the plan changes the new plan is also stored.
• Performance metrics – information such as CPU consumption, reads and writes, are captured and stored for each query.

This information is aggregated over intervals (default is one hour) so you can see how query performance changes over time.

This isn’t earth shatteringly new, you can already query to find out the execution plan for a query and you can also query to find aggregated performance metrics related to a given query.

The difference is that now a history can be maintained without implementing additional monitoring. Previously the performance metrics would be aggregated as a single total since the last restart of the SQL instance – and would be cleared at the next restart. Now they are persisted and time-sliced so you can actually measure change over time.

The simple activity of storing old execution plans is also profound for performance troubleshooting. Anyone who’s worked with large scale production data will have experienced the issue when a function that was working fine, fairly suddenly starts to develop performance problems.

A common cause of this is what’s known as “plan regression”. Basically this is where the execution plan has changed – and the new one’s just not as good as the old one for most executions. Previously you might suspect this was the cause of an issue you were seeing, but there was no way to easily prove it, now you can use query store to view and compare the old and new plans to verify this. You can even with a click or two force the query to go back to using the old (better) plan – though we hope people won’t overuse this and will try to at least delve into the cause and think about resolving it. There is usually a reason SQL thought the new plan would be better – and a particular plan may work for now but may not always be the best plan in the future as your data changes.

Let’s have a look at these features in little more detail.

Enabling Query Store
Query store is a database level configuration. It’s important to understand that, and that the information stored is actually stored within system tables in the database. That means that if you backup and restore the database, the information is retained. Also very importantly, the information is stored asynchronously – so there shouldn’t be any performance impact on the executed queries themselves. There will of course be some overall server overhead at the point the data does get saved, but that shouldn’t be too significant in most cases.

You can enable Query Store for a database through T-SQL (or in your source code) or through the GUI in SSMS. I just want to demonstrate doing this through the GUI so you can see some of the options. Right-click on the database, select properties, and then select the Query Store page all the way at the bottom:

QueryStore1

Above you can see Query Store enabled for the WideWorldImporters database, with all default settings.

The first setting is “Operation Mode”. By default this is set to “Off”. To enable Query Store and get it running for a particular database you change it to “Read Write”. Job Done.

The Data Flush interval is how often the query store data gets written to disk – remembered I said this was asynchronous.

The Statistics Collection interval determines the size of the time slices that your query performance metrics get aggregated into.

Then we have some stuff about data retention. It’s worth noting that if your query store fills up and nothing is happening to clear it out then it flips to Read-Only mode and won’t store any more data until space is freed up. The default amount of space set for it is 100MB – that’s not a lot of space so I really can’t see any justification from that point of view for not enabling this feature.

Leaving the “Size Based Cleanup Mode” set to Auto should make sure that old data gets purged if the query store starts to fill up. Above that is the “Query Store Capture Mode” – if you leave that to AUTO it will ignore infrequent queries or those with negligible overhead.

The last setting “Stale Query Threshold” is how long it keeps data for in days. So 30 days for default. I can see it being useful to up this significantly it we want to use Query Store to be able to monitor performance over a long period, but it may depend on how much space Query Store wants to consume for your database – remember the default is 100MB but you can up that to whatever you like.

At the bottom of the properties page you can also see some nice pie charts that show how much of a proportion of your database Query Store has allocated, and how much of that space it is using.

So that’s Query store set up and configured, let’s have a look at a few of the things it gives us.

Query Store in Action and Forcing a Plan
I’ve set up Query Store as above in a copy of the WideWorldImporters databases on a SQL 2016 instance. I’ve created a stored procedure that I’m running every two seconds and I’ve set the Statistics Collection Interval in Query Store to 1 minute (rather than an hour) so that I can get some figures and graphs out fairly quickly.

Under the database in SSMS, there is now a Query Store folder where some built in reports reside:

QueryStore2

For the sake of this blog post I’m just going to look at a couple of these. Let’s open the “Top Resource Consuming Queries” Report:

QueryStore3

You can see a few things here. On the top left is a bar chart of the most expensive queries (you’ll notice one large one and the rest are negligible in comparison – the large one’s my query). You can configure whether you want to look by CPU or Logical Reads amongst other options and whether you want to look at averages, or maximums or minimums. Basically there a whole bunch of ways you can configure your view.

I’ll be honest that I struggled with some of these built-in Query Store reports to get them to show me what I wanted, so expect a bit of playing around to figure things out if you are using this feature.

In the bar chart, the bar highlighted in green is the currently selected query, on the right we can then see a scatter graph of the execution figures for this query across our statistics intervals (remember I’ve set it to have intervals of 1 minute). You can see I’m looking at average logical reads. You will also see that this query was ticking along nicely until about 14:05 when something happened (that was me!) and the logical reads for each execution of the query suddenly shot up. The blobs on the scatter graph have also changed colour at this point and that represents that the query is now using a new execution plan.

Next to this graph is a key telling us which plan each colour of blob represents and if you click on the plan in the key that you want, the plan itself is displayed in the bottom pane. At the moment I’m looking at the original plan (Plan 1). You will notice that the title specifies that it is “not forced”, you’ll also notice a button to the right of the title that gives us the option to “Force Plan”. Let’s just hold off a minute before we do that.

Before we change anything to try and fix the issue with this query, let’s look at the “Regressed Queries” report. This is pretty similar, but you may use it from a different direction. I.e. it may not be one your most expensive queries that has started going bad, so if you look in the Regressed Queries report it will focus on ones for which the execution plan has changed in the interval you are looking at. Again I found it a little challenging to get this report to show me the query I was interested in, some playing around can be required:

QueryStore4

You can see here that I have just one big fat bar on my bar chart – as I only have one regressed query in the interval (last 30 minutes) I chose to look at. This can make it easier to identify queries suffering this issue.

I’m going to go back the the previous Resource Consumers report and try and fix the problem. Now, in reality I know what I did and why the query went bad. It was the result of something caused parameter sniffing, which is where, if a stored procedure recompiles, the execution plan that is formed may be different depending on the parameters it is executed with. Basically it forms the best plan for the parameters supplied – but that might not be the best plan for all sets of parameters. In this case I forced the stored procedure to form a plan that was going to be expensive in most cases. More on that in my next set of performance tuning workshops.

That issue would be best fixed in the code of the stored procedure, but in production, turning around a fix may take days and we have the problem right now. So let’s use the Force Plan functionality to fix the symptom – just for the moment.

I select the plan I want, and I click the “Force Plan” button. The effect is immediate and I notice it within minutes because my statistics collection interval is so small. I’ll let it run for a bit and then show you the new graph:

QueryStore5

You can see the query has now returned back to healthy (quick) execution. Note the Orange blobs all now have a tick over them to denote that this plan is now forced.

Comparing Plans

A related feature in SQL 2016 is the ability to compare two execution plans to see what’s changed. I haven’t found this that amazing myself when I’ve looked at it, but that’s mainly due to the natural limitations – if two plans are significantly different then something that highlights the differences is just going to highlight the whole thing. However it can be useful at least to have both plans on screen at the same time so you can go back and forth easily.

You don’t need to do this through Query Store – if you right-click on any Execution Plan in SSMS there is now the “Compare ShowPlan” option, and as long as you have the plan you want to compare against saved as a file then you can go ahead. Note that one good thing is that this is an SSMS feature, so as long as you have SSMS 2016 or higher you can use it to compare plans from on earlier versions of SQL Server.

With Query Store you can compare plans directly from the Store. If we go back to one of the reports above, the plans are listed in the key for the scatter graph. You can select more than one by using Shift+Click. Then you can click the button in the toolbar above the scatter graph which has the ToolTip “Compare the Plans for the selected query in separate window.”

Let’s do that for the two plans formed for our query above. The resulting view shows us two views side by side. It may be useful to look at these separately so they fit better on this page. On the left we have:

QueryStore6

The area highlighted in Red is where the tool had identified that the two plans are the same. The rest it is not so sure about. All the same it’s a nice visual view just to be able to see what both plans are doing. On the right hand side of the screen you then get this view:

QueryStore7

This shows us a comparison of the properties of whichever operator is selected in each plan – note this need not be equivalent operator in each plan. You can select the Nested Loop operator in the top and the Index Scan operator in the bottom and it will show you those – though the comparison may not be that meaningful!

So, this is kind of a useful tool, but don’t expect it to magically solve the process of comparing plans for you. In general too much changes from one plan to the next for it to be that simple – but this may help – a bit…

Query Store Catalog Views

Like everything else in SQL Server – all the information you can access through the GUI in SSMS, is available directly through the use of system catalogs and views. So if you want to examine the information in a way that isn’t supported by the built in reports then you can just go ahead and write your own queries.

The new views available are:

sys.database_query_store_options
sys.query_context_settings
sys.query_store_plan
sys.query_store_query
sys.query_store_query_text
sys.query_store_runtime_stats
sys.query_store_runtime_stats_interval

Rather than me going into detail here, I’ll just refer you to the MSDN reference:

https://msdn.microsoft.com/en-gb/library/dn818149.aspx

Conclusions

Query store is a great feature. It’s not rocket-science but it is very useful. In particular it massively aids the investigation of production issues, saving time for those troubleshooting them at exactly the point they need time saving – right when everything’s hitting the fan.

It is also very useful for monitoring performance over time and being able to keep ahead of scalability issues with specific queries.

The methods for forcing a plan are also excellent for quick fixes – but try not to overuse them. It is possible to force plans in earlier versions of SQL – but tricky, so people usually just fixed the code. Forcing plans can end up being a case of treating the symptoms rather than the cause – and can lead to other problems later on.

Other Posts about Query Store

Identify the (Top 20) most expensive queries across your SQL Server using Query Store

How does Query Store capture cross database queries?

Instantaneous Transaction Rollback with SQL 2019

If you’ve read about the Accelerated Database Recovery feature in SQL Server 2019 you could be forgiven for thinking it’s just about speeding up database recovery time in case of a server failure.

In fact, enabling it also means that where you have a long running transaction that fails or is cancelled the rollback is almost instantaneous. This is great news for DBAs who have to sometimes kill a long-running blocking transaction but worry that it may take a long time to rollback – continuing to block all that time.

This is achieved by the fact that Accelerated Database Recovery maintains a version store in the database, and where a row is updated, the old version of the row is kept until after the transaction is complete. That makes it quick and easy to revert to the old version in case of failure.

Let’s look at a quick example.

I have a table with about 10 million rows – all containing the same text value:

CREATE DATABASE TestADR;
USE TestADR;

CREATE TABLE dbo.TestADR(Id int IDENTITY, SomeText varchar(50));

INSERT INTO dbo.TestADR (SomeText)
SELECT TOP 10000000 'FrangipanDeluxe' 
FROM sys.objects a, sys.objects b, sys.objects c, sys.objects d;

I update all the rows in the table to a new value:

UPDATE dbo.TestADR SET SomeText = 'FrangipanDeluxA';

This took about a minute.

I then execute a query to change them back and cancel the query in SSMS after about 30 seconds.

UPDATE dbo.TestADR SET SomeText = 'FrangipanDeluxe';

It took about 30 seconds more to cancel – which is SQL rolling back the changes.

Then I enabled Accelerated Database Recovery, you do this at the database level:

ALTER  DATABASE TestADR 
SET ACCELERATED_DATABASE_RECOVERY = ON;

Now I re-run that last update, again cancelling after 30 seconds.

This time the cancel was instantaneous, it took SQL no noticeable amount of time to roll back the changes.

This is great but we’ll probably want to be careful before we enable it on all our databases – when we get them onto SQL 2019 anyway. There will be an additional overhead in managing the version store and that could have an impact in terms of time taken to complete write queries, as well as storage requirements.

Still, it seems like a good feature  – something to look forward to playing with more.

Check Query Progress with Live Query Stats

This is something I touched on back in 2017 a little after the Live Query Statistics feature was introduced with SQL 2016, but I was using the functionality this morning and felt like it was worth a reminder.

https://matthewmcgiffen.com/2017/02/23/livequerystats/

You can use Live Query Stats to check on the progress of an executing query – and you can do it through the GUI in SSMS.

I created a long running query for a demo, and after 15 minutes I was still waiting for it to finish. That was a bit longer than I intended. Should I kill it – or did I just need to wait a few more minutes for it to complete.

You can check this quickly via the Activity Monitor:

Find the query you are interested in in the processes list:

Right-click and select “Show Live Execution Plan”. That will show you something like this:

I can see from this that my query is about 83% complete, so maybe I’ll just wait a little longer. Note that this is a live view, so the numbers keep updating. If I want I can watch the progress.

This is against a SQL 2019 instance and is the out of the box behaviour. Before SQL 2019 you had to enable trace flag 7412 if you wanted this to work:

DBCC TRACEON(7412,-1);

Find queries with high memory grants – using Query Store

I had a server that looked like it had been suffering from memory contention. I wanted to see what queries were being run that had high memory requirements. The problem was that it wasn’t happening right now – I needed to be able to see what had happened over the last 24 hours.

Enter Query Store. In the run-time stats captured by Query Store are included details relating to memory.

I decided to use the column max_query_max_used_memory from sys.query_store_runtime_stats. In books online this is defined as:

Maximum memory grant (reported as the number of 8 KB pages) for the query plan within the aggregation interval.

I took my script from the following post https://matthewmcgiffen.com/2017/11/01/capture-the-most-expensive-queries-across-your-sql-server-using-query-store/ and modified it to look just at this metric.

Here’s the script, it collates figures across all databases that have Query Store enabled and returns the top 50 queries with the highest memory grants. This is looking over the last 24 hours, but you can easily modify that to look at details for any interval you are interested in:

--Gather and report on most memory hungry queries
DECLARE @Reportinginterval int;
DECLARE @Database sysname;
DECLARE @StartDateText varchar(30);
DECLARE @TotalExecutions decimal(20,3);
DECLARE @TotalDuration decimal(20,3);
DECLARE @TotalCPU decimal(20,3);
DECLARE @TotalLogicalReads decimal(20,3);
DECLARE @SQL varchar(MAX);

--Set Reporting interval in days
SET @Reportinginterval = 1;

SET @StartDateText = CAST(DATEADD(DAY, -@Reportinginterval, GETUTCDATE()) AS varchar(30));

--Cursor to step through the databases
DECLARE curDatabases CURSOR FAST_FORWARD FOR
SELECT [name]
FROM sys.databases 
WHERE is_query_store_on = 1
  AND state_desc = 'ONLINE';

--Temp table to store the results
DROP TABLE IF EXISTS #Stats;
CREATE TABLE #Stats (
   DatabaseName sysname,
   SchemaName sysname NULL,
   ObjectName sysname NULL,
   QueryText varchar(1000),
   MaxMemoryGrantMB decimal(20,3)
);

OPEN curDatabases;
FETCH NEXT FROM curDatabases INTO @Database;

--Loop through the datbases and gather the stats
WHILE @@FETCH_STATUS = 0
BEGIN
    
    SET @SQL = '
	   USE [' + @Database + ']
	   INSERT INTO #Stats
		SELECT
			DB_NAME(),
			s.name AS SchemaName,
			o.name AS ObjectName,
			SUBSTRING(t.query_sql_text,1,1000) AS QueryText,
			(MAX(rs.max_query_max_used_memory)/128) AS MaxMemoryMB
		FROM sys.query_store_query q
		INNER JOIN sys.query_store_query_text t
			ON q.query_text_id = t.query_text_id
		INNER JOIN sys.query_store_plan p
			ON q.query_id = p.query_id
		INNER JOIN sys.query_store_runtime_stats rs
			ON p.plan_id = rs.plan_id
		INNER JOIN sys.query_store_runtime_stats_interval rsi
			ON rs.runtime_stats_interval_id = rsi.runtime_stats_interval_id
		LEFT JOIN sys.objects o
			ON q.OBJECT_ID = o.OBJECT_ID
		LEFT JOIN sys.schemas s
			ON o.schema_id = s.schema_id     
		WHERE rsi.start_time > ''' + @StartDateText + '''
		GROUP BY s.name, o.name, SUBSTRING(t.query_sql_text,1,1000)
		OPTION(RECOMPILE);'

    EXEC (@SQL);

    FETCH NEXT FROM curDatabases INTO @Database;
END;

CLOSE curDatabases;
DEALLOCATE curDatabases;

--Report Results
SELECT TOP 50
	DatabaseName,
	SchemaName,
	ObjectName,
	QueryText,
	MaxMemoryGrantMB
FROM #Stats
WHERE QueryText not like 'INSERT INTO #Stats%' --Exclude current query
ORDER BY MaxMemoryGrantMB DESC;

DROP TABLE #Stats;

Adding a Column with a Default Constraint

Last week a question came up about adding a column to a table, and giving that column a default constraint. Would that default value be assigned to all existing rows, and how much processing would be involved.

Unsurprisingly, the answer is that – “it depends”.

I’ve got a table with about a million rows that just has an identity column and a text column I’ve populated from sys.objects:

CREATE TABLE dbo.TestAddColumn (Id int IDENTITY(1,1), TextValue sysname);
INSERT INTO dbo.TestAddColumn(TextValue)
SELECT a.name FROM sys.objects a, sys.objects b, sys.objects c;

Let’s add a nullable bit column and give it a default value of Zero:

ALTER TABLE dbo.TestAddColumn
ADD NewFlag bit NULL CONSTRAINT DF_TestAddColumn_NewFlag DEFAULT 0;

If I look at the table I can see that the new column contains Null values:

i.e. the default value that I’ve defined hasn’t been assigned for existing rows.

I remove the column and the default constraint:

ALTER TABLE dbo.TestAddColumn
DROP CONSTRAINT DF_TestAddColumn_NewFlag;
ALTER TABLE dbo.TestAddColumn DROP COLUMN NewFlag;

Now let’s add the same column but we’ll disallow Null values:

ALTER TABLE dbo.TestAddColumn
ADD NewFlag bit NOT NULL CONSTRAINT DF_TestAddColumn_NewFlag DEFAULT 0;

If we check the table again:

This time we can see that the default value has been assigned.

So whether our default value gets assigned to existing rows depends on whether your column is nullable or not, a nullable column will retain Null as the value. A non-nullable column will get assigned the new default value.

If you want to override that behaviour, and have your default assigned even where the column is nullable, you can use the WITH VALUES statement. First I’ll remove the constraint and column then add it again with values:

ALTER TABLE dbo.TestAddColumn
ADD NewFlag bit NULL CONSTRAINT DF_TestAddColumn_NewFlag DEFAULT 0 WITH VALUES;

We look at the data again:

You can see that the value has been assigned even though the column is Nullable.

One neat thing to note, is the performance impact when carrying out these actions.

Each time I added the column I captured the execution overhead using:

SET STATISTICS IO, TIME ON;

In all cases the resource usage measured was Zero. The Add Column operation was a meta-data only operation – no data in the table needed to be updated – even where the new column was assigned a value.

This was some clever jiggery-pokery added in SQL Server 2012 .

Thanks for reading!

MAXDOP, Parallelism and the Cloud

Parallelism and MAXDOP

The pros and cons of parallelism have always been with us in SQL Server and I blogged about this a couple of years ago. This is an updated version of that post to include details of the new wait stat related to parallelism that was added in 2017 (CXCONSUMER), as well as to discuss the options available for cloud based SQL Server solutions.

There’s no doubt that parallelism in SQL is a great thing. It enables large queries to share the load across multiple processors and get the job done quicker.

However it’s important to understand that it has an overhead. There is extra effort involved in managing the separate streams of work and synchronising them back together to – for instance – present the results.

That can mean in some cases that adding more threads to a process doesn’t actually benefit us and in some cases it can slow down the overall execution.

We refer to the number of threads used in a query as the DOP (Degree of Parallelism) and in SQL Server we have the setting MAXDOP (Maximum Degree of Parallelism) which is the maximum DOP that will be used in executing a single query.

Microsoft generally recommend caution setting MAXDOP above 8:

https://support.microsoft.com/en-gb/help/2806535/recommendations-and-guidelines-for-the-max-degree-of-parallelism-configuration-option-in-sql-server

Here’s a nice post from Kendra Little talking about how having higher settings can actually slow down your query execution time:

https://www.brentozar.com/archive/2013/12/q-can-high-maxdop-make-a-query-slower/

Out of the box, MAXDOP is set to 0, which means there is no limit to the DOP for an individual query. It is almost always worth changing this to a more optimal setting for your workload.

Cost Threshold for Parallelism

This is another setting available to us in SQL Server and defines the cost level at which SQL will consider a parallel execution for a query. Out of the box this is set to 5 which is actually a pretty low number. Query costing is based on Algorithm’s from “Nick’s machine” the box used by the original developer who benchmarked queries for Microsoft.

Nicks_Machine

(Nick’s Machine)

Compared to modern servers Nick’s machine was pretty slow and as the Cost Threshold hasn’t changed for many years, it’s now generally considered too low for modern workloads/hardware. In reality we don’t want all our tiny queries to go parallel as the benefit is negligible and can even be negative, so it’s worth upping this number. Advice varies but generally recommendations say to set this somewhere in the range from 30 to 50 (and then tuning up and down based on your production workload).

There are many articles in the SQL Server community about how the out of the box setting is too low, and asking Microsoft to change it. Here’s a recent one:

http://www.scarydba.com/2017/03/13/change-the-cost-threshold-for-parallelism/

CXPACKET and CXCONSUMER waits

Often in tuning a SQL Server instance we will look at wait stats – which tell us what queries have been waiting for when they run. CXPACKET waits are usually associated with parallelism and particularly the case where multi-threaded queries have been stuck waiting for one or more of the threads to complete – i.e. the threads are taking different lengths of time because the load hasn’t been split evenly. Brent Ozar talks about that here:

https://www.brentozar.com/archive/2013/08/what-is-the-cxpacket-wait-type-and-how-do-you-reduce-it/

High CXPACKET waits can be – but aren’t necessarily – a problem. You can cure CXPACKET waits by simply setting MAXDOP to 1 at a server level (thus preventing parallelism) – but this isn’t necessarily the right solution. Though in some cases in can be, SharePoint for instance is best run with MAXDOP set to 1.

What you can definitely deduce from high CXPACKET waits however is that there is a lot of parallelism going on and that it is worth looking at your settings.

To make it easier to identify issues with parallelism, with SQL Server 2017 CU3 Microsoft added a second wait type related to parallelism – CXCONSUMER. This wait type was also added to SQL Server 2016 in SP2.

Waits related to parallelism are now split between CXPACKET and CXCONSUMER.

Here’s the original announcement from Microsoft regarding the change and giving more details:

https://blogs.msdn.microsoft.com/sql_server_team/making-parallelism-waits-actionable/

In brief, moving forward CXPACKET waits are the ones you might want to worry about, and CXCONSUMER waits are generally benign, encountered as a normal part of parallel execution.

Tuning Parallelism

In tuning parallelism we need to think about how we want different sized queries to act on our server.

Small Queries

In general we don’t want these to go parallel so we up the Cost Threshold to an appropriate number to avoid this. As discussed above 30 is a good number to start with. You can also query your plan cache and look at the actual costs of queries that have been executed on your SQL Instance to get a more accurate idea of where you want to set this. Grant Fritchey has an example of how to do that here:

http://www.scarydba.com/2017/02/20/estimated-costs-queries/

As he mentions in the post, this assessment can be quite expensive to run – so do it when things are quiet.

Medium to Large Queries

This is where we want to take advantage of parallelism, and do so by setting MAXDOP to an appropriate level. Follow the guidelines from the Microsoft article referenced above. Here it is again:

https://support.microsoft.com/en-gb/help/2806535/recommendations-and-guidelines-for-the-max-degree-of-parallelism-configuration-option-in-sql-server

Often the answer is going to be simply to set it to 8 – but then experiment with tuning it up and down slightly to see whether that makes things better or worse.

Very Large Queries

If we have a mixed workload on our server which includes some very expensive queries – possibly for reporting purposes – then we may want to look at upping the MAXDOP for these queries to allow them to take advantage of more processors. One thing to consider though is – do we really want these queries running during the day when things are busy? Ideally they should run in quieter times. If they must run during the day, then do we want to avoid them taking over all the server power and blocking our production workload? In which case we might just let them run at the MAXDOP defined above.

If we decide we want to let them have the extra power then we can override the server MAXDOP setting with a query hint OPTION(MAXDOP n):

https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query

You will want to experiment to find the “best” value for the given query. As discussed above and as shown in Kendra Little’s article, just setting it to the maximum number of cores available isn’t necessarily going to be the fastest option.

Exceptions to the Rule

Regardless of the size, there are some queries that just don’t benefit from parallelism so you may need to assess them on an individual basis to find the right degree of parallelism to use.

With SQL server you can specify the MAXDOP at the server level, but also override it at the database level using a SCOPED CONFIGURATION or for individual queries using a query hint. There are even other ways you can control this:

https://www.brentozar.com/archive/2016/12/ten-ways-set-maxdop/

Options in the Cloud

If your SQL Server is hosted in the cloud, then most of this still applies. You still need to think about tuning parallelism – it isn’t done for you, and the defaults are the same – so probably not optimal for most workloads.

There are in general two flavours of cloud implementation. The first is Infrastructure as a Service (IaaS) where you simply have a VM provided by your cloud provider and run an OS with SQL server on top of it in that VM. Regardless of your cloud provider (e.g. Azure, AWS etc.), if you’re using IaaS for SQL Server then the same rules apply, and you go about tuning parallelism in exactly the same way.

The other type of cloud approach is Platform as a Service (PaaS). This is where you use a managed service for SQL Server. This would include Azure SQL Database, Azure SQL Database Managed Instance, and Amazon RDS for SQL Server. In these cases, the rules still apply, but how you manage these settings may differ. Let’s look at that for the three PaaS options mentioned above.

Azure SQL Database

This is a single SQL Server database hosted in Azure. You don’t have access to server level settings, so you can’t change MAXDOP or the cost threshold. You can however specify MAXDOP at the database level e.g.

ALTER DATABASE SCOPED CONFIGURATION SET MAXDOP = 4;

Cost threshold for Parallelism however is unavailable to change in Azure SQL Database.

Azure SQL Database Managed Instance

This presents you with something that looks very much like the SQL Server you are used to, you just can’t access the box behind it. And similar to your regular SQL instance, you can set MAXDOP and the Cost threshold as normal.

Amazon RDS for SQL Server

This is similar to managed instance. It looks and acts like SQL Server but you can’t access the machine or OS. You access your RDS instance through an account that has permissions that are more limited than your usual sa account or sysadmin role allows. And one of the things you can’t do with your limited permissions is to change the parallelism settings.

Amazon have provided a way around this though and you can change both settings using something called a parameter group:

https://www.mssqltips.com/sqlservertip/5329/setting-sql-server-configuration-options-with-aws-rds-parameter-groups/

Closing Thoughts

Parallelism is a powerful tool at our disposal, but like all tools it should be used wisely and not thrown at every query to its maximum – and this is often what happens with the out of the box settings on SQL Server. Tuning parallelism is not a knee-jerk reaction to high CXPACKET waits, but something we should be considering carefully in all our SQL Server implementations.

Acknowledgements

I wanted to update my original article to include the cloud options noted above, but didn’t have access to an Azure SQL Database Managed Instance to check the state of play. Thanks to TravisGarland via Twitter (@RockyTopDBA) and Chrissy LeMaire via the SQL community slack (@cl) for checking this and letting me know!

Extended Events Made Easy: Using XEvent Profiler

In this post we’ll look at how quick and easy it is to use the XEvent Profiler to create an Extended Events (XE) session to replace the most common usage of SQL Server Profiler.

Before you start, update your SSMS to a recent version (17.3) or later:

https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-2017

Now, let me ask you the question – how often do you just go into Profiler and create a trace taking all the defaults? If you’re like me then that’s pretty often. It’s nice and easy, select the menu item from SSMS and you’re going in a couple of clicks.

All XEvent Profiler does is implement a couple of standard XE sessions very similar to Profiler, that you can launch with a click or two.

You can find XE Profiler in the object explorer in SSMS under each of your SQL instances – just below SQL Server Agent:

XE_1

Because it’s there for each instance, you don’t have to then connect to the instance separately (as you do in Profiler).

You’ve got two options, Standard which pretty much equates to the default trace in Profiler, or TSQL which will just capture SQL statements being executed. Let’s right-click on Standard and select “Launch Session”. Here’s the Live Data view that comes up:

XE_2

You can see that looks pretty similar to what you see for your traditional traces.

Now, one thing I almost always do for this sort of trace in Profiler is capture the database name. This is a bit of a pain, because I have to select “View all columns”, scroll along to find the database name column, then select it for each relevant event I’m capturing.

In XE it’s much easier. I just right click on one of the column headers in the above view and select “Choose Columns”:

XE_3

Which brings up the menu of columns available to me:

XE_4

I can then just select the database_name column in the left and use the arrow button to push it into the list of selected columns. Want row_count? That’s there too, along with a whole bunch of other stuff. I’ll add those and remove a few I don’t want and exit the dialog.

The other thing I’m commonly going to want to do is add a filter to one or more of the columns. When you’re viewing live data you get the Extended Events toolbar shown in SSMS below the menu bar, just click on Filters…

XE_5

And you get the filters dialog:

XE_6

Click below “Field” to select a column and then add a value to filter on. In this case I’m going to filter to look at a specific database.

It’s worth noting that there are a bunch of operators you can use (not just equals), you can do greater than, less than, contains, does not contain, is null etc.

Now we can see the effect of those changes in the Live Data view:

XE_7

The above is what I’d usually do in Profiler, except that here I’ve also added an extra column and removed a few – because it was so easy to do. The whole thing felt like a simpler and cleaner experience than the old way.

But I get it, a marginally simpler experience probably isn’t enough to make you change habits. What might, is what you can do with your trace results once you’ve got them. In the next post we’ll look at sorting, grouping and aggregation.

Previous posts in this series:

There’s Still a Place for SQL Server Profiler

Using the built-in System Health session

Exporting Extended Events Session Data to a Table