SQL Server functionality can move on pretty quickly sometimes, and it’s not always all about the big features but the many little enhancements the SQL team implements to make the product better and easier to use.
I’d barely published my post about Automatic Sample Sizes for Statistics Updates when I saw that Microsoft had enhanced the functionality in the latest cumulative update for SQL Server.
From SQL 2016 CU4 you can manually update a statistics object specifying a given sample rate, and you can specify that the sample rate you specified is what will be used for any subsequent automatic updates on the same object.
You can’t change the default sample rate globally, but this functionality is actually more useful. It means that if you find that you have a particular statistics object where the automatic sampling rate is too low, and that has a negative effect on query performance, then you can choose a sampling rate specific for that object.
Let’s take a quick look at doing this in practice.
Here’s some reused SQL from the previous post to create a table and throw in a million rows:
--Create table for Testing CREATE TABLE dbo.Test( Id INT IDENTITY(1,1) CONSTRAINT PK_Test PRIMARY KEY CLUSTERED, TextValue VARCHAR(20) NULL ); --Create index on TextValue CREATE INDEX IX_Test_TextValue ON dbo.Test(TextValue); --Insert a bunch of rows INSERT INTO dbo.Test(TextValue) SELECT TOP 1000000 'blah' FROM sys.objects a, sys.objects b, sys.objects c, sys.objects d;
I haven’t updated the stats yet. As we can see if I view them:
--View the statistics DBCC SHOW_STATISTICS('dbo.Test', IX_Test_TextValue) WITH STAT_HEADER;
The stats haven’t any information yet but there is still one thing to notice. See that final column Persisted Sample Percent? That’s new.
The stats will only get updated by the auto stats mechanism when I run a query against the table and the optimiser is interested in them but sees they’re stale. From zero to a million rows should be stale enough. Let’s run a quick query:
--Random Query SELECT * FROM dbo.Test WHERE TextValue = 'not blah’;'
And view the stats again:
We can see the stats got updated and the table was big enough that SQL decided to sample rather than scan the whole index. About 42% of the rows got sampled. Persisted Sample Percent is set to zero, i.e. it is not set.
Let’s say that I want to make sure that this statistics object always updates with a full scan.
What I can now do is manually update the statistics and specific that the sample percentage should be persisted:
--Update the stats and persist the sample rate UPDATE STATISTICS dbo.Test IX_Test_TextValue WITH FULLSCAN, PERSIST_SAMPLE_PERCENT = ON;
Let’s check that did what we wanted by viewing the stats:
Great, we can see all the rows were sampled and the Persisted Sample Percent is now set to 100.
Now let’s see what happens when another auto update kicks in. I insert another million rows with the same SQL from above. Then I run the select query again. The optimiser notices the statistics are now stale and so it updates them and recompiles the plan for the query. We can see this when we check the stats yet again:
SQL has done exactly what we wanted, it performed the auto stats update in the background and it used the 100% persisted sample rate.