Statistics are vital in allowing SQL Server to execute your queries in the most performant manner. Having a deeper understanding of how Statistics work really helps when you are performance tuning
I talked a few posts ago about Automatic Sample Sizes for Statistics Updates. From SQL 2016 CU4 we’ve been able to override that. You can manually update a statistics object specifying a given sample rate, and specify that that sample rate is what will be used for any subsequent automatic updates on the same object.
You can’t change the default sample rate globally, but this functionality is actually more useful. It means that if you find that you have a particular statistics object where the automatic sampling rate is too low, and that has a negative effect on query performance, then you can choose a sampling rate specific for that object.
Let’s take a quick look at doing this in practice.
Here’s some reused SQL from the previous post to create a table and throw in a million rows:
--Create table for Testing CREATE TABLE dbo.Test( Id INT IDENTITY(1,1) CONSTRAINT PK_Test PRIMARY KEY CLUSTERED, TextValue VARCHAR(20) NULL ); --Create index on TextValue CREATE INDEX IX_Test_TextValue ON dbo.Test(TextValue); --Insert a bunch of rows INSERT INTO dbo.Test(TextValue) SELECT TOP 1000000 'blah' FROM sys.objects a, sys.objects b, sys.objects c, sys.objects d;
I haven’t updated the stats yet. As we can see if I view them:
--View the statistics DBCC SHOW_STATISTICS('dbo.Test', IX_Test_TextValue) WITH STAT_HEADER;
The stats haven’t any information yet but there is still one thing to notice. See that final column Persisted Sample Percent? That was added to support this functionality.
The stats will only get updated by the auto stats mechanism when I run a query against the table and the optimiser is interested in them but sees they’re stale. From zero to a million rows should be stale enough. Let’s run a quick query:
--Random Query SELECT * FROM dbo.Test WHERE TextValue = 'not blah’;'
And view the stats again:
We can see the stats got updated and the table was big enough that SQL decided to sample rather than scan the whole index. About 42% of the rows got sampled. Persisted Sample Percent is set to zero, i.e. it is not set.
Let’s say that I want to make sure that this statistics object always updates with a full scan.
What I can now do is manually update the statistics and specific that the sample percentage should be persisted:
--Update the stats and persist the sample rate UPDATE STATISTICS dbo.Test IX_Test_TextValue WITH FULLSCAN, PERSIST_SAMPLE_PERCENT = ON;
Let’s check that did what we wanted by viewing the stats:
Great, we can see all the rows were sampled and the Persisted Sample Percent is now set to 100.
Now let’s see what happens when another auto update kicks in. I insert another million rows with the same SQL from above. Then I run the select query again. The optimiser notices the statistics are now stale and so it updates them and recompiles the plan for the query. We can see this when we check the stats yet again:
SQL has done exactly what we wanted, it performed the auto stats update in the background and it used the 100% persisted sample rate.