Indexes and Statistics with Always Encrypted

In a previous post we looked at executing queries against columns encrypted using Always Encrypted. In this short post we look at the ability to be able to have (or not) indexes on those columns. This information is relevant to the case where you are using Always Encrypted without enclaves, we’ll look at working with enclaves later on.

Where we want to query with a predicate, it is often useful to have an index on the column being searched. You can only have indexes on columns using deterministic encryption – not randomized. This makes sense as a column with randomized encryption is going to have different values stored even when the underlying plaintext value is the same, so an index would be meaningless. You also can’t perform comparisons on such columns so an index wouldn’t be useful. An index on a column deterministically encrypted with Always Encrypted will simply be an index of the encrypted values and as such is only useful for equality comparisons, not for sorting or range queries – though neither of these functions are possible against our deterministically encrypted column anyway.

To create an index on a column using deterministic encryption is the same as if the column is not encrypted. Here is an example using the database and table created in the previous posts Setting Up Always Encrypted and Executing Queries Using Always Encrypted. In our table dbo.EncryptedTable we have the columns LastName (which is encrypted using deterministic encryption), and FirstName (which is encrypted using randomized encryption):

CREATE NONCLUSTERED INDEX IX_LastName
ON dbo.EncryptedTable(LastName);

While we can’t index on our columns using randomized encryption, we can include them in the leaf pages of an index so that their values do not have to be retrieved using a Key Lookup operation. So the following SQL would also work:

CREATE NONCLUSTERED INDEX IX_LastName_Include_FirstName
ON dbo.EncryptedTable(LastName) INCLUDE(FirstName);

In common with any other index, a statistics object is created that maintains information about the distribution of data in the index. We can view the data held in the statistics object with the following command:

DBCC SHOW_STATISTICS('dbo.EncryptedTable','IX_LastName');

Here we see the output:

We only have a couple of rows in our table, but if you’re familiar with statistics objects, you’ll see this is pretty much the same as we would see where there is no encryption. What is interesting is to look at the last result set. Normally this would show actual data values in the RANGE_HI_KEY column; here we can see that we have an encrypted value. From this, we can understand that the statistics are based on the encrypted values. This underlines the point already made, that our index is an index of the encrypted values of our data. That makes sense as SQL Server is never aware of the unencrypted values, so it wouldn’t be possible for it to create or maintain indexes (or statistics) of the plaintext values.

This is different when using always encrypted with enclaves where indexes on columns with randomized encryption are allowed. In that case, those indexes, and associated statistics objects, are actually based on the plaintext values. As mentioned, we’ll talk about how that works in a later post.

This post is part of a comprehensive series on SQL Server Encryption. Subscribe to my blog for updates as new posts are published or you can buy my book through the link in the sidebar to get it all in one go.

And if you’re embarking on an encryption project (or anything else to do with SQL Server) and want some help, I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Under Promise and Overdeliver

There’s a monthly “blog party” called T-SQL Tuesday that invites blog posts on a given topic related to SQL Server and data. This month Gethyn Ellis has asked people to share the best piece of advice they’ve received in their careers.

For me that’s an easy answer. “Under promise and overdeliver” is a practice that has held me in great stead all through my IT career. I first came across the phrase in my very first development job, working at BT back in the 90s. Our team had a manager, Mike Taylor, who would say this to us on a regular basis.

If you under promise and overdeliver then you will always have happy clients.

Let me give an example. If I’m asked to look at a poorly performing SQL Server instance for a client, and I know that it’s not had much love from an optimization point of view, then I can be pretty confident I can knock the CPU consumption in half and to be honest I’m going to be disappointed if I can’t reduce it by between 75 and 90%. The client however just wants it back running within acceptable parameters. If it’s running close to 100% CPU and causing timeouts in the client, they just want that under control. They will be happy if I get that back down to 70% and the timeouts go away – so that’s the expectation I will set as a minimum target. They will be very satisfied when I achieve that, very happy when I exceed that and get things down to 50% and ecstatic if it goes down to 25% or less – especially once I mention that they have the potential to reduce the size of kit the instance is running on and save a lot of money.

If I start out by saying I’ll reduce it by 75% but only reduce it by half then they will be satisfied, but I’ve given them a disappointing outcome compared to what I promised.

Another example is in timescales when promising delivery of work. It’s usually much better to say that a fix will take a day and deliver it early or on time, and to a high quality, than to say you can knock it out in a couple of hours and then having to keep pushing that back due to unforeseen problems. Worse, when you over-promise, you end up having to spend time on update calls and on resetting expectations, which means the task ends up taking you longer. You are also rushed and it’s much more likely that poor quality will creep in.

Related to timescales, another piece of advice I’ve come across was a rule of thumb for estimation. Basically, when trying to estimate how long a piece of work might take, give it your best estimate, then double it. You might be surprised how often that’s much more accurate than the original figure. Partly because we generally want to please and tend to be over-optimistic when trying to figure out how long something will take us.

We’ve all been on the other side of this. How frustrating is it when you have a problem with a service you use and contact support to be told it will be fixed in a given timeframe and that doesn’t happen. Or that you will be contacted by a certain time, but you don’t get a call or a message. More of your time gets wasted chasing for an update – which also wastes time for the organisation the promise came from. Much better to be given a realistic expectation at the outset – even if that’s disappointing. You can at least then look at how you’re going to work around the issue in the short-term.

It’s sometimes difficult, because you have to be firm about the expectation you set at the outset, but once you have delivered a solution, a fix, or other deliverable that exceeds that expectation, people are always happy and that it what they remember after the event. If you have over-promised, that it also likely to be what they remember.

So that’s the advice I was given and I pass it on to you – “Under promise and overdeliver”.

Got a problem and could do with some assistance? Get in touch today to see how I can help.

Query Store Hints in SQL Server 2022

Another neat little feature in SQL Server 2022 is Query Store Hints. This is the ability to apply a query hint through Query Store rather than having to modify existing code or fiddle around with plan guides.


Query hints are a way to influence optimizer behaviour towards generating desired execution plans for a given query. The word “hint” is a bit of a misnomer as usually they mandate what you wish to happen.


I’m not a big fan of leaping straight away to use query hints as they force the behaviour rather than letting the optimizer work out the best plan for your query. This means you may implement hints that you see as desirable for the current issue you are trying to fix, but they create problems for different executions of the query, or if the data profile changes. In general it’s better to see if you can rewrite the SQL code to achieve what you want, but sometimes that’s not possible and a hint ends up being the best way to go – or at least a good short term solution.


The great thing about applying hints through query store is that it can be done without altering application code. That means the DBA can apply a temporary hint while waiting for code to be fixed – or in extremis use them against third party applications where there is not the ability to get code changed.


I first read about them in Bob Ward’s book SQL Server 2022 Revealed. You can also read about them in Grant Fritchey’s excellent (and long!) book SQL Server 2022 Query Performance Tuning.


Applying a hint is simple as long as the query has a plan stored in query store. First you need to obtain the query_id for the query. One method for doing that is to use the following query I got from this stackexchange post:


SELECT
qsq.query_id,
qsq.last_execution_time,
qsqt.query_sql_text
FROM sys.query_store_query qsq
INNER JOIN sys.query_store_query_text qsqt
ON qsq.query_text_id = qsqt.query_text_id
WHERE
qsqt.query_sql_text LIKE '%your query text%';

Armed with the query_id, you can then apply the hint using the sys.sp_query_store_set_hints stored procedure. For example, if my query_id was 100 and I want to apply the RECOMPILE hint then I could do that with this SQL:


EXEC sys.sp_query_store_set_hints 100, N'OPTION(RECOMPILE)';

You can also view which queries have hints applied using the sys.query_store_hints view.


Finally if you want to clear a hint you have previously applied (such as the one above), you can do that as follows:


EXEC sp_query_store_clear_hints @query_id = 100;

Query store does not support all query hints. You see the full list of those supported (and not) at:

https://learn.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-query-store-set-hints-transact-sql?view=sql-server-ver16

And that’s it – it’s that simple. Thanks for reading!

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Statistics and the Ascending Key Problem

I’ve mentioned previously how not having up to date statistics can cause problems in query performance. This post looks at something called the Ascending Key Problem which can badly affect your cardinality estimation in some cases and therefore your execution plans.

The Ascending Key Problem relates to the most recently inserted data in your table which is therefore also the data that may not have been sampled and included in the statistics histograms. This sort of issue is one of the reasons it can be critical to update your statistics more regularly than the built-in automatic thresholds.

We’ll look at the problem itself, but also some of the mitigations that you can take to deal with it within SQL Server.

The Problem

Imagine you have a table that stores a set of events. As new records are inserted they get stamped with the current date and time. You regularly query that table based on that EventDate looking to find recent events, let’s say just for the current day.

Even if you haven’t indexed the EventDate column (though why haven’t you?!), as long as you have AUTO CREATE STATISTICS and AUTO UPDATE STATISTICS on for your database you’ll have “reasonably” up to date statistics for that column.

But “reasonably” may not be good enough. When you’re querying against the most recent data it may not yet have been sampled by a statistics update and the range you are looking for may fall beyond the top of the histogram captured in the statistics object for EventDate. Imagine that statistics were last updated yesterday. When the Optimizer checks the statistics to estimate a rowcount for today’s date it finds that is above the top bound. So what should it guess?

Historically it would guess that there were zero rows, but as always the cardinality estimation gets set to the minimum of 1. If the real answer is a lot larger you might end up with a bad plan.

Let’s look at that in practice.

Staying true to the example above, I create a table called Events and I index the EventDate column:

CREATE TABLE dbo.Events
(
Id INT IDENTITY(1,1) CONSTRAINT PK_Events PRIMARY KEY CLUSTERED,
EventName VARCHAR(255) NOT NULL,
EventDate DATETIME CONSTRAINT DF_Events_EventDate DEFAULT (GETDATE())
);

CREATE INDEX IX_Events_EventDate ON dbo.Events(EventDate) include (EventName);

Then I insert records to represent events at one minute intervals for 100 days:

--Insert data for 100 days at minute intervals from the start of this year
DECLARE @StartDate DATETIME = '20170101 00:00.00';

INSERT INTO dbo.Events(EventName, EventDate)
SELECT
    'Event' + CAST(num.n AS VARCHAR(10)),
    DATEADD(minute,num.n,@StartDate)
FROM 
(
    SELECT TOP 144000 row_number() OVER(ORDER BY (SELECT NULL)) AS n 
    FROM sys.objects a, sys.objects b, sys.objects c
) num;

I’m going to query to check what date range was inserted. That should have the additional advantage of triggering a statistics update:

SELECT MIN(EventDate), MAX(EventDate)
FROM dbo.Events;

AscendingKey0

As a slight digression, it’s interesting to look at the execution plan here:

AscendingKey1

You can see two index scans. That sounds horrendous, scan the index twice to find the MIN and MAX? If you look at the properties though you can see it only read one row in each case:

AscendingKey2

An index scan doesn’t have to read all the records in the index, it can bail out once it is satisfied. For a MIN or MAX type query it makes perfect sense just to jump to one end of the index and start scanning.

The side lesson is that Scans aren’t always bad for performance.

Anyway, back to the topic in hand. Now let’s look at the statistics:

DBCC SHOW_STATISTICS('dbo.Events',IX_Events_EventDate);

AscendingKey25

You can see they’re up to date. They show 144,000 rows in total which is correct. Interestingly the Histogram (bottom result-set) only has a couple of steps. SQL has determined that the data is uniformly distributed so has bunched it altogether. Clever stuff!

Let’s insert data for another day:

--Insert one more day's data
DECLARE @StartDate DATETIME;

SELECT @StartDate = MAX(EventDate) FROM dbo.Events;

INSERT INTO dbo.Events(EventName, EventDate)
SELECT
    'Event' + CAST(num.n AS VARCHAR(10)),
    DATEADD(minute,num.n,@StartDate)
FROM 
(
    SELECT TOP 1440 row_number() OVER(ORDER BY (SELECT NULL)) AS n 
    FROM sys.objects a, sys.objects b, sys.objects c
) num;

Now I query to see the new events. I captured the MAX(EventDate) earlier so let’s use that to find the new records:

SELECT *
FROM dbo.Events
WHERE EventDate > '20170411'
OPTION (RECOMPILE);

(Notice I’ve added the option to recompile so I get a fresh plan each time I run this, that will be important for testing)

Let’s not bother with the results, we all know there will 1,440 records that are basically the same. Here’s the execution plan:

AscendingKey3

The interesting bit comes when I look at the properties for the Index Seek:

AscendingKey4

Estimated number of rows = 1, Actual = 1,440. That’s quite a long way out. Of course here we have a trivial query so the massive underestimate isn’t affecting our plan. If we started joining to other tables though it would likely result in a massively inefficient plan – perhaps choosing a Nested Loops join over a Hash or Merge.

Note I’m using SQL Server 2012 for this test and I’m not using the Traceflag (2371) which reduces the threshold for statistics updates ( When do Statistics Get Updated? ):

So I’ve got nearly another 30,000 rows to insert before statistics get automatically updated and my estimates come into line. If I’m always querying for the current day then it’s going to be very rare that statistics are going to be able to help me with a good estimate.

So what’s the fix?

Before we get on to the methods that have been introduced to try and ameliorate this problem, if you face this sort of scenario you might want to consider whether you need to update your statistics objects more often than the auto-stats threshold. If you have a regular job to rebuild fragmented indexes then those indexes that get rebuilt will have their statistics refreshed – however that won’t cover the auto created statistics, and it won’t cover statistics for tables that get don’t get rebuilt.

So, if you don’t have a specific scheduled job to regularly update statistics that is definitely worth considering.

In terms of how SQL has changed to help us, from SQL Server 2005 SP1, the nature of columns began to be tracked, monitoring updates of statistics to understand how the data changes. This additional information can be seen if you enable traceflag 2388, then view the statistics. Let’s have a look at what’s gathered. First I’m going to add a couple more days of data, updating the statistics between each insert, then I run the following:

DBCC TRACEON(2388);
DBCC SHOW_STATISTICS('dbo.Events',IX_Events_EventDate);
DBCC TRACEOFF(2388);

AscendingKey5

What you see here is historical information about the updates to the statistics. This is undocumented stuff, but some of what we see we can work out the meaning for. In particular we can see how many rows were inserted since the last statistics update, and how many of those values were above the top of the old histogram. We also see a column “Leading Column Type” which has a value of “Unknown”.

Now I’m going to insert another day’s date and update the statistics once more, then we’ll look at this again:

AscendingKey6

You can see that now we have a Leading Column Type of “Ascending”. After three updates to the statistics where the Leading Value was only increasing each time, SQL Server will identify that it is an ascending column. It must be at least three updates before SQL will recognise this, and if that stops being the case (i.e. some lower values are inserted) then the next statistics update will reset this until we again get three consecutive updates with only increasing values.

This happens in the background and you don’t need the traceflag 2388 to make it happen –that is just so you can see what is going on.

The obvious question is, now SQL knows my column is ascending, has that affected the estimation for my query? Before we look I’ll insert another day of data so there is some data beyond the histogram, and then I’ll query again:

SELECT *
FROM dbo.Events
WHERE EventDate > '20170415'
OPTION (RECOMPILE);

And the properties from the execution plan:

AscendingKey7

So nope. Nothing has changed.

Yet…

To tell the query optimizer to take advantage of this extra information for ascending keys we have traceflag 2389. Let’s enable that and run the query again:

DBCC TRACEON(2389);

SELECT *
FROM dbo.Events
WHERE EventDate > '20170415'
OPTION (RECOMPILE);

The properties:

AscendingKey8

Voila! SQL Server has now estimated my rowcount perfectly.

Now, be warned. This is a rather contrived example with a perfectly and artificially smooth distribution of data. The estimate is made by checking the current maximum value in the table, and using that combined with the information existing in the statistics and the value of your predicate to extrapolate a guess. If you’re data is evenly distributed as it is here then the guess will be pretty good, if it is fairly skewed then it may be bad.

In any case though it will probably be better that the fixed value of 1 that would have been used historically.

One thing to note is that traceflag 2389 is only going to have any affect if the leading column of the relevant statistics object has been marked as ascending. There is also traceflag 2390, and this will adopt a similar behaviour even if your column hasn’t been identified as ascending, i.e. it will check the maximum value in the table and if it is higher than the max value in the histogram, it will extrapolate to work out the cardinality estimate.

So should you turn on the traceflag(s) globally?

The Microsoft recommendation is not to enable traceflags such as these unless you are suffering from the specific problem they are aiming to resolve, and even then to make sure you test carefully to ensure they are achieving what you desire.

One issue can be that in more complex queries there are a number of cardinality estimates being made. It can be that two bad estimates within the same plan might cancel each other out and the query overall performs fine. If you then implement something that fixes one of them, you risk such queries going bad – a scenario known as plan regression.

This sort of scenario is one of the reasons why Microsoft have made very few core changes to the cardinality estimator since it came out.

So, use 2389 is you are specifically encountering this sort of ascending key problem, but also, if you are in the position to change the code then you might want to consider adding it as a query hint so it only affects the specific query you are targeting. For our example query above, that would simply look like:

SELECT *
FROM dbo.Events
WHERE EventDate > '20170415'
OPTION (RECOMPILE, QUERYTRACEON 2389);

Welcome to SQL Server 2014 (and later)

In 2014 we received a substantial revamp of the Cardinality Estimator, the first since SQL Server 7.0. A bunch of assumptions and algorithms have been re-jigged based on the real-world data that Microsoft have seen in supporting their wide customer base.

Key to having a new version was the concept that, henceforth, optimizer fixes would be tied to the database compatibility version. This means that customers can upgrade their SQL Server version but if they find performance problems related to the upgrade they can downgrade their database’s compatibility level while the issues are resolved within their codebase.

One of the items specifically looked at in the new version was this Ascending Key problem. To see how things work in the latest versions I’m going to repeat many of the steps above using a database deployed on SQL Server 2016.

So I:

  • Create my table again
  • Populate with the 100 days data
  • Run a query to check the dates, which has the added benefit of updating statistics
  • Add one more day’s data

Then I’m ready to run my test query again:

SELECT *
FROM dbo.Events
WHERE EventDate > '20170411'
OPTION (RECOMPILE);

I get the same execution plan as ever so again I jump to the properties of the Index Seek operator to look at the estimates:

AscendingKey9

Now, this is interesting. I might have expected I would get either 1 row estimated (the old model) or 1,440 (the model with traceflag 2389). Instead I get 432 rows. It seems the new CE (Cardinality Estimator) uses a different algorithm.

Sometimes numbers stick in your head. I happen to know that where no statistics are available and you are querying with an inequality predicate (<, > , <=, >=) that the CE will estimate the number of rows to be 30% of the total number of rows in the table. This assumption seems to have originated in a 1979 research paper from IBM suggesting 1/3 was a good guess.

With 30% in my head I noticed that 432 is 30% of 1440. So it seems that the optimizer is recognising that we are querying for values above the histogram (where no statistics exist) with an inequality, it knows from somewhere that there have been 1440 rows inserted since the last statistics update, so it takes 30% of 1440 to produce the estimate (432).

To try validate that theory I thought I’d query with a later datetime in the predicate. Sure enough, if I add 12 hours I still get 432 rows estimated. If I add 23 hours, 432 rows. In fact if I query for any date in the future, even outside of the maximum value in the table, guess what – I get an estimate of 432 rows.

I have a fascination for the algorithms involved in distribution statistics. It satisfies the maths geek in me. As such it’s difficult to end a post like this, there’s always more things to test, to try and work out. For instance what happens if you query across an interval that starts within the current histogram, but then extends above it? I’ll admit I’ve had a play, but will leave that for another post.

As a very final point in this post, I thought I’d just check whether the 2389 traceflag makes any difference to this estimation with the 2014 CE. I’ll change my query to look way into the future, enable the traceflag and look at the estimate:

SELECT *
FROM dbo.Events
WHERE EventDate > '99991231 23:59:59'
OPTION (RECOMPILE, QUERYTRACEON 2389);

Guess what? Still 432 rows… so no, the traceflag doesn’t appear to still give us any

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.extra benefit.
Though when we get to that date it’ll be someone else’s problem to sort out!

When do Statistics get updated?

Statistics objects are important to us for allowing SQL to make good estimates of the row-counts involved in different parts of a given query and to allow the SQL Optimiser to form efficient execution plans to delivery those query results.

Statistics get updated automatically when you rebuild (or re-organise) an index they are based on – but we only tend to rebuild indexes that are fragmented, and we don’t need fragmentation for statistics to be stale. We also may have many auto-created statistics objects that are not related to an index at all.

It’s generally recommended to have the database level setting AUTO_UPDATE_STATISTICS turned on, so that SQL can manage the process of keeping statistics up to date for us. The only excuse to turn it off is that you are managing the updates to stats yourself in a different manner. And you can always turn the auto update off at an individual table or statistics level if you need to, rather than for the whole database.

SQL Server has had the ability to automatically update statistics since version 7.0. Nonetheless for a long part of my career working with SQL Server, whenever a performance issue raised its head everyone’s knee-jerk response would be “Update Statistics!” In most cases though the people shouting that didn’t really understand what the “Statistics” were, or what mechanisms might already be in place for keeping them up to date.

Of course SQL Server isn’t perfect and sometimes it is helpful for human intelligence to intervene. But to provide intelligent intervention one has to understand how things work.

So how does the automatic updating of statistics work?

In the background SQL maintains a count of changes to tables that might affect statistics. This can be updates, inserts or deletes. So if I inserted 100 records, updated 100 records and then deleted 100 records, I would have made 300 changes.

When SQL forms an execution plan for a query it references various distribution statistics objects to estimate row-counts and to use that to try find the best plan. The statistics objects it looks at are referred to as being “interesting” in the context of the query.

Before using values from the statistics, the Optimizer will check to see if the statistics are “stale”, i.e. the modification counter exceeds a given threshold. If it does, SQL will trigger a resampling of the statistics before going on to form an execution plan. This means that the plan will be formed against up to date statistics for the table.

For subsequent executions of the query, the existing plan will be loaded from the plan cache. Within the plan, the Optimiser can see a list of the statistics objects that were deemed “interesting” in the first place. Once again it will check each of them to see if they are “stale”. If they are, an auto-update of the statistics object(s) will be triggered and once that is complete the plan will be recompiled, in case the updated statistics might suggest a better way of executing the query. Equally, if any of the statistics objects have been updated since the last execution then the plan will also be recompiled.

One important caveat to this is the database level setting AUTO_UPDATE_STATS_ASYNC (Asynchronously). Generally it is best to have this turned off, in which case the above behaviour is observed. If you turn it on however, in the case of stale stats the query execution will not wait for the stats to be updated, but will start them updating in the background while the query executes. The plan will only recompile to be based on the new stats at the next execution.

From SQL Server2008 R2 SP2 and SQL Server 2012 SP1 we have a new DMF (Dynamic Management Function) sys.dm_db_stats_properties that allows us to see how many row modifications have been captured against a given statistics object as well as when it was last refreshed, how many rows were sampled etc. Modifications are captured on a per column basis (though when statistics were originally introduced in SQL Server it was per table) so the counter will only be affected if the leading column for the statistics object has been affected by a given operation.

SELECT
s.name AS StatsName, sp.*
FROM sys.stats s
CROSS apply sys.dm_db_stats_properties(s.OBJECT_ID, s.stats_id) sp
WHERE s.name = 'IX_Test_TextValue'

Results:

Statistics_Properties

So what are the thresholds?

For a long time the thresholds were as follows. Statistics were considered stale if one of the following was true:

  • The table size has gone from 0 rows to more than 0 rows
  • The table had 500 rows or less when the statistics were last sampled and has since had more than 500 modifications
  • The table had more than 500 rows when the statistics were last sampled and the number of modifications is more than 500 + 20% of the row-count when the statistics were last sampled (when talking about tables with larger row-counts a lot of the documentation just describes this as 20% as the additional 500 becomes less and less relevant the larger the number you are dealing with).

Those thresholds did mean that when a table had a large number of rows, Statistics might not get updated that often. A table with a million rows would only have stats updated if about 200,000 rows changed. Depending on the distribution of the data and how it is being queried this could be a problem.

So, in SQL 2008 R2 SP2 Microsoft introduced Traceflag 2371 which when set would reduce the stale statistics threshold for larger tables. From SQL 2016 this is the default functionality.

That adds the following test for statistics being stale:

  • If the number of rows (R) when the statistics were last sampled is 25,000 or more and the number of modifications is more than the square root of R x 1000:

Statistics_1000R

Now, I’m just going to correct myself here, the documentation I’ve found SAYS the threshold is 25,000 but when I started to have a play that didn’t seem to be the case at all.

What actually seems to happen is that whichever of the two estimates is smaller gets used i.e

Either:

Statistics_20pcnt

Or:

Statistics_1000R

Whichever is smaller.

I don’t know if this means that both get evaluated and the smaller is used, or if the threshold between the two rules is simply defined at the point where the second formula gives the smaller result – which is after 19,682 rows. I discovered that threshold by solving where the two equations above would give the same result – then by experimenting to prove it in practice.

I think this incorrect stating of 25,000 as the threshold probably comes from confusion, taking an approximation (20%) as the actual figure. Remember I mentioned that people often generalise to say that statistics are stale after 20% of the rows change, and forget about the extra 500 rows. If that was true and it was exactly 20%, then the threshold would be 25,000 as that would be the point that both equations are equal.

Anyway it’s not even vaguely important to know that. I just found it interesting! Note that the tests above were carried out on SQL Server 2012 SP3 so could well be different on later versions.

To more visually understand the above rules, here’s a table showing the thresholds for some example table sizes under both the Old algorithm (without the traceflag) and the New algorithm (with the traceflag or on SQL 2016 or later).

R is the number of rows when the statistics were last sampled and T is the number of modifications for statistics to be considered stale:

Statistics_Thresholds

You can see for the larger table sizes there is a massive difference. If you’ve got large tables you’re querying against and are having to update the statistics manually to keep them fresh then you may find implementing the traceflag is a help.

For large tables statistics are sampled when being updated rather than the whole table being necessarily being read. I have details on that in this post:

Automatic Sample Sizes for Statistics Updates

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

T-SQL Tuesday #114 Puzzle Party – Roundup

Here’s my round-up for this month’s T-SQL Tuesday.

Thanks to everyone who contributed last week. It was great reading your posts and seeing the different ways you interpreted the puzzle theme.

We had real-life problems, we had SQL coding questions, we had puzzles, we had solutions, we had games, and we had the imaginarium.

Hopefully I’ve included all the posts, if I’ve missed any – please yell and it will be corrected soonest possible!

Without further ado, here are the contributions…

Real-life problems

Michal Poreba presents us with a real-life performance tuning issue he faced. Can you solve it? https://dbain.wales/2019/05/14/the-sql-puzzle-party/

Kevin Chant shares an issue he faced with a SQL Agent job using PowerShell. Can you find another way of fixing it? https://www.kevinrchant.com/2019/05/14/t-sql-tuesday-114-powershell-puzzle/

Eugene Meidinger shows a real-life SQL puzzle from his own demos with heaps that he’d love an answer to… https://www.sqlgene.com/2019/05/14/t-sql-tuesday-114-an-unsolved-sql-puzzle/

Coding Questions

Kathi Kellenberger tells us about a deceptively tricky SQL question posed at a family party… https://auntkathisql.com/2019/05/14/may-2019-t-sql-tuesday-puzzle/

Jon Shaulis gives us a problem he was posed in a recruitment scenario, along with a detailed solution that demonstrates how he approaches finding the best solution… https://jonshaulis.com/index.php/2019/05/14/t-sql-tuesday-114-puzzle-party/

Kenneth Fisher asks us to solve some query problems using older versions of SQL Server – so we can appreciate just how much easier our lives are these days… https://sqlstudies.com/2019/05/14/how-was-it-done-back-in-the-day-t-sql-tuesday-114/

Games and Puzzles

Nate Johnson challenges us to write a SQL solution to help evaluating your hand in the card game cribbage… https://natethedba.wordpress.com/2019/05/14/t-sql-tuesday-114-a-puzzle/

Jason Brimhall supplies his solution for solving Sudoku puzzles using T-SQL as well as details of a database he uses to provide daily trivia questions to those he mentors in learning SQL… http://jasonbrimhall.info/2019/05/14/puzzles-and-daily-trivia/

Shane O’Neill takes us on a Loller-Coaster ride… https://nocolumnname.blog/2019/05/14/t-sql-tuesday-114-puzzle-party/

Steve Jones shares some puzzle resources from SQL Server Central and other sources as well as presenting his T-SQL solution to a Christmas coding question… https://voiceofthedba.com/2019/05/14/puzzles-t-sql-tuesday-114/

Is Bert Wagner a video star? The magic 8-ball says “Without a doubt!” https://bertwagner.com/2019/05/14/sql-server-magic-8-ball/

Rob Farley presents puzzles he’s encountered and played with before and reminds us to never stop puzzling if we want to keep our minds sharp! http://blogs.lobsterpot.com.au/2019/05/14/puzzling-times/

The Imaginarium

Finally, in a category on his own, Todd Kleinhans talks about his SQL Imaginarium project. A VR experience for people new to SQL Server: https://toddkleinhans.wordpress.com/2019/05/14/project-sqlimaginarium-1-year-update/

Thanks again everyone and I look forward to next month’s episode of T-SQL Tuesday!

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Attending a climate protest for the first time

This is a blog devoted to databases, but for once I’m going to go off topic and talk about something I did at the weekend that I’ve never done before. On Easter Sunday I went to London to join in a protest demanding action on climate change.

I’d heard about what Extinction Rebellion were doing for a while. Initially I was put off by the name which I felt made them sound a bit like an anti-capitalist movement and I felt that for us to effect change we need to embrace people from all sides of the political spectrum. I also wasn’t sure how I felt about the actions they were taking – blocking roads and traffic. Was that going to antagonise more people than it gathered support?

In the end though I figured they were at least doing something – and something needed to be done. I’d lost faith that governments were going to act in time to avert disaster. I decided I wanted to show my support, add one more person to the weight of people demanding action. I didn’t want to get arrested but felt that at least being there was something. From their website I discovered there were all sorts of ways to get involved.

I went with Lisa my fiancee, and Millie our dog.

Millie’s first protest too

We got the train up to London from Bristol on Sunday morning and headed to the legal protest site near Marble Arch. At first it was confusing. Who could we talk to about being involved? What could we do? We found an induction session that had just started and sat down to join in.

As the leaders of the session talked about their background, their ethos, their aims and methodology I was quickly impressed. They stressed that this was not just a socialist movement, that they recognised they needed to engage people from all walks of life and ideologies.

I was most impressed by the strong focus on non-violent, non-aggressive action. That in all interactions with the police or public, those involved should make sure they were peaceful and reasonable at all times, whatever was going on. That if anyone witnessed someone against going against that guiding principle they should either intervene or find someone else to do so, to suggest to the perpetrator that this was the wrong movement for them. And they managed to make this work, in all arrests and actions there wasn’t a single report of violence or wilful destruction.

I get that some people were frustrated by their actions and the disruption it caused, and I empathise with anyone affected, but they explained that they hadn’t undertaken their actions lightly. Hundreds of thousands of people had attended climate marches to little effect. They knew they needed to do something different and things were becoming more urgent. Research was undertaken looking at movements in the past that were succesful in achieving their aims and it was dicovered that the common theme was creating disruption in a peaceful manner. Only through the threat of continued disruption were authorities forced to take notice and engage with the movements in question.

Marble Arch is a busy place, with lots of people milling around or wandering to the nearby Hyde Park. Lisa and I decided we could best involve ourselves by chatting to people passing by, handing out flyers and explaining to people what was going on and why. Talking to people about the urgency of action and some of the dangers to us all if nothing happens.

We had a lot of nice interactions, and maybe even changed a couple of minds. Millie was a big hit as a protester (it was her first protest too) and a great draw to start a conversation. In particular the police loved her! That was another great thing, to see how good natured the police were through it all, smiling, laughing and posing for pictures with protesters.

Millie being “arrested”

All in all it was a surprisingly nice day out and very inspiring. There were times when I felt myself getting quite emotional. It’s definitely something I’d do again.

From the media since it seems like there has been some impact, but there’s a long way to go. If you’re the slightest bit concerned about climate change – and I hope you are – I’d encourage you to get involved. Even if you just sign up on their website or add your name to petitions that circulate. Or you could make a donation, or attend one of their actions.

https://rebellion.earth/

You can particpate in whatever way you feel comfortable with.

I strongly believe that this is the biggest issue of our time, and without enough being done it could be the biggest issue of all time. We all need to do what we can.

The march back from Parliament Square to Marble Arch

Thanks for reading!

“SSPI handshake failed” \ “The login is from an untrusted domain” errors

I’ve recently encountered an issue that was difficult to resolve and I didn’t find the particular cause that was troubling us documented elsewhere on the web so thought I’d record it here.

The issue was with a service account connecting to SQL Server and intermittently failing to logon.

Errors reported in the Windows Application Event log were:
SSPI handshake failed with error code 0x8009030c
Login failed. The login is from an untrusted domain and cannot be used with Windows authentication.

The login attempt didn’t appear to get as far as the SQL instance, so no further information could be captured in a failed Logins trace.

This was affecting a large number of application servers using the same service account. Fortunately this was in development and test environments so no production impact.

The problem was that the account was getting locked out. A service was running every half hour using the account to connect to SQL, but with the wrong password. We also had a process running to unlock locked service accounts – so the account would start working again after a few minutes.

The resolution was to kill that service as it was no longer required. We were able to identify where the failed logins were coming from via the Active Directory audit logs for the account in question.

This was particularly difficult to troubleshoot as the error was a bit misleading.

There were a couple of other sources I came across while trying to google the cause of the problem. I’ve updated those with the cause we found as well:
https://blogs.msdn.microsoft.com/docast/2016/02/11/common-sspi-handshake-failed-errors-and-troubleshooting/#comment-7075
https://stackoverflow.com/questions/1538027/sspi-handshake-failed-with-error-code-0x8009030c-while-establishing-a-connection

Hopefully this post will save someone else the many hours of investigation I’ve just been through!

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Extended Events Made Easy – Sorting, Grouping and Aggregation

When I’m using Profiler to analyse performance issues I often save the results to a table, or upload a trace file into a table, so that I can analyse the data. Often this involves aggregating the values for particular queries so that I can see the most resource hungry.

This is by no means a difficult process, but with Extended Events (XE) it’s arguably even easier.

Let’s start with a session from XEvent Profiler like we saw in my last post on this subject:

Extended Events Made Easy: Using XEvent Profiler

I’ve set a workload running and captured the Live Data. The workload consists of some arbitrary queries I’ve written, each running in a loop in its own query window.

After a few minutes I stop the session so I can do my analysis:

XESortingetc1

I’ve got a couple of hundred events. Let’s say I want to quickly find the longest running individual query. In the Live Data viewer you can just right-click on the column name and select sort ascending or descending. This is pretty obvious stuff, but I thought it was worth highlighting as you can’t do the same in Profiler – there you have to save to a table first then query the table.

So, I’ve sorted descending on Duration:

XESortingetc2

You can see the biggest values shown are the aggregated values for the existing_connection and logout events. In retrospect it might have been better to create an XE session that didn’t include these events. However, I look down the list and quickly find my longest running query (highlighted in blue).

The duration specified – 33,161,516 – is in microseconds. So that is just over 30 seconds.

Of course, now I’m thinking – that’s my longest running, but what I really want is the one that has the most CPU – and not just from one execution but the total from all the executions that happened in my sample time frame.

This is where I can use grouping and aggregation. I can right-click on any of the columns and select “Group by this column”. I’m going to do that for the TextData column:

XESortingetc3

We’re now seeing all the events from the sample time period, grouped together based on TextData. The number on the right of each one is the count of events in that group.

Now I can add some aggregations. After grouping I still can see the original column headings at the top, so I just right-click on cpu_time, select “Calculate Aggregation” and select “SUM”:

XESortingetc4

You’ll see you can pick the other standard aggregations also e.g. MIN, MAX, AVG etc.

You can have multiple aggregations going on, so I’ll do the same for duration and logical_reads.

Once I have my aggregations in place, I’ll want to sort the results. It’s the total cpu_time I’m most interested in so I right-click on that column and now I have the option to “Sort Aggregation Descending”:

XESortingetc5

After those steps this is what I end up with (I’ll just show the top few rows):

XESortingetc6

Admittedly this isn’t the most beautiful of views, but it was quick and easy to get to and it gives me the information I need. I can see my most CPU hungry queries and I can make a plan of action.

Incidentally, the top query was:

UPDATE dbo.Test SET SomeText = 's' + SomeText WHERE SomeText LIKE 's%'

If this wasn’t a totally arbitrary set of queries for demo purpose I might be considering putting an index on the SomeText column.

That’s all I wanted to cover in this post. Hopefully you can see that XE data is really easy to work with, you don’t have to get involved in XML querying and you can perform quick analysis without even having to write SQL – much as we all love writing SQL.

In the last couple of posts, I’ve looked at how easy it is to get started using XE with the XEvent Profiler. For the next post in this series I aim to demonstrate that it’s also really easy to create your own customizable Extended Events sessions.

Previous posts in this series:

There’s Still a Place for SQL Server Profiler

Using the built-in System Health session

Exporting Extended Events Session Data to a Table

Extended Events Made Easy: Using XEvent Profiler

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

T-SQL Tuesday 107 – Viral Demo

tsql2sday150x150

For this month’s T-SQL Tuesday. Jeff Mlakar invites to talk about “a project you worked on or were impacted by that went horribly wrong”:

https://www.mlakartechtalk.com/t-sql-tuesday-107-invitation-death-march/

I’m not sure the story I’m about to share is exactly what Jeff was looking for, but when you say failed project, this one usually comes to mind.

This is from one of my first roles as a developer, back when I was writing VB code and before I got into SQL Server. This was even before .NET. At the time I was working for an engineering company that dabbled in a few pieces of very specific software for the industry. I’d taken the job so I could develop my coding skills by working with a proper software development team. Only to discover when I started, that I was the development team – the others had left.

So… I worked on this piece of software. It was one of those productivity tools that had a very specific use in a very specific type of project. There was nothing else that did what it did in Windows. Without it you were required to do a lot of hand-cranked analysis using Fortran, or if you were really unfortunate, paper and a pencil. The licences sold for big bucks per seat – but there were only about 20 customers worldwide for whom it might be useful.

I’d been working on a shiny new version. As part of that, I was tasked with creating a demo that we could send to all those clients. The main thing was that it had to have full functionality but be limited to working with a sample dataset. The company were currently in litigation with one client who they’d given a copy for eval, but who’d then used it for completing a big piece of work within a six figure contract. They didn’t want a repeat of that.

My boss wanted to create an auto-run menu for the CD – we’d only just moved over to CDs from having to create 10 floppy disks for each install. An auto-run was the latest and greatest pinnacle of slickness. I gave him a few lessons in creating forms and buttons in VB and off he went.

My boss burnt the CDs one Friday, including his fancy menu executable. He packaged them up with the relevant sales brochures and sent them out. He then went off on holiday for a fortnight.

I came in on the Monday morning to an urgent email from one of our senior managers. Apparently one of the recipients of the CD was reporting that they had put it in a desktop and that it had installed a virus on their system. Testing verified that they were correct. The auto-run menu program did indeed have a fairly nasty virus hiding in the executable, and because it was in the auto-run, all they needed to do to get infected was put the CD in their drive.

Basically, my boss used to use his work PC for downloading hacked “warez” from happyhippo.com. His work desktop had become badly infected and that had spread to the executable he worked on and compiled.  As he wasn’t around to own the issue, a colleague and myself had the fun of calling all the people we’d sent the demo to, asking them if they’d put the CD in any of their computers. If they hadn’t we told them to destroy it. If they had, then we had to talk them through the lengthy process to get rid of the virus.

As a marketing exercise this wasn’t a stunning success.

I often tell friends in the software business this story as a consolation when they feel responsible for some mess up. “Well, at least you didn’t send a virus infected demo to your entire prospective customer base like my old boss did”

I left that company not longer after.