The Importance of ORDER BY

May 30, 2018May 23, 2018 ~ Matthew McGiffen ~ 1 Comment

Everyone, at the beginning of their SQL career, get’s told that it is important to include an ORDER BY if they want the results ordered. Otherwise the order in which they are returned is not guaranteed.

But then you run queries a lot of times that don’t need a specific order – and you see that they (at least seem to) come out in the same order every time. You could (almost) be forgiven for thinking you can rely on that.

There was even a question on a Microsoft SQL certification exam a few years ago that asked what the default order was for records returned by a simple SELECT – the answer it was looking for was that it would be according to the order of the clustered index. So you can rely on that – right?

Wrong. The question was a bad question, and the answer was incorrect. Let’s look at this in action. First I’ll create a table and add some rows:

CREATE TABLE dbo.NoSuchThingAsDefaultSortOrder (
   Id INT IDENTITY(1,1) 
      CONSTRAINT PK_NoSuchThingAsDefaultSortOrder PRIMARY KEY CLUSTERED,
   FirstName VARCHAR(30)
);

INSERT INTO dbo. NoSuchThingAsDefaultSortOrder (FirstName)
VALUES
('John'),
('Fred'),
('Bob'),
('Sue'),
('Jenny'),
('Matt'),
('Alexis'),
('Zebedee');

Now let’s SELECT from the table, without specifying an order:

SELECT Id, FirstName
FROM dbo.NoSuchThingAsDefaultSortOrder;

Here’s our results:

OrderBy1

Okay, so they’ve come out ordered by Id according to our clustered index.

But what if someone comes along and decides they want to be able to look up records in the table by name, so they create a non-clustered index for that:

CREATE NONCLUSTERED INDEX IX_NoSuchThingAsDefaultSortOrder_FirstName
   ON dbo.NoSuchThingAsDefaultSortOrder(FirstName);

We run our original query again and this time these are the results we get:

OrderBy2

They’ve come out in a different order – this time ordered by FirstName – like the index we just added.

The reason is that this time SQL decided that it would be better to use our new index to execute the query. In general, it will try and use the smallest index that will satisfy the query – and in this case it decided that was the non-clustered index. I’d show you the execution plans to prove that, but I think it’s pretty obvious from the order the results are returned in.

So, don’t go relying on there being a “default” sort order. It’s just not true, changes to indexing – or even conceivably to statistics about the data distribution, could change the way SQL chooses to execute your query, and that could change your order.

Unless you make sure to specify ORDER BY.

Thoughts on Query Performance with TDE enabled

May 23, 2018February 18, 2021 ~ Matthew McGiffen ~ 2 Comments

Microsoft state that enabling TDE (Transparent Data Encryption) usually has a performance overhead of 2-4%. That doesn’t sound like very much, and personally I wouldn’t let it bother me if I want to make sure my data is encrypted at rest.

However, you may have heard other sources saying that it’s actually a lot more than that – and the performance impact is a high price to pay for the level of protection offered.

So, what’s the truth?

The critical thing to remember is that with TDE your data is encrypted on disk, but data held in memory (i.e. the buffer pool) is unencrypted. Therefore, you would only expect an overhead when reading from and writing to disk.

SQL Server tries to keep data that is referenced repeatedly in the buffer pool. So, if your SQL instance is provisioned with enough memory, a lot of your read queries can access the buffer pool and don’t have to go out to disk. Such queries should not be affected performance-wise by TDE.

There may be other read queries however that access older data that hasn’t been read for a while, these queries would need to retrieve that data from disk and so there would be an overhead from TDE.

Any queries that modify data will need the outcome to be written to disk so in these cases we will see an overhead. This overhead is likely to come in two parts, first when the transaction is written to the logfile before committing, and then later as the updated data gets written to the data file as part of a checkpoint operation.

We also have other operations that write or update encrypted data on disk, so we would also expect these to have some overhead. This would include operations such backups, or index rebuild operations.

You can see from this that the overhead will very much depend on how your application interacts with your data. At one extreme, if you have a set of static data that is small enough to be held in memory and is queried regularly then there should be no overhead. At the other end of the spectrum, if you have an application that writes to the database a lot, and reads less often, then the overhead will be higher.

Let’s look at some examples and try and quantify what amount of overhead we might be talking about. In this post we’re just going to focus on the effect TDE has when you are reading data.

First, I’ll create two databases, one with TDE enabled and one without. Then I’ll load the same set of data into each (Total size about 1GB).

You can find the script I used for this in my previous blog post:

Encrypting an existing database with TDE

In the first test we’ll perform a like query of the worst kind, one that tries to match for a value within a column. We have no indexes on the table, but none would be that helpful with this query anyway.

SELECT *
FROM dbo.SomeData 
WHERE SomeText LIKE '%Wibble%';

I’ll run across 4 test cases (capturing the total CPU consumed in each case). The test cases are:

TDE Protected database where the buffer cache is empty (i.e. all data has to be read from disk)
TDE protected database where all the data for the table is in the buffer cache (i.e. no data has to be read from disk)
Database without TDE where the buffer cache is empty (i.e. all data has to be read from disk)
Database without TDE where all the data for the table is in the buffer cache (i.e. no data has to be read from disk)

In each test I’ll run the query 5 times and total the CPU to even out variance between executions. For the tests involving disk reads I’ll run the command DBCC DROPCLEANBUFFERS in between executions of the query to empty the buffer cache.

The results looked like this, with time shown in seconds. Note that MAXDOP was set to 4 and each query went parallel over 4 threads:

TDE_ReadPerf1

There’s quite a variance between each run so I’m not going to take anything significant from small differences we see. However, we can see that the timings are pretty much the same when the data is in memory, but there seems to be about a 10% overhead with TDE when reading from disk.

In case you’re wondering why reading from disk didn’t add much elapsed time for the No-TDE database – the reads were “read-ahead” so were able to complete while the CPU was burning through the data.

Let’s try a different query, this one will still have to scan the whole table as we are dealing with a heap, but it uses an equality predicate so there is less work to do in matching the data:

SELECT *
FROM dbo.SomeData 
WHERE Id = 100000000;

I’ll run the same set of tests as above and we can look at the results:

TDE_ReadPerf2

The first thing we notice is that this query runs a lot quicker in general. And again, there is little difference with and without TDE when the data is in memory.

But look at the comparison when the data has to be read from disk. With TDE the CPU consumption is more than 10 times as large, or in percentages, over 1000% worse.

At this point you could be forgiven for panicking – are you willing to risk that TDE makes your queries that much worse.

In both the above two tests, the same amount of data is being read from disk. And if you re-examine the numbers, you’ll see that (very roughly) the same amount of CPU has been added in each case where we have TDE enabled – about 50 seconds. That 50 seconds was split over 4 cores so it would have been about 12.5 seconds per core.

In terms of elapsed time, we had approximately that increase with the first query because CPU was the resource under most contention – i.e. the reads were able to occur while waiting for the CPU to complete. In the second query we can see the reading from disk occupied most of the elapsed time for those queries, so the extra CPU consumption didn’t make the query run particularly longer.

By the time it had been executed 5 times (with the memory flushed between each execution) each query read about 600,000 pages sized at 8kb each – just under 5GB. If it took 50 seconds on the decryption of those pages, then each page took about 1 twelfth of a milli-second to decrypt – or alternatively, TDE decrypted about 12 pages per millisecond. Or in terms of disk size, 100MB per second. These were tests on a server with magnetic spinning disks (not SSDs) and you can see from the above figures, the straight disk access took about 40 seconds on its own.

When TDE doesn’t read from disk it doesn’t add any overhead, but how do we quantify what the overhead to queries is when it does have to access the disk?

From the above tests we could suggest it adds from 10% to over 1000% CPU.

Or alternatively between 10% to 0% elapsed time. Note that those figures are the same way round, i.e. when it added 10% CPU it added 10% elapsed time, but when it added 1000% CPU time – the elapsed time was about the same.

I could go on with this type of confusing analysis, but instead I’ll suggest this is the wrong way to think about performance in terms of TDE.

Don’t think about query performance.

Think about read performance.

TDE overhead depends on the level of your physical disk access. In the case of read query performance, it depends wholly on the level of physical reads, and seems to be a reasonable fixed overhead for each physical read.

That makes perfect sense, the overhead for querying is in decrypting data, and surely it will take pretty much the same amount of CPU to decrypt each 8KB page.

In theory this makes it simple for us to calculate what this overhead would look like on our production SQL Servers. There’s all sorts of ways of capturing physical reads (and writes).

Let’s say I take a quick look at Resource Monitor to get a ballpark figure for one of my databases on this server that I know to be quite heavy on physical reads. And let’s say I see that it averages 25MB/s during the peak hour.

From that, and from the figures above I can estimate what impact enabling TDE for this database would have on the CPU.

Well I know that 25MB equates to about 0.25 seconds of CPU to decrypt the data, and I know I have 4 cores, so I can expect that in the average second this adds 0.0625 seconds of CPU per core. I multiple that be 100 and I find that I’ve added 6.25% CPU.

The calculation I’ve just done is:

(Reads/Second) * 100

Divided by

(MBs TDE decrypts every second/CPU) * (Number of CPU cores)

This doesn’t include writes, and it doesn’t include backups – I hope to look at that in a later post.

Now, let’s say that produces a scary number, and I’m worried about the strain that’s going to put on my CPU…

My first question would be why am I experiencing so many reads and can I alleviate that? Does data have a short shelf-life in memory? Do I have enough memory in my server – and is enough allocated to SQL?

This isn’t just in terms of TDE. SQL Server is going to perform much better if your current dataset – i.e. the data you are currently accessing most, can be held in memory. So, if TDE is causing a problem, then it’s possible your queries are slow anyway.

Again, I’m not talking about writes just yet.

Or maybe your database is heavy on physical reads because it’s a data warehouse, regularly querying historical data. In that case, is it a suitable target for encryption? Hopefully the data is well anonymised if you’re using it for reporting and therefore doesn’t contain anything personal or sensitive.

I hope.

In summary…

Just to repeat myself, if you’re wondering about TDE and its impact of query performance, which we all have done, try to reframe the question and think about its impact on read performance.

It makes more sense and it may help you to more easily quantify the impact on your servers. And if it does look like the performance may be an issue – perhaps there is tuning you can perform on your database instance to reduce the physical disk access.

Hitting my first big blogging milestone – 100k Reads

May 16, 2018May 16, 2018 ~ Matthew McGiffen ~ 7 Comments

After blogging about SQL Server for just over a year, at some in the last month I went over 100,000 reads in total for my blog posts.

That breaks down today as about 17,500 reads on my WordPress site, and 89,000 on SQL Server Central where my blog is syndicated, across 52 posts. I’m sure this is nothing for the better-known bloggers out there, but for me it has far exceeded my expectations.

Thanks to everyone who has read my posts, and a special thanks to everyone who has commented, liked, or corrected me when I’ve got something wrong. Especially at the beginning, it has meant so much to have your feedback.

When I started, I didn’t really know who – if anyone – would read what I was writing. I think if I’d eventually averaged 100 reads on each post I’d have been more than happy, to average two thousand per post has been amazing.

So, if there are any of you out there who are thinking about starting a SQL blog – but aren’t sure – I say go for it, you won’t regret it!

And, if the same as I was, you wonder where your readers will actually come from, then seriously consider syndicating on SQL Server Central. You just need an RSS feed, which most blogging platforms will provide you by default. I’d say to get a few posts under your belt, then drop them an email – they do everything for you and are super helpful. Here’s a link:

http://www.sqlservercentral.com/blogs/steve_jones/2010/07/23/syndication-on-sqlservercentral/

Also, consider joining the SQL Server Community on Slack. There’s an #blogging channel where other bloggers are happy to help you out – or share your woes, and there’s the #blogs channel where you can set up a feed so your posts automatically appear there.

Trying to explain and demonstrate something in a clear manner is one of the best ways to harden and deepen your own knowledge of a subject. I’ve learnt so much since I started this blog. The blog also starts to become a great repository of knowledge for your own use – more and more often now I try to remember some aspect of SQL Server’s behaviour and then remember the answer is in one of my posts.

And if you’re thinking that everything you have to say has been said before – some of my most popular posts are rehashing old ideas, or going quite basic concepts. In fact, the one’s where I’ve done deep research and feel I’m presenting information that hasn’t been seen before are often far less popular.

Some of the highlights of the journey so far

Being shared in the Brent Ozar Weekly links – the first time it happened the traffic blew up so much on my WordPress site I thought I must be under attack!

BlogMilestone1

WHO’S HACKING MY SITE??!!

Being featured on the SQL Server Central homepage quite a few times.

Having a colleague tell me he had found a great resource on Always Encrypted – only to then realise I wrote it.

Having a great deal of fun writing my post “How to be a bad interviewer” for T-SQL Tuesday #93, a stream of consciousness rant that had to be toned down a few times before it was suitable for publication.

It’s been a fun and rewarding undertaking, am now looking forward to the next big milestone – 250k!

Matthew McGiffen Data

Month: May 2018

The Importance of ORDER BY

Thoughts on Query Performance with TDE enabled

Hitting my first big blogging milestone – 100k Reads

Some of the highlights of the journey so far