Introduction to SQL Server Query Store

Introduced with SQL 2016, Query Store was, probably without doubt, the most anticipated and talked out new feature. In this post we’ll just take a brief look at it, what it is, how you set it running, and what you can use it for. This will be a fairly brief overview – you’d need a book to cover it in detail – but hopefully this will give you a flavour of how useful this will be and how to get started.

What it does, at a base level, is actually quite simple. It just stores information relating to query execution over time.

That information consists of two things:
• Execution Plans – the execution plans generated for each query are stored in the query store, and if the plan changes the new plan is also stored.
• Performance metrics – information such as CPU consumption, reads and writes, are captured and stored for each query.

This information is aggregated over intervals (default is one hour) so you can see how query performance changes over time.

This isn’t earth shatteringly new, you can already query to find out the execution plan for a query and you can also query to find aggregated performance metrics related to a given query.

The difference is that now a history can be maintained without implementing additional monitoring. Previously the performance metrics would be aggregated as a single total since the last restart of the SQL instance – and would be cleared at the next restart. Now they are persisted and time-sliced so you can actually measure change over time.

The simple activity of storing old execution plans is also profound for performance troubleshooting. Anyone who’s worked with large scale production data will have experienced the issue when a function that was working fine, fairly suddenly starts to develop performance problems.

A common cause of this is what’s known as “plan regression”. Basically this is where the execution plan has changed – and the new one’s just not as good as the old one for most executions. Previously you might suspect this was the cause of an issue you were seeing, but there was no way to easily prove it, now you can use query store to view and compare the old and new plans to verify this. You can even with a click or two force the query to go back to using the old (better) plan – though we hope people won’t overuse this and will try to at least delve into the cause and think about resolving it. There is usually a reason SQL thought the new plan would be better – and a particular plan may work for now but may not always be the best plan in the future as your data changes.

Let’s have a look at these features in little more detail.

Enabling Query Store
Query store is a database level configuration. It’s important to understand that, and that the information stored is actually stored within system tables in the database. That means that if you backup and restore the database, the information is retained. Also very importantly, the information is stored asynchronously – so there shouldn’t be any performance impact on the executed queries themselves. There will of course be some overall server overhead at the point the data does get saved, but that shouldn’t be too significant in most cases.

You can enable Query Store for a database through T-SQL (or in your source code) or through the GUI in SSMS. I just want to demonstrate doing this through the GUI so you can see some of the options. Right-click on the database, select properties, and then select the Query Store page all the way at the bottom:

QueryStore1

Above you can see Query Store enabled for the WideWorldImporters database, with all default settings.

The first setting is “Operation Mode”. By default this is set to “Off”. To enable Query Store and get it running for a particular database you change it to “Read Write”. Job Done.

The Data Flush interval is how often the query store data gets written to disk – remembered I said this was asynchronous.

The Statistics Collection interval determines the size of the time slices that your query performance metrics get aggregated into.

Then we have some stuff about data retention. It’s worth noting that if your query store fills up and nothing is happening to clear it out then it flips to Read-Only mode and won’t store any more data until space is freed up. The default amount of space set for it is 100MB – that’s not a lot of space so I really can’t see any justification from that point of view for not enabling this feature.

Leaving the “Size Based Cleanup Mode” set to Auto should make sure that old data gets purged if the query store starts to fill up. Above that is the “Query Store Capture Mode” – if you leave that to AUTO it will ignore infrequent queries or those with negligible overhead.

The last setting “Stale Query Threshold” is how long it keeps data for in days. So 30 days for default. I can see it being useful to up this significantly it we want to use Query Store to be able to monitor performance over a long period, but it may depend on how much space Query Store wants to consume for your database – remember the default is 100MB but you can up that to whatever you like.

At the bottom of the properties page you can also see some nice pie charts that show how much of a proportion of your database Query Store has allocated, and how much of that space it is using.

So that’s Query store set up and configured, let’s have a look at a few of the things it gives us.

Query Store in Action and Forcing a Plan
I’ve set up Query Store as above in a copy of the WideWorldImporters databases on a SQL 2016 instance. I’ve created a stored procedure that I’m running every two seconds and I’ve set the Statistics Collection Interval in Query Store to 1 minute (rather than an hour) so that I can get some figures and graphs out fairly quickly.

Under the database in SSMS, there is now a Query Store folder where some built in reports reside:

QueryStore2

For the sake of this blog post I’m just going to look at a couple of these. Let’s open the “Top Resource Consuming Queries” Report:

QueryStore3

You can see a few things here. On the top left is a bar chart of the most expensive queries (you’ll notice one large one and the rest are negligible in comparison – the large one’s my query). You can configure whether you want to look by CPU or Logical Reads amongst other options and whether you want to look at averages, or maximums or minimums. Basically there a whole bunch of ways you can configure your view.

I’ll be honest that I struggled with some of these built-in Query Store reports to get them to show me what I wanted, so expect a bit of playing around to figure things out if you are using this feature.

In the bar chart, the bar highlighted in green is the currently selected query, on the right we can then see a scatter graph of the execution figures for this query across our statistics intervals (remember I’ve set it to have intervals of 1 minute). You can see I’m looking at average logical reads. You will also see that this query was ticking along nicely until about 14:05 when something happened (that was me!) and the logical reads for each execution of the query suddenly shot up. The blobs on the scatter graph have also changed colour at this point and that represents that the query is now using a new execution plan.

Next to this graph is a key telling us which plan each colour of blob represents and if you click on the plan in the key that you want, the plan itself is displayed in the bottom pane. At the moment I’m looking at the original plan (Plan 1). You will notice that the title specifies that it is “not forced”, you’ll also notice a button to the right of the title that gives us the option to “Force Plan”. Let’s just hold off a minute before we do that.

Before we change anything to try and fix the issue with this query, let’s look at the “Regressed Queries” report. This is pretty similar, but you may use it from a different direction. I.e. it may not be one your most expensive queries that has started going bad, so if you look in the Regressed Queries report it will focus on ones for which the execution plan has changed in the interval you are looking at. Again I found it a little challenging to get this report to show me the query I was interested in, some playing around can be required:

QueryStore4

You can see here that I have just one big fat bar on my bar chart – as I only have one regressed query in the interval (last 30 minutes) I chose to look at. This can make it easier to identify queries suffering this issue.

I’m going to go back the the previous Resource Consumers report and try and fix the problem. Now, in reality I know what I did and why the query went bad. It was the result of something caused parameter sniffing, which is where, if a stored procedure recompiles, the execution plan that is formed may be different depending on the parameters it is executed with. Basically it forms the best plan for the parameters supplied – but that might not be the best plan for all sets of parameters. In this case I forced the stored procedure to form a plan that was going to be expensive in most cases. More on that in my next set of performance tuning workshops.

That issue would be best fixed in the code of the stored procedure, but in production, turning around a fix may take days and we have the problem right now. So let’s use the Force Plan functionality to fix the symptom – just for the moment.

I select the plan I want, and I click the “Force Plan” button. The effect is immediate and I notice it within minutes because my statistics collection interval is so small. I’ll let it run for a bit and then show you the new graph:

QueryStore5

You can see the query has now returned back to healthy (quick) execution. Note the Orange blobs all now have a tick over them to denote that this plan is now forced.

Comparing Plans

A related feature in SQL 2016 is the ability to compare two execution plans to see what’s changed. I haven’t found this that amazing myself when I’ve looked at it, but that’s mainly due to the natural limitations – if two plans are significantly different then something that highlights the differences is just going to highlight the whole thing. However it can be useful at least to have both plans on screen at the same time so you can go back and forth easily.

You don’t need to do this through Query Store – if you right-click on any Execution Plan in SSMS there is now the “Compare ShowPlan” option, and as long as you have the plan you want to compare against saved as a file then you can go ahead. Note that one good thing is that this is an SSMS feature, so as long as you have SSMS 2016 or higher you can use it to compare plans from on earlier versions of SQL Server.

With Query Store you can compare plans directly from the Store. If we go back to one of the reports above, the plans are listed in the key for the scatter graph. You can select more than one by using Shift+Click. Then you can click the button in the toolbar above the scatter graph which has the ToolTip “Compare the Plans for the selected query in separate window.”

Let’s do that for the two plans formed for our query above. The resulting view shows us two views side by side. It may be useful to look at these separately so they fit better on this page. On the left we have:

QueryStore6

The area highlighted in Red is where the tool had identified that the two plans are the same. The rest it is not so sure about. All the same it’s a nice visual view just to be able to see what both plans are doing. On the right hand side of the screen you then get this view:

QueryStore7

This shows us a comparison of the properties of whichever operator is selected in each plan – note this need not be equivalent operator in each plan. You can select the Nested Loop operator in the top and the Index Scan operator in the bottom and it will show you those – though the comparison may not be that meaningful!

So, this is kind of a useful tool, but don’t expect it to magically solve the process of comparing plans for you. In general too much changes from one plan to the next for it to be that simple – but this may help – a bit…

Query Store Catalog Views

Like everything else in SQL Server – all the information you can access through the GUI in SSMS, is available directly through the use of system catalogs and views. So if you want to examine the information in a way that isn’t supported by the built in reports then you can just go ahead and write your own queries.

The new views available are:

sys.database_query_store_options
sys.query_context_settings
sys.query_store_plan
sys.query_store_query
sys.query_store_query_text
sys.query_store_runtime_stats
sys.query_store_runtime_stats_interval

Rather than me going into detail here, I’ll just refer you to the MSDN reference:

https://msdn.microsoft.com/en-gb/library/dn818149.aspx

Conclusions

Query store is a great feature. It’s not rocket-science but it is very useful. In particular it massively aids the investigation of production issues, saving time for those troubleshooting them at exactly the point they need time saving – right when everything’s hitting the fan.

It is also very useful for monitoring performance over time and being able to keep ahead of scalability issues with specific queries.

The methods for forcing a plan are also excellent for quick fixes – but try not to overuse them. It is possible to force plans in earlier versions of SQL – but tricky, so people usually just fixed the code. Forcing plans can end up being a case of treating the symptoms rather than the cause – and can lead to other problems later on.

Other Posts about Query Store

Identify the (Top 20) most expensive queries across your SQL Server using Query Store

How does Query Store capture cross database queries?

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

SQL Server Backup Encryption and Compression

In SQL Server you can also compress your encrypted backups. Unlike TDE this has been possible with Backup Encryption since the feature was first made available, and there have been no issues that have required fixing – though as always you should still test that restores work correctly. As mentioned in my post about compression with TDE, compressing backups has benefits not just in terms of file size but potentially also in reduced backup times as the time taken to write to disk is smaller.

Compressing an encrypted backup is the same as compressing a regular backup; you just need to specify WITH COMPRESSION as shown in the following SQL:

BACKUP DATABASE TestBackupEncryption
TO DISK = 'C:\Test\TestBackupEncryption_EncryptedAndCompressed.bak'
WITH ENCRYPTION(ALGORITHM = AES_256, SERVER CERTIFICATE = BackupEncryptionCert),
COMPRESSION;

We can run a quick test to see how compression performs against the same functionality with an unencrypted backup taken by executing this backup command:

BACKUP DATABASE TestBackupEncryption
TO DISK = 'C:\Test\TestBackupEncryption_UnencryptedAndCompressed.bak'
WITH COMPRESSION;

In this image we can see both backups:

Both backups are a fairly similar size. The encrypted one seems to be very slightly bigger, and I’d say this is the pattern I usually see, not enough that we are likely to be bothered with it. Compression is usually just as effective with encrypted backups as with unencrypted.

This post is part of a comprehensive series on SQL Server Encryption. Subscribe to my blog for updates as new posts are published or you can buy my book through the link in the sidebar to get it all in one go.

And if you’re embarking on an encryption project (or anything else to do with SQL Server) and want some help, I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Statistics Parser

In a recent post we looked at using the STATISTICS IO and STATISTICS TIME commands to measure query performance.

If you’ve started using these, you may notice that once you start to troubleshoot longer scripts or stored procedures that reference lots of tables or have multiple queries, you start getting lots of output. Sometimes it can sprawl across several screens worth. That can make it a a bit of a pain to total up the numbers, or to quickly see if a change you have made has had a positive effect. That’s where you can use the online Statistics Parser tool to help.

Here’s the STATISTICS output from a troublesome query I was looking at recently:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 140 ms, elapsed time = 140 ms.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.
(188 row(s) affected)

Table 'ParameterValue'. Scan count 14, logical reads 2350, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ParameterDefinition'. Scan count 8, logical reads 456, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 8, logical reads 105600, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ParameterDataType'. Scan count 0, logical reads 376, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 2371 ms,  elapsed time = 655 ms.

SQL Server Execution Times:
CPU time = 2511 ms,  elapsed time = 797 ms.

Now that’s not too bad, and it’s fairly simple for me to add up the numbers of reads. But you also can find yourself copying and pasting the output from several iterations of changes into notepad documents or query windows so you can compare them later, and it can become a bit of a pain to keep track of.

That’s what I used to do, until I discovered the Statistics Parser tool.

This a free web-based tool to help you keep track of your Statistics output, which will also format the results nicely for you and produce totals. It’s very simple and easy to use:

http://statisticsparser.com/


StatisticsParser1
You simply paste the STATISTICS output into the big box and hit the “Parse” button. You can also enter a Page Title that will become the name of the Tab in your browser to make it easier to keep track of different sets of output.

The parser will turn the set of output for each query into an easy to read table and then will also provide a Totals table that shows you (not surprisingly) the totals across all your queries. Here’s the output above formatted via the tool (I am just showing the final totals):

StatisticsParser2

Personally I find that significantly easier to read!

Watch out though, the total CPU time can be double-counted. This is because the STATISTICS TIME output can sometimes include a total itself. If I look at just the TIME sections from the output above I have:

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 140 ms, elapsed time = 140 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 2371 ms, elapsed time = 655 ms.

SQL Server Execution Times:
CPU time = 2511 ms, elapsed time = 797 ms.

If you look carefully at that you’ll see that the last set of entries (CPU time = 2511ms, elapsed time = 797ms) are totals of the other numbers. If you then look at the Statistics Parser output above you’ll see that it has added those to the previous outputs to give a total CPU time of 5.022 seconds (5,022ms).

Still we’re mainly interested in getting the IO down so I ignore that for now. The main thing I’m interested in is the total number of reads:
108,782

I make my changes to the stored procedure, execute it and get the STATISTICS output again:

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 16 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(189 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

(188 row(s) affected)
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ParameterDefinition'. Scan count 2, logical reads 396, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ParameterValue'. Scan count 3, logical reads 708, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ParameterNames_____________________________________________________________________________________________________0000000000CB'. Scan count 2, logical reads 718, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ParameterDataType'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 80 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 94 ms, elapsed time = 83 ms.

You can see my output is getting longer and starting to become more challenging to read – I broke the original query down into a few simpler steps, but each produces its own output. No worries, I open up a new tab in my browser and go to a second instance of the Statistics Parser. I give this one the Page Title “After”. I paste in my output and I hit parse:

StatisticsParser3

You can now see I have a “Before” and an “After” tab so it’s easy to flick back and forth if I’ve forgotten my totals.

I can now easily see that my total number of reads – previously 108,782 – is down to 1,824. That’s about 50 times better. The CPU time is down by a factor of about 25, not as good as the reads, but I call that good enough and proclaim that tuning task complet

Need Help?

Got a problem and could do with some assistance? Get in touch today to see how I can help.

Backup Encryption Performance

Unlike TDE, there is some extra CPU overhead when you take an encrypted backup as the data has to be encrypted before being written to disk – whereas with TDE the data is already encrypted. Backup times however are unlikely to be affected significantly as the bottleneck is usually going to be the time it takes to physically write the data to disk. The CPU processing should take a fraction of that time.

We can run a quick test with our database to show how backup performs with and without encryption. If you’re running this test yourself having followed the examples in Backup Encryption in SQL Server, then make sure the previous backup files are removed before executing the below script.

BACKUP DATABASE TestBackupEncryption
TO DISK = 'C:\Test\TestBackupEncryption_Unencrypted.bak';

BACKUP DATABASE TestBackupEncryption
TO DISK = 'C:\Test\TestBackupEncryption_Encrypted.bak'
WITH ENCRYPTION(ALGORITHM = AES_256, SERVER CERTIFICATE = BackupEncryptionCert);

Here is the output for the unencrypted backup:

Processed 1205416 pages for database 'TestBackupEncryption', file 'TestBackupEncryption' on file 1.
Processed 1 pages for database 'TestBackupEncryption', file 'TestBackupEncryption_log' on file 1.
BACKUP DATABASE successfully processed 1205417 pages in 17.428 seconds (540.355 MB/sec).

And here is the output for the encrypted backup:

Processed 1205416 pages for database 'TestBackupEncryption', file 'TestBackupEncryption' on file 1.
Processed 1 pages for database 'TestBackupEncryption', file 'TestBackupEncryption_log' on file 1.
BACKUP DATABASE successfully processed 1205417 pages in 19.631 seconds (479.716 MB/sec).

You can see the backup with encryption did take a bit longer, about 2 seconds, a little over a 10% increase which seems not too bad.

With backup performance, it’s also worth considering how long it takes to restore a database from a backup. Let’s take a quick look at that. We’ll drop the database, restore the unencrypted backup, then drop it again, and restore from the encrypted backup. Then we can compare the performance. We do all that with the following code:

DROP DATABASE TestBackupEncryption;
GO

RESTORE DATABASE TestBackupEncryption
FROM DISK = 'C:\Test\TestBackupEncryption_Unencrypted.bak';

DROP DATABASE TestBackupEncryption;
GO

RESTORE DATABASE TestBackupEncryption
FROM DISK = 'C:\Test\TestBackupEncryption_Encrypted.bak';

Here is the output for restoring from the unencrypted backup:

Processed 1205416 pages for database 'TestBackupEncryption', file 'TestBackupEncryption' on file 1.
Processed 1 pages for database 'TestBackupEncryption', file 'TestBackupEncryption_log' on file 1.
RESTORE DATABASE successfully processed 1205417 pages in 17.979 seconds (523.795 MB/sec).

And here is the output with the encrypted backup:

Processed 1205416 pages for database 'TestBackupEncryption', file 'TestBackupEncryption' on file 1.
Processed 1 pages for database 'TestBackupEncryption', file 'TestBackupEncryption_log' on file 1.
RESTORE DATABASE successfully processed 1205417 pages in 20.794 seconds (452.886 MB/sec).

You can see that like the backup itself, the restore from the encrypted backup took slightly longer – in this case about 3 seconds or 15%. It’s not too bad, but it’s good to be aware that you may see some impact.

This post is part of a comprehensive series on SQL Server Encryption. Subscribe to my blog for updates as new posts are published or you can buy my book through the link in the sidebar to get it all in one go.

And if you’re embarking on an encryption project (or anything else to do with SQL Server) and want some help, I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Why you still need DBA resource in the cloud

I was chatting with a cloud consultant who was advising on a large scale migration to AWS. He told me that one of the advantages of going for a PaaS offering (Platform as a Service) was that DBAs were no longer required as backups and restores were handled for you. PaaS services for SQL Server include AWS RDS and Azure SQL Database or Azure SQL Managed Instance.

I found it quite a funny conversation, partly as I don’t think he realised being a DBA was part of my job role, but also because I don’t know a single DBA who spends a significant amount of their time doing backups and restores.

It’s true that the services mentioned handle a lot of the routine tasks for you and that is great, it’s also reasonable to hope that they manage databases with greater reliability. Rather than replace the DBA though, they instead free them up to achieve more.

One common pitfall I’ve seen with cloud implementations is to assume that everything is done for you and that can lead to poorly configured services or ones with missing maintenance. It remains important to have access to someone who has a deep understanding of SQL Server when provisioning a cloud service to host your databases. It’s also important they understand the cloud options available and can make sure you have the right settings and instance type etc.

And once things are running you still need someone to turn to when there are problems. These problems are less likely to be hardware related, but you are still going to see occasional performance (or other) issues, just as you would when you were running on-premises hardware.

When those things happen you need access to someone with the right skills to diagnose the problem and perform analysis to define the right solution. Enter the cloud DBA.

As mentioned in my previous post The Importance of SQL Server Performance Tuning in the Cloud, ongoing performance tuning becomes even more important in the cloud as database services are expensive, a performance optimized database is also a cost-optimized one, and you can realise saving immediately.

Brent Ozar talks about the importance of DBAs in the cloud in this post https://www.brentozar.com/archive/2019/06/what-happens-to-dbas-when-we-move-to-the-cloud/. He tells a story that I’m going to quote here:

One of my favorite recent examples was a company who came to me saying, “We’re spending about $2M per year in the cloud just on our databases alone. Can you help us reduce those costs?” Absolutely: with just a couple of days spent query & index tuning, we chopped their biggest database expenses in half while increasing performance.

At the end of that engagement, the CTO told me, “I thought I’d save money in the cloud by not having a DBA, but what I’m learning is that in the cloud, I actually get a return on my DBA investments.”

And it’s not all about performance. Someone recently reached out to me to say that they had mysteriously “lost” about 400GB of free space overnight on an AWS RDS instance running SQL Server, and wanted help identifying the cause. That led to identifying an issue relating to a recent code change that needed fixing.

Of course it’s possible you might not need a full-time DBA in the cloud – or otherwise. That’s where it’s useful to have access to someone who can jump in to perform that role as required. Ideally as they work with you over time they will understand your systems and be able to talk in the context of your applications when discussing problems and solutions.

I can help you with all that. You can find more details of my services here or simply get in touch to discuss what I can do for you.

Backup Encryption in SQL Server

We’ve seen in the previous posts that TDE is a simple way of protecting your at-rest data. There may however be times where you can’t or don’t want to use TDE. The main scenario for this is where you are on a version of SQL Server before 2019 (when TDE was made available in standard edition) and you don’t want to pay for the enterprise version which has a high price tag associated with it.

When we talk about protecting our at-rest data, the item that we are likely to be most concerned about is the security of our backups. Backups are generally – and should be – stored off the server itself, and often we will ship copies offsite to a third party where we don’t have control over who can access the data, even if we trust that that will be well managed.

From SQL Server 2014 the product has included the ability to encrypt data while creating a backup. This feature is available in both the standard and enterprise editions of SQL Server, so it is something you can use even when TDE may not be a feature that is available to you.

Backup Encryption has a lot in common with TDE in terms of the objects required. The encryption hierarchy is the same; you require a DMK and a certificate with a public/private key pair. In theory you can use an asymmetric key instead of a certificate, but this has the disadvantage that you can’t export the asymmetric key – which means you will struggle to restore your database backup to a different server. As such, for the sake of the examples that follow, we’ll just look at the certificate option.

You may however choose to use an asymmetric key if you wish to use Extensible Key Management (EKM) and store the key externally to your SQL Server. We’ll look at EKM in a later post.

Setting Up Backup Encryption

As mentioned, the prerequisites for Backup Encryption are the same as for TDE. We’ll go over creating them again here, but a little more briefly this time. Refer to Setting up TDE for more information.

Creating a Test Database

We’ll start with creating a sample database that we want to backup. You can skip this step if you just want to work with an existing database. We’ll use basically the same database we used for the TDE examples, just with a different name. The following SQL creates the database and populates it with test data:

CREATE DATABASE TestBackupEncryption;
GO
USE TestBackupEncryption;
CREATE TABLE dbo.SomeData(Id INT IDENTITY(1,1), SomeText VARCHAR(255));
GO
INSERT INTO dbo.SomeData (SomeText)
SELECT TOP 1000000
('XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')
FROM sys.objects a
CROSS JOIN sys.objects b
CROSS JOIN sys.objects c
CROSS JOIN sys.objects d;
GO 100

Create the Database Master Key (DMK)

You must have a DMK, which resides in the master database, and you can create it with the following code:

USE master;
CREATE MASTER KEY
ENCRYPTION BY PASSWORD = 'UseAStrongPasswordHere!£$7';

You should also backup the DMK using the following command:

BACKUP MASTER KEY TO FILE = 'C:\Test\MyDMK'
ENCRYPTION BY PASSWORD = 'UseAnotherStrongPasswordHere!£$7';

Creating the Certificate

You also require a certificate in the master database which has an associated public/private key pair. Unlike TDE, in the case of Backup Encryption, this key pair will be used to directly encrypt the backup using asymmetric encryption. There is no separate Database Encryption Key required. You create the certificate with this SQL:

USE master;
CREATE CERTIFICATE BackupEncryptionCert
WITH SUBJECT = 'Certificate used for backup encryption';

You should take backups of the certificate and private key and keep them safe if you ever want to be able to restore your backups to another server. Here is the SQL to backup these objects:

BACKUP CERTIFICATE BackupEncryptionCert
TO FILE = 'C:\Test\BackupEncryptionCert.cer'
WITH PRIVATE KEY
(
FILE = 'C:\Test\BackupEncryptionCert_PrivateKeyFile.pvk',
ENCRYPTION BY PASSWORD = 'UseAStrongPasswordHereToo!£$7'
);

Permissions

It’s possible that the account you generally use for taking backups doesn’t have sysadmin permissions on the server. If that is the case, then there are some additional permissions required. The account needs the db_backupoperator role in each database being backed up, but that should already be in place. The only additional permission required is that the account must have the VIEW DEFINITION permission on the certificate. You can assign that permission with this SQL:

USE master;
GRANT VIEW DEFINITION ON CERTIFICATE::BackupEncryptionCert
TO [MyBackupAccount];

That’s all we need to do before we are ready to start encrypting our backups.

Working with Encrypted Backups

Now that we have all the objects in place to encrypt our backups, we can look at how you take a backup with encryption enabled and how you restore an encrypted backup.

Taking an Encrypted Backup

It is possible to encrypt any of the backup types – FULL, DIFFERENTIAL, or LOG. In practice if you are using Backup Encryption, you are likely to want to make sure all are encrypted. The syntax is the same in each case though, so we’ll just look at FULL backups. This is the backup command with encryption specified:

BACKUP DATABASE TestBackupEncryption
TO DISK = 'C:\Test\TestBackupEncryption_Encrypted.bak'
WITH ENCRYPTION(ALGORITHM = AES_256, SERVER CERTIFICATE = BackupEncryptionCert);

You can see we specify the algorithm. As with TDE, AES_256 is recommended (Advanced Encryption Standard with a 256-bit key). We also specify which certificate to use.

We can view data about the backup using the RESTORE HEADERONLY command, which will include information about encryption. Here is the code for that:

RESTORE HEADERONLY
FROM DISK = 'C:\Test\TestBackupEncryption_Encrypted.bak';

This returns us a lot of information, so I won’t include the full set of columns. Relevant to encryption though, we will see the following:

KeyAlgorithm – aes_256

EncryptorThumbprint – 0xA2E4A2A29182054B2F97FCD9954FA9349B4351EC

EncryptorType – CERTIFICATE

You can use this if you need to be able to check whether a particular backup is encrypted or not.

Restoring an Encrypted Backup

Restoring an encrypted backup is the same as restoring any other backup – as long as the certificate used to encrypt the backup exists on the server.

If you are restoring to a different server, you will need to restore a copy of the certificate and private key from the backup taken before you can restore the encrypted database (the server must also have a DMK before you can do this). This is the same command we covered in Migrating or Recovering a TDE Protected Database:

USE master;
CREATE CERTIFICATE BackupEncryptionCert
FROM FILE = 'C:\Test\BackupEncryptionCert.cer'
WITH PRIVATE KEY
(
FILE = 'C:\Test\BackupEncryptionCert_PrivateKeyFile.pvk',
DECRYPTION BY PASSWORD = 'UseAStrongPasswordHereToo!£$7'
);

Then you can simply restore the database as normal with the following command:

RESTORE DATABASE TestBackupEncryption
FROM DISK = 'C:\Test\TestBackupEncryption_Encrypted.bak';

If you get an error, it is likely to be because the certificate doesn’t exist – for instance, if you have restored the wrong one:

Msg 33111, Level 16, State 3, Line 25 Cannot find server certificate with thumbprint '0xA2E4A2A29182054B2F97FCD9954FA9349B4351EC'. Msg 3013, Level 16, State 1, Line 25

RESTORE DATABASE is terminating abnormally.

This post is part of a comprehensive series on SQL Server Encryption. Subscribe to my blog for updates as new posts are published or you can buy my book through the link in the sidebar to get it all in one go.

And if you’re embarking on an encryption project (or anything else to do with SQL Server) and want some help, I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Measuring SQL Query Performance

When running workshops on the subject of SQL Query Performance the first question I always ask is how do we measure the performance of a query? Or to look at it another way  – how do we measure the resources it consumes?

Attendees have come up with a range of answers from “With a stopwatch” (which I like a lot) to the slightly more technical “Using Profiler”. I like the first answer just because that’s where we all start. We run something from an application and we literally time how long it takes, or we run something in SSMS and we use the counter near the bottom right of our query window to tell us how long it took. We may even run something 100 times in a loop and capture the overall time so we can take an average.

SQL Server Profiler is also good when trying to capture what is actually happening in a given environment.

But there is a better way…

Using STATISTICS commands

When we’re working on a query in SSMS there is an easier, more accurate, and more detailed approach than any of the others available, which is to use the following two commands:

SET STATISTICS TIME ON;

This command tells SSMS to capture the consumed CPU time and elapsed time related to any queries you subsequently run in that query window. The output is sent to the Messages tab.

SET STATISTICS IO ON;

This command tell SSMS to capture the amount of data read from disk or memory by any subsequent queries in that window. Again the output is sent to the Messages tab.

Let’s look at some quick examples. For these I’ll use the StackOverflow database (here’s a post from Brent Ozar on how you can download this database for your own use https://www.brentozar.com/archive/2015/10/how-to-download-the-stack-overflow-database-via-bittorrent/):

SET STATISTICS TIME ON;
SELECT COUNT(*) FROM dbo.Users;

Output:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
(1 row(s) affected)

SQL Server Execution Times:
CPU time = 422 ms,  elapsed time = 2296 ms.

You can see we now have a fairly accurate measure of how long that query took to run (and how long to parse and compile). The CPU time is how much CPU is used, the elapsed time is how much time the query took overall. You’ll see the elapsed time is much longer the the CPU time, this generally means SQL is waiting for something the rest of the time, perhaps for data to be read from disk or memory, or perhaps something else (like locking and blocking).

Now let’s look at STATISTICS IO for the same query.

SET STATISTICS IO ON;
SELECT COUNT(*) FROM dbo.Users;

Output:
Table 'Users'. Scan count 5, logical reads 24914, physical reads 1, read-ahead reads 24770, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

You’ll see that there’s a bit more info here, let’s run through each piece in order:

  • Table – First we have the name of the table the reads are being generated from – when we have a bunch of tables referenced this can help us to work out where to focus our tuning efforts.
  • Scan Count – In simple terms this is the amount of times the table was accessed. This isn’t always a massively useful metric. In can also be Zero in some specifics scenarios such as searching for one specific value in a  clustered index.
  • Logical Reads – this is the number of data “pages” read from memory. A data page is 8KB in size, so this query has read about 200MB from the User table (which is basically the whole table).
  • Physical Reads – This is the number of data pages read from disk.
  • Read-Ahead Reads – These are also physical reads. In this case pages read from disk, but in theory read ahead of time e.g. while the processor is doing something else and doesn’t need them yet. In practice SQL has started to read these before it needs them, but is still reading them as the query is consuming them so we do end up waiting for the disk to supply most of the data. If we look back at the CPU time and elapsed time above, it is a pretty reasonable assumption that the time taken up waiting for these physical reads accounts for much of the difference between the CPU time and elapsed time.
  • LOB – The columns are repeated for LOB data, which is “Large Object” data such as varbinary(max) columns and other such “blobs”. These get stored outside of the main data page. We’re not going to worry about these too much for the moment.

Physical vs Logical Reads

It is worth knowing that SQL Server (in general) will try to retain data it has read in memory for as long as possible. This is because Logical Reads (from memory) are much much faster than Physical Reads (from disk). You may have heard people saying that SQL Server is very aggressive about using memory, or heard complaints like “SQL Server is consuming all the memory on the server”. This is why. SQL Server wants to do as much querying as possible from memory and save on those expensive physical reads. We allocate a maximum amount of memory to SQL Server and over time it will use as much of that allowance as it can. This is also one of the reasons why querying slows down after you restart SQL Server, all your queries are going to go back to reading from disk until the server has that data in memory once more.

It’s also good to understand that in the numbers above SQL Server is not reading one set of data from disk and another set from memory. Rather it is a two stage process, required data is read from disk into memory (the physical/read-ahead reads) and then the whole set is read from memory (the logical reads). So the number of logical reads includes the number of pages read first from disk.

We can see that effect if we run the query again. This time I’ll include both commands:

SET STATISTICS TIME ON;
SET STATISTICS IO ON;
SELECT COUNT(*) FROM dbo.Users
;

Output:
Table 'Users'. Scan count 5, logical reads 24982, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 546 ms,  elapsed time = 164 ms.

You’ll see our physical reads ( including read-ahead reads) have now dropped to zero. This is because we’ve already read this data from disk the first time we ran the query and SQL has retained the data in memory. Logical reads have stayed about the same. You’ll hopefully also see that our CPU time is reasonably similar at around 500ms, but our elapsed time has dropped dramatically as we no longer had to wait for slow reads from disk.

You might also notice that the elapsed time is now actually shorter than the CPU time which can be a bit confusing. The query has consumed 546ms of CPU time, but only took 164ms from beginning to end. How than this be? The answer is that our query has gone parallel over several CPU cores. Counting all the records in a large table is actually a pretty resource hungry operation as you can see, so SQL has decided to save time by breaking the task into pieces and running those sub tasks concurrently.

I should probably add a caveat here in that I have actually seen STATISTICS TIME return results where the elapsed time is marginally shorter than the CPU time even when the query is single-threaded. I queried this with a performance tuning specialist from Microsoft  who I was working with at one of our clients and he admitted that these figures weren’t always 100% accurate – they are however the most accurate you’ll get.

Flushing Data From Memory

You might sometimes want to see how your query would perform if it’s having to read all its data from disk. This might be more appropriate to how it is likely to be working in the wild. In that case you can run the following command which will flush all the data from memory (so don’t run it on a production environment!).

DBCC DROPCLEANBUFFERS;

If you execute this and then were to re-run the above query and capture the IO you’d see that the physical/read-ahead reads are back.

In SQL Server the area of memory where data is stored is referred to as the Buffer Cache. Within that Cache we have clean pages where the data in memory is the same as the data on disk, and dirty pages which are pages that contain modifications that have yet to be written to disk. The above command only flushes (or drops) the clean pages (or buffers). Just in case you were wondering why it was called DROPCLEANBUFFERS…

A Slightly More Complex Example

The above example with just the one table should make it clear what these commands do, but let’s look at a query that goes across a couple of tables and does a bit more work so you can see the sort of output you’ll get:

SET STATISTICS TIME ON;
SET STATISTICS IO ON;
SELECT TOP 10
u.Id
,
   u.DisplayName,
   COUNT(p.Id) AS NumberOfPosts
FROM dbo.Users u
INNER JOIN dbo.Posts p
   ON u.Id = p.OwnerUserId
GROUP BY u.Id, u.DisplayName
ORDER BY NumberOfPosts DESC;

Output:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 54 ms.
(10 row(s) affected)

Table 'Posts'. Scan count 5, logical reads 51352, physical reads 1, read-ahead reads 51192, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Users'. Scan count 0, logical reads 94, physical reads 45, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 336, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 8002 ms,  elapsed time = 5597 ms.

You can see here that the IO for both tables referenced in the query has been captured. This is very useful in telling us where the bulk of the work is being done.

Notable here also are the references to “Worktable” and “Workfile”. These often appear in the output to STATISTICS IO for more complex queries. In general it is worth thinking of these as internal objects SQL uses to store temporary resultsets while processing your query. The difference between the two is the type of resultsets they store and for which particular operations. This can make it a bit more challenging to work out which part of your query is generating the most work when the highest number of reads appear against these. You end up needing to look at the execution plan and trying to work out what operations these could be related to. We may get to more of that in a later post.

Using this information to Tune your Queries

To close, I just want to mention how you should use this information to focus your efforts when tuning queries. Intuitively we all think in terms of time when we think in terms of query performance, but in practice it is much better to focus on the output of STATISTICS IO when you are trying to make improvements.

The IO gives you much more information, and while figures for both CPU and IO will vary depending on what is going on on the server, the IO figures are more static and therefore you can have more confidence that changes you see are the results of your tuning efforts. It also allows you to focus on tables one at a time as you tune multi-table queries.

Both IO and TIME are measures of work being done on a server. If you can reduce the number of reads generated by your query then that is generally the best measure of improvement and will likely have similar benefits to the overall time taken by the query.

So in your tuning efforts, focus on the logical and physical reads and try to get the totals for these down as low as possible. Generally you’ll start with operations on the table that generates the highest number of reads and go on from there.

Need Help?

Got a problem and could do with some assistance? Get in touch today to see how I can help.

Top Tools for SQL Server Performance Troubleshooting

When you’ve got the symptoms of a database issue you can run a series of diagnostic queries to try and drill down on the problem and then start figuring out solutions. There are a number of free tools out there though that can speed up that process immensely. In this post we’re going to look at my favourites.

In no particular order…

sp_WhoIsActive

sp_WhoIsActive by Adam Machanic is great for the situation where you have a problem right now. It’s versatile and supports a number of parameters for execution but most of the time I run it with just the defaults and it tells me what I need to know. The resultset returned shows you what is executing on your SQL Server without any noise from idle connections etc.

From the output you can quickly identify long running, or CPU consuming queries. You can also identify queries whose execution is blocked and analyse the blocking chain to find the root blocker.

sp_WhoIsActive is a stored procedure that you install on your database instance. Either in the master database, or another database of your choosing.

You can download it and find the documentation here.

Query Store

Query Store is built in functionality that has existed in SQL Server since 2016. Once enabled it captures query execution information over time so you can see what was running in a particular time window, look at what are your most resource intensive queries, as well as identify regressed queries (queries that were good, but have gone bad). You can also use it to “fix” some issues by forcing a known good execution plan – and in SQL Server 2022 you can use it to force query hints too.

I’ve blogged a fair bit about Query Store and my Introduction to SQL Server Query Store post is one of my most popular posts. Give it a read if Query Store is unfamiliar to you.

Query store isn’t going to help you if you haven’t turned it on, and it’s a setting you should consider as default for each database. From SQL Server 2022 it is enabled automatically.

The only downside with Query Store is that it is a per-database view and if you have a lot of databases it won’t easily help you determine which one a problem is coming from. That’s why I wrote the script Identify the (Top 20) most expensive queries across your SQL Server using Query Store.

Paul Randall’s Wait Stats script

This is a great script for identifying what exactly is troubling your SQL Server. Is it CPU, is it Parallelism, is it blocking or memory etc. It analyses the aggregated waits (what SQL Server has been waiting on) since the last server restart. This sort of analysis is often the first thing you should be looking at when you have a problem. Paul used to work for Microsoft in the SQL Server team and is one of the foremost experts on all things SQL Server. You can get the script here as well as find comprehensive documentation on most of the wait types you are likely to find.

Brent Ozar’s First Responder Kit

Originally written by Brent Ozar (he has since open-sourced it) the First Responder Kit covers a lot of ground. It’s a set of stored procedures you can use in different scenarios. Like sp_WhoIsActive, you can install them in the master database or another database of your choosing. The most useful for problems that you are in the middle of is sp_BlitzFirst which encapsulates some of the other items above. It includes:

  • Identifying currently long running, high CPU or blocked queries.
  • Analysing wait stats.
  • Checking for currently executing SQL Server agent jobs.
  • Looking at perfmon counters.

As it covers a number of areas it is (as the name suggests) what you should consider running first when you have a problem.

Another very useful stored procedure in the toolkit is spBlitz which gives an overall healthcheck of your server and is a useful starting point for identifying all sorts of things that are going wrong or aren’t correctly configured.

You can get the First Responder kit here.

Need Help?

Got a SQL Server performance problem and need some assistance? Get in touch today to discuss how I can help you.

TDE: Database Backups and High Availability

Database backups continue to work without change when you have TDE enabled. The only difference is that the backups contain encrypted data that cannot be read without the certificate and private key. There are a couple of points that are worth discussing though.

Backup Performance

Following on from general questions of TDE performance, it’s sensible that you might also be concerned whether TDE has an impact on backup times. You may also have read of people complaining about long backup times with TDE.

It’s not necessarily true that TDE has much of an impact on backup performance. The reason is that when a backup is performed, SQL Server does not have to encrypt the data. The data already sits encrypted on disk in the data and log files, and those copies of the data are what are used for performing the backup. In practice there may be some data in memory that has yet to be encrypted and written to disk, but in general, that is not going to be large enough to cause significant overhead.

When people talk about issues with backup performance and TDE, they are likely to be talking about the case involving backup compression.

Backup Compression

Many people use backup compression with database backups in SQL Server. It is simple functionality to use as shown in this code example:

BACKUP DATABASE [TestTDE] TO DISK = 'C:\Test\TestTDE_Compressed.bak' WITH COMPRESSION;

The benefit of backup compression isn’t just about having smaller backup files but also in the time taken to perform a backup. The biggest bottleneck involved in taking a backup is usually the time it takes to write it to disk. By taking compressed backups you can significantly reduce backup takes. This comes at the cost of some extra CPU overhead to perform the compression, but unless your CPU is under pressure, it’s often worthwhile.

Up until the 2016 version, SQL Server did not support backup compression on TDE enabled databases. One reason for this may be that most compression algorithms work best where there is some repetition in the data to be compressed, but encrypted data looks pretty much random. What this meant in practice was that you might specify the WITH COMPRESSION option when backing up your TDE-protected databases but you wouldn’t see much difference in the file size or backup times. This changed from SQL 2016 and was a welcome improvement.

To use backup compression with TDE, however, you needed to specify an additional parameter MAXTRANSFERSIZE. This parameter specifies the largest unit of transfer in bytes used between SQL Server and the backup media. If you’re interested in fine-tuning your backup performance, this is one value you can play with. Backup compression with TDE doesn’t kick in unless your MAXTRANSFERSIZE is greater than 64kb (65536). As long as the value you specify is at least one greater than 64k, then an optimized algorithm for compression of TDE encrypted databases is enabled. Commonly people use the value of 128kb. The command looks like this:

BACKUP DATABASE TestTDE TO DISK = 'C:\Test\TestTDE_Compressed.bak'
WITH COMPRESSION, MAXTRANSFERSIZE = 131072;

This extra parameter becomes unnecessary if you are on SQL Server 2019 Cumulative Update 5 or higher. With that release, if you specify WITH COMPRESSION for a backup taken for a TDE-protected database and you don’t specify MAXTRANSFERSIZE, then MAXTRANSFERSIZE will automatically be increased to 128kb, and your backup will be compressed.

Backup Compression Issues

The introduction of backup compression for TDE-protected databases has however not been without problems, and this is something you really need to be aware of. There have been a number of bugs discovered where a compressed backup of a TDE database was found to be unrecoverable. Some people have also reported that restore times were massively increased in some cases.

If you’re on a version of SQL Server higher than 2016 CU 7 or 2016 SP1 CU4, then you should be fine, but I would stress the importance of regularly testing your backups by restoring them. A few days before writing this, I came across the term Schrodinger’s Backup – the condition of any backup is unknown until a restore is attempted. When it comes to TDE and backup compression, you should consider that as a very true statement.

TDE and High Availability

In general, TDE plays nicely with any of the built-in features that SQL Server has for high availability (HA). That includes:

  • Availability Groups
  • Log Shipping
  • Database Mirroring

In theory, in all cases, the actions you need to take to support TDE are the same. You just need to ensure that the secondary server has a Database Master Key (DMK). Then you need to ensure that copies of your certificate and private key have been restored to the secondary before you turn encryption on. This is the same step you would take if you were attempting to restore a TDE-protected database to a different server. We covered that a previous post Migrating or Recovering a TDE Protected Database.

As long as that is done, then whichever HA tool you use should take care of the rest.

In practice, we DBAs are cautious folk, and you don’t want to risk anything going wrong when you are dealing with a live system. As such you may want to take the following steps:

  1. Remove the database from HA.
  2. Set up TDE for the database and turn on.
  3. Set up the keys and certificate on the secondary.
  4. Add the database back into HA.

This post is part of a comprehensive series on SQL Server Encryption. Subscribe to my blog for updates as new posts are published or you can buy my book through the link in the sidebar to get it all in one go.

And if you’re embarking on an encryption project (or anything else to do with SQL Server) and want some help, I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

The Importance of SQL Server Performance Tuning in the Cloud

SQL Server Performance tuning is always important and I can list off reasons why:

  1. Improved application response.
  2. Identifying issues before they become outages.
  3. Potential cost savings.

In terms of the cost savings however, that is always a little speculative when running on premises. I can take your 8 core box running at 70% CPU peak and get that down to 35%. Then you could potentially migrate to a smaller box with 4 cores and save up to $7,000 a year for standard edition or about $26,000 for enterprise. If you’re running on bigger kit then the potential savings are much larger. Often though SQL Server runs on physical machines so downsizing isn’t that straightforward and there isn’t the appetite for migrating to different kit.

In the cloud it’s different, you can change the size of the instance running your SQL Server quickly and easily with no real risk, so cost savings can be realised immediately. That applies whether you’re running SQL on cloud Virtual Machines or using a Platform as a Service (PaaS) offering such as Azure SQL Database or Amazon RDS. If you’re running the serverless option on Azure SQL Database then you can start seeing savings without even having to change configuration of the instance.

Let’s look at some figures showing the potential cost savings. The table below shows the cost of running 4, 8 or 16 vCPU instances in Azure and AWS. For Azure I’ve chosen provisioned compute on the General Purpose Service Tier running on Standard hardware. For AWS I’ve chosen M5 as the instance type with SQL Server Standard edition and Multi-AZ for high availability. The figures shown are on-demand costs per year:

Number of vCPUsAzure SQLDBAWS RDS
16$35,346$88,423
8$17,673$44,641
4$8,836$21,444
Cost per year of running SQL Server in Azure and AWS

You can see the pricing calculators I’ve based these figures on here:

https://azure.microsoft.com/en-gb/pricing/details/azure-sql-database/single/

https://aws.amazon.com/rds/sqlserver/pricing/?nc=sn&loc=4

As a side note, we can see that AWS is a lot more expensive. There are a number of reasons why I see Azure as a better fit for running SQL Server workloads and price is one. That said, the database isn’t the only factor in choosing your cloud provider so it’s perfectly legitimate that you might be running on AWS.

When I’m asked to look at performance of a SQL instance, it’s rare that I can’t knock performance in half and often we can achieve much more. You can see the potential cost savings are significant. One large project I’ve worked on recently called me in where the application was struggling to perform running on 32 vCPUs in AWS RDS. I identified changes that meant it was performing perfectly running on 16 vCPUs, and soon after we were able to move it down to 8. A few more rounds of optimisation and now it can run on 4. That represented a saving of over $150,000 annually.

Get in touch today to discuss how I can help you with your cloud SQL Server performance and save you a lot of money year on year.