Manually updating Statistics in SQL Server

Having up to date statistics is vital for getting the best performance out of your queries. Even though SQL Server automatically updates statistics in the background for you (When do statistics get updated?), you may find there are times when you want to manage updating them yourself.

  • You may have large tables and find that the interval between the automatic updates is too big and is resulting in sub-optimal query plans.
  • You might need timely updates to a specific object – maybe as part of an ETL process to make sure that statistics are up to date after a particular part of the process, perhaps after a daily load into a table.
  • You may find that the automatic updates look at too small a sample size and you need to scan more of the table to enable accurate estimates for your queries.

My previous post on the Ascending Key problem demonstrated a common issue where the first of these scenarios could be affecting you. We’ll look at examples of the other scenarios in subsequent posts.

For now though, let’s just look at how you go about updating statistics.

UPDATE STATISTICS

At the heart of all the methods we’ll look at is the UPDATE STATISTICS command. There are a lot of options for using this command, but we’ll just focus on the ones you’re most likely to use. For full documentation here is the official reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/update-statistics-transact-sql

You can use this at the table level in which case all statistics objects belonging to that table will be updated:

UPDATE STATISTICS dbo.Test;

Or you can specify the specific statistics object you want to update:

UPDATE STATISTICS dbo.Test _WA_Sys_00000002_3AD6B8E2;

FULL SCAN or SAMPLE

The other thing you may be likely to want to specify is whether the statistics should be updated using a full scan of the table, or just be looking at a sample of the rows. In the above examples we didn’t specify this so SQL Server will decide for us. In general sampling (as opposed to full scans) kicks in when we have about 8MB of data in the table (or about 1000 pages).

If you want to specify a full scan the syntax is as follows:

UPDATE STATISTICS dbo.Test _WA_Sys_00000002_3AD6B8E2 WITH FULLSCAN;

If you want the statistics update to use sampling (more on how this works in subsequent posts) then you can choose to specify a percentage of the total table to be sampled:

UPDATE STATISTICS dbo.Test _WA_Sys_00000002_3AD6B8E2 WITH SAMPLE 10 PERCENT;

Or you can specify a fixed number of rows:

UPDATE STATISTICS dbo.Test _WA_Sys_00000002_3AD6B8E2 WITH SAMPLE 10000 ROWS;

You might want to use a sample as once your tables get large full scans can take a little time. Equally though if you’re updating statistics in a quieter time (e.g. overnight) you may feel you can afford the extra time for the full scans. Here’s some comparison figures I produced on my local machine showing how long full scans take. Obviously this will change depending on your hardware and other factors including how much of the table is already in memory:

FullScanTimes

You can see however that the time taken pretty much scales linearly as the table size increases.

One thing to be aware of is parallelism. A full scan can run as a parallel operation on your server which can speed it up considerably. When you update statistics using a sample however this can only run single-threaded unless you’re on SQL Server 2016 (or higher). Sampling data to build statistics in parallel was one of the many excellent little improvements in 2016.

sp_UpdateStats

This is another method you might use for manually updating statistics (perhaps as part of a scheduled maintenance job). This system stored procedure can be used for updating all of the statistics objects in a database:

USE Test;
EXEC sp_UpdateStats;

This stored procedure iterates through your database using a WHILE loop and executes the UPDATE STATISTICS command as it goes. One nifty thing about using this procedure is that it only updates statistics objects where rows have changed, so you don’t have any overhead for refreshing statistics where the underlying data hasn’t been modified. You can see this from this extract of the output of the stored procedure:

Updating [dbo].[TestMemory]
[PK__TestMemo__3214EC070D799003], update is not necessary…
0 index(es)/statistic(s) have been updated, 1 did not require update.

Updating [dbo].[TestMemory2]
[PK__TestMemo__3214EC07D3DC52DE], update is not necessary…
0 index(es)/statistic(s) have been updated, 1 did not require update.

Updating [dbo].[Test]
[PK_Test] has been updated…
[IX_Test_TextValue] has been updated…
2 index(es)/statistic(s) have been updated, 0 did not require update.

Ola Hallengren

Of course, if you’re looking to implement statistics update as part of regular maintenance, then you should definitely be considering using Ola Hallengren’s maintenance solution. Ola maintains a great solution for managing database backups and integrity checks as well index and statistics maintenance, and he shares it with the SQL community for free.

You can find full details of Ola’s solution and download it here:
https://ola.hallengren.com/

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Success vs. Scalability: Why your biggest success might also be your biggest disaster

There is a scenario I have seen again and again. A company has developed a nice piece of software which many clients are running with happily. Then along comes a new client, the biggest client to date by far. Celebrations abound, plans are made to expand the business and the execs are wringing their hands with glee at the anticipated new revenue.

But then soon after go-live things start to go wrong. The application just doesn’t cope well with what are unprecedented loads. The database is under strain and even throwing more power at at – where that is an option – isn’t helping as much as would be hoped.

This is often where I get called in. Scalability of database applications is a real issue. Without proactive performance tuning and optimization, most systems will reach a tipping point once a certain load, or size of data is reached.

On the application side it’s often easy, you can scale-out and simply add more application servers. On the database end however, you often have no choice to rely on a single database instance. This is often a point that is used to criticize traditional OLTP databases and people will say you should use NoSQL or rely on things such sharding and eventual consistency. There is definitely a place for those things, but at the same time, speaking mainly from my experience working with SQL Server, there is no realistic load it can’t take as long as the application is designed well, and the database and server are properly tuned.

So how do you identify if you’re going to have a problem before you end up in the middle of a client emergency?

Load testing is one answer. I’ve worked with product teams in this exact circumstance, testing to see if the application and database will scale to the requirements of a new customer. It is hard though. It’s hard to accurately simulate a profile of usage that anticipates how a given client – or all clients – may use your system, and it’s a lot of effort.

The good news though is that you can identify most potential issues within database applications before they become critical ones. Often it’s clearcut; a query is not going to scale well unless it has better indexing to support it, a particular piece of code is going to create a lot of locking as data sizes grow, parallelism settings are not optimal and won’t support optimum performance at scale, the list goes on.

Even when it’s not clear, there’s usually enough information to raise suspicion. Then you can at least target load testing to those specific functions and see how they cope under strain.

This scenario isn’t just limited to new customers. Existing customers can grow over time and usually their databases certainly do. Unless you’re pro-active in identifying those potential issues early then you will reach performance tipping points where clients end up raising Sev 1 issues.

It’s often said that the best DBAs go un-noticed because this is their approach. They’ve identified and solved issues ahead of time before anyone else was even aware of them. The best performance tuning activities are invisible in terms of negative impact, and it’s always great to be able to tell clients you’ve made things better without being asked.

So, have you got a big new client just around the corner, or are your existing clients (or their databases) growing, and you haven’t done any tuning recently (or at all)?

Get in touch and I can perform a Comprehensive SQL Server Healthcheck for you. We’ll identify these sorts of issues, as well as any other hidden dangers that might be lurking around the corner ready to bite. We’ll discuss solutions and I’ll work with your team so they understand the issues in detail. Or if you’re have a need for any other SQL Server consultancy then check out my services page or give me a yell.

Introduction to SQL Server Query Store

Introduced with SQL 2016, Query Store was, probably without doubt, the most anticipated and talked out new feature. In this post we’ll just take a brief look at it, what it is, how you set it running, and what you can use it for. This will be a fairly brief overview – you’d need a book to cover it in detail – but hopefully this will give you a flavour of how useful this will be and how to get started.

What it does, at a base level, is actually quite simple. It just stores information relating to query execution over time.

That information consists of two things:
• Execution Plans – the execution plans generated for each query are stored in the query store, and if the plan changes the new plan is also stored.
• Performance metrics – information such as CPU consumption, reads and writes, are captured and stored for each query.

This information is aggregated over intervals (default is one hour) so you can see how query performance changes over time.

This isn’t earth shatteringly new, you can already query to find out the execution plan for a query and you can also query to find aggregated performance metrics related to a given query.

The difference is that now a history can be maintained without implementing additional monitoring. Previously the performance metrics would be aggregated as a single total since the last restart of the SQL instance – and would be cleared at the next restart. Now they are persisted and time-sliced so you can actually measure change over time.

The simple activity of storing old execution plans is also profound for performance troubleshooting. Anyone who’s worked with large scale production data will have experienced the issue when a function that was working fine, fairly suddenly starts to develop performance problems.

A common cause of this is what’s known as “plan regression”. Basically this is where the execution plan has changed – and the new one’s just not as good as the old one for most executions. Previously you might suspect this was the cause of an issue you were seeing, but there was no way to easily prove it, now you can use query store to view and compare the old and new plans to verify this. You can even with a click or two force the query to go back to using the old (better) plan – though we hope people won’t overuse this and will try to at least delve into the cause and think about resolving it. There is usually a reason SQL thought the new plan would be better – and a particular plan may work for now but may not always be the best plan in the future as your data changes.

Let’s have a look at these features in little more detail.

Enabling Query Store
Query store is a database level configuration. It’s important to understand that, and that the information stored is actually stored within system tables in the database. That means that if you backup and restore the database, the information is retained. Also very importantly, the information is stored asynchronously – so there shouldn’t be any performance impact on the executed queries themselves. There will of course be some overall server overhead at the point the data does get saved, but that shouldn’t be too significant in most cases.

You can enable Query Store for a database through T-SQL (or in your source code) or through the GUI in SSMS. I just want to demonstrate doing this through the GUI so you can see some of the options. Right-click on the database, select properties, and then select the Query Store page all the way at the bottom:

QueryStore1

Above you can see Query Store enabled for the WideWorldImporters database, with all default settings.

The first setting is “Operation Mode”. By default this is set to “Off”. To enable Query Store and get it running for a particular database you change it to “Read Write”. Job Done.

The Data Flush interval is how often the query store data gets written to disk – remembered I said this was asynchronous.

The Statistics Collection interval determines the size of the time slices that your query performance metrics get aggregated into.

Then we have some stuff about data retention. It’s worth noting that if your query store fills up and nothing is happening to clear it out then it flips to Read-Only mode and won’t store any more data until space is freed up. The default amount of space set for it is 100MB – that’s not a lot of space so I really can’t see any justification from that point of view for not enabling this feature.

Leaving the “Size Based Cleanup Mode” set to Auto should make sure that old data gets purged if the query store starts to fill up. Above that is the “Query Store Capture Mode” – if you leave that to AUTO it will ignore infrequent queries or those with negligible overhead.

The last setting “Stale Query Threshold” is how long it keeps data for in days. So 30 days for default. I can see it being useful to up this significantly it we want to use Query Store to be able to monitor performance over a long period, but it may depend on how much space Query Store wants to consume for your database – remember the default is 100MB but you can up that to whatever you like.

At the bottom of the properties page you can also see some nice pie charts that show how much of a proportion of your database Query Store has allocated, and how much of that space it is using.

So that’s Query store set up and configured, let’s have a look at a few of the things it gives us.

Query Store in Action and Forcing a Plan
I’ve set up Query Store as above in a copy of the WideWorldImporters databases on a SQL 2016 instance. I’ve created a stored procedure that I’m running every two seconds and I’ve set the Statistics Collection Interval in Query Store to 1 minute (rather than an hour) so that I can get some figures and graphs out fairly quickly.

Under the database in SSMS, there is now a Query Store folder where some built in reports reside:

QueryStore2

For the sake of this blog post I’m just going to look at a couple of these. Let’s open the “Top Resource Consuming Queries” Report:

QueryStore3

You can see a few things here. On the top left is a bar chart of the most expensive queries (you’ll notice one large one and the rest are negligible in comparison – the large one’s my query). You can configure whether you want to look by CPU or Logical Reads amongst other options and whether you want to look at averages, or maximums or minimums. Basically there a whole bunch of ways you can configure your view.

I’ll be honest that I struggled with some of these built-in Query Store reports to get them to show me what I wanted, so expect a bit of playing around to figure things out if you are using this feature.

In the bar chart, the bar highlighted in green is the currently selected query, on the right we can then see a scatter graph of the execution figures for this query across our statistics intervals (remember I’ve set it to have intervals of 1 minute). You can see I’m looking at average logical reads. You will also see that this query was ticking along nicely until about 14:05 when something happened (that was me!) and the logical reads for each execution of the query suddenly shot up. The blobs on the scatter graph have also changed colour at this point and that represents that the query is now using a new execution plan.

Next to this graph is a key telling us which plan each colour of blob represents and if you click on the plan in the key that you want, the plan itself is displayed in the bottom pane. At the moment I’m looking at the original plan (Plan 1). You will notice that the title specifies that it is “not forced”, you’ll also notice a button to the right of the title that gives us the option to “Force Plan”. Let’s just hold off a minute before we do that.

Before we change anything to try and fix the issue with this query, let’s look at the “Regressed Queries” report. This is pretty similar, but you may use it from a different direction. I.e. it may not be one your most expensive queries that has started going bad, so if you look in the Regressed Queries report it will focus on ones for which the execution plan has changed in the interval you are looking at. Again I found it a little challenging to get this report to show me the query I was interested in, some playing around can be required:

QueryStore4

You can see here that I have just one big fat bar on my bar chart – as I only have one regressed query in the interval (last 30 minutes) I chose to look at. This can make it easier to identify queries suffering this issue.

I’m going to go back the the previous Resource Consumers report and try and fix the problem. Now, in reality I know what I did and why the query went bad. It was the result of something caused parameter sniffing, which is where, if a stored procedure recompiles, the execution plan that is formed may be different depending on the parameters it is executed with. Basically it forms the best plan for the parameters supplied – but that might not be the best plan for all sets of parameters. In this case I forced the stored procedure to form a plan that was going to be expensive in most cases. More on that in my next set of performance tuning workshops.

That issue would be best fixed in the code of the stored procedure, but in production, turning around a fix may take days and we have the problem right now. So let’s use the Force Plan functionality to fix the symptom – just for the moment.

I select the plan I want, and I click the “Force Plan” button. The effect is immediate and I notice it within minutes because my statistics collection interval is so small. I’ll let it run for a bit and then show you the new graph:

QueryStore5

You can see the query has now returned back to healthy (quick) execution. Note the Orange blobs all now have a tick over them to denote that this plan is now forced.

Comparing Plans

A related feature in SQL 2016 is the ability to compare two execution plans to see what’s changed. I haven’t found this that amazing myself when I’ve looked at it, but that’s mainly due to the natural limitations – if two plans are significantly different then something that highlights the differences is just going to highlight the whole thing. However it can be useful at least to have both plans on screen at the same time so you can go back and forth easily.

You don’t need to do this through Query Store – if you right-click on any Execution Plan in SSMS there is now the “Compare ShowPlan” option, and as long as you have the plan you want to compare against saved as a file then you can go ahead. Note that one good thing is that this is an SSMS feature, so as long as you have SSMS 2016 or higher you can use it to compare plans from on earlier versions of SQL Server.

With Query Store you can compare plans directly from the Store. If we go back to one of the reports above, the plans are listed in the key for the scatter graph. You can select more than one by using Shift+Click. Then you can click the button in the toolbar above the scatter graph which has the ToolTip “Compare the Plans for the selected query in separate window.”

Let’s do that for the two plans formed for our query above. The resulting view shows us two views side by side. It may be useful to look at these separately so they fit better on this page. On the left we have:

QueryStore6

The area highlighted in Red is where the tool had identified that the two plans are the same. The rest it is not so sure about. All the same it’s a nice visual view just to be able to see what both plans are doing. On the right hand side of the screen you then get this view:

QueryStore7

This shows us a comparison of the properties of whichever operator is selected in each plan – note this need not be equivalent operator in each plan. You can select the Nested Loop operator in the top and the Index Scan operator in the bottom and it will show you those – though the comparison may not be that meaningful!

So, this is kind of a useful tool, but don’t expect it to magically solve the process of comparing plans for you. In general too much changes from one plan to the next for it to be that simple – but this may help – a bit…

Query Store Catalog Views

Like everything else in SQL Server – all the information you can access through the GUI in SSMS, is available directly through the use of system catalogs and views. So if you want to examine the information in a way that isn’t supported by the built in reports then you can just go ahead and write your own queries.

The new views available are:

sys.database_query_store_options
sys.query_context_settings
sys.query_store_plan
sys.query_store_query
sys.query_store_query_text
sys.query_store_runtime_stats
sys.query_store_runtime_stats_interval

Rather than me going into detail here, I’ll just refer you to the MSDN reference:

https://msdn.microsoft.com/en-gb/library/dn818149.aspx

Conclusions

Query store is a great feature. It’s not rocket-science but it is very useful. In particular it massively aids the investigation of production issues, saving time for those troubleshooting them at exactly the point they need time saving – right when everything’s hitting the fan.

It is also very useful for monitoring performance over time and being able to keep ahead of scalability issues with specific queries.

The methods for forcing a plan are also excellent for quick fixes – but try not to overuse them. It is possible to force plans in earlier versions of SQL – but tricky, so people usually just fixed the code. Forcing plans can end up being a case of treating the symptoms rather than the cause – and can lead to other problems later on.

Other Posts about Query Store

Identify the (Top 20) most expensive queries across your SQL Server using Query Store

How does Query Store capture cross database queries?

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Why you still need DBA resource in the cloud

I was chatting with a cloud consultant who was advising on a large scale migration to AWS. He told me that one of the advantages of going for a PaaS offering (Platform as a Service) was that DBAs were no longer required as backups and restores were handled for you. PaaS services for SQL Server include AWS RDS and Azure SQL Database or Azure SQL Managed Instance.

I found it quite a funny conversation, partly as I don’t think he realised being a DBA was part of my job role, but also because I don’t know a single DBA who spends a significant amount of their time doing backups and restores.

It’s true that the services mentioned handle a lot of the routine tasks for you and that is great, it’s also reasonable to hope that they manage databases with greater reliability. Rather than replace the DBA though, they instead free them up to achieve more.

One common pitfall I’ve seen with cloud implementations is to assume that everything is done for you and that can lead to poorly configured services or ones with missing maintenance. It remains important to have access to someone who has a deep understanding of SQL Server when provisioning a cloud service to host your databases. It’s also important they understand the cloud options available and can make sure you have the right settings and instance type etc.

And once things are running you still need someone to turn to when there are problems. These problems are less likely to be hardware related, but you are still going to see occasional performance (or other) issues, just as you would when you were running on-premises hardware.

When those things happen you need access to someone with the right skills to diagnose the problem and perform analysis to define the right solution. Enter the cloud DBA.

As mentioned in my previous post The Importance of SQL Server Performance Tuning in the Cloud, ongoing performance tuning becomes even more important in the cloud as database services are expensive, a performance optimized database is also a cost-optimized one, and you can realise saving immediately.

Brent Ozar talks about the importance of DBAs in the cloud in this post https://www.brentozar.com/archive/2019/06/what-happens-to-dbas-when-we-move-to-the-cloud/. He tells a story that I’m going to quote here:

One of my favorite recent examples was a company who came to me saying, “We’re spending about $2M per year in the cloud just on our databases alone. Can you help us reduce those costs?” Absolutely: with just a couple of days spent query & index tuning, we chopped their biggest database expenses in half while increasing performance.

At the end of that engagement, the CTO told me, “I thought I’d save money in the cloud by not having a DBA, but what I’m learning is that in the cloud, I actually get a return on my DBA investments.”

And it’s not all about performance. Someone recently reached out to me to say that they had mysteriously “lost” about 400GB of free space overnight on an AWS RDS instance running SQL Server, and wanted help identifying the cause. That led to identifying an issue relating to a recent code change that needed fixing.

Of course it’s possible you might not need a full-time DBA in the cloud – or otherwise. That’s where it’s useful to have access to someone who can jump in to perform that role as required. Ideally as they work with you over time they will understand your systems and be able to talk in the context of your applications when discussing problems and solutions.

I can help you with all that. You can find more details of my services here or simply get in touch to discuss what I can do for you.

Implementing Temporal Tables Where You Have Existing Data

In my post about auditing tools in SQL Server I mentioned a few tools we have at our disposal for capturing the change history of data.

One was Change Data Capture which I want into a bit more detail about in Setting up Change Data Capture (CDC)

I also referred to Temporal Tables which are available to us from SQL Server 2016 onward.

Temporal tables aren’t just about monitoring change, they also provide really nice methods for being able to query historical data – to see what the values were at a particular point in time e.g.

SELECT * FROM dbo.SomeData FOR SYSTEM_TIME AS OF '1900-01-01';

This makes them easy to work with, but also opens up other applications for them over and above auditing purposes – for instance being able to maintain configuration data that changes over time, but where we need to reference what the value was historically.

I wanted to look in more detail about the process for changing an existing table so that it becomes a temporal one.

Let’s start by quickly looking at how we would define a temporal table for a fresh deployment:

CREATE TABLE dbo.SomeData(
   Id INT IDENTITY(1,1) NOT NULL
      CONSTRAINT PK_SomeData PRIMARY KEY CLUSTERED,
   SomeText VARCHAR(1000) NULL,
   ModifiedBy VARCHAR(128) NULL,
   StartDate datetime2(7) GENERATED ALWAYS AS ROW START NOT NULL,
   EndDate datetime2(7) GENERATED ALWAYS AS ROW END NOT NULL,
   PERIOD FOR SYSTEM_TIME (StartDate, EndDate)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.SomeData_History));

There are a few things here additional to a normal table definition. First thing to note is we need two columns to support the Temporal versioning. I’ve called my columns StartDate and EndDate, these both have to be of DATETIME2 datatype to the highest precision (100 nanoseconds). These are specified as GENERATED ALWAYS AS ROW STARTEND so SQL will take care of assigning values when new rows are inserted or updated.

I’ve specified the PERIOD FOR SYSTEM TIME, so SQL knows that these are the fields to use for the temporal versioning.

Finally I’ve specified SYSTEM_VERSIONING for the table and what the history table should be called.

When I execute this you can find the table and corresponding history table displayed in SSMS as below:
Tempopral1
We have the table itself, and also the history table has been created for us. This is where the previous versions of rows of data will be stored.

So that’s pretty straightforward – but what if you want to change an existing table?

First of all let’s drop this one:

DROP TABLE dbo.SomeData;

Unfortunately that gives me an error message:
Msg 13552, Level 16, State 1, Line 90
Drop table operation failed on table ‘TestAudit.dbo.SomeData’ because it is not supported operation on system-versioned temporal tables.

As the message says, you can’t just drop Temporal tables. First you must turn system versioning off, then you can go ahead:

ALTER TABLE dbo.SomeData SET (SYSTEM_VERSIONING = OFF); 
DROP TABLE dbo.SomeData;

Dropping the table though leaves the History table behind – now seen as a normal table: Tempopral2

That’s handy as it means you don’t lose the history when the main table is deleted. For now though let’s just delete that as well:

DROP TABLE dbo.SomeData_History;

Now let’s create a new table that will be our “existing table” we want to change. For this example I’ll create a test table with a trigger that maintains a modified date, and throw in 1000 rows:

CREATE TABLE dbo.SomeData(
   Id INT IDENTITY(1,1) 
      CONSTRAINT PK_SomeData PRIMARY KEY CLUSTERED NOT NULL,
   SomeText VARCHAR(1000) NULL,
   ModifiedBy VARCHAR(128) NULL,
   ModifiedDate DATETIME NOT NULL 
      CONSTRAINT DF_SomeData_ModifiedDate DEFAULT (GETUTCDATE())
);
GO

CREATE TRIGGER dbo.TR_SomeData_ModifiedDate
   ON dbo.SomeData 
   AFTER UPDATE
AS
BEGIN
   UPDATE dbo.SomeData 
   SET ModifiedDate = GETUTCDATE()
   FROM dbo.SomeData 
   INNER JOIN Inserted
      ON dbo.SomeData.Id = Inserted.Id;
END;
GO

INSERT INTO dbo.SomeData(SomeText, ModifiedBy)
SELECT TOP 1000 'Blah', 'Me'
FROM sys.objects a CROSS JOIN sys.objects b;

I’m going to complicate things and say that when I turn this into a temporal table I want to keep the existing ModifiedDate column and use that as the start date for my system versioning.

The first thing you might try is to alter the column:

ALTER TABLE dbo.SomeData
ALTER COLUMN ModifiedDate DATETIME GENERATED ALWAYS AS ROW START NOT NULL;

Error:
Msg 13589, Level 16, State 1, Line 44
Column ‘ModifiedDate’ in table ‘TestAudit.dbo.SomeData’ cannot be specified as ‘GENERATED ALWAYS’ in ALTER COLUMN statement.

So I can’t do this in an alter statement. What I’ll actually have to do is create a new column specified to work with the system versioning, copy the data over, drop the old column and rename the new one.

First let’s create the additional columns and define the PERIOD FOR SYSTEM_TIME:

ALTER TABLE dbo.SomeData
ADD 
StartDate datetime2 GENERATED ALWAYS AS ROW start NOT NULL
   CONSTRAINT DF_SomeData_StartDate DEFAULT (GETUTCDATE()),
EndDate datetime2 GENERATED ALWAYS AS ROW END NOT NULL
    CONSTRAINT DF_SomeData_EndDate DEFAULT ('9999-12-31 23:59:59.9999999'),
PERIOD FOR SYSTEM_TIME (StartDate, EndDate);

Note that I’ve had to specify defaults for the new columns so that the existing data gets values populated (I didn’t have to do that when I simply defined the Temporal Table from scratch).

For the StartDate I’ve specified the current UTC date – this could be any date and time that’s not in the future, though note it should be a UTC one. If I’d tried to use GETDATE, as I’m currently on British Summer Time, I would get the following error:
Msg 13542, Level 16, State 0, Line 51
ADD PERIOD FOR SYSTEM_TIME on table ‘TestAudit.dbo.SomeData’ failed because there are open records with start of period set to a value in the future.

For the end date I have put the maximum value a DATETIME2 column can hold, this is required and is a way of saying that this record is (at the moment) valid indefinitely. If I try to put in an earlier date I receive this error:
Msg 13575, Level 16, State 0, Line 51
ADD PERIOD FOR SYSTEM_TIME failed because table ‘TestAudit.dbo.SomeData’ contains records where end of period is not equal to MAX datetime.

Now I’ve got the columns I could try copying my data over:

UPDATE dbo.SomeData SET StartDate = ModifiedDate;

I get another error:
Msg 13537, Level 16, State 1, Line 70
Cannot update GENERATED ALWAYS columns in table ‘TestAudit.dbo.SomeData’.

It turns out you can get around this by temporarily dropping the period for system time:

ALTER TABLE dbo.SomeData DROP PERIOD FOR SYSTEM_TIME;
UPDATE dbo.SomeData SET StartDate = ModifiedDate;
ALTER TABLE dbo.SomeData ADD PERIOD FOR SYSTEM_TIME (StartDate, EndDate);

Your logical thought might be here that I didn’t need to create the period for system time when I altered the table to add the new columns. I could have just added the columns, migrated the data, then only added the period afterwards. Trust me, I tried that too:
Msg 13509, Level 16, State 1, Line 51
Cannot create generated always column when SYSTEM_TIME period is not defined.

From here it turns out to be reasonable clear sailing.

Let’s run the final command to make the table Temporal:

ALTER TABLE dbo.SomeData 
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.SomeData_History));

Here’s what it looks like in SSMS:
Tempopral3
You can see this is the same as the original fresh creation of a temporal table. The history table has been created for us. Just with an additional column in both tables.

Let’s remove the extra column and do the final rename:

ALTER TABLE dbo.SomeData DROP CONSTRAINT DF_SomeData_ModifiedDate;
ALTER TABLE dbo.SomeData DROP COLUMN ModifiedDate;
EXEC sp_rename 'dbo.SomeData.StartDate','ModifiedDate','COLUMN';

I was amazed the sp_rename didn’t give me an error. In fact if we look at the table again:
Tempopral4
You can see the column has been renamed in both the main table and the history table. Pretty neat!

I have at this stage forgotten to do one thing. Let’s just try to update a record and see what happens:

UPDATE dbo.SomeData SET SomeText = 'Wibble' WHERE Id = 2;

Msg 13537, Level 16, State 1, Procedure TR_SomeData_ModifiedDate, Line 7 [Batch Start Line 185]
Cannot update GENERATED ALWAYS columns in table ‘TestAudit.dbo.SomeData’.
The statement has been terminated.

Yep, don’t forget to remove your trigger!

DROP TRIGGER dbo.TR_SomeData_ModifiedDate;

And now we’re all done.

What if I use SQL Server Data Tools (SSDT) to make my change?

If you’re working with Database Projects in Visual Studio then you will be used to publishing changes straight from there – or deploying them using a DACPAC file.

In that case SSDT handles all this for you – though it does use a slightly different method. Rather than creating a new column, copying that data over, dropping the old column and renaming the new one, it does it at the table level.
That’s the same way some other changes are made with SSDT (or if you do them through the GUI in SSMS). The steps are broadly the same, just with tables instead of columns:

1) Create new temporal table
2) Turn SYSTEM_VERSIONING off
3) Copy across the data from the old table to the new
4) Turn SYSTEM_VERSIONING back on
5) Drop the old table
6) Rename the new table to the original name

If you’re doing this by “hand”, you could do it this way if you prefer, but it is probably a heavier operation that will require more down time. For instance you’ll need the whole table to be locked while the operation progresses as you don’t want to risk data changes once you’ve started copying the data over. The column method will also lock the table, but potentially there is less to copy so it should hopefully be quicker.

As a final point I should note that you may run into problems dropping the trigger with SSDT, it can be a little quirky around doing that. For the moment I’ll leave off from going into that in more detail – but if you get that issue feel free to give me a yell!

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Errata for my book: Pro Encryption in SQL Server 2022

My biggest fear when my book went into production was that any factual errors had slipped through my checks and the various reviews. I had a lot of reviewer support from Apress, but at the end of the day any issues are my responsibility.

So far I’m not aware of any factual errors but one kind reader (Ekrem Önsoy) has shared with me a few typos they have found. I’m going to document them here and will keep this post up to date as I’m made aware of any others:

Page 158

“In practice however some of the steps detailed earlier can’t be carried out directly from T-SQL, and these include generating the certificate and generating the new values of your CEKS encrypted by the new CEK.”

The highlighted term is incorrect, should read:

“In practice however some of the steps detailed earlier can’t be carried out directly from T-SQL, and these include generating the certificate and generating the new values of your CEKS encrypted by the new CMK.”

Page 204

“Now is a good time to check that you can connect to SQL Server on your VM from SSMS on your host machine. Connect with the IP address 192.168.03”

The IP address referenced here should read 192.168.0.3

Page 221

“We’re going to use the lab environment we created in Chapter 14 and connect to our SQL Server via the IP address 192.168.03 again;”

Again, the IP address referenced here should read 192.168.0.3

Page 232

“To be specific, how you can you ensure that the enclave is hydrated with the right CEKs”

Should read:

“To be specific, how can you ensure that the enclave is hydrated with the right CEKs”

That’s it so far. Thanks again to Ekrem, and if anyone else spots any errors please let me know.

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Identify Unused Indexes across all Databases

I had a request regarding how to identify unused indexes across all databases on a SQL Server instance (rather than just the current one). I’ve written this script before so thought I’d post it up here on my blog for the future use of myself and others.

The script uses the undocumented sp_MSforeachdb internal stored procedure to iterate through all the databases (excluding the system ones) and store the results of a standard unused indexes script into a temp table. The table can then be queried to analyse the results.

This script comes with a caveat, that the sp_MSforeachdb procedure can be unreliable and on occasion skip databases so you should check that the results include all the databases you are interested in. A simple count of distinct database names, and comparing that against the number of databases on your instance should be sufficient.

If you are trying to analyse in the case where you have multiple instances of the same application database, e.g. one for each client, you can query the temp table to identify indexes that are unused for all clients.

Anyway here’s the script:

--Find unused indexes across all user databases
IF OBJECT_ID('tempdb..#UnusedIndexes') IS NOT NULL
DROP TABLE #UnusedIndexes;
SELECT TOP 0
DB_NAME() AS DatabaseName,
s.name AS SchemaName,
o.name AS TableName,
i.name AS IndexName,
iu.user_updates as IndexUpdates,
iu.user_lookups as UserLookups,
iu.user_seeks AS UserSeeks,
iu.user_scans as UserScans
INTO #UnusedIndexes
FROM sys.dm_db_index_usage_stats iu
INNER JOIN sys.objects o
ON iu.object_id = o.object_id
INNER JOIN sys.schemas s
ON o.schema_id = s.schema_id
INNER JOIN sys.indexes i
ON o.object_id = i.object_id
AND i.index_id = iu.index_id
WHERE i.is_primary_key = 0
AND i.is_unique = 0
AND iu.user_lookups = 0
AND iu.user_scans = 0
AND iu.user_seeks = 0;
EXEC sp_MSforeachdb '
USE [?]
IF DB_ID() > 4
BEGIN
INSERT INTO #UnusedIndexes
SELECT
DB_NAME() AS DatabaseName,
s.name AS SchemaName,
o.name AS TableName,
i.name AS IndexName,
iu.user_updates as IndexUpdates,
iu.user_lookups as UserLookups,
iu.user_seeks AS UserSeeks,
iu.user_scans as UserScans
FROM sys.dm_db_index_usage_stats iu
INNER JOIN sys.objects o
ON iu.object_id = o.object_id
INNER JOIN sys.schemas s
ON o.schema_id = s.schema_id
INNER JOIN sys.indexes i
ON o.object_id = i.object_id
AND i.index_id = iu.index_id
WHERE i.is_primary_key = 0
AND i.is_unique = 0
AND iu.user_lookups = 0
AND iu.user_scans = 0
AND iu.user_seeks = 0;
END'
SELECT * FROM #UnusedIndexes;
--DROP #UnusedIndexes;

Note: Partially in response to Jeff’s comment below. Proceed with caution before dropping any indexes the script identifies. Stats are only kelp since the last restart, so if that was recent then you may have required indexes that get listed. Also think about whether you might have any occasional processes, such as month, quarter, or year-end processing that will not have executed since that last restart and may require indexes.

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Encrypting Your SQL Data Should be Your Default Option

Cyber-attacks and data breaches seem to be perpetually on the rise. Every week we hear news of large companies that have had data stolen with both financial and reputational repercussions. It should be clear that access controls aren’t sufficient to fully protect us from attacks and we need a second line of defence – which is encryption.


Brent Ozar ran a twitter poll a few months ago asking if people are encrypting their databases and backups. We see the results here:

You can see that a small majority of DBAs who follow Brent on Twitter and respond to polls use some form of at-rest encryption. I guess that means that in the larger population the majority aren’t encrypting their data at all.


There are a number of reasons why that might be the case, the only good one is that your database doesn’t hold any personally identifiable information (PII) or sensitive data which will certainly be true in some cases but not in most.


Other reasons are likely to be around effort and expense involved. With the advances Microsoft has made in the encryption feature set over the last several versions of SQL Server I’d argue that those reasons are not as valid as they might have once been.


There are four areas you should be thinking about in terms of properly protecting your data.


Passwords
Passwords are the most sensitive piece of information we hold in our systems. If these are accessed then an attacker can access all a user’s information in our system, and likely in other services they use if they re-use passwords. We shouldn’t even store passwords at all, but should use hashing and salting instead. Most people are probably pretty good about following good practices around passwords, but you’re not then stop reading now and go start having a conversation with your team about how you can address that.


At-rest Encryption
Protecting our data where it is stored on disk is an important part of an overall encryption strategy. Some people will use disk encryption for this but it is better if you can use TDE (Transparent Data Encryption) which will protect you against those who get read rights against the file systems on your servers. I get that TDE can be expensive as it required Enterprise Edition to use it, but that changed with SQL 2019. If you are on an older version then that alone can be a compelling reason to upgrade. Where that is not possible, if you are on SQL 2014 or higher then you can at least use the backup encryption feature to give protection to your backup files.


In-transit Encryption
In my experience most people do use TLS to protect their data is it is transmitted across the network, so that is good news. Again, if you’re not doing this then start making plans to change that. TLS is easy to implement and has little downside in return for the protection it offers.


Column Encryption
At-rest and in-transit encryption are great, but they don’t protect you if someone gains direct read access against your database. That’s where column encryption comes in. Column encryption used to require significant effort to implement, but with the introduction of Always Encrypted in SQL Server 2016 it became a lot easier. In many cases data can be encrypted with no code change required in your applications and giving you the highest level of protection possible. With Always Encrypted, data is protected at-rest, in memory and in-transit. It only ever exists in unencrypted form when it reaches your application.


We should use column encryption against our PII and sensitive data. I know Always Encrypted does give you limitations around how you work with your data, but in most cases you’ll have some data where those limitations are not an issue. I’d recommend you start encrypting those items now, and then over time work out to to encrypt the rest bit by bit.

This post is part of a comprehensive series on SQL Server Encryption. Subscribe to my blog for updates as new posts are published or you can buy my book through the link in the sidebar to get it all in one go.

And if you’re embarking on an encryption project (or anything else to do with SQL Server) and want some help, I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

SQL Server 2022 is Coming

We still don’t have an exact release date for SQL Server 2022 but at this stage we can have strong confidence that it will be with us before the end of the year – rather than slipping and becoming SQL Server 2023.


My first intimation that this was so, was earlier this year. I’ve been writing a book for Apress – “Pro Encryption in SQL Server 2022” that’s due to be released in a couple of weeks. At some point I questioned my editor about whether we should hold off as I didn’t want to be the author of a book referencing a version of a product that never existed. His answer was that Bob Ward says it’s happening, we’re releasing his book too, and it’s also got SQL Server 2022 in the title.


Given that Bob’s a Principal Architect for SQL Server with Microsoft, that gave me the confidence I needed. Bob’s book “SQL Server 2022 Revealed” was released a couple of weeks ago and goes over all the new features and enhancements in depth. I’ve started working my way through it and can highly recommend you get yourselves a copy. You get it currently at nearly a 70% discount through the following link:



As well as being a great technologist, Bob is an excellent and engaging writer. In addition to talking about the what the new features are, he gives insights into the design thinking that went into creating them which makes for interesting reading.
Here I reproduce Bob’s diagram that he calls the “Wheel of Power” showing the major new capabilities of SQL Server 2022:

I plan to blog about some of the new features I find most interesting and exciting over the next couple of weeks. I can’t wait to get dug in with the enhancements to Built-in Query Intelligence and Query Store. There are also enhancements to Always Encrypted which I’ve written about in my own book.


The cloud connected features are also very interesting and create new options for us in term of how we manage High Availability and Disaster Recovery as well as being able to offload parts of our workloads to the cloud.


Stay tuned, or if you can’t wait then grab a copy of Bob’s book and start working through the demos – there’s some great stuff. Like most releases on SQL Server, it’s not a quantum leap in computing, but there’s certainly enough for us DBAs to get our teeth stuck into and start thinking about what it offers us.

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Manually updating Statistics

Having up to date statistics is vital for getting the best performance out of your queries. Even though SQL Server automatically updates statistics in the background for you (When do statistics get updated?), you may find there are times when you want to manage updating them yourself.

  • You may have large tables and find that the interval between the automatic updates is too big and is resulting in sub-optimal query plans.
  • You might need timely updates to a specific object – maybe as part of an ETL process to make sure that statistics are up to date after a particular part of the process, perhaps after a daily load into a table.
  • You may find that the automatic updates look at too small a sample size and you need to scan more of the table to enable accurate estimates for your queries.

My previous post on the Ascending Key problem demonstrated a common issue where the first of these scenarios could be affecting you. We’ll look at examples of the other scenarios in subsequent posts.

For now though, let’s just look at how you go about updating statistics.

UPDATE STATISTICS

At the heart of all the methods we’ll look at is the UPDATE STATISTICS command. There are a lot of options for using this command, but we’ll just focus on the ones you’re most likely to use. For full documentation here is the official reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/update-statistics-transact-sql

You can use this at the table level in which case all statistics objects belonging to that table will be updated:

UPDATE STATISTICS dbo.Test;

Or you can specify the specific statistics object you want to update:

UPDATE STATISTICS dbo.Test _WA_Sys_00000002_3AD6B8E2;

FULL SCAN or SAMPLE

The other thing you may be likely to want to specify is whether the statistics should be updated using a full scan of the table, or just be looking at a sample of the rows. In the above examples we didn’t specify this so SQL Server will decide for us. In general sampling (as opposed to full scans) kicks in when we have about 8MB of data in the table (or about 1000 pages).

If you want to specify a full scan the syntax is as follows:

UPDATE STATISTICS dbo.Test _WA_Sys_00000002_3AD6B8E2 WITH FULLSCAN;

If you want the statistics update to use sampling (more on how this works in subsequent posts) then you can choose to specify a percentage of the total table to be sampled:

UPDATE STATISTICS dbo.Test _WA_Sys_00000002_3AD6B8E2 WITH SAMPLE 10 PERCENT;

Or you can specify a fixed number of rows:

UPDATE STATISTICS dbo.Test _WA_Sys_00000002_3AD6B8E2 WITH SAMPLE 10000 ROWS;

You might want to use a sample as once your tables get large full scans can take a little time. Equally though if you’re updating statistics in a quieter time (e.g. overnight) you may feel you can afford the extra time for the full scans. Here’s some comparison figures I produced on my local machine showing how long full scans take. Obviously this will change depending on your hardware and other factors including how much of the table is already in memory:

FullScanTimes

You can see however that the time taken pretty much scales linearly as the table size increases.

One thing to be aware of is parallelism. A full scan can run as a parallel operation on your server which can speed it up considerably. When you update statistics using a sample however this can only run single-threaded unless you’re on SQL Server 2016 (or higher). Sampling data to build statistics in parallel was one of the many excellent little improvements in 2016.

sp_UpdateStats

This is another method you might use for manually updating statistics (perhaps as part of a scheduled maintenance job). This system stored procedure can be used for updating all of the statistics objects in a database:

USE Test;
EXEC sp_UpdateStats;

This stored procedure iterates through your database using a WHILE loop and executes the UPDATE STATISTICS command as it goes. One nifty thing about using this procedure is that it only updates statistics objects where rows have changed, so you don’t have any overhead for refreshing statistics where the underlying data hasn’t been modified. You can see this from this extract of the output of the stored procedure:

Updating [dbo].[TestMemory]
[PK__TestMemo__3214EC070D799003], update is not necessary…
0 index(es)/statistic(s) have been updated, 1 did not require update.

Updating [dbo].[TestMemory2]
[PK__TestMemo__3214EC07D3DC52DE], update is not necessary…
0 index(es)/statistic(s) have been updated, 1 did not require update.

Updating [dbo].[Test]
[PK_Test] has been updated…
[IX_Test_TextValue] has been updated…
2 index(es)/statistic(s) have been updated, 0 did not require update.

Ola Hallengren

Of course, if you’re looking to implement statistics update as part of regular maintenance, then you should definitely be considering using Ola Hallengren’s maintenance solution. Ola maintains a great solution for managing database backups and integrity checks as well index and statistics maintenance, and he shares it with the SQL community for free.

You can find full details of Ola’s solution and download it here:
https://ola.hallengren.com/

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.