Cycle Through Your Clipboard History In SSMS

This is my favourite SSMS trick I’ve discovered recently, probably some time towards the end of last year.

Basically, when you paste in Management Studio, you have not just the option to paste the last thing you selected and copied, but can cycle back through previous things that were in the clipboard. In a quick test I was able to choose from the last 20 clipboard entries.

You can find a menu item for this in the “Edit menu”:

ClipboardRing

Or as you can see in the menu (and much more easily) you can use Ctrl+Shift+V

Here’s an animation of me using it to cycle through the SQL statements from my last blog post:

ClipboardRing

This is a great little productivity tool. I was forever copying and pasting one thing, then a second, then having to go back to copy and paste the first thing again. Never again!

 

The Importance of ORDER BY

Everyone, at the beginning of their SQL career, get’s told that it is important to include an ORDER BY if they want the results ordered. Otherwise the order in which they are returned is not guaranteed.

But then you run queries a lot of times that don’t need a specific order – and you see that they (at least seem to) come out in the same order every time. You could (almost) be forgiven for thinking you can rely on that.

There was even a question on a Microsoft SQL certification exam a few years ago that asked what the default order was for records returned by a simple SELECT – the answer it was looking for was that it would be according to the order of the clustered index. So you can rely on that – right?

Wrong. The question was a bad question, and the answer was incorrect. Let’s look at this in action. First I’ll create a table and add some rows:

CREATE TABLE dbo.NoSuchThingAsDefaultSortOrder (
   Id INT IDENTITY(1,1) 
      CONSTRAINT PK_NoSuchThingAsDefaultSortOrder PRIMARY KEY CLUSTERED,
   FirstName VARCHAR(30)
);

INSERT INTO dbo. NoSuchThingAsDefaultSortOrder (FirstName)
VALUES
('John'),
('Fred'),
('Bob'),
('Sue'),
('Jenny'),
('Matt'),
('Alexis'),
('Zebedee');

Now let’s SELECT from the table, without specifying an order:

SELECT Id, FirstName
FROM dbo.NoSuchThingAsDefaultSortOrder;

Here’s our results:

OrderBy1

Okay, so they’ve come out ordered by Id according to our clustered index.

But what if someone comes along and decides they want to be able to look up records in the table by name, so they create a non-clustered index for that:

CREATE NONCLUSTERED INDEX IX_NoSuchThingAsDefaultSortOrder_FirstName
   ON dbo.NoSuchThingAsDefaultSortOrder(FirstName);

We run our original query again and this time these are the results we get:

OrderBy2

They’ve come out in a different order – this time ordered by FirstName – like the index we just added.

The reason is that this time SQL decided that it would be better to use our new index to execute the query. In general, it will try and use the smallest index that will satisfy the query – and in this case it decided that was the non-clustered index. I’d show you the execution plans to prove that, but I think it’s pretty obvious from the order the results are returned in.

So, don’t go relying on there being a “default” sort order. It’s just not true, changes to indexing – or even conceivably to statistics about the data distribution, could change the way SQL chooses to execute your query, and that could change your order.

Unless you make sure to specify ORDER BY.

 

 

 

Thoughts on Query Performance with TDE enabled

Microsoft state that enabling TDE (Transparent Data Encryption) usually has a performance  overhead of 2-4%. That doesn’t sound like very much, and personally I wouldn’t let it bother me if I want to make sure my data is encrypted at rest.

However, you may have heard other sources saying that it’s actually a lot more than that – and the performance impact is a high price to pay for the level of protection offered.

So, what’s the truth?

The critical thing to remember is that with TDE your data is encrypted on disk, but data held in memory (i.e. the buffer pool) is unencrypted. Therefore, you would only expect an overhead when reading from and writing to disk.

SQL Server tries to keep data that is referenced repeatedly in the buffer pool. So, if your SQL instance is provisioned with enough memory, a lot of your read queries can access the buffer pool and don’t have to go out to disk. Such queries should not be affected performance-wise by TDE.

There may be other read queries however that access older data that hasn’t been read for a while, these queries would need to retrieve that data from disk and so there would be an overhead from TDE.

Any queries that modify data will need the outcome to be written to disk so in these cases we will see an overhead. This overhead is likely to come in two parts, first when the transaction is written to the logfile before committing, and then later as the updated data gets written to the data file as part of a checkpoint operation.

We also have other operations that write or update encrypted data on disk, so we would also expect these to have some overhead. This would include operations such backups, or index rebuild operations.

You can see from this that the overhead will very much depend on how your application interacts with your data. At one extreme, if you have a set of static data that is small enough to be held in memory and is queried regularly then there should be no overhead. At the other end of the spectrum, if you have an application that writes to the database a lot, and reads less often, then the overhead will be higher.

Let’s look at some examples and try and quantify what amount of overhead we might be talking about. In this post we’re just going to focus on the effect TDE has when you are reading data.

First, I’ll create two databases, one with TDE enabled and one without. Then I’ll load the same set of data into each (Total size about 1GB).

You can find the script I used for this in my previous blog post:

Encrypting an existing database with TDE

In the first test we’ll perform a like query of the worst kind, one that tries to match for a value within a column. We have no indexes on the table, but none would be that helpful with this query anyway.

SELECT *
FROM dbo.SomeData 
WHERE SomeText LIKE '%Wibble%';

I’ll run across 4 test cases (capturing the total CPU consumed in each case). The test cases are:

  • TDE Protected database where the buffer cache is empty (i.e. all data has to be read from disk)
  • TDE protected database where all the data for the table is in the buffer cache (i.e. no data has to be read from disk)
  • Database without TDE where the buffer cache is empty (i.e. all data has to be read from disk)
  • Database without TDE where all the data for the table is in the buffer cache (i.e. no data has to be read from disk)

In each test I’ll run the query 5 times and total the CPU to even out variance between executions. For the tests involving disk reads I’ll run the command DBCC DROPCLEANBUFFERS in between executions of the query to empty the buffer cache.

The results looked like this, with time shown in seconds. Note that MAXDOP was set to 4 and each query went parallel over 4 threads:

TDE_ReadPerf1

There’s quite a variance between each run so I’m not going to take anything significant from small differences we see. However, we can see that the timings are pretty much the same when the data is in memory, but there seems to be about a 10% overhead with TDE when reading from disk.

In case you’re wondering why reading from disk didn’t add much elapsed time for the No-TDE database – the reads were “read-ahead” so were able to complete while the CPU was burning through the data.

Let’s try a different query, this one will still have to scan the whole table as we are dealing with a heap, but it uses an equality predicate so there is less work to do in matching the data:

SELECT *
FROM dbo.SomeData 
WHERE Id = 100000000;

I’ll run the same set of tests as above and we can look at the results:

TDE_ReadPerf2

The first thing we notice is that this query runs a lot quicker in general. And again, there is little difference with and without TDE when the data is in memory.

But look at the comparison when the data has to be read from disk. With TDE the CPU consumption is more than 10 times as large, or in percentages, over 1000% worse.

At this point you could be forgiven for panicking – are you willing to risk that TDE makes your queries that much worse.

In both the above two tests, the same amount of data is being read from disk. And if you re-examine the numbers, you’ll see that (very roughly) the same amount of CPU has been added in each case where we have TDE enabled – about 50 seconds. That 50 seconds was split over 4 cores so it would have been about 12.5 seconds per core.

In terms of elapsed time, we had approximately that increase with the first query because CPU was the resource under most contention – i.e. the reads were able to occur while waiting for the CPU to complete. In the second query we can see the reading from disk occupied most of the elapsed time for those queries, so the extra CPU consumption didn’t make the query run particularly longer.

By the time it had been executed 5 times (with the memory flushed between each execution) each query read about 600,000 pages sized at 8kb each – just under 5GB. If it took 50 seconds on the decryption of those pages, then each page took about 1 twelfth of a milli-second to decrypt – or alternatively, TDE decrypted about 12 pages per millisecond. Or in terms of disk size, 100MB per second. These were tests on a server with magnetic spinning disks (not SSDs) and you can see from the above figures, the straight disk access took about 40 seconds on its own.

When TDE doesn’t read from disk it doesn’t add any overhead, but how do we quantify what the overhead to queries is when it does have to access the disk?

From the above tests we could suggest it adds from 10% to over 1000% CPU.

Or alternatively between 10% to 0% elapsed time. Note that those figures are the same way round, i.e. when it added 10% CPU it added 10% elapsed time, but when it added 1000% CPU time – the elapsed time was about the same.

I could go on with this type of confusing analysis, but instead I’ll suggest this is the wrong way to think about performance in terms of TDE.

Don’t think about query performance.

Think about read performance.

TDE overhead depends on the level of your physical disk access. In the case of read query performance, it depends wholly on the level of physical reads, and seems to be a reasonable fixed overhead for each physical read.

That makes perfect sense, the overhead for querying is in decrypting data, and surely it will take pretty much the same amount of CPU to decrypt each 8KB page.

In theory this makes it simple for us to calculate what this overhead would look like on our production SQL Servers. There’s all sorts of ways of capturing physical reads (and writes).

Let’s say I take a quick look at Resource Monitor to get a ballpark figure for one of my databases on this server that I know to be quite heavy on physical reads. And let’s say I see that it averages 25MB/s during the peak hour.

From that, and from the figures above I can estimate what impact enabling TDE for this database would have on the CPU.

Well I know that 25MB equates to about 0.25 seconds of CPU to decrypt the data, and I know I have 4 cores, so I can expect that in the average second this adds 0.0625 seconds of CPU per core. I multiple that be 100 and I find that I’ve added 6.25% CPU.

The calculation I’ve just done is:

(Reads/Second) * 100

Divided by

(MBs TDE decrypts every second/CPU) * (Number of CPU cores)

This doesn’t include writes, and it doesn’t include backups – I hope to look at that in a later post.

Now, let’s say that produces a scary number, and I’m worried about the strain that’s going to put on my CPU…

My first question would be why am I experiencing so many reads and can I alleviate that? Does data have a short shelf-life in memory? Do I have enough memory in my server – and is enough allocated to SQL?

This isn’t just in terms of TDE. SQL Server is going to perform much better if your current dataset – i.e. the data you are currently accessing most, can be held in memory. So, if TDE is causing a problem, then it’s possible your queries are slow anyway.

Again, I’m not talking about writes just yet.

Or maybe your database is heavy on physical reads because it’s a data warehouse, regularly querying historical data. In that case, is it a suitable target for encryption? Hopefully the data is well anonymised if you’re using it for reporting and therefore doesn’t contain anything personal or sensitive.

I hope.

In summary…

Just to repeat myself, if you’re wondering about TDE and its impact of query performance, which we all have done, try to reframe the question and think about its impact on read performance.

It makes more sense and it may help you to more easily quantify the impact on your servers. And if it does look like the performance may be an issue – perhaps there is tuning you can perform on your database instance to reduce the physical disk access.

Database Files Down The Wrong Path

I manage a few servers used to host SQL Instances for development and test purposes. Each of those instances hosts databases covering multiple environments. So I’ve got multiple servers, with multiple instances, with multiple environments.

It’s important that issues in those environments don’t block development tasks, or invalidate or block testing cycles, so I like to keep them a little bit locked down. Code changes into those environments are generally made through builds and releases.

Equally though, sometimes developers need to be able to make changes that require sysadmin rights, without waiting for me to have time to help – or I might be on holiday.

Consequently, I’ve had to give elevated permissions to a few “trusted” devs. Result – bad things happen occasionally.

One common issue that bugs me is where databases have been moved from one instance to another, usually through backup and restore, and the files haven’t been moved as part of the restore so they get recreated in the data\log folders for the old instance.

This has caused me various problems, e.g. working out which instance is hammering the disk or using up all the space.

You can avoid this using the tick box in the Files page in the Restore dialog (if you’re restoring through the GUI):

FilesDownWrongPath

I’ve written a quick script to identify any database files that are suffering this and identifying the user responsible for the restore (where this is available). I thought I’d share in case anyone else suffers from the same problem:

--Check for database files down the wrong path
DECLARE @DefaultDataPath VARCHAR(512);
DECLARE @DefaultLogPath VARCHAR(512);

SET @DefaultDataPath = CAST(SERVERPROPERTY('InstanceDefaultDataPath') AS VARCHAR(512)) + '%';
SET @DefaultLogPath = CAST(SERVERPROPERTY('InstanceDefaultLogPath') AS VARCHAR(512)) + '%';

SELECT @DefaultDataPath AS CorrectDataPath, @DefaultLogPath AS CorrectLogPath;

SELECT 
   SUSER_SNAME(d.owner_sid) AS Culprit, 
   d.name, 
   f.type_desc, 
   f.physical_name
FROM sys.master_files f
INNER JOIN sys.databases d
   ON f.database_id = d.database_id
WHERE d.database_id > 4
AND d.name != 'SSISDB'
AND (
      (f.type_desc = 'ROWS' AND f.physical_name NOT LIKE @DefaultDataPath)
   OR (f.type_desc = 'LOG' AND f.physical_name NOT LIKE @DefaultLogPath)
);

System databases and the SSIS catalog are excluded.

For once I won’t show the results of the script in action – I am however off to have a quiet word with a few culprits!

T-SQL Tuesday #101 – Some Great SQL Server Tools

tsql2sday150x150

This month for T-SQL Tuesday #101 Jens Vestergaard asks us to blog about the essential tools in our SQL Toolbelt.

http://t-sql.dk/?p=1947

I’d just completed by post on CMS when I realised I’ve blogged about a few of my favourite tools in the past and that this would be a good opportunity to share them again in a quick post. So, here’s my second of two T-SQL Tuesday posts.

First we have these two great tools:

Statistics  Parser – If you do performance tuning and you’re not using this then you’re missing out. Formats and summarises the results of your STATISTICS IO and TIME commands. Big shout to Richie Rump for creating this – I use it so much I just need to type “st” in my browser search and it comes straight up!

Live Query Stats – Watch an animated execution plan while your query executes. I’ll admit I’ve not used this a lot but I still think it’s really cool tool.

And then a few different ones around the same tool:

Query Store – track query performance and plan changes over time as well as forcing SQL to use the plan you want. I’m lucky to have quite a bit of stuff on SQL 2016 and Query Store is great! I was just using it minutes ago to look at how a performance fix put in last week has improved things.

Clone Database with Query Store – If you want to see how your queries would behave in your production environment then Clone Database copies out the stats used to generate the plan. With SQL 2016 you can also a copy of your query store data so you can query away at it happily in your dev environment.

Capture the most expensive queries across your instance with Query Store – blowing my own trumpet a bit here, but I’ve used this script a lot since I wrote it.

That’s it for now – at least for tools I’ve written about – I look forward to seeing what the rest of the T-SQLT community comes up with.

T-SQL Tuesday #101. CMS – Effortlessly run queries against multiple SQL Servers at once

tsql2sday150x150

This month for T-SQL Tuesday #101 Jens Vestergaard asks us to blog about the essential tools in our SQL Toolbelt.

http://t-sql.dk/?p=1947

The concept of a Central Management Server (CMS) is one I’ve been meaning to blog about for a while – just because I get the impression not a lot of people know about it or use it even though it’s been around since at least SQL 2008.

A central management server is simply one of your SQL instances on which you can register a group or groups of other instances. You can then run queries against those groups as a set, it’s a very easy way of being able to run diagnostic queries – or deploy common objects across a group.

There are other ways of performing these sorts of tasks – such as using Powershell, but with a CMS it’s really quick and easy, and all you have to know is SQL.

I’ll run through how you set one up, then give a couple of examples of using it.

Setting up your CMS

Go to the View menu in SSMS and select Registered Servers.

Then in the Registered Servers window, expand Database Engine, right-click Central Management Servers and click Register Central Management Server.

CMS1

In the New Server Registration dialog select the instance you want to use as a Central Management Server.

CMS2

You just need to put in the connection details – though you can also give it a friendly name in the Registered server name field.

It’s worth noting that this server isn’t going to be doing any particularly heavy lifting, it’s just going to store registrations for other servers, so there’s no need to use a dedicated instance for this.

You’ll now see this instance listed under Central Management Servers in the Registered Servers window.

You can right-click on it and either register individual servers underneath it or create one or more groups beneath which you can register servers. You can even nest groups if you want.

Here’s what it looks like after I’ve registered a few servers / groups:

CMS3

And that’s it setup – told you it was easy.

Running Queries against multiple instances

This is the clever bit. I can run a query against an individual SQL instance or against a group and any sub-groups.

I want to run something against all my instances in both groups, so I right-click on My CMS Server and select New Query.

CMS4

Let’s have a look at the query window as there’s some differences:

CMS5

You can see the status bar at the bottom is pink, that tells me I’m connected to multiple instances via a CMS. On the bottom left you can see I’m connected to 6 out of 6, i.e. I tried to connect to 6 instances and all are successfully connected.

Now let’s run a query. Suppose I want to check MAXDOP across my instances:

SELECT name, value_in_use
FROM sys.configurations
WHERE name = 'max degree of parallelism';

Here’s the results:

CMS6

You can see the queries against each instance have been combined into one result set and a column added for the ServerName (the friendly name I specified).

In the messages tab, we also have the output from each query:

CMS7

There’s no limit to what you can do with this, as long as you can express what you want to do in T-SQL, you can execute it easily across as many instances as you want.

I use this for deploying updated versions of Ola Hallengren’s procedures across multiple instances, or the First Responder Toolkit from Brent Ozar Inc.

I use this for running checks across multiple instances like the MAXDOP one above. Earlier, I used it to check for any databases that didn’t have PAGE_VERIFY set to CHECKSUM, and then I used it to enable that across the databases where it wasn’t set.

I’ll finish with another query to change my server configurations. Having run the first one to look at MAXDOP I’ve decided I want to set it to 4 for all the servers where it’s not set:

IF (SELECT value_in_use FROM sys.configurations 
    WHERE name = 'max degree of parallelism') = 0
BEGIN
   EXEC sp_configure 'show advanced options', 1;  
   RECONFIGURE;  
   EXEC sp_configure 'max degree of parallelism', 4;  
   RECONFIGURE;  
END

Here’s the output from the messages tab:

CMS8

(The really sharp eyed amongst you will notice SQL2012_Local is missing, that’s because I tested the query on that one first).

And if I run the original query again to check the settings:

SELECT name, value_in_use
FROM sys.configurations
WHERE name = 'max degree of parallelism';

CMS9

Sorted!

Summary

Content Management Servers are an exceedingly simple way of performing admin tasks or checks against a whole bunch of SQL Instances at once.

Once you’ve set it up, anyone else with access can connect to it run queries – or add other servers and groups.

There’s still a place for Powershell, particularly in automation of tasks. But when you’re doing tasks that are a bit more ad-hoc then using a CMS is perfect.

Crystal Balls

tsql2sday150x150

In the centenary edition of the monthly blogging party that is T-SQL Tuesday, Adam Machanic asks us to look forward 100 months and speculate what the world of SQL Server will look like:

http://dataeducation.com/invitation-t-sql-tuesday-100-looking-forward-100-months/

For Adam’s challenge, we’re looking forward just over 8 years. If we look back that amount then we’re looking at SQL Server 2008, with R2 coming out soon.

So how much has changed in the that time? Well, quite a lot, but it would be fair to say that more has stayed the same than has changed. For those of us who sit mainly in the data platform (rather than BI or Data Science) space, then most of what we do is the same, same T-SQL, same types of performance problems, same arguments with developers over database design, scalabilty and good practice.

What we have got though is enhanced tooling and improved performance. I love Query Store, I love the stuff starting to come out with adaptive query plans, even the automatic tuning features are looking good. I love that improved tooling means we can get away from the drudge tasks and concentrate on the clever stuff. Technologies like Always Encrypted give us great new ways to protect our data. The growth of the cloud offerings in Paas with Azure SQLDB and amazon RDS offer great possibilities.

Of course most people are still in the past with SQL Server and don’t get to use any of this, according to Spotlight, a quarter of SQL servers are still 2008 versions ( https://www.spotlightessentials.com/public/collectiveiq ), and I bet the truth is much bigger than that if you include all the severs that no-one bothers to  monitor.

So my biggest prediction for 2026 is that DBAs will finally have the go-ahead from their companies to upgrade to the versions of SQL that are current now and start using this stuff!

Some other random predictions:

Just as it didn’t when it was called Data Mining, Machine Learning won’t have solved all the world’s problems. It’ll be re-branded something like “Deep Intelligence” and be hailed as the new saviour going forward.

Similarly, Big Data will have been due for a re-brand  and re-hype for a while – err.. “Bigger Data” anyone?

It turned out no-one ended up getting the really big fines under GDPR, so companies have relaxed about abusing your data and storing backups on open cloud drives.

Implementations based on the successor to No-SQL, “No-Data” have run into problems. It becomes reluctantly accepted that data is useful in some contexts and the movement is relabelled “Not-only-data”.

The DBA role is dead – various people announce.

Many data lakes rotted and festered over time, becoming swamps. Cost of calling in the clean-up teams was too much so they just got concreted over.

But best, of all, Microsoft will have changed the default settings for a parallelism, and we’ll know what part of our string or binary data would have been truncated. Hallejulah! All hail our robot overlords! Where’s my Soylent Green?!

TDE and backup compression – still not working?

Until SQL 2016 if you used TDE (Transparent Data Encryption) you couldn’t use backup compression.

In 2016 Microsoft changed this, but it has been a rocky road. Backups work okay but in some circumstances people have found that they are corrupt when they come to restore them.

We thought that was all fixed, as highlighted by Brent Ozar, MS have been recommending that you are on at least SQL Server 2016 RTM CU6 or 2016 SP1 CU4 where these issues are resolved.

https://www.brentozar.com/archive/2017/09/breaking-news-using-tde-2016-backups-time-patch/

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/dad3dfe3-c914-44f8-96b9-ab56cc825fe9/tde-corrupt-backups-when-using-backup-database-with-compression-maxtransfersize?forum=sqldatabaseengine

It seems like people are still having problems though. Ken Johnson is on 2106 SP1 CU6 and is having problems:

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/dad3dfe3-c914-44f8-96b9-ab56cc825fe9/tde-corrupt-backups-when-using-backup-database-with-compression-maxtransfersize?forum=sqldatabaseengine

Our production DBA has just run a test and confirmed that in one test backing up 20 databases from one server and restoring them to another, about 10 have failed. Although in other environments this is working fine.

You can see the error if you try to verify the backup:

Date and time: 2018-02-16 09:52:10
Command: BACKUP DATABASE [XXXXX] TO DISK = XXXXX’ WITH CHECKSUM, COMPRESSION, MAXTRANSFERSIZE = 131072
Processed 95488 pages for database ‘XXXXX’, file ‘XXXXX’ on file 1.
Processed 3 pages for database ‘XXXXX’, file ‘XXXXX’ on file 1.
BACKUP DATABASE successfully processed 95491 pages in 2.375 seconds (314.113 MB/sec).
Outcome: Succeeded
Duration: 00:00:02
Date and time: 2018-02-16 09:52:12

Date and time: 2018-02-16 09:52:12
Command: RESTORE VERIFYONLY FROM DISK = ‘XXXXX’ WITH CHECKSUM
Msg 3189, Level 16, State 1, Line 1
Damage to the backup set was detected.
Msg 3013, Level 16, State 1, Line 1
VERIFY DATABASE is terminating abnormally.
Outcome: Failed
Duration: 00:00:03
Date and time: 2018-02-16 09:52:15

And it certainly doesn’t then work if you try to restore it:

5 percent processed.
Msg 3183, Level 16, State 1, Line 2
RESTORE detected an error on page (1:7763) in database “XXXXXXX as read from the backup set.
Msg 3013, Level 16, State 1, Line 2
RESTORE DATABASE is terminating abnormally.

We’re raising a call with Microsoft, will update when we know more.

In the meantime, can we trust backups of TDE with backup compression enabled? Given that we want to be absolutely sure our backups can save us in event of disaster the answer may be no. At a minimum we need to test each time we update our applications or patch SQL Server.

The side lesson if course is that we should always be regularly checking our backups are restorable.

More articles on TDE

What is Transparent Data Encryption?

Setting up Transparent Data Encryption (TDE)

Encrypting an existing database with TDE

Understanding Keys and Certificates with Transparent Data Encryption (TDE)

How Secure is Transparent Data Encryption (TDE) – and How to Prevent Hacking

Migrating or Recovering a TDE protected Database

 

Avoiding confusion with SQLCMD variables

SQLCMD variables can be a useful way of having changeable parameters for your SQL scripts, allowing you to specify the value from a command line, but also to control things you couldn’t manage through a SQL variable.

They can also be a little confusing the first time you see them.

Here’s an example:

:SETVAR MySQLCMDVar "Wibble"
SELECT * FROM dbo.Test WHERE TextValue='$(MySQLCMDVar)';

If you just run this then you might get the error:
Msg 102, Level 15, State 1, Line 1
Incorrect syntax near ‘:’.

The important thing is to understand that when you see a colon at the beginning of a statement in your T-SQL script, then this is a SQLCMD statement, and to parse these in SSMS you need to have SQLCMD mode enabled. You can do this from the “Query” menu:
SQLCMD1
Now I execute my script again, and it runs fine – though it returns no results.
Let’s look at another quick example:

:SETVAR TableName "Test"
SELECT * FROM dbo.$(TableName) WHERE TextValue='Wibble';

Here I’ve used the SQLCMD variable to define the name of the table in my query. This is the power of SQLCMD variables, you couldn’t do this with a normal SQL variable – the only way to do that would be to use dynamic SQL instead.

Let’s understand a little deeper what they are and how they work.

First of all, they don’t have a datatype, they are actually always text values. You can enclose them in double quotes or not –but I usually prefer to – although if you have spaces or other special characters then quotes are required.

You define them as follows:
:SETVAR SQLCMDVariableName “Whatever value you want”

And where you want to refer to them in your script you use a dollar sign and the variable name in brackets:
$(SQLCMDVariableName)

Rather than being a conventional form of variables, SQLCMD variables are actually tags for text replacement. It’s handy to understand this as it leads to some strange behaviours. What happens when you run a query with SQLCMD enabled, is that first of all the script is parsed and any SQLCMD statements are processed.

In the case of SQLCMD variables, first all the :SETVAR statements in the script are processed and each variable is assigned the correct value. Then all the references to each variable in the script are replaced with the literal value, it is then this modified version of your script (which you never get to see) which gets executed.

That’s why something like this doesn’t work:

:SETVAR TextVal "Hello There"
DECLARE @TextVal varchar(30);

SET @TextVal = $(TextVal);
SELECT @TextVal;

When I run this I get an error:
Msg 102, Level 15, State 1, Line 11
Incorrect syntax near ‘There’.

What’s going on? Both my SQL and SQLCMD variables are text aren’t they? Why doesn’t that work?

The answer lies in what I said before, the reference to a SQLCMD variable is just a tag to be replaced with the value defined elsewhere in the script. So in the above example what actually gets executed is:

DECLARE @TextVal varchar(30);

SET @TextVal = Hello There;
SELECT @TextVal;

Which isn’t valid SQL. What I should have done in my original SQL is to wrap the reference to the SQLCMD variable in single quotes:

:SETVAR TextVal "Hello There"
DECLARE @TextVal varchar(30);

SET @TextVal = '$(TextVal)';
SELECT @TextVal;

Now it works:

SQLCMD2

I mentioned you could pass SQLCMD variables from the command line – this can be handy if you’re executing scripts and you want to (for instance) specify the database name from outside. Watch out though, if you also assign a value in your script then it is the last value assigned that gets used.

I had a developer come to me complaining that SQL wasn’t picking up the SQLCMD variable he was passing through the command line, the answer was that he had another value assigned in the script. He thought that was dumb, so I asked the question “What would you expect to happen if you were writing C# code and assigned a value to a variable, and then assigned a new one to it – which would you expect it to hold – the first or the second?”

That doesn’t mean however that assignment of values to SQLCMD variables doesn’t display some counterintuitive behaviour. Look at the following query:

:SETVAR TextVal "Hello There"
SELECT '$(TextVal)';

:SETVAR TextVal "GoodBye"
SELECT '$(TextVal)';

So I set a value in my SQLCMD variable, output it with a select statement, then I change the value and output it again. Let’s look at the output:
SQLCMD3
What the…?! I’ve encountered issues before where I’ve tried to change the value of a variable and – having done something wrong – the value hasn’t updated. But here it looks like the first query is looking into the future!

This goes back to what I said earlier, first the :SETVAR statements are processed and the variable evaluated, only then are the references replaced in the script. This means you can’t have changing values for your SQLCMD variable throughout the execution of your script.

You can even see the same behaviour if you do this:

:SETVAR TextVal "Hello There"
SELECT '$(TextVal)';

:SETVAR TextVal "GoodBye"
SELECT '$(TextVal)';

:SETVAR TextVal "See you Later!"

Outputs:
SQLCMD35
I’ve said you can’t change the value of your SQLCMD variable through your script, technically it’s more accurate to say you can’t have different values within the same batch. So if you separate your script into separate batches using the GO statement, then you get a different result:

:SETVAR TextVal "Hello There"
SELECT '$(TextVal)';
GO

:SETVAR TextVal "GoodBye"
SELECT '$(TextVal)';

Outputs:
SQLCMD4
You might therefore think that the SQLCMD variable is only valid in the context of the batch in which is defined. So if I remove the :SETVAR in the second batch my script will fail:

:SETVAR TextVal "Hello There"
SELECT '$(TextVal)';
GO

SELECT '$(TextVal)';

Not so:
SQLCMD5
We see from this that a SQLCMD variable is not limited to the scope of a single batch – even though it gets re-evaluated on a batch by batch basis.

I’ll finish with something you might have attempted to do at some point. How about if I conditionally try to change a SQLCMD variable:

:SETVAR TextVal "Hello There"
IF 1=0
BEGIN
   PRINT 'Whoah!'
   :SETVAR TextVal "Maths is Broken"
END;

SELECT '$(TextVal)';

If I’ve not confused you too much with the above examples you can probably figure out what the output will be. That’s right:

SQLCMD6
This has just reminded me of a quote from “The Hitchhikers Guide to the Galaxy“ about the babel fish, particularly the last line:
“Now it is such a bizarrely improbable coincidence that anything so mind-bogglingly useful could have evolved purely by chance that some thinkers have chosen to see it as the final and clinching proof of the non-existence of God.
The argument goes something like this: “I refuse to prove that I exist,'” says God, “for proof denies faith, and without faith I am nothing.”

“But,” says Man, “The Babel fish is a dead giveaway, isn’t it? It could not have evolved by chance. It proves you exist, and so therefore, by your own arguments, you don’t. QED.”

“Oh dear,” says God, “I hadn’t thought of that,” and promptly vanishes in a puff of logic.

“Oh, that was easy,” says Man, and for an encore goes on to prove that black is white and gets himself killed on the next zebra crossing.”

 

The main take home from all this should be to avoid trying to use a SQLCMD variable like a normal one. Assign it once, at the top of your script or in a command line – then leave it alone!

Implementing Temporal Tables Where You Have Existing Data

In my post about auditing tools in SQL Server I mentioned a few tools we have at our disposal for capturing the change history of data.

One was Change Data Capture which I want into a bit more detail about in Setting up Change Data Capture (CDC)

I also referred to Temporal Tables which are available to us in the brave new world of SQL 2016.

Temporal tables aren’t just about monitoring change, they also provide really nice methods for being able to query historical data – to see what the values were at a particular point in time e.g.

SELECT * FROM dbo.SomeData FOR SYSTEM_TIME AS OF '1900-01-01';

This makes them easy to work with, but also opens up other applications for them over and above auditing purposes – for instance being able to maintain configuration data that changes over time, but where we need to reference what the value was historically.

I wanted to look in more detail about the process for changing an existing table so that it becomes a temporal one.

Let’s start by quickly looking at how we would define a temporal table for a fresh deployment:

CREATE TABLE dbo.SomeData(
   Id INT IDENTITY(1,1) NOT NULL
      CONSTRAINT PK_SomeData PRIMARY KEY CLUSTERED,
   SomeText VARCHAR(1000) NULL,
   ModifiedBy VARCHAR(128) NULL,
   StartDate datetime2(7) GENERATED ALWAYS AS ROW START NOT NULL,
   EndDate datetime2(7) GENERATED ALWAYS AS ROW END NOT NULL,
   PERIOD FOR SYSTEM_TIME (StartDate, EndDate)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.SomeData_History));

There are a few things here additional to a normal table definition. First thing to note is we need two columns to support the Temporal versioning. I’ve called my columns StartDate and EndDate, these both have to be of DATETIME2 datatype to the highest precision (100 nanoseconds). These are specified as GENERATED ALWAYS AS ROW START\END so SQL will take care of assigning values when new rows are inserted or updated.

I’ve specified the PERIOD FOR SYSTEM TIME, so SQL knows that these are the fields to use for the temporal versioning.

Finally I’ve specified SYSTEM_VERSIONING for the table and what the history table should be called.

When I execute this you can find the table and corresponding history table displayed in SSMS as below:
Tempopral1
We have the table itself, and also the history table has been created for us. This is where the previous versions of rows of data will be stored.

So that’s pretty straightforward – but what if you want to change an existing table?

First of all let’s drop this one:

DROP TABLE dbo.SomeData;

Unfortunately that gives me an error message:
Msg 13552, Level 16, State 1, Line 90
Drop table operation failed on table ‘TestAudit.dbo.SomeData’ because it is not supported operation on system-versioned temporal tables.

As the message says, you can’t just drop Temporal tables. First you must turn system versioning off, then you can go ahead:

ALTER TABLE dbo.SomeData SET (SYSTEM_VERSIONING = OFF); 
DROP TABLE dbo.SomeData;

Dropping the table though leaves the History table behind – now seen as a normal table: Tempopral2

That’s handy as it means you don’t lose the history when the main table is deleted. For now though let’s just delete that as well:

DROP TABLE dbo.SomeData_History;

Now let’s create a new table that will be our “existing table” we want to change. For this example I’ll create a test table with a trigger that maintains a modified date, and throw in 1000 rows:

CREATE TABLE dbo.SomeData(
   Id INT IDENTITY(1,1) 
      CONSTRAINT PK_SomeData PRIMARY KEY CLUSTERED NOT NULL,
   SomeText VARCHAR(1000) NULL,
   ModifiedBy VARCHAR(128) NULL,
   ModifiedDate DATETIME NOT NULL 
      CONSTRAINT DF_SomeData_ModifiedDate DEFAULT (GETUTCDATE())
);
GO

CREATE TRIGGER dbo.TR_SomeData_ModifiedDate
   ON dbo.SomeData 
   AFTER UPDATE
AS
BEGIN
   UPDATE dbo.SomeData 
   SET ModifiedDate = GETUTCDATE()
   FROM dbo.SomeData 
   INNER JOIN Inserted
      ON dbo.SomeData.Id = Inserted.Id;
END;
GO

INSERT INTO dbo.SomeData(SomeText, ModifiedBy)
SELECT TOP 1000 'Blah', 'Me'
FROM sys.objects a CROSS JOIN sys.objects b;

I’m going to complicate things and say that when I turn this into a temporal table I want to keep the existing ModifiedDate column and use that as the start date for my system versioning.

The first thing you might try is to alter the column:

ALTER TABLE dbo.SomeData
ALTER COLUMN ModifiedDate DATETIME GENERATED ALWAYS AS ROW START NOT NULL;

Error:
Msg 13589, Level 16, State 1, Line 44
Column ‘ModifiedDate’ in table ‘TestAudit.dbo.SomeData’ cannot be specified as ‘GENERATED ALWAYS’ in ALTER COLUMN statement.

So I can’t do this in an alter statement. What I’ll actually have to do is create a new column specified to work with the system versioning, copy the data over, drop the old column and rename the new one.

First let’s create the additional columns and define the PERIOD FOR SYSTEM_TIME:

ALTER TABLE dbo.SomeData
ADD 
StartDate datetime2 GENERATED ALWAYS AS ROW start NOT NULL
   CONSTRAINT DF_SomeData_StartDate DEFAULT (GETUTCDATE()),
EndDate datetime2 GENERATED ALWAYS AS ROW END NOT NULL
    CONSTRAINT DF_SomeData_EndDate DEFAULT ('9999-12-31 23:59:59.9999999'),
PERIOD FOR SYSTEM_TIME (StartDate, EndDate);

Note that I’ve had to specify defaults for the new columns so that the existing data gets values populated (I didn’t have to do that when I simply defined the Temporal Table from scratch).

For the StartDate I’ve specified the current UTC date – this could be any date and time that’s not in the future, though note it should be a UTC one. If I’d tried to use GETDATE, as I’m currently on British Summer Time, I would get the following error:
Msg 13542, Level 16, State 0, Line 51
ADD PERIOD FOR SYSTEM_TIME on table ‘TestAudit.dbo.SomeData’ failed because there are open records with start of period set to a value in the future.

For the end date I have put the maximum value a DATETIME2 column can hold, this is required and is a way of saying that this record is (at the moment) valid indefinitely. If I try to put in an earlier date I receive this error:
Msg 13575, Level 16, State 0, Line 51
ADD PERIOD FOR SYSTEM_TIME failed because table ‘TestAudit.dbo.SomeData’ contains records where end of period is not equal to MAX datetime.

Now I’ve got the columns I could try copying my data over:

UPDATE dbo.SomeData SET StartDate = ModifiedDate;

I get another error:
Msg 13537, Level 16, State 1, Line 70
Cannot update GENERATED ALWAYS columns in table ‘TestAudit.dbo.SomeData’.

It turns out you can get around this by temporarily dropping the period for system time:

ALTER TABLE dbo.SomeData DROP PERIOD FOR SYSTEM_TIME;
UPDATE dbo.SomeData SET StartDate = ModifiedDate;
ALTER TABLE dbo.SomeData ADD PERIOD FOR SYSTEM_TIME (StartDate, EndDate);

Your logical thought might be here that I didn’t need to create the period for system time when I altered the table to add the new columns. I could have just added the columns, migrated the data, then only added the period afterwards. Trust me, I tried that too:
Msg 13509, Level 16, State 1, Line 51
Cannot create generated always column when SYSTEM_TIME period is not defined.

From here it turns out to be reasonable clear sailing.

Let’s run the final command to make the table Temporal:

ALTER TABLE dbo.SomeData 
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.SomeData_History));

Here’s what it looks like in SSMS:
Tempopral3
You can see this is the same as the original fresh creation of a temporal table. The history table has been created for us. Just with an additional column in both tables.

Let’s remove the extra column and do the final rename:

ALTER TABLE dbo.SomeData DROP CONSTRAINT DF_SomeData_ModifiedDate;
ALTER TABLE dbo.SomeData DROP COLUMN ModifiedDate;
EXEC sp_rename 'dbo.SomeData.StartDate','ModifiedDate','COLUMN';

I was amazed the sp_rename didn’t give me an error. In fact if we look at the table again:
Tempopral4
You can see the column has been renamed in both the main table and the history table. Pretty neat!

I have at this stage forgotten to do one thing. Let’s just try to update a record and see what happens:

UPDATE dbo.SomeData SET SomeText = 'Wibble' WHERE Id = 2;

Msg 13537, Level 16, State 1, Procedure TR_SomeData_ModifiedDate, Line 7 [Batch Start Line 185]
Cannot update GENERATED ALWAYS columns in table ‘TestAudit.dbo.SomeData’.
The statement has been terminated.

Yep, don’t forget to remove your trigger!

DROP TRIGGER dbo.TR_SomeData_ModifiedDate;

And now we’re all done.

 

What if I use SQL Server Data Tools (SSDT) to make my change?

If you’re working with Database Projects in Visual Studio then you will be used to publishing changes straight from there – or deploying them using a DACPAC file.

In that case SSDT handles all this for you – though it does use a slightly different method. Rather than creating a new column, copying that data over, dropping the old column and renaming the new one, it does it at the table level.
That’s the same way some other changes are made with SSDT (or if you do them through the GUI in SSMS). The steps are broadly the same, just with tables instead of columns:

1) Create new temporal table
2) Turn SYSTEM_VERSIONING off
3) Copy across the data from the old table to the new
4) Turn SYSTEM_VERSIONING back on
5) Drop the old table
6) Rename the new table to the original name

If you’re doing this by “hand”, you could do it this way if you prefer, but it is probably a heavier operation that will require more down time. For instance you’ll need the whole table to be locked while the operation progresses as you don’t want to risk data changes once you’ve started copying the data over. The column method will also lock the table, but potentially there is less to update\copy so it should hopefully be quicker.

As a final point I should note that you may run into problems dropping the trigger with SSDT, it can be a little quirky around doing that. For the moment I’ll leave off from going into that in more detail – but if you get that issue feel free to give me a yell!