Implementing Temporal Tables Where You Have Existing Data

In my post about auditing tools in SQL Server I mentioned a few tools we have at our disposal for capturing the change history of data.

One was Change Data Capture which I want into a bit more detail about in Setting up Change Data Capture (CDC)

I also referred to Temporal Tables which are available to us from SQL Server 2016 onward.

Temporal tables aren’t just about monitoring change, they also provide really nice methods for being able to query historical data – to see what the values were at a particular point in time e.g.

SELECT * FROM dbo.SomeData FOR SYSTEM_TIME AS OF '1900-01-01';

This makes them easy to work with, but also opens up other applications for them over and above auditing purposes – for instance being able to maintain configuration data that changes over time, but where we need to reference what the value was historically.

I wanted to look in more detail about the process for changing an existing table so that it becomes a temporal one.

Let’s start by quickly looking at how we would define a temporal table for a fresh deployment:

CREATE TABLE dbo.SomeData(
   SomeText VARCHAR(1000) NULL,
   ModifiedBy VARCHAR(128) NULL,
   PERIOD FOR SYSTEM_TIME (StartDate, EndDate)

There are a few things here additional to a normal table definition. First thing to note is we need two columns to support the Temporal versioning. I’ve called my columns StartDate and EndDate, these both have to be of DATETIME2 datatype to the highest precision (100 nanoseconds). These are specified as GENERATED ALWAYS AS ROW STARTEND so SQL will take care of assigning values when new rows are inserted or updated.

I’ve specified the PERIOD FOR SYSTEM TIME, so SQL knows that these are the fields to use for the temporal versioning.

Finally I’ve specified SYSTEM_VERSIONING for the table and what the history table should be called.

When I execute this you can find the table and corresponding history table displayed in SSMS as below:
We have the table itself, and also the history table has been created for us. This is where the previous versions of rows of data will be stored.

So that’s pretty straightforward – but what if you want to change an existing table?

First of all let’s drop this one:

DROP TABLE dbo.SomeData;

Unfortunately that gives me an error message:
Msg 13552, Level 16, State 1, Line 90
Drop table operation failed on table ‘TestAudit.dbo.SomeData’ because it is not supported operation on system-versioned temporal tables.

As the message says, you can’t just drop Temporal tables. First you must turn system versioning off, then you can go ahead:

DROP TABLE dbo.SomeData;

Dropping the table though leaves the History table behind – now seen as a normal table: Tempopral2

That’s handy as it means you don’t lose the history when the main table is deleted. For now though let’s just delete that as well:

DROP TABLE dbo.SomeData_History;

Now let’s create a new table that will be our “existing table” we want to change. For this example I’ll create a test table with a trigger that maintains a modified date, and throw in 1000 rows:

CREATE TABLE dbo.SomeData(
   Id INT IDENTITY(1,1) 
   SomeText VARCHAR(1000) NULL,
   ModifiedBy VARCHAR(128) NULL,
   ModifiedDate DATETIME NOT NULL 

CREATE TRIGGER dbo.TR_SomeData_ModifiedDate
   ON dbo.SomeData 
   UPDATE dbo.SomeData 
   SET ModifiedDate = GETUTCDATE()
   FROM dbo.SomeData 
   INNER JOIN Inserted
      ON dbo.SomeData.Id = Inserted.Id;

INSERT INTO dbo.SomeData(SomeText, ModifiedBy)
SELECT TOP 1000 'Blah', 'Me'
FROM sys.objects a CROSS JOIN sys.objects b;

I’m going to complicate things and say that when I turn this into a temporal table I want to keep the existing ModifiedDate column and use that as the start date for my system versioning.

The first thing you might try is to alter the column:

ALTER TABLE dbo.SomeData

Msg 13589, Level 16, State 1, Line 44
Column ‘ModifiedDate’ in table ‘TestAudit.dbo.SomeData’ cannot be specified as ‘GENERATED ALWAYS’ in ALTER COLUMN statement.

So I can’t do this in an alter statement. What I’ll actually have to do is create a new column specified to work with the system versioning, copy the data over, drop the old column and rename the new one.

First let’s create the additional columns and define the PERIOD FOR SYSTEM_TIME:

ALTER TABLE dbo.SomeData
StartDate datetime2 GENERATED ALWAYS AS ROW start NOT NULL
    CONSTRAINT DF_SomeData_EndDate DEFAULT ('9999-12-31 23:59:59.9999999'),

Note that I’ve had to specify defaults for the new columns so that the existing data gets values populated (I didn’t have to do that when I simply defined the Temporal Table from scratch).

For the StartDate I’ve specified the current UTC date – this could be any date and time that’s not in the future, though note it should be a UTC one. If I’d tried to use GETDATE, as I’m currently on British Summer Time, I would get the following error:
Msg 13542, Level 16, State 0, Line 51
ADD PERIOD FOR SYSTEM_TIME on table ‘TestAudit.dbo.SomeData’ failed because there are open records with start of period set to a value in the future.

For the end date I have put the maximum value a DATETIME2 column can hold, this is required and is a way of saying that this record is (at the moment) valid indefinitely. If I try to put in an earlier date I receive this error:
Msg 13575, Level 16, State 0, Line 51
ADD PERIOD FOR SYSTEM_TIME failed because table ‘TestAudit.dbo.SomeData’ contains records where end of period is not equal to MAX datetime.

Now I’ve got the columns I could try copying my data over:

UPDATE dbo.SomeData SET StartDate = ModifiedDate;

I get another error:
Msg 13537, Level 16, State 1, Line 70
Cannot update GENERATED ALWAYS columns in table ‘TestAudit.dbo.SomeData’.

It turns out you can get around this by temporarily dropping the period for system time:

UPDATE dbo.SomeData SET StartDate = ModifiedDate;

Your logical thought might be here that I didn’t need to create the period for system time when I altered the table to add the new columns. I could have just added the columns, migrated the data, then only added the period afterwards. Trust me, I tried that too:
Msg 13509, Level 16, State 1, Line 51
Cannot create generated always column when SYSTEM_TIME period is not defined.

From here it turns out to be reasonable clear sailing.

Let’s run the final command to make the table Temporal:

ALTER TABLE dbo.SomeData 

Here’s what it looks like in SSMS:
You can see this is the same as the original fresh creation of a temporal table. The history table has been created for us. Just with an additional column in both tables.

Let’s remove the extra column and do the final rename:

ALTER TABLE dbo.SomeData DROP CONSTRAINT DF_SomeData_ModifiedDate;
ALTER TABLE dbo.SomeData DROP COLUMN ModifiedDate;
EXEC sp_rename 'dbo.SomeData.StartDate','ModifiedDate','COLUMN';

I was amazed the sp_rename didn’t give me an error. In fact if we look at the table again:
You can see the column has been renamed in both the main table and the history table. Pretty neat!

I have at this stage forgotten to do one thing. Let’s just try to update a record and see what happens:

UPDATE dbo.SomeData SET SomeText = 'Wibble' WHERE Id = 2;

Msg 13537, Level 16, State 1, Procedure TR_SomeData_ModifiedDate, Line 7 [Batch Start Line 185]
Cannot update GENERATED ALWAYS columns in table ‘TestAudit.dbo.SomeData’.
The statement has been terminated.

Yep, don’t forget to remove your trigger!

DROP TRIGGER dbo.TR_SomeData_ModifiedDate;

And now we’re all done.

What if I use SQL Server Data Tools (SSDT) to make my change?

If you’re working with Database Projects in Visual Studio then you will be used to publishing changes straight from there – or deploying them using a DACPAC file.

In that case SSDT handles all this for you – though it does use a slightly different method. Rather than creating a new column, copying that data over, dropping the old column and renaming the new one, it does it at the table level.
That’s the same way some other changes are made with SSDT (or if you do them through the GUI in SSMS). The steps are broadly the same, just with tables instead of columns:

1) Create new temporal table
3) Copy across the data from the old table to the new
5) Drop the old table
6) Rename the new table to the original name

If you’re doing this by “hand”, you could do it this way if you prefer, but it is probably a heavier operation that will require more down time. For instance you’ll need the whole table to be locked while the operation progresses as you don’t want to risk data changes once you’ve started copying the data over. The column method will also lock the table, but potentially there is less to copy so it should hopefully be quicker.

As a final point I should note that you may run into problems dropping the trigger with SSDT, it can be a little quirky around doing that. For the moment I’ll leave off from going into that in more detail – but if you get that issue feel free to give me a yell!

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Impact of TDE on Performance

Microsoft states that enabling TDE usually has a performance overhead of 2–4%. That doesn’t sound like very much, and personally I wouldn’t let it bother me if I want to make sure my data is encrypted at rest. However, you may have heard other sources saying that it’s actually a lot more than that – and the performance impact is a high price to pay for the level of protection offered. So, what’s the truth?

Where Do We See an Overhead?

When we talk about performance, we are likely to be concerned about two things. One is the impact on query performance. Are my queries going to execute slower with TDE enabled? The other is what overall pressure is going to be added to the server.

The important point to start with is in understanding where and how TDE adds overhead. Encryption occurs as data is written to disk, and decryption occurs as data is read from disk. Each of those activities uses CPU. So, the CPU overhead added by TDE is going to be in proportion to your disk activity. If you have a system that is heavy on IO, then there is going to be more CPU overhead.

SQL Server tries to keep data that is referenced repeatedly in memory (the buffer pool). So, if your SQL instance is provisioned with enough memory, a lot of your read queries can access the buffer pool and don’t have to go out to disk. Such queries should not be affected performance-wise by TDE. There may be other read queries however that access older data that hasn’t been read for a while, and these queries would need to retrieve that data from disk and so there would be an overhead from TDE.

Any queries that modify data will need the outcome to be written to disk, so in these cases we will see an overhead. This overhead is likely to come in two parts: first when the transaction is written to the log file before committing and then later as the updated data gets written to the data file as part of a checkpoint operation.

We also have other operations that write or update encrypted data on disk, so we would also expect these to have some overhead. This would include operations such as index rebuild operations.

You can see from this that the overhead will very much depend on how your application interacts with your data. At one extreme, if you have a set of static data that is small enough to be held in memory and is queried regularly, then there should be no overhead. At the other end of the spectrum, if you have an application that writes to the database a lot and reads less often, then the overhead will be higher.

How to Estimate the Performance Impact for Your Server?

Performance impact for your queries is going to be very much on a case-by-case basis, but in reality, it’s generally likely to be quite small. The reason for that is that, as discussed, we’re only going to see extra CPU requirements when our query needs to access the disk. Reading from and writing to disk is itself an activity that takes time, and even with the fastest disks, encryption/decryption is likely to take no longer than the disk access time. The encryption activities can usually be carried out in parallel to the disk activity, so you don’t see much increased time to read or write data. We’ll see an example of that shortly when we look at how you can get an idea of likely overhead on your server.

In terms of estimating overall overhead on your server, you need to understand the level of IO on the server as well as how well encryption will perform on the box.

Let’s work through an exercise to get an idea of the sort of numbers we might be talking about. For this, we’re going to need the database we created in How Long Will it Take to Encrypt Your Existing Data with TDE? that has about 10GB of data in a single table. We’ll also need a database that has the same set of data but without encryption turned on so we can get comparison figures. You can create that using the same scripts – just don’t run the final step of turning encryption on. We’ll call that database TestTDEOff.

We’re first going to run a query that will force SQL Server to read all the data in a table. We’ll repeat that across four scenarios:

  • TDE-protected database where the buffer cache is empty, so all data has to be read from disk
  • TDE-protected database where all the data for the table is in the buffer cache, so no data has to be read from disk
  • Database without TDE where the buffer cache is empty, so all data has to be read from disk
  • Database without TDE where all the data for the table is in the buffer cache, so no data has to be read from disk

Here is our query:

FROM dbo.SomeData
WHERE Id = 100000000;
FROM dbo.SomeData
WHERE Id = 100000000;

The DBCC DROPCLEANBUFFERS command flushes all data held in the buffer cache. You won’t want to do this on a live system as it will affect performance, but if you have a server with similar hardware, you can run this to get an idea of how encryption performs.

The test runs the same select statement twice, once with no data loaded into memory and a second time once data has been loaded by the first run. We use the SET STATISTICS command to output information about performance to the messages tab in SSMS. The table we are querying from is a heap and has no indexes defined, so SQL Server has no option but to scan the whole table in order to generate the result set.

Let’s look at an extract of the information outputted by STATISICS IO, TIME to see what we’re getting. This is for the database with TDE enabled:

Table 'SomeData'. Scan count 13, logical reads 1204820, physical reads 0, page server reads 0, read-ahead reads 1203777, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

SQL Server Execution Times:

CPU time = 10046 ms, elapsed time = 5580 ms.

Table 'SomeData'. Scan count 13, logical reads 1204820, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

SQL Server Execution Times:

CPU time = 12407 ms, elapsed time = 1050 ms.

We have two sets of output here: one for the first run where there was no data loaded into memory and one for the second once the data was loaded. The key difference is that in the first we have a large number of “read-ahead” reads, which are where data is read from disk. Read-ahead refers to the fact that they are read in parallel with the processing, rather than all needing to be read before the CPU can get to work. In the second output we only have “logical” reads where data is read from memory.

You get a reasonable amount of variance in the CPU and elapsed times when running such tests, so I executed the query five times against each database, averaged the results, and rounded off to the nearest 100ms. The figures are shown below.

 CPU time (ms)Elapsed time (ms)
TDE with all data read from disk10,6005,600
No TDE with all data read from disk7,6005,700
TDE with all data read from memory12,4001,100
No TDE with all data read from memory12,2001,100

Due to the variance between test runs, we’ll ignore small differences. There are a few key takeaways:

  • The elapsed time was about the same with and without TDE.
  • The CPU consumption was about the same where data was read from memory.
  • When reading from disk, there was a higher CPU consumption when TDE was enabled.

That is about what we would expect; TDE only adds overhead when reading or writing to disk. When we were reading from disk, my disk was being accessed at full speed and the disk access time was the bottleneck, so the decryption required by TDE was easily able to complete while that was occurring.

In terms of what this shows regarding the performance impact on a production server, there are a few ways you can think about the data.

The scariest way of looking at it – and not necessarily the correct one – is to focus on the fact that when reading from disk TDE added about 3 seconds of CPU. That was about a 40% increase. The reason that’s not going to be the impact you see in live though is that (hopefully) most of the data accessed by SQL Server is already going to be sitting in memory and so will not be affected. Still, I might envision that I’m going to see between a zero and 40% impact.

Another way to look at it is to realize that my disk was maxed out during this test, loading about 2GB of data per second (actually slightly less but we’ll stick with round numbers). I can calculate that during every second of execution about an extra half a second of CPU, power was consumed by the TDE decryption. That equates to half a CPU core being busy. My machine has 12 cores so that’s about 4% of physical CPU overhead added. Running the Windows Performance Monitor (perfmon) during the tests, I can see that is about right. If I only had four cores in this box, I’d be using the same half a core, so that would be about 12.5%. It couldn’t go any higher though because my disks are already maxed out. I’d have to be able to physically read data from disk quicker in order to create more TDE overhead. On this box, I can see that decrypting data with TDE costs me about 0.3 seconds of CPU per GB of data.

Where this leaves us is that the best way to get an idea on TDE impact on a particular server is to look at the level of disk access and the number of CPU cores. You can look at how much disk access you have (to the relevant database data and log files only) at peak times and get an idea from there. Hopefully you can see that to add a significant percentage of CPU overhead with TDE, you’re going to need to be reading a huge amount of data from disk, have superfast disks, and not have a lot of CPU power.

If you are able to run through something similar to this exercise to benchmark a production system – and you come up with numbers that worry you – I’m going to suggest that you might have an excessive amount of disk access, and that might be indicative of a problem. In particular, you may want to consider if the server would benefit from having more memory so that data can be held in the buffer cache for longer – and not read from disk so often. Also, are queries being forced to scan whole tables due to a lack of adequate indexes – or because they are poorly written.

This post is part of a comprehensive series on SQL Server Encryption. Subscribe to my blog for updates as new posts are published or you can buy my book through the link in the sidebar to get it all in one go.

And if you’re embarking on an encryption project (or anything else to do with SQL Server) and want some help, I’m available for consulting. Please get in touch to discuss what I can do for you, or check out my services page to see what kind of things I offer.