Implementing Temporal Tables Where You Have Existing Data

May 11, 2023May 11, 2023 ~ Matthew McGiffen ~ 1 Comment

In my post about auditing tools in SQL Server I mentioned a few tools we have at our disposal for capturing the change history of data.

One was Change Data Capture which I want into a bit more detail about in Setting up Change Data Capture (CDC)

I also referred to Temporal Tables which are available to us from SQL Server 2016 onward.

Temporal tables aren’t just about monitoring change, they also provide really nice methods for being able to query historical data – to see what the values were at a particular point in time e.g.

SELECT * FROM dbo.SomeData FOR SYSTEM_TIME AS OF '1900-01-01';

This makes them easy to work with, but also opens up other applications for them over and above auditing purposes – for instance being able to maintain configuration data that changes over time, but where we need to reference what the value was historically.

I wanted to look in more detail about the process for changing an existing table so that it becomes a temporal one.

Let’s start by quickly looking at how we would define a temporal table for a fresh deployment:

CREATE TABLE dbo.SomeData(
   Id INT IDENTITY(1,1) NOT NULL
      CONSTRAINT PK_SomeData PRIMARY KEY CLUSTERED,
   SomeText VARCHAR(1000) NULL,
   ModifiedBy VARCHAR(128) NULL,
   StartDate datetime2(7) GENERATED ALWAYS AS ROW START NOT NULL,
   EndDate datetime2(7) GENERATED ALWAYS AS ROW END NOT NULL,
   PERIOD FOR SYSTEM_TIME (StartDate, EndDate)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.SomeData_History));

There are a few things here additional to a normal table definition. First thing to note is we need two columns to support the Temporal versioning. I’ve called my columns StartDate and EndDate, these both have to be of DATETIME2 datatype to the highest precision (100 nanoseconds). These are specified as GENERATED ALWAYS AS ROW STARTEND so SQL will take care of assigning values when new rows are inserted or updated.

I’ve specified the PERIOD FOR SYSTEM TIME, so SQL knows that these are the fields to use for the temporal versioning.

Finally I’ve specified SYSTEM_VERSIONING for the table and what the history table should be called.

When I execute this you can find the table and corresponding history table displayed in SSMS as below:
Tempopral1
We have the table itself, and also the history table has been created for us. This is where the previous versions of rows of data will be stored.

So that’s pretty straightforward – but what if you want to change an existing table?

First of all let’s drop this one:

DROP TABLE dbo.SomeData;

Unfortunately that gives me an error message:
Msg 13552, Level 16, State 1, Line 90
Drop table operation failed on table ‘TestAudit.dbo.SomeData’ because it is not supported operation on system-versioned temporal tables.

As the message says, you can’t just drop Temporal tables. First you must turn system versioning off, then you can go ahead:

ALTER TABLE dbo.SomeData SET (SYSTEM_VERSIONING = OFF); 
DROP TABLE dbo.SomeData;

Dropping the table though leaves the History table behind – now seen as a normal table: Tempopral2

That’s handy as it means you don’t lose the history when the main table is deleted. For now though let’s just delete that as well:

DROP TABLE dbo.SomeData_History;

Now let’s create a new table that will be our “existing table” we want to change. For this example I’ll create a test table with a trigger that maintains a modified date, and throw in 1000 rows:

CREATE TABLE dbo.SomeData(
   Id INT IDENTITY(1,1) 
      CONSTRAINT PK_SomeData PRIMARY KEY CLUSTERED NOT NULL,
   SomeText VARCHAR(1000) NULL,
   ModifiedBy VARCHAR(128) NULL,
   ModifiedDate DATETIME NOT NULL 
      CONSTRAINT DF_SomeData_ModifiedDate DEFAULT (GETUTCDATE())
);
GO

CREATE TRIGGER dbo.TR_SomeData_ModifiedDate
   ON dbo.SomeData 
   AFTER UPDATE
AS
BEGIN
   UPDATE dbo.SomeData 
   SET ModifiedDate = GETUTCDATE()
   FROM dbo.SomeData 
   INNER JOIN Inserted
      ON dbo.SomeData.Id = Inserted.Id;
END;
GO

INSERT INTO dbo.SomeData(SomeText, ModifiedBy)
SELECT TOP 1000 'Blah', 'Me'
FROM sys.objects a CROSS JOIN sys.objects b;

I’m going to complicate things and say that when I turn this into a temporal table I want to keep the existing ModifiedDate column and use that as the start date for my system versioning.

The first thing you might try is to alter the column:

ALTER TABLE dbo.SomeData
ALTER COLUMN ModifiedDate DATETIME GENERATED ALWAYS AS ROW START NOT NULL;

Error:
Msg 13589, Level 16, State 1, Line 44
Column ‘ModifiedDate’ in table ‘TestAudit.dbo.SomeData’ cannot be specified as ‘GENERATED ALWAYS’ in ALTER COLUMN statement.

So I can’t do this in an alter statement. What I’ll actually have to do is create a new column specified to work with the system versioning, copy the data over, drop the old column and rename the new one.

First let’s create the additional columns and define the PERIOD FOR SYSTEM_TIME:

ALTER TABLE dbo.SomeData
ADD 
StartDate datetime2 GENERATED ALWAYS AS ROW start NOT NULL
   CONSTRAINT DF_SomeData_StartDate DEFAULT (GETUTCDATE()),
EndDate datetime2 GENERATED ALWAYS AS ROW END NOT NULL
    CONSTRAINT DF_SomeData_EndDate DEFAULT ('9999-12-31 23:59:59.9999999'),
PERIOD FOR SYSTEM_TIME (StartDate, EndDate);

Note that I’ve had to specify defaults for the new columns so that the existing data gets values populated (I didn’t have to do that when I simply defined the Temporal Table from scratch).

For the StartDate I’ve specified the current UTC date – this could be any date and time that’s not in the future, though note it should be a UTC one. If I’d tried to use GETDATE, as I’m currently on British Summer Time, I would get the following error:
Msg 13542, Level 16, State 0, Line 51
ADD PERIOD FOR SYSTEM_TIME on table ‘TestAudit.dbo.SomeData’ failed because there are open records with start of period set to a value in the future.

For the end date I have put the maximum value a DATETIME2 column can hold, this is required and is a way of saying that this record is (at the moment) valid indefinitely. If I try to put in an earlier date I receive this error:
Msg 13575, Level 16, State 0, Line 51
ADD PERIOD FOR SYSTEM_TIME failed because table ‘TestAudit.dbo.SomeData’ contains records where end of period is not equal to MAX datetime.

Now I’ve got the columns I could try copying my data over:

UPDATE dbo.SomeData SET StartDate = ModifiedDate;

I get another error:
Msg 13537, Level 16, State 1, Line 70
Cannot update GENERATED ALWAYS columns in table ‘TestAudit.dbo.SomeData’.

It turns out you can get around this by temporarily dropping the period for system time:

ALTER TABLE dbo.SomeData DROP PERIOD FOR SYSTEM_TIME;
UPDATE dbo.SomeData SET StartDate = ModifiedDate;
ALTER TABLE dbo.SomeData ADD PERIOD FOR SYSTEM_TIME (StartDate, EndDate);

Your logical thought might be here that I didn’t need to create the period for system time when I altered the table to add the new columns. I could have just added the columns, migrated the data, then only added the period afterwards. Trust me, I tried that too:
Msg 13509, Level 16, State 1, Line 51
Cannot create generated always column when SYSTEM_TIME period is not defined.

From here it turns out to be reasonable clear sailing.

Let’s run the final command to make the table Temporal:

ALTER TABLE dbo.SomeData 
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.SomeData_History));

Here’s what it looks like in SSMS:
Tempopral3
You can see this is the same as the original fresh creation of a temporal table. The history table has been created for us. Just with an additional column in both tables.

Let’s remove the extra column and do the final rename:

ALTER TABLE dbo.SomeData DROP CONSTRAINT DF_SomeData_ModifiedDate;
ALTER TABLE dbo.SomeData DROP COLUMN ModifiedDate;
EXEC sp_rename 'dbo.SomeData.StartDate','ModifiedDate','COLUMN';

I was amazed the sp_rename didn’t give me an error. In fact if we look at the table again:
Tempopral4
You can see the column has been renamed in both the main table and the history table. Pretty neat!

I have at this stage forgotten to do one thing. Let’s just try to update a record and see what happens:

UPDATE dbo.SomeData SET SomeText = 'Wibble' WHERE Id = 2;

Msg 13537, Level 16, State 1, Procedure TR_SomeData_ModifiedDate, Line 7 [Batch Start Line 185]
Cannot update GENERATED ALWAYS columns in table ‘TestAudit.dbo.SomeData’.
The statement has been terminated.

Yep, don’t forget to remove your trigger!

DROP TRIGGER dbo.TR_SomeData_ModifiedDate;

And now we’re all done.

What if I use SQL Server Data Tools (SSDT) to make my change?

If you’re working with Database Projects in Visual Studio then you will be used to publishing changes straight from there – or deploying them using a DACPAC file.

In that case SSDT handles all this for you – though it does use a slightly different method. Rather than creating a new column, copying that data over, dropping the old column and renaming the new one, it does it at the table level.
That’s the same way some other changes are made with SSDT (or if you do them through the GUI in SSMS). The steps are broadly the same, just with tables instead of columns:

1) Create new temporal table
2) Turn SYSTEM_VERSIONING off
3) Copy across the data from the old table to the new
4) Turn SYSTEM_VERSIONING back on
5) Drop the old table
6) Rename the new table to the original name

If you’re doing this by “hand”, you could do it this way if you prefer, but it is probably a heavier operation that will require more down time. For instance you’ll need the whole table to be locked while the operation progresses as you don’t want to risk data changes once you’ve started copying the data over. The column method will also lock the table, but potentially there is less to copy so it should hopefully be quicker.

As a final point I should note that you may run into problems dropping the trigger with SSDT, it can be a little quirky around doing that. For the moment I’ll leave off from going into that in more detail – but if you get that issue feel free to give me a yell!

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Things to look out for while TDE is encrypting existing data

January 27, 2023May 10, 2023 ~ Matthew McGiffen ~ 1 Comment

Monitoring for Problems

The encryption of your existing data occurs as a background process referred to as the encryption scan, but it will consume resources while it runs, so if you are implementing TDE against a system with large databases where performance is critical then you will want to either run it in a period of quiet (or down time), or you will want to monitor to check that encryption isn’t impacting your system too much. Experience suggests that it shouldn’t be a problem unless your server is already under strain.

There are a few things to look out for if you are monitoring during the encryption scan:

CPU and IO, both these could take a hit
You may want to look out for blocking caused by encryption, you can do this by checking the sys.dm_tran_locks view where the resource_subtype is “ENCRYPTION_SCAN”. Here is an example of the SQL for that:

SELECT * FROM sys.dm_tran_locks WHERE resource_type = 'ENCRYPTION_SCAN';

Monitor transaction log usage with DBCC LOGINFO. While the encryption scanner is running the transaction log can’t be truncated and VLFs marked for re-use. This could mean the transaction log might grow larger than normal, so you need to watch out if you are constrained for disk space.
If your database is synched to a secondary in an HA scenario, for example as part of an Availability Group or through log shipping, you may want to check your synch process remains healthy during the encryption scan. We talk a little bit about working with TDE and HA later on in this series.

What if you run into any performance problems during the scan?

First things first – don’t turn encryption off.

ALTER DATABASE TestTDE SET ENCRYPTION OFF;

This isn’t going to stop the encryption scan, rather it’s just going to change direction. So now it will begin decrypting everything it’s just encrypted, that’s likely to have just as much impact, then sooner or later you’re going to have to start again.

What we want to do is to pause the encryption scan. Prior to SQL Server 2019 there was no direct command to achieve this, however there was a trace flag that could be used to achieve much the same thing. Let’s look at both methods. First, if you are on SQL Server 2017 or lower you can pause the encryption scan with the following command:

DBCC TRACEON(5004);

When you do this, if you then query dm_database_encryption_keys you will see the database is set to a state of 2 (Encryption in Progress) and the percent_complete column will show zero – as nothing is currently in progress.

When you wish to begin the scan again, you need to disable the trace flag and then set encryption on again (even though it’s not technically off):

DBCC TRACEOFF(5004,-1); ALTER DATABASE TestTDE SET ENCRYPTION ON;

When you do this the scanner will pick up where it left off, for instance if it had got to 50%, then it will then continue from there.

It’s important to note that the trace flag doesn’t actually turn TDE off. In fact, if you add new data, or update data and it gets written to disk while the trace flag is enabled, the data will still become encrypted. The trace flag just pauses the background process of converting all the data already on disk.

Note that this method will pause the encryption scan for all databases that are in the process of being encrypted. The SQL 2019 method is better as it gives us the control to pause the scan for a specific database using the following code:

ALTER DATABASE TestTDE SET ENCRYPTION SUSPEND;

To set the encryption running again from where it left off, we just use the following command:

ALTER DATABASE TestTDE SET ENCRYPTION RESUME;

Just like with the trace flag, this method doesn’t turn TDE off, it just pauses the background process of encrypting existing data. New data will still be written to disk encrypted.

Along with this new functionality, SQL 2019 also added extra columns to the dm_database_encryption_keys view to help you manage the process of pausing and resuming the encryption scan. These columns include encryption_scan_state_desc which will tell us the state of the scan, for instance if it is running or suspended. We also have encryption_scan_modify_date which tells us when the scan was last modified, we can use that to tell when a scan was suspended or resumed.

If performance impact is an issue for you with the scanning process, you can use one of these methods to incrementally run the scan out of hours, pausing it when you want to ensure maximum performance of your systems.

What if the Encryption Scan Fails?

It is possible for the encryption scan to fail. The most likely scenario is a failure due to corruption in your database. Because the scan is a background process you won’t get an error message telling you that it has stopped, so it is good to use the methods we’ve looked at in this chapter to check when the scan is complete.

Prior to SQL 2019 a failure would exhibit itself by the encryption_state column showing as 2 (in progress) in dm_database_encryption_keys and the percent_complete column showing as zero. Usually the 5004 trace flag would also have been enabled to prevent the scan from resuming.

From SQL 2019 we have a little more information and can see in dm_database_encryption_keys that encryption_scan_state_desc is set to aborted and can tell when that happened.

In either case your first action should be to check the database for corruption and resolve that if there are any issues. After that you can attempt to restart the encryption scan using the methods we’ve just looked at. If you have no corruption, and the scan still fails then you’ll need to reach out to Microsoft for support, but that’s an unlikely scenario.

Taking Backups While Encryption is in Progress

You can continue to take backups while the scanner is running to encrypt your existing data. However, it is important to understand that until that process is complete your backups will not be fully encrypted.

What will happen in the meantime is that a mixture of encrypted and unencrypted data will get written to your backup files. Only backups taken after the TDE encryption scan is complete will be fully protected.

This post is part of a comprehensive series on SQL Server Encryption. Subscribe to my blog for updates as new posts are published or you can buy my book through the link in the sidebar to get it all in one go.

And if you’re embarking on an encryption project (or anything else to do with SQL Server) and want some help, I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

The Cost of not Following the Regulations Around Data

January 10, 2023May 10, 2023 ~ Matthew McGiffen ~ 1 Comment

Brent Ozar shared in his weekly links this week a GDPR penalty tracker which makes for thought-provoking reading. Regular readers of this blog will know I have a keen interest in data protection and encryption – on which topic I’ve written a book – so it’s interesting so see some figures of what failing in those areas can cost you.

Here’s a link to the report, though it only goes as far as July 2022 at the time of writing:

https://www.enforcementtracker.com/

The biggest fines (both individually and in total) have been levied for improper use of data, with Amazon, WhatsApp, Google and Facebook topping the list. After that though we have fines for insufficient protection of data. In most cases this is where companies have had some form of data breach and the safeguards in place weren’t deemed sufficient. The largest fine in this case was against British Airways who were hacked in 2018 and they received a fine of over 22 million euros for the lack of safeguards. That was calculated as 1.5% of the company’s turnover in 2017.

In the case of British Airways, the problem was an application vulnerability but there have been other cases where the database (or backups) were accessed directly. In these cases, encryption would have prevented the attack and saved the companies in question fines as well as reputational damage. As I’ve mentioned elsewhere in this blog, the advances made to the encryption feature set by Microsoft over the last few releases to SQL Server make it a no-brainer to implement encryption on at least your most sensitive data.

Let’s look at some of the summary figures in a bit more detail. Here we see a table of the total fines levied by type of violation:

And the same information in a graph:

We can see the total fines are in the billions, with incorrect processing and inappropriate use of data receiving the highest penalties. Protection of data is of the most interest to me and that comes 4^th with around 80 million euros of fines in total (that was across about 200 separate violations) – still a large figure.

Some of the areas that the violations cover are probably outside our remit as DBAs and architects, but in some cases we do have some responsibility for pushing to make sure things are doing correctly.

In terms of following data principles these include (but are not limited to) things like:

Data Retention – making sure that personally identifiable data is kept for no longer than required.
Data Minimization – making sure we only collect the data we legitimately need.
Making sure data is processed securely.
We have adequate protection against data loss – things such as backups and checking for corruption.

In terms of illegal data processing, we should call out if we think data is being used for something we don’t have permission to do.

Protection of data where it resides in the database is certainly something we have to own, and shout loudly if things are not being done correctly. This includes having appropriate access controls and thinking about whether it should be encrypted.

One other important takeaway from the GPR tracker is the fact that the number of fines being levied is increasing over time so following the rules around data is continuing to become more and more important.

I should state in closing that this is just a quick post on the subject, and you need to do your own research to understand the rules and how they apply to your role. Many of us (in Europe at least) will have looked at the GDPR in some detail before it came into place, and in many companies, there was flurry of activity to make sure things were up to code. We shouldn’t rest on our laurels though and need to continually think about this stuff.

Setting up Transparent Data Encryption (TDE)

April 12, 2021May 10, 2023 ~ Matthew McGiffen ~ 4 Comments

Transparent Data Encryption (TDE) is one of the easiest ways of encrypting your data at rest. In my prevous post we looked at what TDE is and how it works: What is Transparent Data Encryption?

In this post we go through the steps of setting it up.

You can set up TDE when you first create a database, or you can apply it to an existing database. In the latter case, once TDE has been enabled it will set to work encrypting your existing data in the background.

In either case the steps are the same. We’ll run through those quickly before going into more detail.

First you must have a Database Master Key (DMK) in the Master database, and a certificate that will be used by TDE:

USE MASTER;

CREATE MASTER KEY
ENCRYPTION BY PASSWORD = 'UseAStrongPasswordHere!£$7';
CREATE CERTIFICATE MyTDECert 
WITH SUBJECT = 'Certificate used for TDE in the TestTDE database';

This certificate is critical to you being able to access data encrypted by TDE, so you should make sure you back it up:

BACKUP CERTIFICATE MyTDECert   
TO FILE = 'C:\MyTDECert'  
WITH PRIVATE KEY   
(  
    FILE = 'C:\MyTDECert_PrivateKeyFile',  
    ENCRYPTION BY PASSWORD = 'UseAStrongPasswordHereToo!£$7'  
);

Then, in the database being encrypted with TDE you must create a Database Encryption Key (DEK) and specify the certificate:

USE TestTDE;

CREATE DATABASE ENCRYPTION KEY WITH ALGORITHM = AES_256
ENCRYPTION BY SERVER CERTIFICATE MyTDECert;

Finally, you turn encryption on:

ALTER DATABASE TestTDE SET ENCRYPTION ON;

And that’s all there is to it in practice.

A potential problem is that it is easy to set this up without really understanding it. Maybe that’s fine in many cases, but can you be sure that there’s nothing that can go wrong and have confidence that whatever the scenario, you will be able to get your data back? And can you be sure that your data is properly protected.

To gain that level of surety and to have confidence, I think it’s best to understand this in a bit more detail. In particular, I think it’s good to understand why each step is required and how the objects created are used.

So let’s go through those steps again in more detail.

Creating the Database Master Key (DMK)

CREATE MASTER KEY
ENCRYPTION BY PASSWORD = 'UseAStrongPasswordHere!£$7';

A DMK is used to protect other keys that are created in the database. It does this by encrypting them and only the encrypted value is stored. You can only have one DMK in a database.

The DMK itself is also stored encrypted, you can see when we created it we specified a password to encrypt it by. SQL Server also makes separate copy of the key encrypted by the Service Master Key (SMK). The SMK is the root level Key in SQL Server. This additional copy of the DMK means that SQL can access the actual value of your DMK without you having to specify the password again.

Creating the Certificate

CREATE CERTIFICATE MyTDECert 
WITH SUBJECT = 'Certificate used for TDE in the TestTDE database';

The certificate is going to be used in the next step down – to protect the Database Encryption Key (DEK) in your TDE enabled database. When you create a certificate, it contains an asymmetric key that can be used for encryption. An asymmetric key includes a public key that can be used to encrypt data and a private key that must be used if you want to decrypt data – that private key gets automatically protected (encrypted) by the DMK.

A logical question to ask is why we need the certificate? Why couldn’t we just protect the DEK in our TDE enabled database with the DMK from the master database directly?

Imagine the scenario that you need to migrate your database to another SQL Server instance. We can do this but we will need also to migrate the object that was used to protect/encrypt the DEK – which is itself stored in the database.

If TDE used the DMK to protect that then we would need to migrate the DMK to the new instance. But what if the new instance already had a DMK in the master database and objects that it was used to protect – such as other databases using TDE. At this point we would be stuck, we can’t migrate our DMK without overwriting the one that’s there, so we would have a choice, enable encryption for the migrated database but break it fore the existing ones, or vice versa.

Neither is a good option, but by having a certificate we can migrate that happily as we can have as many certificates as we want.

This also gives us the option that where we have multiple databases encrypted by TDE we could use a separate certificate for each. That means if one certificate is breached the others could remain protected.

This does raise a good point though that you may want one day to migrate your certificate, so call it something more meaningful and unique than “MyTDECert”.

Backing up the certificate

BACKUP CERTIFICATE MyTDECert   
TO FILE = 'C:MyTDECert'  
WITH PRIVATE KEY   
(  
    FILE = 'C:MyTDECert_PrivateKeyFile',  
    ENCRYPTION BY PASSWORD = 'UseAStrongPasswordHereToo!£$7'  
);

We backup the certificate in case we ever need to move or restore our database to a different server or SQL instance. It’s critical if we want to be able to access our encrypted data.

When you back it up, you specify a password to encrypt the key. What happens is SQL grabs the stored version of the Private Key (which is encrypted by the DMK) decrypts it, then re-encrypts it with the password. This means that you would be able to restore it to a different SQL instance where the DMK didn’t exist.

This covers us against the scenarios explained above regarding why we use a certificate rather than just relying on the DMK. It should also make it clear that if we need to migrate or recover the database all we need is:

The database backup
The certificate and key backups and the password used when creating them

Creating the Database Encryption Key (DEK)

CREATE DATABASE ENCRYPTION KEY WITH ALGORITHM = AES_256
ENCRYPTION BY SERVER CERTIFICATE MyTDECert;

In this, almost the final step, we create the actual key that will be used to encrypt the data. It’s stored in the database, but encrypted by the key associated with the certificate created in the previous step. The DEK is a symmetric key, i.e. the same value is used to encrypt or decrypt data.

It’s logical to ask why we don’t just use the private key from the certificate to encrypt the data directly. This is a bit more difficult to justify than the previous scenario with the DMK versus use of a certificate. It is certainly feasible that TDE could have been implemented this way.

One consideration is that the certificate is created with an asymmetric key – these are easier to work with in some ways as we only need the public key to encrypt data so can keep the private key concealed most of the time. Asymmetric encryption however is slower that symmetric encryption so to reduce the performance impact of TDE we want to use a symmetric key.

The concept of a DEK was new in SQL 2008 and created specifically for TDE. It makes sense that if we are to have a separate DEK then it should be stored in the database itself. That way migration/recovery is eased as it minimises the number of objects required.

It’s worth noting that you can only have one DEK in each database.

Enabling Encryption

ALTER DATABASE TestTDE SET ENCRYPTION ON;

In sys.databases you can see which databases have TDE turned on by looking at the is_encrypted column:

SELECT name
FROM sys.databases
WHERE is_encrypted = 1;

TDE sys_databases

We can find more details in the sys.dm_database_encryption_keys server view. Let’s query looking at some particular columns of interest:

SELECT
   d.name,
   k.encryption_state,
   k.encryptor_type,
   k.key_algorithm,
   k.key_length,
   k.percent_complete
FROM sys.dm_database_encryption_keys k
INNER JOIN sys.databases d
   ON k.database_id = d.database_id;

Here’s what I see after I created my DEK but before I enable encryption:

We can see information about the DEK. We also see encryption state which describes the current state of the database. The main values you’ll see are:
1 = Unencrypted
2 = Encryption in Progress
3 = Encrypted

If I now enable encryption on this database and run the query again:

TDE_State2

We see that both my database and the TempDB database are now encrypted.

We also see the percent_complete column, which confusingly says zero. This column only has meaning when a state change is occurring. So, if the encryption state was 2 – then we would see a value here whilst the database was in the process of being encrypted. Here my database only had one row, so it was fairly instantaneous to flip encryption on.

This column becomes relevant when we are encrypting an existing database that has a reasonable amount of data, we’ll look at that next:

Encrypting an existing database with TDE

What is Transparent Data Encryption?

March 29, 2021May 10, 2023 ~ Matthew McGiffen ~ 2 Comments

Transparent Data Encryption (TDE) was introduced in SQL 2008 as a way of protecting “at rest” data. It continues to be available in all versions of SQL right up until the present, until recently it was only available in the Enterprise editions of SQL Server but from SQL 2019 it was made available in standard edition.

When we talk about “at rest” data we are referring to data that has been written to disk. In terms of our SQL databases that includes:

Any data files for our database
Any log files for our database
All backup files for the database, be they Full, Log or Differential backups
Database snapshot files
Any data written to disk in the TempDB database

The last item in that list, TempDB, needs to be included for completeness. Imagine that you query your database and as part of the query execution TempDB is used. If that data was written to disk then that creates a hole in our protection, someone could potentially read or copy the TempDB files and might see some of the data we are trying to protect. As a result when you enable TDE against any database on your SQL Server instance, the TempDB database is automatically encrypted as well to prevent this from happening.

Data “at rest” of course doesn’t include the following things:

Data loaded/stored in memory (buffer pool)
Data returned from a query and being passed across the network
Data received by a client as a result of a query

If you want to cover those scenarios as well then you need to look at other forms of encryption e.g. TLS and Always Encrypted.

There are also some less obvious exceptions which occur where SQL doesn’t use the buffer pool – and therefore there isn’t an in-memory version of the data:

Filestream data
Data persisted to disk using Buffer Pool Extensions

And there are a couple of other exceptions that can occur in certain circumstances:

Where the buffer pool gets paged to disk due to memory pressure
SQL dump files when there is a crash

That’s summarised in the below diagram:

TDE Encrypted vs Unencrypted

TDE mainly uses standard encryption protocols based on AES (Advanced Encryption Standard). When you set up TDE you can specify which AES algorithm you wish to use, AES_128, AES_192 or AES_256. In each case the number specifies the length of the key to be used for encryption in bits.

Obviously the longer your key, the harder the encryption should be to crack, however even for AES_128, estimations of how long it would take to break down the key by brute force vary between a thousand years, to numbers many times greater than the age of the universe – trillions of years. The difference is based on how we anticipate processing power to grow in the future. Even with the lowest estimates AES_128 should be sufficient in most scenarios but most people seem to go for AES-256 which should take the same time squared to be beaten.

Up to 2016, SQL also supported the TRIPLE_DES_3KEY encryption protocol. This is now generally not considered to be as secure as AES, and from SQL 2016 its use is deprecated. So, it is best if you stick to AES even if you are on a SQL version where DES is an option.

Let’s have a look at contents of some SQL data files so you can see the difference with and without TDE. I’ve created a database with a single table and inserted a row of data:

CREATE DATABASE TestTDE;
USE TestTDE;
CREATE TABLE dbo.SomeData(Id INT IDENTITY(1,1), SomeText VARCHAR(255));
INSERT INTO dbo.SomeData (SomeText) VALUES('This is my data');

I’ll close my connection from the database, and detach it so I can open the files in a Hex Editor. Then I search for my text in the data file:

TDE_MDF_File_Unencrypted

As you can see the data is stored clear as day in the data file.

Now let’s look at the same data file once TDE has been enabled. This time if I search for my string it’s not found and my data looks like this:

TDE_MDF_File_Encrypted

Even where the previous file was all zeros where there was free space at the end, the encrypted version also has those encrypted:

TDE_MDF_File_Encrypted_End

TDE Works by using an encryption key that is stored in the database being encrypted – but that key is itself stored encrypted by an object outside of the database. We’ll see all the various objects involved when we look at setting up TDE next:

Setting up Transparent Data Encryption (TDE)

More articles on TDE:

Encrypting an existing database with TDE
Understanding Keys and Certificates with Transparent Data Encryption (TDE)
How Secure is Transparent Data Encryption (TDE) – and How to Prevent Hacking

TDE in Standard Edition on SQL 2019

November 5, 2019May 10, 2023 ~ Matthew McGiffen ~ 1 Comment

Recently, Microsoft quietly let us know that TDE (Transparent Data Encryption) will be available in the Standard Edition of SQL Server 2019. If you don’t follow SQL topics on Twitter then it would have been easy to have missed that.

Transparent Data Encryption is the ability to have all your data stored encrypted on disk – otherwise known as encryption at rest. This is data files, log files and backups. TDE allows this without you having to change anything in your applications or code (thus the transparent part).

This is big news. Given the upsurge in awareness of data protection – largely driven by GDPR and other data regulation – encryption is a hot topic.

Management in most organisations tend to be keen to have encryption at rest – especially if it doesn’t cost much, and is easy to implement. It doesn’t always achieve as much as people might think – but it’s still very much a desirable feature.

For us as SQL Server DBAs I can see this change meaning a couple of things. The first is that there’s likely to be a push to upgrade to SQL Server 2019 much sooner than we might have seen with previous versions. It’s often having a “killer feature” that pushes upgrades. For SQL 2019 – if you run on Standard Edition – I believe this is that killer feature.

The other thing it means is that if you haven’t worked with TDE before, you’re going to want to get knowledgeable about it. It’s very easy to set up, but you’re going to want to understand the potential pitfalls, issues around managing it, how it’s going to affect performance. What exactly it’s protecting you from, and additional steps (that may not be obvious) you should take to make it more secure.

If you understand these things before you have to think about implementing TDE then you’ll be able to plan a smooth implementation – and be able to advise management from a strong knowledge base.

I tried to cover all these topics on this blog previously, so here’s some links. I also have some new stuff coming up about TDE.

What is TDE?
Setting up TDE
Encrypting an Existing Database with TDE
Understanding Keys and Certificates with TDE
How Secure is TDE – And how to Prevent Hacking
Thoughts on Query Performance with TDE Enabled
Migrating or Recovering a TDE Protected Database
Recovering a TDE Protected Database Without the Certificate
TDE – Regenerating the Database Encryption Key
Rotating TDE Certificates Without Having to Re-Encrypt the Data

Checking for Membership of a Specific Active Directory Group

October 2, 2018May 10, 2023 ~ Matthew McGiffen ~ 1 Comment

As part of my job I manage a bunch of SQL instances for Development and Test.

Access is managed though Active Directory groups, so I rarely have to do anything regards managing permissions. Nonetheless I often get requests from people to give them access. This is usually for a new starter or someone who has moved from one team to another.

Of course, the answer is usually that they just need adding to the right AD group. Rather than assume though, I always get them to check before I pass the request on to the AD team. You never know, there could be something else wrong.

T-SQL has a lovely little function for this, IS_MEMBER. For instance, If I want to know if I’m a member of MyDomain\SQLAdmins I just run:

SELECT IS_MEMBER('MyDomain\SQLAdmins');

If it returns 1 then I am a member. Zero then I am not. Null means it can’t find the group, probably because I’ve spelt it wrong.

So, I can get the requester to check for themselves if they are a member of the group, and then we can raise the request to get them added.

IS_MEMBER is also useful if you want to check if you are a member of a specific database role – either one of the built-in ones or a user-defined one e.g.

SELECT IS_MEMBER('db_owner');

I’m not sure how well known this is and I just had one of those requests, so I thought I’d create a quick post on the topic.

RDS encryption vs TDE

April 17, 2018February 18, 2021 ~ Matthew McGiffen ~ Leave a comment

If you’re starting to use cloud services, the number of options available can be confusing. Particularly when they may seem to do the same thing.

If you’re on AWS, and you want to create a SQL Server instance on RDS (Relational Database Service), then you potentially have a couple of different options for enabling encryption at rest.

If you’re deploying an Enterprise Edition SQL Server instance then you could use TDE (Transparent Database Encryption), the technology most of us in the SQL Server world already have some awareness of.

RDS also has its own at-rest encryption though, so what’s the difference?

The answer is that (at least in terms of what they protect) they are pretty much equivalent. RDS encryption can be used with whichever database platform you choose to use – Aurora, MySQL, MariaDB, PostgreSQL, Oracle or SQL Server. As many people want this functionality it made sense for Amazon to provide it.

Only Oracle and SQL Server have their own built-in equivalent – TDE – so in these cases you have a choice of which want you want to use. You may prefer to use the one you are familiar with from your on-premise deployments, or you may prefer to go with the RDS one.

A key difference is going to be database backups. RDS handles backups for you, and of course these backups will be encrypted whichever option you choose. However, you would be wise to also take your own set of backups which you store outside the RDS instance. With RDS encryption these backups would not be encrypted, however with TDE they would be.

Even this though is not a killer point, with SQL Server (from 2014 onward) you have backup encryption, so even if you were using RDS, you could use this to make sure that externally stored backups were also encrypted.

A big reason you might want to use the RS encryption is price. TDE is only available on SQL Server Enterprise Edition, whereas you can use RDS encryption on Standard Edition also.

UPDATE (03/12/2019): From SQL Server 2019 TDE is available in standard edition, so price is no longer a factor.

The difference in cost for SQL Server Standard vs Enterprise licences is significant. Last time I checked the standard price was around $2,000 dollars per CPU core for Standard, but $7,000 with Enterprise.

The price difference in RDS is also large. If I look at a “db.m4.xlarge” instance which is 4 virtual CPUs and 16GB RAM, then the price quoted is roughly $750 dollars a month for SQL Server Standard Edition, $1,650 for Enterprise.

Of course, there are differences between each technology in terms of how you set it up and manage it, how and where the keys are stored etc. But if you’re confronted with the choice, then mostly, you can just pick the one you prefer, it doesn’t really matter.

What does matter is that if you prefer not to pay for Enterprise Edition, then you still have the option of at-rest encryption. Which is great news.

TDE – Regenerating the Database Encryption Key

April 3, 2018February 18, 2021 ~ Matthew McGiffen ~ 4 Comments

As discussed in the last post, with TDE (Transparent Data Encryption), you generally do not need to change the Database Encryption Key (DEK) as it is separately protected by a certificate in the master database and you can rotate the certificate periodically.

Rotating TDE Certificates without re-encrypting data

As also mentioned, if you did need to change the DEK then this will be a resource intensive operation as SQL Server will need to decrypt all your existing data using the existing DEK value, then re-encrypt it with the new one.

Nonetheless, there may be scenarios when you wish to do this. There could be a security breach where you don’t feel confident unless all the keys are updated. Or you might need to support a higher level of encryption than had been previously used.

More commonly, you may publish a database from a development environment and wish to regenerate the keys so that there are unique to your production environment.

In either case, the process is simple and transparent from your point of view, you issue the ALTER DATABASE ENCRYPTION KEY command, specifying REGENERATE and the encryption algorithm you wish to use:

ALTER DATABASE ENCRYPTION KEY
REGENERATE WITH ALGORITHM = AES_256;

In the previous post I talked about rotating the certificate, and it’s worth noting that you can perform both these actions at the same time e.g.

ALTER DATABASE ENCRYPTION KEY
REGENERATE WITH ALGORITHM = AES_256
ENCRYPTION BY SERVER CERTIFICATE MyTDECert_with_longevity

Where you have a lot of data in the database this could take a while to complete. You can check the status of the operation by looking in the dm_database_encryption_keys view:

SELECT k.encryption_state, k.create_date, k.regenerate_date,k.percent_complete
FROM sys.dm_database_encryption_keys k
INNER JOIN sys.databases d
ON k.database_id = d.database_id
WHERE d.name = ‘TestTDE’

TDE_Regenerate1

Note that the encryption_state is set to 4 which means “key change in progress”.

In a brief test I compared the time taken to change the key with the time taken to encrypt a fresh un-encrypted database. Surprisingly it was about the same, I had expected it would be longer due to the need to decrypt and re-encrypt each data item.

Rotating TDE Certificates without re-encrypting data

March 28, 2018February 18, 2021 ~ Matthew McGiffen ~ 13 Comments

I talked previously about why we have each of the layers in the encryption hierarchy used to support TDE (Transparent Data Encryption).

Understanding Keys and Certificates with Transparent Data Encryption (TDE)

At the time I mused about why we need the encryption key in the database itself, rather than just encrypting with the keys associated with the certificate stored in the master database. I mentioned one reason – that these keys are intended for asymmetric encryption which has a higher overhead (the DEK is a symmetric key).

In a light-bulb moment, the more important reason has come to me since, and that is about key rotation.

In any form of encryption, it is possible that at some point an individual could gain access to the keys. For example, administrators of your SQL Server instance can freely make a backup of the certificate with a password they choose which can be used to encrypt it. Whilst that in itself is not a problem, it is good practice to periodically refresh those keys so that if someone gained access to an old copy it would be of no use to them. Or if you are aware that there has been a potential breach of security, you can quickly replace your keys.

If you have any form of security certification in your organisation, then it is likely it will be stipulated that you must change these in a defined timescale- e.g. annually.

However, this forms a problem if you want to change the key that is used to actually encrypt the data, to change that you would need to decrypt all the data using the old key and re-encrypt it using the old one. That would be an intensive activity, but also could lead to vulnerability while the process was underway.

As a result, many forms of encryption use separation between the actual key that is used for the encryption, and another object that is used to protect that key. You can back up the latter separately outside the database to ensure you can recover your data in case of a failure.

With TDE we have the database encryption key (DEK) – the key that is actually used to encrypt/decrypt the data, stored encrypted in the database itself. In the normal run of things, we never access this key directly and never store it anywhere else, so it should be fairly safe, i.e. we should rarely have to worry about replacing it – though the functionality exists to allow us to do so if we require.

Then we have the certificate which is used to encrypt that key – without the certificate the key is useless and TDE can’t work. The certificate we keep a password protected backup of, so is a bit more vulnerable/public.

Due to that, certificates have an expiry date. This date is a reminder to us that, as a good practice, we should create a new certificate and use that going forward before the existing one expires. TDE doesn’t stop working if the certificate expires, it is up to you to monitor your certificates and replace them when they come to the end of their life.

If you think about it, we might not like it if TDE did enforce this, it would be a big impact if we suddenly could no longer read or write to our database. Of course, MS could probably implement functionality to lock us out, but still allow us to use the certificate for rotating the key, but still it would probably upset a lot of people and be less than “transparent”.

If we look at the sys.certificates view we can find our TDE certificate and examine the expiry date:
SELECT name, subject, expiry_date
FROM sys.certificates
WHERE name = ‘MyTDECert’;

TDE_Rotate1

I didn’t specify an expiry date for this certificate when I created it, so it was automatically given one that was a year in the future.

Let’s create a new certificate, and this time we’ll specify a longer expiry. Then we rotate the encryption to use that one.
CREATE CERTIFICATE MyTDECert_with_longevity
WITH SUBJECT = ‘Certificate used for TDE in the TestTDE database for years to come’,
EXPIRY_DATE = ‘20201231’;

And let’s have a look at the values that has in the store:

What do we mean by rotation?
Rotation is the process of moving from the old certificate to the new one. In this case all that happens is that the encrypted value of the database encryption key (stored in our TDE protected database) is decrypted with the old certificate, re-encrypted with the new certificate, and that new encrypted value is stored in the database, overwriting the old one.

The Key value itself hasn’t changed, just the object protecting it, and as such we can still read/write data from the database without any additional work.

To rotate the certificate used is as simple as:
USE TestTDE;

ALTER DATABASE ENCRYPTION KEY
ENCRYPTION BY SERVER CERTIFICATE MyTDECert_with_longevity;

And you’re done. The operation is almost instantaneous, shouldn’t require any database down time, or create additional overhead on your server. Though in production I’d still do it when things are quiet just in case anything goes wrong!

Matthew McGiffen Data

Security

Implementing Temporal Tables Where You Have Existing Data

What if I use SQL Server Data Tools (SSDT) to make my change?

Things to look out for while TDE is encrypting existing data

Monitoring for Problems

What if you run into any performance problems during the scan?

What if the Encryption Scan Fails?

Taking Backups While Encryption is in Progress

The Cost of not Following the Regulations Around Data

Setting up Transparent Data Encryption (TDE)

Creating the Database Master Key (DMK)

Creating the Certificate

Backing up the certificate

Creating the Database Encryption Key (DEK)

Enabling Encryption

More articles about TDE

What is Transparent Data Encryption?

TDE in Standard Edition on SQL 2019

Checking for Membership of a Specific Active Directory Group

RDS encryption vs TDE

TDE – Regenerating the Database Encryption Key

Rotating TDE Certificates without re-encrypting data