Recovering a TDE protected database without the Certificate

If you’ve been careful and done everything right when you’re setting up TDE then you shouldn’t run into this problem.

We all make mistakes though, and we’ve all been asked to deal with environments that haven’t been so carefully managed.

But what if you do? You have access to the backups for one or more TDE protected databases, but you don’t have the certificate and private key backups – or you don’t have the password to decrypt them.

I’m assuming here that you also can’t simply recover your old server from a complete file system backup. Obviously if you can do that then you are going to be fine.

If the two following things are true, then you can still recover your database:

  • You have a backup of the master database from the previous instance.
  • The previous instance used a domain account as its service account.

The reason you are going to be okay is that all the objects in the SQL Server Encryption Hierarchy that sit above the Database Encryption Key (that exists in your TDE database) are stored in the  master database. That is until we get right to the top, the Service Master Key (SMK) which exists in the master database is itself encrypted. There are two copies of it:

  • One encrypted by the machine account
  • Once encrypted by the SQL Server service account

The first copy is only going to be of any use to us if we can recover the old machine (and its account) direct from backups but we’ve already ruled that out.

If the service account is a domain account though then we should be able to use it.

The method is going to involve:

  • Setting up a new SQL instance using the same service account as the old instance
  • Restore your backup of master from the old instance onto the new instance

My personal opinion is that it’s not the greatest of ideas to restore the master database from one instance onto a new one and expect everything to simple work okay. So I’m only suggesting you use this so you can recover the certificate. Once you’ve got that, I would go back to the steps in the previous post [LINK] for recovering your TDE protected database(s).

  • Reboot your new server – that’s the whole server, not just SQL.
  • Backup your certificate and private key – and don’t lose them this time!

That’s fairly straightforward, but let’s just go into a little more detail.

Setting up a new SQL instance using the same service account as the old instance

What this obviously means is that your server must be on the same domain as the old server (or at least another domain that is trusted). You also must have the credentials for the service account.

You can’t fake this, for example setting up a new account on another domain called the same thing as the old one. The new account won’t have the same keys associated with it as the one’s used to encrypt your SMK, so you will achieve nothing.

Restore your backup of master from the old instance onto the new instance

There are a lot of resources out there that that tell you how to do this in detail such as Thomas LaRock’s post here:

https://thomaslarock.com/2014/01/restore-the-master-database-in-sql-server-2012/

In short you need to first stop your new SQL Server instance and then from a command prompt start it in single user mode e.g.

sqlservr.exe -c -m -s {InstanceName}

Then you need to (again from a command line) issue the command to restore/overwrite the master database. First start SQLCMD:

C:\> sqlcmd -s {InstanceName}

Then at the prompt that opens up within your command window:

1> RESTORE DATABASE master FROM DISK = ‘C:\Test\master.bak’ WITH REPLACE;

2> GO

Reboot your new server –the whole server, not just SQL

If you restart SQL before doing this, you can still go in and everything looks okay. You can even restore a TDE database from your old instance and you’ll find you can access the data.

Everything is not okay though, and if you tried to backup your certificate and private key you would get an error:

Msg 15151, Level 16, State 1, Line 7
Cannot find the certificate ‘MyTDECert’, because it does not exist or you do not have permission.

The reason for this error is that the SMK isn’t in the correct state. The copy that is encrypted by the service account is fine, but the copy that is encrypted by the machine account is currently using the wrong machine account. You need to reboot the whole server to fix this, just restarting SQL doesn’t do it. On a full restart the SMK is retrieved from the copy encrypted by the service account, and then encrypted with the current machine account. That version then replaces the one using the wrong machine account.

Once that’s done the encryption hierarchy is fully fixed, and the certificate becomes accessible for a backup command.

Backup your certificate and private key – and don’t lose them this time

I’ve given the command to backup these a few times, but here it is again:

BACKUP CERTIFICATE MyTDECert   
TO FILE = 'C:\Test\MyTDECert'  
WITH PRIVATE KEY   
(  
    FILE = 'C:\Test\MyTDECert_PrivateKeyFile',  
    ENCRYPTION BY PASSWORD = 'UseAStrongPasswordHereToo!£$7'  
);  
GO  

You can now take those backup files and use them to restore the certificate and key to the SQL Server instance of your choice, and then restore the backups of your TDE protected database(s).

Not losing these backups – or the password – is a serious issue. If you’re responsible for setting up any form of encryption you need to think about the process that’s going to manage the objects used to protect your data. People move from one role to another, from one company to another, and often things tick along happily for many years before a failure happens.

You need to be confident that come next year, or in five or ten years, whoever is responsible for the data will be able to recover it if the worst happens.

Other articles about TDE:

What is Transparent Data Encryption?

Setting up Transparent Data Encryption (TDE)

Encrypting an existing database with TDE

Understanding Keys and Certificates with Transparent Data Encryption (TDE)

How Secure is Transparent Data Encryption (TDE) – and How to Prevent Hacking

Migrating or Recovering a TDE protected Database

TDE and backup compression – still not working?

Migrating or Recovering a TDE protected Database

When encrypting a database with Transparent Data Encryption(TDE), a vital consideration is to make sure we are prepared for the scenario when something goes wrong. For instance, if the server hosting our SQL instance goes belly-up, can we recover the data that we have encrypted with TDE?

In the ordinary recovery scenario, we would make sure that we have appropriate backups of our database, and that they (or copies of them) are stored off the server itself so that we can access them in case of a failure.

If you have followed the instructions in Setting up Transparent Data Encryption (TDE) then you will also have a backup of the certificate and private key used to protect the database encryption key used by TDE e.g:

BACKUP CERTIFICATE MyTDECert   
TO FILE = 'C:\Test\MyTDECert'  
WITH PRIVATE KEY   
(  
    FILE = 'C:\Test\MyTDECert_PrivateKeyFile',  
    ENCRYPTION BY PASSWORD = 'UseAStrongPasswordHereToo!£$7'  
);  
GO  

You need to make sure that these are also stored securely off the server and that you have kept the password you used somewhere you can access it – but not so accessible that unauthorised users can get it otherwise you are defeating the object of TDE somewhat.

In summary you need:

  • The database backup file
  • The backup of the certificate
  • The backup of the private key
  • The password used to encrypt the private key

Armed with those objects, you are equipped to restore your database to another SQL Instance.

Working on the new SQL instance, the steps are straightforward.

Create a Database Master Key if one doesn’t exist

USE MASTER;
CREATE MASTER KEY
ENCRYPTION BY PASSWORD = 'UseAStrongPasswordHere!£$7';

Note that this will be a new and unique database master key, it will not be the same as the one you had on your old instance – and you don’t need to use the same password to protect it.

Restore the Certificate and Private Key

On the new SQL instance you need to restore the certificate and private key into the master database:

USE MASTER;

CREATE CERTIFICATE MyTDECert
FROM FILE = 'C:\Test\MyTDECert.cer'
WITH PRIVATE KEY 
( 
   FILE = 'C:\Test\MyTDECert_PrivateKeyFile.pvk',
   DECRYPTION BY PASSWORD = 'UseAStrongPasswordHereToo!£$7' 
);

This will decrypt your key using the password supplied, and then re-encrypt it using the database master key you created. Then the certificate and its key will be stored in the master database on your new SQL instance.

If you’ve done something wrong, it’s entirely possible you may get an error at this stage, commonly:
Msg 15208, Level 16, State 6, Line 56
The certificate, asymmetric key, or private key file is not valid or does not exist; or you do not have permissions for it.

If you’re confident that all details specified are correct, and that the certificate and private key were backed up properly, then the most likely issue is that the current SQL instance doesn’t have access to the file path you’ve placed the files in.

Restore the Database

Once you’ve completed the previous steps you are ready to restore the database(s) from the backup(s). You do that as you would restore any other database. Potentially as simple as:

RESTORE DATABASE TestTDE FROM DISK = 'C:\Test\TestTDE.bak';

Then you’ll find you can access your database and view data without any issues. At this point you can celebrate – you are done.

You only get a problem if you haven’t set up the certificate and key correctly, or you have the wrong one:
Msg 33111, Level 16, State 3, Line 2
Cannot find server certificate with thumbprint ‘0x682C8797633B9AD8875967502861CCAE33ECAD66’.
Msg 3013, Level 16, State 1, Line 2
RESTORE DATABASE is terminating abnormally.

So what do I do if I can’t restore the certificate?

Of course you’re never going to run into this problem because you’ve followed all the instructions carefully, and you’ve made sure you have your certificate and key backups – and the password used to protect them.

Let’s say for the sake of argument though that you’ve taken ownership of an environment that hasn’t been so carefully managed. You’ve had a server failure and there are no certificate and key backups – or they exist but no-one knows the password.

Is your data lost forever? Or rather is it now so safe that no-one can access it – even those who are supposed to? Don’t panic just yet, we’ll look at a technique you may be able to use to recover your data in my next blog post.

Other articles on TDE:

What is Transparent Data Encryption?

Setting up Transparent Data Encryption (TDE)

Encrypting an existing database with TDE

Understanding Keys and Certificates with Transparent Data Encryption (TDE)

How Secure is Transparent Data Encryption (TDE) – and How to Prevent Hacking

 

Encrypting an existing database with TDE

As mentioned previously, the process of setting up TDE (Setting up Transparent Data Encryption (TDE)) is the same whether you’ve just set up a new database, or whether you’re working with a live database. Once you turn encryption on SQL Server will begin the process of encrypting any data in your database. Be that 1 row, or be that terabytes of data.

I’m going to load up my database with about 1 GB of data so we can get an idea of how long this process takes.

CREATE DATABASE TestTDE;
USE TestTDE;
CREATE TABLE dbo.SomeData(Id INT IDENTITY(1,1), SomeText VARCHAR(255));

INSERT INTO dbo.SomeData (SomeText) 
SELECT TOP 1000000 
('XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')
FROM sys.objects a
CROSS JOIN sys.objects b
CROSS JOIN sys.objects c
CROSS JOIN sys.objects d;
GO 10

I then go through all the steps mentioned previously to set up encryption:

Setting up Transparent Data Encryption (TDE)

After the last step:

ALTER DATABASE TestTDE SET ENCRYPTION ON;

I run a query to report the progress:

DECLARE @state tinyint;
DECLARE @encyrption_progress 
    TABLE(sample_time DATETIME, percent_complete DECIMAL(5,2))

SELECT @state = k.encryption_state
FROM sys.dm_database_encryption_keys k
INNER JOIN sys.databases d
   ON k.database_id = d.database_id
WHERE d.name = 'TestTDE';

WHILE @state != 3
BEGIN
   INSERT INTO @encyrption_progress(sample_time, percent_complete)
   SELECT GETDATE(), percent_complete
   FROM sys.dm_database_encryption_keys k
   INNER JOIN sys.databases d
      ON k.database_id = d.database_id
   WHERE d.name = 'TestTDE';


   WAITFOR delay '00:00:01';

   SELECT @state = k.encryption_state
   FROM sys.dm_database_encryption_keys k
   INNER JOIN sys.databases d
      ON k.database_id = d.database_id
   WHERE d.name = 'TestTDE'; 
END

SELECT * FROM @encyrption_progress;

I’m sampling every second, let’s look at the results:

TDE_Progress

You can see that the encryption took about 30 seconds to complete for (just under) 1GB of data. This isn’t intended to be a formal benchmark, but rather just to give you an idea of the order of magnitude of time this might take to encrypt your own databases.

Let’s say – just for the same of argument – that this scaled up linearly on a given system and you wanted to encrypt a 1TB database. Then it might take as long as 500 minutes (8 hours and 20 minutes).

The encryption occurs as a background process, but it will take some resources while it runs, so if you are implementing this against a system with large databases where performance is critical then you will want to either run it in a period of quiet – or down time, or you will want to monitor to check that encryption isn’t impacting your system too much. Experience suggests that it shouldn’t be a problem unless your server is already under strain.

There are a few things to look out for if you are monitoring during the background encryption process:

  • CPU and IO, both these could take a hit
  • You may want to look out for blocking caused by encryption, you can do this by checking the sys.dm_tran_locks view where the resource_subtype is “ENCRYPTION_SCAN”
  • Monitor transaction log usage with DBCC LOGINFO. While the encryption scanner is running the transaction log can’t be truncated and VLFs marked for re-use. This could mean the transaction log might grow larger than normal, so you need to watch out if you are constrained for disk space.

What if you run into any performance problems during the scan?

First things first – DON’T TURN ENCRYPTION OFF!

If you turn encryption off, i.e.

ALTER DATABASE TestTDE SET ENCRYPTION OFF;

This isn’t going to stop the encryption scan, rather it’s just going to change direction. So now it will be decrypting everything it’s just encrypted, that’s likely to have just as much impact, and then sooner or you’re going to have to start again.

There is no “ALTER DATABASE TestTDE SET ENCRYPTION PAUSE;” command. There is however a trace flag (5004) that achieves the same thing.

If you enable the trace flag it will disable the encryption scanner:

DBCC TRACEON(5004);

When you do so, if you then query dm_database_encryption_keys you will see the database is set to a state of 2 (Encryption in progress) and the percent_complete column will show zero – as nothing is actually currently in progress.

When you wish to begin the scan again, you need to disable the traceflag and then set encryption on again (even though it’s not actually off):

DBCC TRACEOFF(5004,-1);
ALTER DATABASE TestTDE SET ENCRYPTION ON;

When you do this the scanner will pick up where it left off, i.e. if it had got to 50%, it will then continue from there.

It’s important to note that the traceflag doesn’t actually turn TDE off. In fact if you add new data, or update data and it get’s written to disk while the traceflag is enabled, the data will still become encrypted. It just stops the background process of converting all the data already on disk.

If performance impact is an issue for you with the scanning process, you can use the traceflag to incrementally run the scan out of hours, pausing it when you want to ensure maximum performance of your systems.

The traceflag seems to exist for internal use by SQL if the encryption scanner runs into a problem, for instance if it encounters corruption in your database files then it will halt and the traceflag will be enabled.

This could be useful to know if you find you tried to turn TDE on for a database but it’s stuck at “Encryption in Progress” but the percent_complete remains set to Zero. Check to see if the trace flag is enabled, and if you didn’t enable it then you may want to check for corruption in your database.

More Articles on TDE

What is Transparent Data Encryption?

Setting up Transparent Data Encryption (TDE)

Understanding Keys and Certificates with Transparent Data Encryption (TDE)

How Secure is Transparent Data Encryption (TDE) – and How to Prevent Hacking

Setting up Transparent Data Encryption (TDE)

You can set up Transparent Data Encryption (TDE) when you first create a database, or you can apply it to an existing database. In the latter case, once TDE has been enabled it will set to work encrypting your existing data in the background.

In either case the steps are the same. We’ll run through those quickly before going into more detail.

First you must have a Database Master Key (DMK) in the Master database, and a certificate that will be used by TDE:

USE MASTER;

CREATE MASTER KEY
ENCRYPTION BY PASSWORD = 'UseAStrongPasswordHere!£$7';
CREATE CERTIFICATE MyTDECert 
WITH SUBJECT = 'Certificate used for TDE in the TestTDE database';

This certificate is critical to you being able to access data encrypted by TDE, so you should make sure you back it up:

BACKUP CERTIFICATE MyTDECert   
TO FILE = 'C:\MyTDECert'  
WITH PRIVATE KEY   
(  
    FILE = 'C:\MyTDECert_PrivateKeyFile',  
    ENCRYPTION BY PASSWORD = 'UseAStrongPasswordHereToo!£$7'  
);

Then, in the database being encrypted with TDE you must create a Database Encryption Key (DEK) and specify the certificate:

USE TestTDE;

CREATE DATABASE ENCRYPTION KEY WITH ALGORITHM = AES_256
ENCRYPTION BY SERVER CERTIFICATE MyTDECert;

Finally, you turn encryption on:

ALTER DATABASE TestTDE SET ENCRYPTION ON;

And that’s all there is to it in practice.

A potential problem is that it is easy to set this up without really understanding it. Maybe that’s fine in many cases, but can you be sure that there’s nothing that can go wrong and have confidence that whatever the scenario, you will be able to get your data back? And can you be sure that your data is properly protected.

To gain that level of surety and to have confidence, I think it’s best to understand this in a bit more detail. In particular, I think it’s good to understand why each step is required and how the objects created are used.

So let’s go through those steps again in more detail.

Creating the Database Master Key (DMK)

CREATE MASTER KEY
ENCRYPTION BY PASSWORD = 'UseAStrongPasswordHere!£$7';

A DMK is used to protect other keys that are created in the database. It does this by encrypting them and only the encrypted value is stored. You can only have one DMK in a database.

The DMK itself is also stored encrypted, you can see when we created it we specified a password to encrypt it by. SQL Server also makes separate copy of the key encrypted by the Service Master Key (SMK). The SMK is the root level Key in SQL Server. This additional copy of the DMK means that SQL can access the actual value of your DMK without you having to specify the password again.

Creating the Certificate

CREATE CERTIFICATE MyTDECert 
WITH SUBJECT = 'Certificate used for TDE in the TestTDE database';

The certificate is going to be used in the next step down – to protect the Database Encryption Key (DEK) in your TDE enabled database. When you create a certificate, it contains an asymmetric key that can be used for encryption. An asymmetric key includes a public key that can be used to encrypt data and a private key that must be used if you want to decrypt data – that private key gets automatically protected (encrypted) by the DMK.

A logical question to ask is why we need the certificate? Why couldn’t we just protect the DEK in our TDE enabled database with the DMK from the master database directly?

Imagine the scenario that you need to migrate your database to another SQL Server instance. We can do this but we will need also to migrate the object that was used to protect/encrypt the DEK – which is itself stored in the database.

If TDE used the DMK to protect that then we would need to migrate the DMK to the new instance. But what if the new instance already had a DMK in the master database and objects that it was used to protect – such as other databases using TDE. At this point we would be stuck, we can’t migrate our DMK without overwriting the one that’s there, so we would have a choice, enable encryption for the migrated database but break it fore the existing ones, or vice versa.

Neither is a good option, but by having a certificate we can migrate that happily as we can have as many certificates as we want.

This also gives us the option that where we have multiple databases encrypted by TDE we could use a separate certificate for each. That means if one certificate is breached the others could remain protected.

This does raise a good point though that you may want one day to migrate your certificate, so call it something more meaningful and unique than “MyTDECert”.

Backing up the certificate

BACKUP CERTIFICATE MyTDECert   
TO FILE = 'C:\MyTDECert'  
WITH PRIVATE KEY   
(  
    FILE = 'C:\MyTDECert_PrivateKeyFile',  
    ENCRYPTION BY PASSWORD = 'UseAStrongPasswordHereToo!£$7'  
);

We backup the certificate in case we ever need to move or restore our database to a different server or SQL instance. It’s critical if we want to be able to access our encrypted data.

When you back it up, you specify a password to encrypt the key. What happens is SQL grabs the stored version of the Private Key (which is encrypted by the DMK) decrypts it, then re-encrypts it with the password. This means that you would be able to restore it to a different SQL instance where the DMK didn’t exist.

This covers us against the scenarios explained above regarding why we use a certificate rather than just relying on the DMK. It should also make it clear that if we need to migrate or recover the database all we need is:

  • The database backup
  • The certificate and key backups and the password used when creating them

Creating the Database Encryption Key (DEK)

CREATE DATABASE ENCRYPTION KEY WITH ALGORITHM = AES_256
ENCRYPTION BY SERVER CERTIFICATE MyTDECert;

In this, almost the final step, we create the actual key that will be used to encrypt the data. It’s stored in the database, but encrypted by the key associated with the certificate created in the previous step. The DEK is a symmetric key, i.e. the same value is used to encrypt or decrypt data.

It’s logical to ask why we don’t just use the private key from the certificate to encrypt the data directly. This is a bit more difficult to justify than the previous scenario with the DMK versus use of a certificate. It is certainly feasible that TDE could have been implemented this way.

One consideration is that the certificate is created with an asymmetric key – these are easier to work with in some ways as we only need the public key to encrypt data so can keep the private key concealed most of the time. Asymmetric encryption however is slower that symmetric encryption so to reduce the performance impact of TDE we want to use a symmetric key.

The concept of a DEK was new in SQL 2008 and created specifically for TDE. It makes sense that if we are to have a separate DEK then it should be stored in the database itself. That way migration/recovery is eased as it minimises the number of objects required.

It’s worth noting that you can only have one DEK in each database.

Enabling Encryption

ALTER DATABASE TestTDE SET ENCRYPTION ON;

In sys.databases you can see which databases have TDE turned on by looking at the is_encrypted column:

SELECT name
FROM sys.databases
WHERE is_encrypted = 1;

TDE sys_databases

We can find more details in the sys.dm_database_encryption_keys server view. Let’s query looking at some particular columns of interest:

SELECT
   d.name,
   k.encryption_state,
   k.encryptor_type,
   k.key_algorithm,
   k.key_length,
   k.percent_complete
FROM sys.dm_database_encryption_keys k
INNER JOIN sys.databases d
   ON k.database_id = d.database_id;

Here’s what I see after I created my DEK but before I enable encryption:

TDE_State1

We can see information about the DEK. We also see encryption state which describes the current state of the database. The main values you’ll see are:
1 = Unencrypted
2 = Encryption in Progress
3 = Encrypted

If I now enable encryption on this database and run the query again:

TDE_State2

We see that both my database and the TempDB database are now encrypted.

We also see the percent_complete column, which confusingly says zero. This column only has meaning when a state change is occurring. So, if the encryption state was 2 – then we would see a value here whilst the database was in the process of being encrypted. Here my database only had one row, so it was fairly instantaneous to flip encryption on.

This column becomes relevant when we are encrypting an existing database that has a reasonable amount of data, we’ll look at that next:

Encrypting an existing database with TDE

 

More articles about TDE

What is Transparent Data Encryption?

Understanding Keys and Certificates with Transparent Data Encryption (TDE)

How Secure is Transparent Data Encryption (TDE) – and How to Prevent Hacking

What is Transparent Data Encryption?

Transparent Data Encryption (TDE) was introduced in SQL 2008 as a way of protecting “at rest” data. It continues to be available in all versions of SQL right up until the present, though only in the Enterprise editions of SQL Server (though as with all other Enterprise only features, you can also work with it using Developer edition).

When we talk about “at rest” data we are referring to data that has been written to disk. In terms of our SQL databases that includes:

  • Any data files for our database
  • Any log files for our database
  • All backup files for the database, be they Full, Log or Differential backups
  • Database snapshot files
  • Any data written to disk in the TempDB database

The last item in that list, TempDB, needs to be included for completeness. Imagine that you query your database and as part of the query execution TempDB is used. If that data was written to disk then that creates a hole in our protection, someone could potentially read or copy the TempDB files and might see some of the data we are trying to protect. As a result when you enable TDE against any database on your SQL Server instance, the TempDB database is automatically encrypted as well to prevent this from happening.

Data “at rest” of course doesn’t include the following things:

  • Data loaded/stored in memory (buffer pool)
  • Data returned from a query and being passed across the network
  • Data received by a client as a result of a query

If you want to cover those scenarios as well then you need to look at other forms of encryption e.g. TLS and Always Encrypted.

There are also some less obvious exceptions which occur where SQL doesn’t use the buffer pool – and therefore there isn’t an in-memory version of the data:

  • Filestream data
  • Data persisted to disk using Buffer Pool Extensions

And there are a couple of other exceptions that can occur in certain circumstances:

  • Where the buffer pool gets paged to disk due to memory pressure
  • SQL dump files when there is a crash

That’s summarised in the below diagram:

TDE Encrypted vs Unencrypted

TDE mainly uses standard encryption protocols based on AES (Advanced Encryption Standard). When you set up TDE you can specify which AES algorithm you wish to use, AES_128, AES_192 or AES_256. In each case the number specifies the length of the key to be used for encryption in bits.

Obviously the longer your key, the harder the encryption should be to crack, however even for AES_128, estimations of how long it would take to break down the key by brute force vary between a thousand years, to numbers many times greater than the age of the universe – trillions of years. The difference is based on how we anticipate processing power to grow in the future. Even with the lowest estimates AES_128 should be sufficient in most scenarios but most people seem to go for AES-256 which should take the same time squared to be beaten.

Up to 2016, SQL also supported the TRIPLE_DES_3KEY encryption protocol. This is now generally not considered to be as secure as AES, and from SQL 2016 its use is deprecated. So, it is best if you stick to AES even if you are on a SQL version where DES is an option.

Let’s have a look at contents of some SQL data files so you can see the difference with and without TDE. I’ve created a database with a single table and inserted a row of data:

CREATE DATABASE TestTDE;
USE TestTDE;
CREATE TABLE dbo.SomeData(Id INT IDENTITY(1,1), SomeText VARCHAR(255));
INSERT INTO dbo.SomeData (SomeText) VALUES('This is my data');

I’ll close my connection from the database, and detach it so I can open the files in a Hex Editor. Then I search for my text in the data file:

TDE_MDF_File_Unencrypted

As you can see the data is stored clear as day in the data file.

Now let’s look at the same data file once TDE has been enabled. This time if I search for my string it’s not found and my data looks like this:

TDE_MDF_File_Encrypted

Even where the previous file was all zeros where there was free space at the end, the encrypted version also has those encrypted:

TDE_MDF_File_Encrypted_End

TDE Works by using an encryption key that is stored in the database being encrypted – but that key is itself stored encrypted by an object outside of the database. We’ll see all the various objects involved when we look at setting up TDE next:

Setting up Transparent Data Encryption (TDE)

 

More articles on TDE:

Encrypting an existing database with TDE
Understanding Keys and Certificates with Transparent Data Encryption (TDE)
How Secure is Transparent Data Encryption (TDE) – and How to Prevent Hacking

Overview of Encryption Tools in SQL Server

More and more people are considering some level of encryption against their data stored in SQL Server. In many cases it might be considered that other measures such as firewalls, well defined access permissions and application code free of security flaws, already offer sufficient protection.

Anyone who follows the tech news though will know that attacks and data breaches are common. Encryption often provides the second line of defence, i.e. even if someone malicious gains access to your data, they may not be able to read what they find there.

Over the last 10 years, the number of features available in SQL Server for working with encryption has grown. It can be confusing to understand what the differences are, which you should implement and what exactly each feature protects you from. In most cases these are complementary technologies and if you are getting serious about encryption you may choose to implement more than one. In the latest versions of the SQL Upgrade Advisor you will be recommended by default to consider Transparent Data Encryption and Always Encrypted, and it is (or should be) standard practice to be using TLS.

Often it seems to be the case that people implement some form of encryption to “tick a box”. If you are really serious about protecting your data then you really need to understand what each feature does – and does not – protect you against.

One thing that may influence your decision on what to use is what is available in the versions and editions of SQL Server that you have in production. Here’s a quick comparison, I’ve missed out versions where no new features were added and for 2016 I’ve specified SP1 as a lot of features changed from Enterprise only to being available in Standard – so there’s no good reason for not being on SP1 if you’re using 2016:

EncryptionOverview1

We’ll quickly run through the various technologies listed above with brief detail on each.

Column Encryption (2005)

Uses the Encryption Hierarchy and built-in Cryptographic Functions to allow you to encrypt values and store them in the database. Ability to read the encrypted data is based on permission or access to the objects used to perform the encryption e.g. a pass phrase, a certificate or an encryption key.

It can be easy to misunderstand what this feature is, it is not that you configure a column as being encrypted, but rather that the Cryptographic Functions combined with the Encryption Hierarchy in SQL Server allow you to encrypt specific values, which can then be stored in the database.

TDE (2008) Enterprise Only

TDE (Transparent Data Encryption) is configured at the database level and is defined as “at-rest” encryption. This means that the data and log files for your database (as well as backups) are encrypted on disk. One of the key advantages to TDE is that it doesn’t require any code change to implement.

TDE doesn’t protect you against anyone who is able to read data directly in the database, and an administrator on the SQL Server instance or the server hosting it can fully circumvent it. It simply protects you against parties who may gain access to the files.

Without the certificate and key used for TDE they would not be able to restore a backup to another server, and they would not be able to directly read data in the database files – which otherwise are in a relatively readable format for someone sufficiently motivated.

One thing to note with TDE is that it is only available in the Enterprise Editions of SQL Server.

Those of you who follow the major SQL Blogs may have come across this article which points out a seemingly big flaw in TDE that allows someone with minimal privileges against your server (read is sufficient) and the right skillset, to be able to extract your keys and read the data:
https://simonmcauliffe.com/technology/tde/

It seems that you can mitigate this by making sure access is restricted to the directories holding the relevant keys:
C:\Windows\System32\Microsoft\Protect\S-1-5-18

If only administrators (and the SQL Server service account) can access this directory then you should be safe.

TDE does add extra processing overhead to SQL. 3-10% is often quoted but that will vary with your workload so if this is a concern you should test – some people consider that the overhead is not worth the benefits.

Backup Encryption (2014)

Backup Encryption is what it sounds like, just your database backups are encrypted. All types of backup are supported – Full, Differential and Log.
This isn’t a setting you turn on for your database – you have to specify when you make a backup that you want it protected. As such this doesn’t protect you against someone making an ad-hoc backup and storing it on a USB device which they then leave on a train – or any other similar human error.

One nice thing is that you don’t have to do anything special when restoring the database back to its original home. Backup encryption (generally) uses a certificate or key stored in the master database so as long as you haven’t lost that, the encryption part of a restore process is seamless. If you need to restore elsewhere then you’ll need a copy of the certificate or key.

Backup encryption will suffer from the same limitations as TDE (except that it will only add CPU overhead when taking backups), however if your intent is to make sure backups stored off the database server are protected then this should be sufficient.

Always Encrypted (2016)

Always Encrypted (AE) is implemented at the column level. Data is stored encrypted on the disk, in memory and when being passed to a client application. Encryption is based on the combination of a key and certificate, where the certificate is usually stored on a client machine and not stored on the database server. This means that access (even full sysadmin access) to the database server on its own is not sufficient to be able to read the data.

Implemented correctly AE can be very secure, however it also has a number of logical limitations that mean it can be challenging to interact with the encrypted data. For instance you can’t use LIKE comparisons against encrypted columns.

How big an impact this is for you depends on how your application is coded and the sort of columns you want to encrypt. It could be straightforward and require no code change – or it could entail major rework.

Here are my previous posts on AE:
Understanding Keys and Certificates with Always Encrypted
Working with Data in Always Encrypted

Dynamic Data Masking (2016)

Dynamic Data Making is not actually encryption but is another way that people may choose to protect data from prying eyes.

This provides a method of hiding data from non-privileged users without having to change any code.

Image storing a credit card number. We might want to remember a customer’s card details so that it is easy for them to make payments in the future. Equally though we don’t want people to be able to easily view it. Often we’ll see that just the last 4 digits are displayed so that we have enough to verify it is the correct card. Dynamic Data Masking provides a way of doing this. We can define a mask, so that for most users who have permission to view the data the number:
9999-9999-9999-1234

Would only appear to them as:
XXXX-XXXX-XXXX-1234

We can then also define specific roles that can access the full number, for instance one assigned to a service account used for processing payments.

That sounds great, but in reality the protection is not that great. If I have read access to the database, but am in a non-privileged role then it would still be possible for me to get around the masking with a bit of querying. This is due to the requirement that existing queries should continue working, so the SQL engine can see the underlying data even if I can’t and will respond to me attempting comparisons against the data.

Transport Level Security (TLS)

TLS is a protocol used to protect data as it travels across a network. It is fairly equivalent to SSL (Secure Sockets Layer) which is used to protect data between a web server and a browser. Without TLS, data from SQL is sent unprotected across a network and can be intercepted by what is known as a Man-in-the-Middle attack.

Such attacks function by impersonating the parties involved. Imagine if we have the SQL Server instance and a client application. A communication starts from the client to the server, the attacker impersonates the server and intercepts the data or request sent from the client, it then can alter the message before passing it on to the server – which believes it is receiving it directly from the client. Then it will intercept any data being send back, either just to capture it, or to alter it, before transmitting that back to the client which believes it is in direct and private communication with the server.
TLS creates an encrypted connection and encrypts the data sent across the network and so the information passed back and forth cannot be read by any party other than the sender or receiver.

TLS also allows both the sender and receiver to verify each other so the Man-in-the-Middle attack is doubly foiled.

Hashing and Salting

Hashing is a method of taking a value (usually a string such as a password) and transforming it via a Hashing function to a new value that in theory cannot be reverse engineered to find the original value.

When someone enters a password into the system that value can be put through the same hashing function and the resulting hashed value be checked against the value stored in the database to verify that they match.

In practice is would be possible for someone to maintain a list of what the hashed values are for a particular hashing function and thus be able to recognise where common values have been used.

This where salting comes in, a second value known as the salt is generated and combined with the value to be hashed. Then the salt is stored unencrypted alongside the resulting hashed value. When we then want to verify (for instance) a password, we simply combine the entered value with the salt before hashing and comparing. If the salt is unique for each value we hash there’s no practical way of maintaining a dictionary of hashed values.

SQL has a series of hashing functions but HASHBYTES is the one most commonly used for this process.

That’s the set of tools available. The below table shows a quick comparison of what each protects you from and any disadvantages:

EncryptionOverview2

No solution you pick is going to be secure and safe just by implementing one tool or another. Of key importance is process. Encryption generally involves keys of one form or another and these must be protected. Your house may be secure if you have good locks, but not if you leave the keys outside on the doorstep.

Equally you need to think about what you are attempting to achieve by implementing a particular tool and make sure that the way your organisational roles and processes are structured supports this. For instance Always Encrypted is often touted as even being able to protect against a rogue DBA. The DBA may have full access and be able to do whatever they like to the SQL instance, but for AE to read protected data you also need a certificate that is stored on application servers. If you want this level of protection then you need to make sure that you have sufficient role separation such that the DBA can’t access the certificate, and those who are admins on the application server can’t access the database. Or at least not without triggering auditing.

Finally, remember that any system is only as strong as its weakest link. There’s no point encrypting your data if the application accessing it is so full of holes it will leak it all out anyway, and it’s often said that the weakest part of any system in the human. The point of having data is that we want to be able to access it, so there must always be a point where the data is viewed/or interacted with in an unencrypted form.

At the end of the day the only way of absolutely ensuring there is no way for someone to access data they shouldn’t – is to have no data in the first place.

But in practice, you can at least try to make it hard for them.

SQL Puzzle 4: The Beale Papers

I’ve not done a SQL puzzle for a while so thought it was getting overdue…

I set this one for my workmates a while ago and people found it quite fun.

The Beale Papers are a set of three ciphertexts which allegedly reveal the details of a buried treasure. The tale is that the treasure was buried by Thomas J. Beale in Bedford County, Virginia, USA back in the 1820s. He entrusted a box containing the ciphered messages to the care of a local innkeeper and then disappeared never to be seen again – his intent was that the papers could be passed on to his descendants and they could decode the messages and claim their rightful inheritance. The innkeeper passed the ciphertexts on to a friend before he died, and said friend spent the rest of his life trying to decipher them. He succeeded on the second message but failed on the others.

Treasure.png

Since the 19th century many attempts have been made to decipher the other messages and find the treasure. Even without cracking the code, treasure hunters have got out their shovels over the years to dig up parts of Bedford County without finding anything (as far as we know). Recently there are some claims to have solved the puzzle, but that the treasure was already gone.Of course there’s a good chance the whole thing is a hoax!

Your task is to repeat the deciphering of the second message, but to make it easier I’ll tell you the key.

Beale used an existing text as a key for the cipher in which he assigned a number to each word, 1 for the first word, two for the second and so on.

Then for each letter in the text he wanted to encode he randomly picked a word in the key that started with that letter, and then he enciphered the letter as the number for that word. Make sense?

For instance, if the key text was “now is the time” and the plain text is “tin”, then either (3 2 1) or (4 2 1) are valid encipherments.

The key text Beale used for encoding the second message was the United States declaration of independence. I’m going to leave you to source your own copy of this. Be aware that some of the real world issues with this task that you may come up include the fact that Beale may have made some inaccuracies in his ciphering, and the copy of the key text he used to code his text is unlikely to be exactly the same as the one you use. As such you may need to do some tweaking, spelling correction and perhaps logic to work out what some of the words are. Good luck if you choose to attempt this!

So, starting with the two variables below, write some SQL to decode the message (@CipherText ):

DECLARE @KeyText VARCHAR(MAX); 
DECLARE @CipherText VARCHAR(MAX);

SET @KeyText = '<Insert your copy of the declaration of independence here>';

SET @CipherText = '115, 73, 24, 807, 37, 52, 49, 17, 31, 62, 647, 22, 7, 15, 140, 47, 29, 107, 79, 84, 56, 239, 10, 26, 811, 5, 196, 308, 85, 52, 160, 136, 59, 211, 36, 9, 46, 316, 554, 122, 106, 95, 53, 58, 2, 42, 7, 35, 122, 53, 31, 82, 77, 250, 196, 56, 96, 118, 71, 140, 287, 28, 353, 37, 1005, 65, 147, 807, 24, 3, 8, 12, 47, 43, 59, 807, 45, 316, 101, 41, 78, 154, 1005, 122, 138, 191, 16, 77, 49, 102, 57, 72, 34, 73, 85, 35, 371, 59, 196, 81, 92, 191, 106, 273, 60, 394, 620, 270, 220, 106, 388, 287, 63, 3, 6, 191, 122, 43, 234, 400, 106, 290, 314, 47, 48, 81, 96, 26, 115, 92, 158, 191, 110, 77, 85, 197, 46, 10, 113, 140, 353, 48, 120, 106, 2, 607, 61, 420, 811, 29, 125, 14, 20, 37, 105, 28, 248, 16, 159, 7, 35, 19, 301, 125, 110, 486, 287, 98, 117, 511, 62, 51, 220, 37, 113, 140, 807, 138, 540, 8, 44, 287, 388, 117, 18, 79, 344, 34, 20, 59, 511, 548, 107, 603, 220, 7, 66, 154, 41, 20, 50, 6, 575, 122, 154, 248, 110, 61, 52, 33, 30, 5, 38, 8, 14, 84, 57, 540, 217, 115, 71, 29, 84, 63, 43, 131, 29, 138, 47, 73, 239, 540, 52, 53, 79, 118, 51, 44, 63, 196, 12, 239, 112, 3, 49, 79, 353, 105, 56, 371, 557, 211, 505, 125, 360, 133, 143, 101, 15, 284, 540, 252, 14, 205, 140, 344, 26, 811, 138, 115, 48, 73, 34, 205, 316, 607, 63, 220, 7, 52, 150, 44, 52, 16, 40, 37, 158, 807, 37, 121, 12, 95, 10, 15, 35, 12, 131, 62, 115, 102, 807, 49, 53, 135, 138, 30, 31, 62, 67, 41, 85, 63, 10, 106, 807, 138, 8, 113, 20, 32, 33, 37, 353, 287, 140, 47, 85, 50, 37, 49, 47, 64, 6, 7, 71, 33, 4, 43, 47, 63, 1, 27, 600, 208, 230, 15, 191, 246, 85, 94, 511, 2, 270, 20, 39, 7, 33, 44, 22, 40, 7, 10, 3, 811, 106, 44, 486, 230, 353, 211, 200, 31, 10, 38, 140, 297, 61, 603, 320, 302, 666, 287, 2, 44, 33, 32, 511, 548, 10, 6, 250, 557, 246, 53, 37, 52, 83, 47, 320, 38, 33, 807, 7, 44, 30, 31, 250, 10, 15, 35, 106, 160, 113, 31, 102, 406, 230, 540, 320, 29, 66, 33, 101, 807, 138, 301, 316, 353, 320, 220, 37, 52, 28, 540, 320, 33, 8, 48, 107, 50, 811, 7, 2, 113, 73, 16, 125, 11, 110, 67, 102, 807, 33, 59, 81, 158, 38, 43, 581, 138, 19, 85, 400, 38, 43, 77, 14, 27, 8, 47, 138, 63, 140, 44, 35, 22, 177, 106, 250, 314, 217, 2, 10, 7, 1005, 4, 20, 25, 44, 48, 7, 26, 46, 110, 230, 807, 191, 34, 112, 147, 44, 110, 121, 125, 96, 41, 51, 50, 140, 56, 47, 152, 540, 63, 807, 28, 42, 250, 138, 582, 98, 643, 32, 107, 140, 112, 26, 85, 138, 540, 53, 20, 125, 371, 38, 36, 10, 52, 118, 136, 102, 420, 150, 112, 71, 14, 20, 7, 24, 18, 12, 807, 37, 67, 110, 62, 33, 21, 95, 220, 511, 102, 811, 30, 83, 84, 305, 620, 15, 2, 10, 8, 220, 106, 353, 105, 106, 60, 275, 72, 8, 50, 205, 185, 112, 125, 540, 65, 106, 807, 138, 96, 110, 16, 73, 33, 807, 150, 409, 400, 50, 154, 285, 96, 106, 316, 270, 205, 101, 811, 400, 8, 44, 37, 52, 40, 241, 34, 205, 38, 16, 46, 47, 85, 24, 44, 15, 64, 73, 138, 807, 85, 78, 110, 33, 420, 505, 53, 37, 38, 22, 31, 10, 110, 106, 101, 140, 15, 38, 3, 5, 44, 7, 98, 287, 135, 150, 96, 33, 84, 125, 807, 191, 96, 511, 118, 40, 370, 643, 466, 106, 41, 107, 603, 220, 275, 30, 150, 105, 49, 53, 287, 250, 208, 134, 7, 53, 12, 47, 85, 63, 138, 110, 21, 112, 140, 485, 486, 505, 14, 73, 84, 575, 1005, 150, 200, 16, 42, 5, 4, 25, 42, 8, 16, 811, 125, 160, 32, 205, 603, 807, 81, 96, 405, 41, 600, 136, 14, 20, 28, 26, 353, 302, 246, 8, 131, 160, 140, 84, 440, 42, 16, 811, 40, 67, 101, 102, 194, 138, 205, 51, 63, 241, 540, 122, 8, 10, 63, 140, 47, 48, 140, 288';

 

I’ll finish this post with an excerpt from the comments on a blog I found where someone claims to have decoded the other two texts as well, and know where the treasure is:
TreasureFound

Have fun!

Previous puzzles on this blog:
SQL Puzzle 1: Magic Squares
SQL Puzzle 2: Eight QueensSQL Puzzle 3: Knights and Queens

Credit to Programming Praxis where I first saw a version of this puzzle – and where you can find many more!

Implementing Temporal Tables Where You Have Existing Data

In my post about auditing tools in SQL Server I mentioned a few tools we have at our disposal for capturing the change history of data.

One was Change Data Capture which I want into a bit more detail about in Setting up Change Data Capture (CDC)

I also referred to Temporal Tables which are available to us in the brave new world of SQL 2016.

Temporal tables aren’t just about monitoring change, they also provide really nice methods for being able to query historical data – to see what the values were at a particular point in time e.g.

SELECT * FROM dbo.SomeData FOR SYSTEM_TIME AS OF '1900-01-01';

This makes them easy to work with, but also opens up other applications for them over and above auditing purposes – for instance being able to maintain configuration data that changes over time, but where we need to reference what the value was historically.

I wanted to look in more detail about the process for changing an existing table so that it becomes a temporal one.

Let’s start by quickly looking at how we would define a temporal table for a fresh deployment:

CREATE TABLE dbo.SomeData(
   Id INT IDENTITY(1,1) NOT NULL
      CONSTRAINT PK_SomeData PRIMARY KEY CLUSTERED,
   SomeText VARCHAR(1000) NULL,
   ModifiedBy VARCHAR(128) NULL,
   StartDate datetime2(7) GENERATED ALWAYS AS ROW START NOT NULL,
   EndDate datetime2(7) GENERATED ALWAYS AS ROW END NOT NULL,
   PERIOD FOR SYSTEM_TIME (StartDate, EndDate)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.SomeData_History));

There are a few things here additional to a normal table definition. First thing to note is we need two columns to support the Temporal versioning. I’ve called my columns StartDate and EndDate, these both have to be of DATETIME2 datatype to the highest precision (100 nanoseconds). These are specified as GENERATED ALWAYS AS ROW START\END so SQL will take care of assigning values when new rows are inserted or updated.

I’ve specified the PERIOD FOR SYSTEM TIME, so SQL knows that these are the fields to use for the temporal versioning.

Finally I’ve specified SYSTEM_VERSIONING for the table and what the history table should be called.

When I execute this you can find the table and corresponding history table displayed in SSMS as below:
Tempopral1
We have the table itself, and also the history table has been created for us. This is where the previous versions of rows of data will be stored.

So that’s pretty straightforward – but what if you want to change an existing table?

First of all let’s drop this one:

DROP TABLE dbo.SomeData;

Unfortunately that gives me an error message:
Msg 13552, Level 16, State 1, Line 90
Drop table operation failed on table ‘TestAudit.dbo.SomeData’ because it is not supported operation on system-versioned temporal tables.

As the message says, you can’t just drop Temporal tables. First you must turn system versioning off, then you can go ahead:

ALTER TABLE dbo.SomeData SET (SYSTEM_VERSIONING = OFF); 
DROP TABLE dbo.SomeData;

Dropping the table though leaves the History table behind – now seen as a normal table: Tempopral2

That’s handy as it means you don’t lose the history when the main table is deleted. For now though let’s just delete that as well:

DROP TABLE dbo.SomeData_History;

Now let’s create a new table that will be our “existing table” we want to change. For this example I’ll create a test table with a trigger that maintains a modified date, and throw in 1000 rows:

CREATE TABLE dbo.SomeData(
   Id INT IDENTITY(1,1) 
      CONSTRAINT PK_SomeData PRIMARY KEY CLUSTERED NOT NULL,
   SomeText VARCHAR(1000) NULL,
   ModifiedBy VARCHAR(128) NULL,
   ModifiedDate DATETIME NOT NULL 
      CONSTRAINT DF_SomeData_ModifiedDate DEFAULT (GETUTCDATE())
);
GO

CREATE TRIGGER dbo.TR_SomeData_ModifiedDate
   ON dbo.SomeData 
   AFTER UPDATE
AS
BEGIN
   UPDATE dbo.SomeData 
   SET ModifiedDate = GETUTCDATE()
   FROM dbo.SomeData 
   INNER JOIN Inserted
      ON dbo.SomeData.Id = Inserted.Id;
END;
GO

INSERT INTO dbo.SomeData(SomeText, ModifiedBy)
SELECT TOP 1000 'Blah', 'Me'
FROM sys.objects a CROSS JOIN sys.objects b;

I’m going to complicate things and say that when I turn this into a temporal table I want to keep the existing ModifiedDate column and use that as the start date for my system versioning.

The first thing you might try is to alter the column:

ALTER TABLE dbo.SomeData
ALTER COLUMN ModifiedDate DATETIME GENERATED ALWAYS AS ROW START NOT NULL;

Error:
Msg 13589, Level 16, State 1, Line 44
Column ‘ModifiedDate’ in table ‘TestAudit.dbo.SomeData’ cannot be specified as ‘GENERATED ALWAYS’ in ALTER COLUMN statement.

So I can’t do this in an alter statement. What I’ll actually have to do is create a new column specified to work with the system versioning, copy the data over, drop the old column and rename the new one.

First let’s create the additional columns and define the PERIOD FOR SYSTEM_TIME:

ALTER TABLE dbo.SomeData
ADD 
StartDate datetime2 GENERATED ALWAYS AS ROW start NOT NULL
   CONSTRAINT DF_SomeData_StartDate DEFAULT (GETUTCDATE()),
EndDate datetime2 GENERATED ALWAYS AS ROW END NOT NULL
    CONSTRAINT DF_SomeData_EndDate DEFAULT ('9999-12-31 23:59:59.9999999'),
PERIOD FOR SYSTEM_TIME (StartDate, EndDate);

Note that I’ve had to specify defaults for the new columns so that the existing data gets values populated (I didn’t have to do that when I simply defined the Temporal Table from scratch).

For the StartDate I’ve specified the current UTC date – this could be any date and time that’s not in the future, though note it should be a UTC one. If I’d tried to use GETDATE, as I’m currently on British Summer Time, I would get the following error:
Msg 13542, Level 16, State 0, Line 51
ADD PERIOD FOR SYSTEM_TIME on table ‘TestAudit.dbo.SomeData’ failed because there are open records with start of period set to a value in the future.

For the end date I have put the maximum value a DATETIME2 column can hold, this is required and is a way of saying that this record is (at the moment) valid indefinitely. If I try to put in an earlier date I receive this error:
Msg 13575, Level 16, State 0, Line 51
ADD PERIOD FOR SYSTEM_TIME failed because table ‘TestAudit.dbo.SomeData’ contains records where end of period is not equal to MAX datetime.

Now I’ve got the columns I could try copying my data over:

UPDATE dbo.SomeData SET StartDate = ModifiedDate;

I get another error:
Msg 13537, Level 16, State 1, Line 70
Cannot update GENERATED ALWAYS columns in table ‘TestAudit.dbo.SomeData’.

It turns out you can get around this by temporarily dropping the period for system time:

ALTER TABLE dbo.SomeData DROP PERIOD FOR SYSTEM_TIME;
UPDATE dbo.SomeData SET StartDate = ModifiedDate;
ALTER TABLE dbo.SomeData ADD PERIOD FOR SYSTEM_TIME (StartDate, EndDate);

Your logical thought might be here that I didn’t need to create the period for system time when I altered the table to add the new columns. I could have just added the columns, migrated the data, then only added the period afterwards. Trust me, I tried that too:
Msg 13509, Level 16, State 1, Line 51
Cannot create generated always column when SYSTEM_TIME period is not defined.

From here it turns out to be reasonable clear sailing.

Let’s run the final command to make the table Temporal:

ALTER TABLE dbo.SomeData 
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.SomeData_History));

Here’s what it looks like in SSMS:
Tempopral3
You can see this is the same as the original fresh creation of a temporal table. The history table has been created for us. Just with an additional column in both tables.

Let’s remove the extra column and do the final rename:

ALTER TABLE dbo.SomeData DROP CONSTRAINT DF_SomeData_ModifiedDate;
ALTER TABLE dbo.SomeData DROP COLUMN ModifiedDate;
EXEC sp_rename 'dbo.SomeData.StartDate','ModifiedDate','COLUMN';

I was amazed the sp_rename didn’t give me an error. In fact if we look at the table again:
Tempopral4
You can see the column has been renamed in both the main table and the history table. Pretty neat!

I have at this stage forgotten to do one thing. Let’s just try to update a record and see what happens:

UPDATE dbo.SomeData SET SomeText = 'Wibble' WHERE Id = 2;

Msg 13537, Level 16, State 1, Procedure TR_SomeData_ModifiedDate, Line 7 [Batch Start Line 185]
Cannot update GENERATED ALWAYS columns in table ‘TestAudit.dbo.SomeData’.
The statement has been terminated.

Yep, don’t forget to remove your trigger!

DROP TRIGGER dbo.TR_SomeData_ModifiedDate;

And now we’re all done.

 

What if I use SQL Server Data Tools (SSDT) to make my change?

If you’re working with Database Projects in Visual Studio then you will be used to publishing changes straight from there – or deploying them using a DACPAC file.

In that case SSDT handles all this for you – though it does use a slightly different method. Rather than creating a new column, copying that data over, dropping the old column and renaming the new one, it does it at the table level.
That’s the same way some other changes are made with SSDT (or if you do them through the GUI in SSMS). The steps are broadly the same, just with tables instead of columns:

1) Create new temporal table
2) Turn SYSTEM_VERSIONING off
3) Copy across the data from the old table to the new
4) Turn SYSTEM_VERSIONING back on
5) Drop the old table
6) Rename the new table to the original name

If you’re doing this by “hand”, you could do it this way if you prefer, but it is probably a heavier operation that will require more down time. For instance you’ll need the whole table to be locked while the operation progresses as you don’t want to risk data changes once you’ve started copying the data over. The column method will also lock the table, but potentially there is less to update\copy so it should hopefully be quicker.

As a final point I should note that you may run into problems dropping the trigger with SSDT, it can be a little quirky around doing that. For the moment I’ll leave off from going into that in more detail – but if you get that issue feel free to give me a yell!

Avoid “Constants” in T-SQL

Unlike some other languages, T-SQL doesn’t have the concept of a constant.

As good coders, we’ve all at some point tried to use a SQL variable to hold a constant value within a stored procedure, in order to make our code both more readable and maintainable.

I’ll give you an example. Here I’m querying a Task table, and I want all the rows where the TaskStatus is 0 (zero) which means the Task is Open:

SELECT *
FROM dbo.Task 
WHERE TaskStatus = 0;

If someone else comes along to look at this code they don’t know what the value of zero means. My code’s not clear. Worse I might refer to the same value multiple times in my procedure, so if I need to change it later I have to change it in multiple places.

Good practice from other coding languages would say that I replace it with a meaningfully named constant. As mentioned, in T-SQL we don’t have constants so I’ll compromise and use a variable:

DECLARE @OpenTask tinyint = 0;

SELECT *
FROM dbo.Task 
WHERE  TaskStatus = @OpenTask;

Now that’s much more readable – right?

Unfortunately it’s also a bad idea in SQL . Let’s see why.

I’ll create the aforementioned Task table, add an index on TaskStatus. Then I’ll add million rows with status 1 (which we’ll call closed) and 1 row with the value 0 (zero) which is open:

CREATE TABLE dbo.Task
(
    Id INT IDENTITY(1,1) CONSTRAINT PK_Task PRIMARY KEY CLUSTERED,
    UserId INT,
    TaskType INT,
    Payload VARCHAR(255) NOT NULL,
    TaskStatus tinyint NOT NULL
);
GO

CREATE INDEX IX_Task_TaskStatus ON dbo.Task(TaskStatus);

INSERT INTO dbo.Task (UserId,TaskType,Payload,TaskStatus)
SELECT TOP 1000000 1,1,'This Shizzle Is Done',1
FROM sys.objects a, sys.objects b, sys.objects c;

INSERT INTO dbo.Task (UserId,TaskType,Payload,TaskStatus)
SELECT 1,1,'Do This Shizzle',0;

Once that’s completed I’m going to update the statistics just so we know SQL has the most up to date information to produce an optimal execution plan for our queries:

UPDATE STATISTICS dbo.Task WITH fullscan;

Now let’s go back to our original queries. Before I run them let’s think what we want them to do. We have an index on TaskStatus and we only have one row we are looking for, so we’d hope the query will use the index and go straight to the record. The index doesn’t contain all the columns, but that’s okay. We’re only going to have to output one record so if it has to look up the extra columns up in the clustered index that’ll be pretty damn quick.

Let’s run the first query, we’ll capture the execution plan and the STATISTICS output:

SET STATISTICS io ON;
SET STATISTICS time ON;

SELECT *
FROM dbo.Task 
WHERE TaskStatus = 0;

Here’s the execution plan:
Constants1

 

 

 

 

That’s doing exactly what we hoped for, it’s looked up the record in our index using a seek. Then it’s grabbed the rest of the columns from the clustered index using a key lookup.

Here’s the statistics output:
Table ‘Task’. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

So that’s a nice small number of reads and less than a millisecond of CPU.

Now let’s run the “improved” version:

SET STATISTICS io ON;
SET STATISTICS time ON;

DECLARE @OpenTask tinyint = 0;

SELECT *
FROM dbo.Task 
WHERE TaskStatus = @OpenTask;

 

Here’s the execution plan this time:

Constants2

That doesn’t look so good. Let’s check the statistics:
Table ‘Task’. Scan count 1, logical reads 5341, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

CPU time = 109 ms, elapsed time = 96 ms.

Those figures tell us the query has got between 100 and 1,000 times worse. So much for the improved version.

So why is this happening?

The answer is simply that the optimizer doesn’t/can’t look at the values inside variables when a piece of SQL is compiled. Therefore it can’t use the statistics against the indexes on the table to get an accurate idea of how many rows to expect back in the results.

We can see that if we compare the properties of the Index Seek Operator from the first query:

Constants3

Against the properties for the Index Scan Operator from the second query:

Constants4

In the first one we can see that the Actual Number of Rows (at the top) exactly matches the Estimated Number of rows (at the bottom). SQL has been able to use the statistics on the index to get an accurate estimate.

In the second this is not the case. We have 500,000 rows estimate, but only 1 actual. This has led SQL down the route of choosing a plan that would have been more effective for 500,000 rows – but is much less effective for 1. In this case it didn’t know what value to optimize for. Lacking that information it used the density value in the statistics and multiplied that by the total number of rows to get the estimate. Or in other words, the statistics tell it that there are two distinct values (0 and 1) in the table. Not knowing which one has been supplied the optimizer figures than on average half the rows will be returned.

So what should do you to make your code clearer?

The simple answer is to use comments, the following is totally clear to its meaning, and will perform optimally:

SELECT *
FROM dbo.Task 
WHERE TaskStatus = 0 -- Open Task;

 

But what about the maintainability issue, where you may have to refer to the same value multiple times in a given procedure?

Unfortunately you’re just going to have to put up with maintaining the value in multiple places, but in general within a well designed application these should be static values (and hopefully integers) so it shouldn’t be too big a deal.

 

Note this is not the same for parameters passed to a stored procedure. In that case the queries inside the stored procedure will compile using the values passed the first time the procedure was executed – that can be really useful, but it can also cause its own set of issues to be aware of! Just remember parameters and variables – similar but not the same!

Auditing Data Access in SQL Server for GDPR Compliance

In relation to the GDPR, I’ve recently been looking at the tools available within SQL Server to support auditing our databases. Many of you out there will already have full-scale auditing solutions in place. For others though, the prompting of GDPR could be the first time you’re really thinking about it.

This post is a brief overview and comparison of the SQL features that might help you out, with some suggestions of what you might want to think about in terms of auditing.

Our interpretation of the GDPR is that it will be important to be able to provide detail of changes to personal data to demonstrate that data protection and security procedures are effective and are being followed. That means being able to say who changed an item of data, when they changed it, and what was the before and after value.

The regulation states that we must ensure that data is only accessed by appropriate parties, and for the explicit purposes that the person the data relates to has consented. We may feel that this latter requirement is satisfied by our security processes, but in practice, if we wish to prove this then we may also want to audit all data access – including when data is simply read but not changed.

For auditing we need to consider all personal data held, whether that is data provided by an individual or data about an individual that supplements this. Auditing should therefore not be limited to the definition of “personal” within GDPR, namely Personally Identifiable Information (PII). We may also, for our own purposes outside of the GDPR requirement, wish to audit other information related to our business process – for instance where approvals are required.

In terms of individuals accessing or changing data there are several contexts of which we need to be aware:
Application users

  • Support access via specific tooling
  • Support access direct to the database
  • Illegitimate access via an application or tooling
  • Illegitimate access direct to the database

Summary of Tools available within SQL Server

The table below lists the technologies we will look at as well as in what versions/editions of SQL Server they are available. Some versions are omitted where the availability of features is the same as in the previous version:

Audit

Profiler Trace

SQL Server traces allow you to capture information about all access to the database/server. As such this technology is sometimes used as part of an auditing solution – in fact both the C2 Auditing and Common Compliance Criteria features utilise it in one way or another.

It is more useful, however, for capturing instance level security events (such as Logons and failed Logins) than for monitoring changes to individual data items.

There can also be quite an overhead to running traces, though this can be minimised if they are created as server-side traces

Extended Events

Introduced in SQL 2008, Extended Events were intended as a lightweight replacement for Profiler Traces. Still, however, they would be more appropriate for capturing security events than for monitoring data access and change – although SQL Server Audit which we’ll look at later is implemented via Extended Events.

C2 Audit

This feature is supported in all versions and editions of SQL Server up to 2017, though it has been marked as deprecated for some time.
C2 was a rating established by the US Department for Defence applicable to security levels of computer systems, mainly focusing on access to resources. The C2 Audit within SQL server is implemented via server-side traces and captures events such as:

  • Server shutdowns and restarts
  • Successful and failed logins
  • Successful and failed use of permissions when accessing database objects
  • Successful and failed use of permissions when running any command against the database server

It can be seen from the events captured that while this sort of data capture may be useful as part of an overall audit strategy, it won’t cover the audit of individual data required by the GDPR.

If implementing C2 Auditing on an instance, it is important to be aware that it will have some performance impact on a server as each event must be written to the audit file before the operation can complete. In fact, if the audit file cannot be accessed (or disk space runs out) then SQL Server will stop completely until that is resolved. However, it is a limited set of items being audited so the performance impact in general should not be too considerable.

Common Criteria Compliance

Available from SQL 2005 SP1 Enterprise, and in standard edition from SQL 2008, Common Compliance Criteria is described by Microsoft as the replacement for C2 Audit.

Enabling this option configures a couple of other behaviours within SQL server to comply with the “Common Criteria for Information Technology Evaluation” International Standard.

From an auditing point of view, it enables a level of login auditing which can be viewed by querying sys.dm_exec_sessions for currently connected users.

This information includes:

  • Last Successful login time
  • Last unsuccessful login time
  •  Number of Unsuccessful login attempts since last successful login

Triggers

Available in all editions/versions from SQL 2005. Triggers were also available before that, but additional events were added in 2005.

It is a common usage of Triggers to perform some level of database auditing, for instance to capture a modified datetime for a record, or even the identity of the logged-on user. They can also be used to capture more information about changes – for instance to maintain Audit tables as data is inserted, updated or deleted.

The main potential disadvantage of triggers for Auditing is performance. Technologies designed specifically for the purpose may be more lightweight.

Change Tracking (CT)

Available in SQL Server 2008 standard edition onward, CT is not designed to be used for Audit. Rather it is intended to support ETL processes by identifying changes to tables.

When enabled Change Tracking simply identifies that a given row has been inserted, updated or deleted. It doesn’t capture the content of the change, for instance that a given column was changed from value ‘A’ to value ‘B’. It doesn’t maintain a change history, so if I insert a record, update it five times, then delete it, when I check change tracking it will only tell me that it was deleted.

Change Data Capture (CDC)

Available in SQL Server 2008 Enterprise Edition onward, from 2016 SP1 CDC is also available in Standard Edition. Like CT, CDC is not intended as an audit technology – however due to the level of information captured it can be useful in that context.

As well as identifying that data has changed, CDC captures the actual changes and maintains the history in a separate table. Unlike a similar hand-cranked solution using triggers, it manages this asynchronously so has only light performance overhead.

One of the key things missing from CDC from an Audit perspective is that while it captures the data change, it doesn’t also record the Login identity making the change or the datetime for the change.

A Connect item was raised to change this, however the Microsoft response was to reject by design as CDC “is not designed to be an auditing feature”. Microsoft’s recommendation is to use SQL Server Audit instead:

https://connect.microsoft.com/SQLServer/feedback/details/283707/cdc-options-to-capture-more-data-username-date-time-etc
Despite this, if the table itself has columns to capture the user identity and the modified datetime, and the application ensures these are set, then these columns will be included in the change data capture in which case a meaningful Audit of changes to data is provided.

CDC can be enabled on a per table basis, and you can configure which security role(s) have access to the change log. You can also define the retention period which avoids you having to implement a separate solution for cleaning up old audit data.

One thing to bear in mind is that due to the asynchronous process involved to log changes (which uses SQL Server Agent) there is no transactional guarantee that changes will reach the capture log, though the risk of data loss is likely to be minimal.

SQL Server Audit

Available in SQL Server 2008 Enterprise edition, Server level auditing came to Standard edition in 2012 and all features are available in all editions from 2016 SP1.

SQL Server Audit is the out of the box feature that is intended for the purposes of Audit, so logically we may think it is the best one for us to be using. Certainly, it is what Microsoft would recommend for auditing within the database.

Auditing is implemented using Extended Events and there are many Audit events you can enable it to capture. Events are either logged to a file, or to the Windows Security log where it is more difficult to tamper with them. You can then either view the events directly in that destination or through a GUI within SSMS.

Depending on the scope of the events monitored, you can choose to Audit either at an object level (for instance table by table) or at schema or database level.
In the context of GDPR we may choose to Audit data access – insert, select, update and delete. What this sort of Audit will give us is not, as we might expect, the ability to view who accessed or changed specific data. Rather it captures the queries being executed against the objects monitored.

As such it’s probably not the best method for being able to track a change history for a piece of data, though that could be done in theory by analysing the individual queries. Equally, if we want to know who read a specific record, this would require analysis of the queries executed.

Because of that, while SQL Server Audit may form an important part of an overall Audit strategy, it may not, on its own, give us what we need.

It does however allow us to capture all queries that have, for instance, touched certain tables. As such it is the only technology available to use within SQL Server that may allow us to monitor for abuse of privileges. For instance, I may be the DBA and require admin rights over the SQL Server instance, with SQL Server Audit the queries I execute against tables containing personal information would be captured and any dubious querying patterns could be identified.

An important part of the use of SQL Server Audit is going to be proactively monitoring the logs to make sure data access is being carried out appropriately.

Temporal Tables

Temporal tables were introduced in SQL Server 2016 and are a way of easily tracking changes to data – as well as being able to easily query what the value was at a given point in time.

Like CDC they do not capture the user context making the data change, so this must be handled by additional application logic if we require it.

The logging of change data for temporal tables is a synchronous operation performed as part of the transaction that changes data in the main table. Microsoft documentation however states that this creates negligible overhead for inserts, and while there is greater overhead for updates and deletes this is still significantly less than a custom solution (for instance using triggers).

Temporal tables work by having two user tables for each table being tracked. One is the original table – known as the current table, and the other is the history table which includes start and end times for which a given version of the data was current. New commands for querying the state of data at a given point in time make this a seamless operation.

Application Audit

One issue with any auditing that we perform at the database level is that any well designed application, from a security context, is likely to be accessing the database via a service account, rather than application users being granted direct database access. As such it is not possible for pure database auditing to capture a meaningful user context where operations are conducted through the application.

For instance, it may be quite legitimate from the application for a user to update details relating to a customer. We need to maintain a record of those changes so we require the application to log the user context – usually into an “UpdatedBy” column.

There may also be the case where semi-structured, or unstructured data is stored in blob forms, such as XML, JSON or other document formats. In these cases, a database level audit could capture the before and after states of the data, but it would require additional analysis to identify what actual elements of the data within the document had changed. If we want to provide an easily digestible change history then we need to consider implementing this at an application or tooling level.

Finally, we may want to record user access and activities from the application. For instance, the date and time of connections as well as what was accessed.

What might a best practice SQL Data Access Audit solution look like?

Which (if any) of these technologies we use is going to depend very much on the versions and editions of SQL Server we have installed. In an ideal world, to make our lives easy we would all have Enterprise edition – or be on SQL 2016 SP1 or later – and have an extensive choice. In some cases, it may be that the access to additional security and audit components becomes the compelling reason for choosing Enterprise over Standard edition, or for upgrading to a later version of SQL Server.

C2 or Common Compliance

It makes sense to implement one of these, ideally Common Compliance where this is available, as this gives us increased security within SQL Server at little cost and minimal effort to implement. Common Compliance is preferable both as C2 is deprecated and that C2 poorly managed (losing disk access) can create additional risks to application availability. Both offer additional levels of logging connections to our SQL Server boxes that may be useful in identifying those logging onto the server for nefarious purposes.

Change Capture

In particular for GDPR we will want to make sure we have some form of capture and logging of changes to personal data. We have a choice of technologies here.

Triggers

Triggers can provide a serviceable solution but the performance overhead is high so it may be better to use a built-in technology where one is available.

CDC

CDC provides a low overhead tool for managing change capture. However, it is only available in Enterprise Edition until SQL 2016 SP1.

Temporal Tables

These are the latest and greatest, so it may make sense to use where we are using SQL Server 2016, although as it is a synchronous operation it may have more of an immediate performance impact than CDC.

I hope to look at a performance comparison between these different methods of data capture in a later post.

Application Auditing

Whatever tool is used for change capture, application auditing is required to make sure the relevant user context is captured. It’s also worth considering logging user activity.

SQL Server Audit

For a best practice solution, where this is available to us it makes sense to use it. We can use it to capture data access even though the format is not going to be the most easy to analyse . We should also use it to capture events that could relate to anyone making changes to our Change Capture. If anyone makes changes to the Audit itself then this gets captured by the audit so we can detect tampering. We should make sure the Audit is saved to the Security log then that is also tamperproof. We can’t guarantee that we will capture all events, but we can reasonably guarantee that we will detect it if someone prevents us from doing so (e.g. turning off Auditing or Change Capture).

Conclusions

There are a number of components you may want to implement in order to deploy an effective data access auditing solution. In some cases your choices may be limited by the versions and editions of SQL your platforms currently sit on.
In some cases though, where the version and edition of SQL we use supports these technologies, then there is not a great deal of effort required to implement them.

In subsequent posts I’ll hope to show some simple examples in practice, as well as look briefly at the impacts on server performance.