Understanding Keys and Certificates with TDE

December 21, 2022May 10, 2023 ~ Matthew McGiffen ~ Leave a comment

When you first look at the encryption hierarchy for TDE in SQL Server it can be a bit daunting. There seem to be a lot of objects involved and it might not be clear why each is required. It can be tempting to skip a full understanding of all the objects and just get on with setting things up – which is relatively straightforward.

I’d encourage you not to do that and I’ll explain why. There are a lot of scenarios that might crop up in the lifecycle of a TDE protected database instance. Recovering a protected database from backup. Migrating database from one server to another. Managing high availability. The list goes on.

There are a lot of resources out there that will advise you on how to do these things – but many of them are inaccurate. Only by understanding how TDE works – and in particular how it uses the various encryption objects, can you be sure what approach is correct. And better than that – if you have a good understanding of the architecture of TDE you won’t need these resources, the correct approach will follow logically from your understanding of how things work.

TDE uses multiple keys and certificates in the process of protecting your database. This is something we’ll see in common with other forms of encryption. The reason for that is both about providing the best level of protection as well as supporting ease of management for your protected databases.

Keys in SQL Server often have at least three components:

The Encryption Key itself – Usually can be thought of a number expressed in binary format. Long and random enough to make it difficult to guess even by brute force attempts. This is the key that is actually used to encrypt and decrypt your data.
Another object that’s used to protect the key – This object might be another key, it might be a certificate, or it might just be a password. This object is used to encrypt the encryption key.
The encrypted value of the encryption key – Formed from the original value of the key, encrypted by the protecting object.

In SQL we rarely (maybe even never) see the actual value of the key. We have the encrypted version and we usually know what object was used to encrypt it. That second object may even be another key that is itself encrypted by a third object.

When it comes down to it though, the actual thing that is used to encrypt or decrypt data is the key itself, not the encrypted value, and not the hierarchy of objects that may have been used to protect it.

So, all I really need to read your data, is your key.

Let’s look at that in the context of TDE. Here we see the hierarchy of encryption objects supporting TDE. There are other ways of working with TDE but the approach shown here is the standard.

Let’s look at the objects one at a time.

Database Encryption Key (DEK)

Right at the bottom of the hierarchy is the DEK. This is stored in the database itself and is what is actually used to encrypt and decrypt data in the database. The DEK is a symmetric key, which means the same key is used to both encrypt and decrypt data. Symmetric key encryption is much quicker than asymmetric key encryption which is why such a key is used in this case. We want the “transparent” encryption activities to have as little overhead as possible while keeping things secure.

The DEK is actually stored encrypted in the database. Encryption wouldn’t be very effective if the key was stored in plain sight for anyone to access and use. As it is stored encrypted, even if someone has your files, there is no way for them to access the unencrypted version of the DEK and use it to read your data.

The encrypted DEK is also stored in any database backups

Certificate and Associated Asymmetric Key Pair

At the next level up we have the Certificate, which contains a Public Key which can be used to encrypt data and has a reference to a Private Key which must be used to decrypt data. The two keys together are known as a Public\Private Key pair and are used to encrypt the DEK using asymmetric encryption.

The Public key can be held in plain sight as it can only be used to encrypt data. The Private Key is required to decrypt data. The Private Key is in turn encrypted to make sure it is protected.

There are other options for protecting your DEK which include using an asymmetric key that is stored externally to your SQL Server. This is managed through something called Extensible Key Management (EKM) and we’ll look at that in later posts.

Database Master Key (DMK)

The DMK is stored in the Master database and is what’s used to encrypt the private key. However, the DMK isn’t just used for your database, and indeed isn’t just used for TDE. It can support a number of activities. There can be only one DMK for your SQL instance. The DMK is in turn encrypted.

Service Master Key (SMK)

The SMK is created when you first install your SQL Server instance and is unique to the instance. The SMK is used to protect your DMK. It is in turn protected by the operating system Data Protection API (DPAPI).

Understanding the Need for the Hierarchy

That’s a lot of layers of encryption to deal with. It can be difficult to understand why we need so many.

Let’s look at one scenario to aid our understanding. We’ll look at how it works in practice and what the problems might be if we didn’t have all these different levels of encryption.

Let’s say you want to restore a TDE protected database to another server – a common requirement. The encrypted DEK is already stored in your backup file – so we have that. You would be forgiven however in thinking that we need to migrate copies of all the layers above it in order to be able to read our protected database.

We run into problems however as soon as we think about migrating the DMK. Remember there can only be one DMK per instance, what if the instance we are restoring to already has a DMK, and that DMK is already used to protect other objects. We can’t just replace it. That is why it is critical there must be an object between the DMK and the DEK, the DMK cannot be used to encrypt the DEK directly.

However, we still need the DMK to read the private key associated with the certificate. Don’t we?

In practice you are able – and pretty much required – to back up the certificate to disk and also the private key associated with it. If you don’t do this and it is lost (for instance if the server crashes and is unrecoverable) then your data is pretty much lost.

The command for backing up the certificate looks something like this:

USE master; GO BACKUP CERTIFICATE MyTDECert TO FILE = 'C:\Test\MyTDECert.cer' WITH PRIVATE KEY( FILE = 'C:\Test\MyTDECert_PrivateKeyFile.pvk', ENCRYPTION BY PASSWORD = 'C0rrecth0rserbatterystab1e' );

You can see that when we backup the certificate we specify a password to encrypt the private key. Remember that the private key was already encrypted by the DMK? This BACKUP CERTIFICATE command doesn’t just encrypt it a second time – what would be the point of that? No, the reason we need to supply a password is that the command retrieves the unencrypted version of the private key, re-encrypts it with the password INSTEAD and then that is what gets saved to disk.

Remember that at the point I run the command SQL has access to all the objects, all the way up the chain, that are used for the encryption. So, it has no problem getting the actual value for the private key.

Now, when I restore that certificate to – let’s say another instance of SQL Server – the command looks like this:

USE master; GO CREATE CERTIFICATE MyTDECert FROM FILE = 'C:\Test\MyTDECert.cer' WITH PRIVATE KEY( FILE = 'C:\Test\MyTDECert_PrivateKeyFile.pvk', DECRYPTION BY PASSWORD = 'C0rrecth0rserbatterystab1e' );

Considering what the BACKUP command did, you can figure out that the above command will do the opposite. It first of all decrypts the private key using the password supplied. Then it encrypts it again using the DMK for the local instance before saving it in the master database locally.

Remember it is the actual values of keys that get used for encryption – not their encrypted value. So, the private key for this certificate is exactly the same as it was where we backed it up from – though the encrypted value will be different as it is protected by a different DMK.

If I now go to restore a copy of a database whose DEK has been encrypted with this certificate onto the new instance I can do so without any problem.

Many articles will tell you that you also need to migrate the DMK from your old instance, and some will say that you also need the SMK. This is simply not correct – I mean, it will work if you do that, in the right order, but it isn’t necessary.

So all you need to have to restore a database protected by TDE, is the database backup, the certificate/private key backups, and the password specified when the certificate and key were backed up.

Equally that’s all someone else needs too, so make sure those things are protected.

There are other reasons it would not be desirable to have the DEK directly protected by the DMK. We can have a separate certificate for each database protected by TDE if we wish – and I would suggest that is good practice. Someone could steal the backup of that, not a big deal unless they have the password used to protect it also. If they do, then they can access our data. That’s not good, but at least we have minimized the attack surface and they have only accessed one database. If the DMK was used to protect the DEK directly then we would need to back that up in a similar manner – and if that was stolen in the same way then protection for all objects and databases protected by the DMK would be breached.

We’ll see other scenarios where the multiple levels in the encryption hierarchy for TDE help us in the later posts when we talk about managing TDE and its lifecycle with your databases.

This post is part of a comprehensive series on SQL Server Encryption. Subscribe to my blog for updates as new posts are published or you can buy my book through the link in the sidebar to get it all in one go.

And if you’re embarking on an encryption project (or anything else to do with SQL Server) and want some help, I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

What is TDE (Transparent Data Encryption)?

December 19, 2022May 10, 2023 ~ Matthew McGiffen ~ 2 Comments

TDE is referred to as a “transparent” form of encryption. What that means is that the process of encrypting and decrypting data is fully performed in the background. The queries we write to access data are unchanged whether TDE is enabled or not. So, enabling TDE has no impact on application functionality, does not require refactoring of code, and is therefore relatively easy to implement. TDE encrypts all the data in a database, so you don’t need to choose which data items to encrypt.

TDE allows us to encrypt “at rest” data. When we talk about “at rest” data we are referring to data that has been written to disk. TDE works at the IO level, encrypting data automatically as it is written to disk, and decrypting it as it is read from disk.

In terms of our SQL databases the assets that are protected include:

Any data files for our database.
Any log files for our database.
All backup files for the database, Full, Log or Differential backups.
Database snapshot files.
Also the TempDB database data and log files are encrypted.

The last item in that list, TempDB, needs to be encrypted for completeness of protection. Imagine that you query your database and as part of the query execution TempDB is used. If that data were written to disk, then that creates a hole in our protection, someone could potentially read or copy the TempDB files and might see some of the data we are trying to protect. As a result, when you enable TDE against any database on your SQL Server instance, the TempDB database is automatically encrypted as well to prevent this from happening.

It’s reasonably obvious to state that data “at rest” doesn’t include the following things:

Data loaded/stored in memory (buffer pool).
Data returned from a query and being passed across the network.
Data received by a client as a result of a query.

If you want to cover those scenarios as well then you need to look at other forms of encryption such as TLS and Always Encrypted.

There are also some less obvious exceptions:

Filestream data.
Data persisted to disk using Buffer Pool Extensions.

And there are a couple of other exceptions that can occur in particular circumstances:

Where the buffer pool gets paged to disk due to memory pressure.
SQL dump files when there is a crash.

What does and doesn’t get encrypted by TDE is summarized in the below diagram:

Let’s have a look at the contents of some SQL data files so you can see the difference with and without TDE. I’ve created a database with a single table and inserted a row of data with the following code:

CREATE DATABASE TestTDE; GO USE TestTDE; GO CREATE TABLE dbo.SomeData (Id INT IDENTITY(1,1), SomeText VARCHAR(255)); GO INSERT INTO dbo.SomeData (SomeText) VALUES('This is my data'); GO

I’ll close my connection from the database, and detach it so I can open the files in a Hex Editor. You can detach the database with the following SQL:

USE master; GO EXEC master.dbo.sp_detach_db @dbname = N'TestTDE';

Then I open the file in my Hex Editor and search for the text “This is my data” in the data file:

As you can see the data is stored as clear as day in the data file.

Now let’s look at the same data file once TDE has been enabled (we will look at enabling TDE in later posts). This time if I search for the same text it’s not found, and my data looks like that shown below.

It’s interesting to also look at the end of the database file where there is free space. In the unencrypted version that free space would have simply been represented by zeros. In the encrypted version that free space too has been encrypted, so an attacker cannot even see where your data ends (Figure 2-4).

TDE works by using an encryption key that is stored in the database being encrypted – but that key is itself stored encrypted by an object outside of the database. We’ll look at the various objects involved in the next post.

Recommended Approach to Encryption in SQL Server

December 14, 2022May 10, 2023 ~ Matthew McGiffen ~ 2 Comments

We looked earlier at the various places our data lives – on disk, in memory, in-transit across the network. A good strategy will protect all of these locations – sometimes with multiple layers. Exactly how you go about doing that for your applications may vary – but as long as you are on SQL Server 2016 or higher there is a default strategy that you should consider. This combines a number of the available SQL Server features to provide the best protection.

TDE. For at-rest protection of all your data.
Always Encrypted. To encrypt all (or most) columns that contain personal identifiable, or sensitive, information.
TLS. To make sure network communications between the application and server are encrypted.
Hashing and Salting of passwords. To make sure passwords are secure – and we never need to store the actual password in the database.

On top of that you should definitely consider using EKM. In previously times, when we had to have dedicated hardware to support it, there was a fair barrier to entry. Now with easily usable and cost-effective cloud solutions it’s easy to get started with EKM, and certainly has many advantages.

Encryption is easiest to build in “by-design” when developing new applications, but more often than not we are implementing or enhancing encryption against our existing applications. It’s great if you are in that situation and have the scope to implement a full encryption strategy as outlined above. Such projects often have time and budget constraints though, or need to be delivered incrementally. If that’s where you are at, then you may want to focus first on where you can achieve the most with the least effort. You need to assess whether implementing any of the features above is going to have a performance impact that worries you, once you’ve dealt with that you might want to look at things in this order:

Hashing and Salting of passwords. I put this item first because you should never be storing passwords in plaintext in a database. If you are encrypting passwords before storing them then that’s better than plaintext but still you should be looking at a hashing method.
TLS. You really should have this on for all your connections between your applications and SQL Server that could contain data or other information you care about protecting. TLS is very easy to set up and can be done very quickly.
TDE. TDE is again very easy to setup, and it comes for free with the standard edition of SQL Server 2016 SP1 onward. It used to require you to be on enterprise edition so that was a good reason why many people didn’t use it. TDE is only going to protect you against a limited number of scenarios, but you still get that for not much more than the flick of a switch.
Always Encrypted. Column encryption with Always Encrypted is a little harder to understand and there are limitations on the ways you can work with encrypted data. It is however the best tool for protecting your personal and sensitive data. If your project has limited resource (what project doesn’t) focus on encrypting first the most sensitive items, and those where you won’t need to make code changes to work around the fact that data is encrypted. Code changes are likely to be required where you need to search against, or perform calculations on, data that you wish to encrypt.

Overview of SQL Server Encryption Features

December 12, 2022May 10, 2023 ~ Matthew McGiffen ~ 1 Comment

SQL Server has had the native ability to encrypt data since SQL Server 2005. This included functionality that could be used to encrypt individual items and columns of data as well as the Transparent Data Encryption (TDE) feature which was available with the enterprise edition of SQL Server and could be used to encrypt all data where it is stored on disk. We then didn’t see significant additions to the features available for encryption until 2016 when Microsoft added Always Encrypted for column encryption. In SQL Server 2019 Microsoft made TDE available in standard edition, and also in SQL Server 2019 the ability to use enclaves was added to Always Encrypted to improve the available functionality for interacting with encrypted data. Finally in SQL Server 2022 further enhancements were made to the set of functionality available when working with Always Encrypted with enclaves.

We can see Always Encrypted (which was introduced in 2016) as the successor to previous methods of column encryption, and in later posts we will go into a high level of detail on Always Encrypted, although we’ll also cover the older methods in brief.

The table below shows the list of encryption features in SQL Server and what each tool is intended to protect.

Encryption features available in SQL Server

Feature	What is Protected
Transparent Data Encryption (TDE)	Data saved to disk. This includes data files, transaction log files, backup files and database snapshots.
Backup Encryption	Backup files.
Always Encrypted	Data stored in columns. With Always Encrypted the data is protected on disk, in memory and in-transit across the network.
Transport Layer Security (TLS)	Network traffic. TLS protects data in-transit across the network as well as commands executed against the database server.
Hashing and Salting	This is not strictly encryption, but we generally use it to protect passwords.
Encryption Functions	Data stored in columns. Here we are referring to the encryption functions introduced in SQL 2005 that pre-date Always Encrypted.
Extensible Key Management (EKM)	This provides extra protection and ease of management for encryption keys by enabling them to be stored with an external provider.

Let’s take a brief look at each of these features in turn.

TDE

TDE protects our data stored on disk, what we often refer to as “at-rest” data. It offers good protection against the scenario where the file system is accessed, and an attacker might attempt to retrieve data directly from the database files themselves – or copy the backup files so they can be restored to another SQL Server to access the data. It doesn’t protect us at all though where an attacker may have access to query the database directly. The “transparent” part of the name refers to the fact that TDE works transparently in the background with no impact on our queries or other application functionality. TDE protects all of the data in a database, unlike methods of column encryption which usually target specific types of information to encrypt.

Backup Encryption

Backup encryption just encrypts our backup files. This includes full backups, differential backups and log backups. This is particularly useful where we might store backups, possibly on tape, off site and want to make sure they are inaccessible if stolen. TDE also does this for us, so we only consider using backup encryption where we can’t use TDE for some reason.

Always Encrypted

Always Encrypted is a form of column encryption. It works hand in hand with the client driver that your application uses to connect to and query the database to ensure that data remains encrypted all the way to the point it reaches your application. That’s what the “always” part of the name refers to. Data is protected at-rest, in-memory and in-transit across the network. Encryption and decryption actually take place within the client driver rather than within SQL Server.

On this blog we will look in depth at two flavors of Always Encrypted. We have the basic version that was introduced in SQL Server 2016, and Always Encrypted with Secure Enclaves that was added in SQL Server 2019. What’s nice about Always Encrypted is that encryption and decryption are carried out automatically for you by the client driver so in many cases you may not even have to make code changes. There are limitations on how you can interact with encrypted data though. The version with enclaves removes some of those restrictions by allowing certain activities to place in a secure portion of memory (called an enclave) on the database server. The use of enclaves does however come with an extra overhead in setup and management.

TLS

TLS is used to encrypt network traffic. That means that data and queries sent between the application and database server are all encrypted. This is similar to SSL, which most people are familiar with for encrypting internet traffic (SSL in most cases actually uses the TLS protocol).

Hashing and Salting

Hashing and salting isn’t actually encryption because it is a one-way process. Hashing is where we run a value through a function that produces a seemingly random output. That output will always be the same for the same input value, but cannot be reverse-engineered to find the original value. Salting is a method to provide extra security for hashed values. Hashing and salting is considered the best practice for storing passwords as it means we don’t even need store actual passwords – so there should be no way for an attacker to access them.

Encryption Functions

Here we refer to the set of encryption functions that SQL Server implements to allow you to encrypt your own data. I see Always Encrypted as the successor to these functions and would recommend you use that where possible. Encryption using the functions is a bit more limiting, a bit less secure and a bit harder to implement than with Always Encrypted. There may be some scenarios where you want to use them though, so we’ll cover them in brief later on – though hopefully in enough detail that it tells you everything you need to know.

EKM

Most encryption is based on keys, and we need to think about how and where we manage them over time. EKM is functionality that allows you to store them outside of your server, either on a piece of kit that sits in a rack in your server room called a Hardware Security Module (HSM), or more commonly these days, using a cloud service like Azure Key Vault. You don’t need to use EKM in order to implement a secure encryption strategy but it’s certainly worth considering due to the ease of management that comes from having all of your keys in one place. It’s also easier to manage policies such as access control when you take a centralized approach to storing your keys.

Encryption and Data Protection Regulation

December 7, 2022May 10, 2023 ~ Matthew McGiffen ~ 1 Comment

One of the reasons you may be considering encryption is due to the relevant data protection regulation: either because the regulation specifies that data should be encrypted or because of the large potential penalties where there is a data breach. Some US companies have been hit by fines in the hundreds of millions of dollars following data breaches, so we are talking large sums of money. In Europe the largest fines so far (under the GDPR) have been related to misuse of personal data or consent (750 million euros is the highest I am aware of), but there have been fines of up to 30 million euros for data breaches. In the case of a breach, you could also be sued by individuals whose data has been accessed or by class action.

I’m not aware of any fines (large ones at least), or successful lawsuits, where a hacker gained access to a company’s systems but was not able to access data as it was securely encrypted. Some regulation also explicitly specifies exemptions where only encrypted data has been accessed. If you implement encryption, and do so well, you are certainly reducing your company’s financial exposure in the case of an attack successfully getting through your first lines of defense. If you implement encryption poorly though, you may not be making it too hard for attackers to get around the protection you have put in place.

It’s not really a point about regulation, but where you have a breach, you are also open to reputational damage, especially if you haven’t followed good practice. I regularly see threads on Twitter, mocking in almost disbelief, companies who haven’t protected items such as passwords in the right manner.

Most of the general regulation I’m aware of doesn’t specifically require you to encrypt data, but it may recommend that you consider it. The European GDPR (General Data Protection Regulation), for instance, recommends encryption, but does not require it. Some legislation specific to particular business sectors however does specifically require encryption; in the United States, for instance, HIPAA (Health Insurance Portability and Accountability Act) does explicitly require encryption of certain healthcare data. You need to be aware of data protection regulation that applies to the industries you are working with and understand what that regulation requires you to do.

Most countries now have some sort of general data protection regulation in place. Here are a few key ones (this is by no means a comprehensive list):

Europe has the GDPR. Note that although the UK has left the European Union since Brexit, legislation has been passed to put pretty much the same rules (known as the UK GDPR) in place.
The United States doesn’t have the same sort of regulation at the federal level, but there are many states that do have their own data-related laws and many others that are in the process of enacting them. The CCPA (California Consumer Privacy Act) is seen as a key player.
Canada has PIPEDA (Personal Information Protection and Electronic Documents Act) which is quite similar to Europe’s GDPR.
India has the Personal Data Protection bill that also has many of the same rules as the GDPR.
South Africa has POPIA (Protection of Personal Information Act). This doesn’t align directly with the GDPR but is seen as being just as rigorous.

It’s estimated that more than 120 countries have some form of data protection regulation.

In addition to general data protection regulation there may be other requirements depending on the industry you work in. We’ve mentioned HIPAA. Here are a few others:

In the United States, there are federal laws that apply to the processing of financial data. These include Sarbanes-Oxley and FACTA (Fair and Accurate Credit Transactions Act).
Also in the United States, there is FISMA (Federal Information Security Management Act) specific to government agencies and those who work with them.
In the UK, the FSA (Financial Standards Authority) imposes rules regarding the processing of financial data.

What do we want to achieve by encrypting our data?

December 5, 2022May 10, 2023 ~ Matthew McGiffen ~ 1 Comment

On the face of it, this is a very obvious question with a very obvious answer. We want to prevent data from falling into the wrong hands. In practice, it gets a little more complicated.

Exactly what types of attacks do you wish to be protected against? It’s good if we make sure our data is encrypted where it is stored on the disk, but that doesn’t help us if an attacker gains direct access to write queries against the database. We might encrypt data held in columns, but does that still protect us if the unencrypted data is being passed back across the network to our application and an attacker is intercepting our network traffic?

Another question is why are you considering encryption in the first place? Often projects consider encryption because relevant regulation, or client requirements, demand it. All too often in these cases, encryption is considered as a binary option, is data encrypted or not. Often what happens is that the bare minimum is done to tick the checkbox and move on. Data might be encrypted, but the protection offered is of value in only limited scenarios.

When we think about what scenarios we wish to be protected against it makes sense to consider where data exists and might therefore be vulnerable. By that I’m not talking about where specific data is held, but rather the types of locations:

In memory on the database server
In transit across the network
In your application
Files stored outside of the database, perhaps on a file share

In subsequent posts we’ll look at how the tools available in SQL Server can protect the first three. I won’t however look at it once it reaches your application; that’s for your application developers to consider. I also don’t talk about files stored outside of the database – but you should think about how you want to protect such items if you have them.

Encryption is only one line of defence and should go hand in hand with a well-defined and implemented approach to security. Your first line of defence is always going to be access controls, making sure that only the right users and applications can access your data and servers in the first place. Some might say that if you have access controls in place, then why do you need encryption at all. The answer is that there is always the potential for access controls to be breached. The best approaches to security are always multi-layered, and on top of access control and encryption, it is good to have auditing in place so you can see who is accessing your systems and what they are doing, as well as having alerting in place for suspicious activity.

Even though we have a good toolset for encryption available to us in SQL Server, it doesn’t come totally for free. Encrypting and decrypting data requires CPU, and so it does have some performance overhead; we’ll discuss that when talking about each tool, and in many cases, we’ll try to look at how you can quantify what level of overhead you might be looking at.

We also will generally have an increase in management overhead, for instance, where we need to manage encryption keys. When implementing encryption it is important to consider how you will manage it on an ongoing basis. One of the worst scenarios you can encounter with encryption is where an individual sets up encryption without telling anyone else where the keys are backed up and then that individual leaves the organization, and if we have a server failure, we may never be able to recover our encrypted data.

The last impact is on functionality; we’ll talk about this a lot when we look at column encryption: where we only store encrypted values in the database that limits how you can interact with them, for instance, searching against such columns or performing calculations. Due to this, an important part of your encryption strategy will be deciding what data to encrypt and how you’re going to work with it once it is encrypted.

When choosing what to encrypt we are usually most focused on personally identifiable information (PII) as well as items deemed particularly sensitive. In considering your approach I’d recommend that something is always better than nothing. If you have a security breach and your list of users is accessed, then that is bad, but it is much better if you are able to say that passwords, credit card information, social security numbers, and other information were not accessed due to the extra encryption on these items.

Matthew McGiffen Data

Month: December 2022