More and more people are considering some level of encryption against their data stored in SQL Server. In many cases it might be considered that other measures such as firewalls, well defined access permissions and application code free of security flaws, already offer sufficient protection.
Anyone who follows the tech news though will know that attacks and data breaches are common. Encryption often provides the second line of defence, i.e. even if someone malicious gains access to your data, they may not be able to read what they find there.
Over the last 10 years, the number of features available in SQL Server for working with encryption has grown. It can be confusing to understand what the differences are, which you should implement and what exactly each feature protects you from. In most cases these are complementary technologies and if you are getting serious about encryption you may choose to implement more than one. In the latest versions of the SQL Upgrade Advisor you will be recommended by default to consider Transparent Data Encryption and Always Encrypted, and it is (or should be) standard practice to be using TLS.
Often it seems to be the case that people implement some form of encryption to “tick a box”. If you are really serious about protecting your data then you really need to understand what each feature does – and does not – protect you against.
One thing that may influence your decision on what to use is what is available in the versions and editions of SQL Server that you have in production. Here’s a quick comparison, I’ve missed out versions where no new features were added and for 2016 I’ve specified SP1 as a lot of features changed from Enterprise only to being available in Standard – so there’s no good reason for not being on SP1 if you’re using 2016:
We’ll quickly run through the various technologies listed above with brief detail on each.
Column Encryption (2005)
Uses the Encryption Hierarchy and built-in Cryptographic Functions to allow you to encrypt values and store them in the database. Ability to read the encrypted data is based on permission or access to the objects used to perform the encryption e.g. a pass phrase, a certificate or an encryption key.
It can be easy to misunderstand what this feature is, it is not that you configure a column as being encrypted, but rather that the Cryptographic Functions combined with the Encryption Hierarchy in SQL Server allow you to encrypt specific values, which can then be stored in the database.
TDE (2008) Enterprise Only
TDE (Transparent Data Encryption) is configured at the database level and is defined as “at-rest” encryption. This means that the data and log files for your database (as well as backups) are encrypted on disk. One of the key advantages to TDE is that it doesn’t require any code change to implement.
TDE doesn’t protect you against anyone who is able to read data directly in the database, and an administrator on the SQL Server instance or the server hosting it can fully circumvent it. It simply protects you against parties who may gain access to the files.
Without the certificate and key used for TDE they would not be able to restore a backup to another server, and they would not be able to directly read data in the database files – which otherwise are in a relatively readable format for someone sufficiently motivated.
One thing to note with TDE is that it is only available in the Enterprise Editions of SQL Server.
Those of you who follow the major SQL Blogs may have come across this article which points out a seemingly big flaw in TDE that allows someone with minimal privileges against your server (read is sufficient) and the right skillset, to be able to extract your keys and read the data:
It seems that you can mitigate this by making sure access is restricted to the directories holding the relevant keys:
If only administrators (and the SQL Server service account) can access this directory then you should be safe.
TDE does add extra processing overhead to SQL. 3-10% is often quoted but that will vary with your workload so if this is a concern you should test – some people consider that the overhead is not worth the benefits.
Backup Encryption (2014)
Backup Encryption is what it sounds like, just your database backups are encrypted. All types of backup are supported – Full, Differential and Log.
This isn’t a setting you turn on for your database – you have to specify when you make a backup that you want it protected. As such this doesn’t protect you against someone making an ad-hoc backup and storing it on a USB device which they then leave on a train – or any other similar human error.
One nice thing is that you don’t have to do anything special when restoring the database back to its original home. Backup encryption (generally) uses a certificate or key stored in the master database so as long as you haven’t lost that, the encryption part of a restore process is seamless. If you need to restore elsewhere then you’ll need a copy of the certificate or key.
Backup encryption will suffer from the same limitations as TDE (except that it will only add CPU overhead when taking backups), however if your intent is to make sure backups stored off the database server are protected then this should be sufficient.
Always Encrypted (2016)
Always Encrypted (AE) is implemented at the column level. Data is stored encrypted on the disk, in memory and when being passed to a client application. Encryption is based on the combination of a key and certificate, where the certificate is usually stored on a client machine and not stored on the database server. This means that access (even full sysadmin access) to the database server on its own is not sufficient to be able to read the data.
Implemented correctly AE can be very secure, however it also has a number of logical limitations that mean it can be challenging to interact with the encrypted data. For instance you can’t use LIKE comparisons against encrypted columns.
How big an impact this is for you depends on how your application is coded and the sort of columns you want to encrypt. It could be straightforward and require no code change – or it could entail major rework.
Here are my previous posts on AE:
Understanding Keys and Certificates with Always Encrypted
Working with Data in Always Encrypted
Dynamic Data Masking (2016)
Dynamic Data Making is not actually encryption but is another way that people may choose to protect data from prying eyes.
This provides a method of hiding data from non-privileged users without having to change any code.
Image storing a credit card number. We might want to remember a customer’s card details so that it is easy for them to make payments in the future. Equally though we don’t want people to be able to easily view it. Often we’ll see that just the last 4 digits are displayed so that we have enough to verify it is the correct card. Dynamic Data Masking provides a way of doing this. We can define a mask, so that for most users who have permission to view the data the number:
Would only appear to them as:
We can then also define specific roles that can access the full number, for instance one assigned to a service account used for processing payments.
That sounds great, but in reality the protection is not that great. If I have read access to the database, but am in a non-privileged role then it would still be possible for me to get around the masking with a bit of querying. This is due to the requirement that existing queries should continue working, so the SQL engine can see the underlying data even if I can’t and will respond to me attempting comparisons against the data.
Transport Level Security (TLS)
TLS is a protocol used to protect data as it travels across a network. It is fairly equivalent to SSL (Secure Sockets Layer) which is used to protect data between a web server and a browser. Without TLS, data from SQL is sent unprotected across a network and can be intercepted by what is known as a Man-in-the-Middle attack.
Such attacks function by impersonating the parties involved. Imagine if we have the SQL Server instance and a client application. A communication starts from the client to the server, the attacker impersonates the server and intercepts the data or request sent from the client, it then can alter the message before passing it on to the server – which believes it is receiving it directly from the client. Then it will intercept any data being send back, either just to capture it, or to alter it, before transmitting that back to the client which believes it is in direct and private communication with the server.
TLS creates an encrypted connection and encrypts the data sent across the network and so the information passed back and forth cannot be read by any party other than the sender or receiver.
TLS also allows both the sender and receiver to verify each other so the Man-in-the-Middle attack is doubly foiled.
Hashing and Salting
Hashing is a method of taking a value (usually a string such as a password) and transforming it via a Hashing function to a new value that in theory cannot be reverse engineered to find the original value.
When someone enters a password into the system that value can be put through the same hashing function and the resulting hashed value be checked against the value stored in the database to verify that they match.
In practice is would be possible for someone to maintain a list of what the hashed values are for a particular hashing function and thus be able to recognise where common values have been used.
This where salting comes in, a second value known as the salt is generated and combined with the value to be hashed. Then the salt is stored unencrypted alongside the resulting hashed value. When we then want to verify (for instance) a password, we simply combine the entered value with the salt before hashing and comparing. If the salt is unique for each value we hash there’s no practical way of maintaining a dictionary of hashed values.
SQL has a series of hashing functions but HASHBYTES is the one most commonly used for this process.
That’s the set of tools available. The below table shows a quick comparison of what each protects you from and any disadvantages:
No solution you pick is going to be secure and safe just by implementing one tool or another. Of key importance is process. Encryption generally involves keys of one form or another and these must be protected. Your house may be secure if you have good locks, but not if you leave the keys outside on the doorstep.
Equally you need to think about what you are attempting to achieve by implementing a particular tool and make sure that the way your organisational roles and processes are structured supports this. For instance Always Encrypted is often touted as even being able to protect against a rogue DBA. The DBA may have full access and be able to do whatever they like to the SQL instance, but for AE to read protected data you also need a certificate that is stored on application servers. If you want this level of protection then you need to make sure that you have sufficient role separation such that the DBA can’t access the certificate, and those who are admins on the application server can’t access the database. Or at least not without triggering auditing.
Finally, remember that any system is only as strong as its weakest link. There’s no point encrypting your data if the application accessing it is so full of holes it will leak it all out anyway, and it’s often said that the weakest part of any system in the human. The point of having data is that we want to be able to access it, so there must always be a point where the data is viewed/or interacted with in an unencrypted form.
At the end of the day the only way of absolutely ensuring there is no way for someone to access data they shouldn’t – is to have no data in the first place.
But in practice, you can at least try to make it hard for them.