Auditing Data Access in SQL Server for GDPR Compliance

In relation to the GDPR, I’ve recently been looking at the tools available within SQL Server to support auditing our databases. Many of you out there will already have full-scale auditing solutions in place. For others though, the prompting of GDPR could be the first time you’re really thinking about it.

This post is a brief overview and comparison of the SQL features that might help you out, with some suggestions of what you might want to think about in terms of auditing.

Our interpretation of the GDPR is that it will be important to be able to provide detail of changes to personal data to demonstrate that data protection and security procedures are effective and are being followed. That means being able to say who changed an item of data, when they changed it, and what was the before and after value.

The regulation states that we must ensure that data is only accessed by appropriate parties, and for the explicit purposes that the person the data relates to has consented. We may feel that this latter requirement is satisfied by our security processes, but in practice, if we wish to prove this then we may also want to audit all data access – including when data is simply read but not changed.

For auditing we need to consider all personal data held, whether that is data provided by an individual or data about an individual that supplements this. Auditing should therefore not be limited to the definition of “personal” within GDPR, namely Personally Identifiable Information (PII). We may also, for our own purposes outside of the GDPR requirement, wish to audit other information related to our business process – for instance where approvals are required.

In terms of individuals accessing or changing data there are several contexts of which we need to be aware:
Application users

  • Support access via specific tooling
  • Support access direct to the database
  • Illegitimate access via an application or tooling
  • Illegitimate access direct to the database

Summary of Tools available within SQL Server

The table below lists the technologies we will look at as well as in what versions/editions of SQL Server they are available. Some versions are omitted where the availability of features is the same as in the previous version:

Audit

Profiler Trace

SQL Server traces allow you to capture information about all access to the database/server. As such this technology is sometimes used as part of an auditing solution – in fact both the C2 Auditing and Common Compliance Criteria features utilise it in one way or another.

It is more useful, however, for capturing instance level security events (such as Logons and failed Logins) than for monitoring changes to individual data items.

There can also be quite an overhead to running traces, though this can be minimised if they are created as server-side traces

Extended Events

Introduced in SQL 2008, Extended Events were intended as a lightweight replacement for Profiler Traces. Still, however, they would be more appropriate for capturing security events than for monitoring data access and change – although SQL Server Audit which we’ll look at later is implemented via Extended Events.

C2 Audit

This feature is supported in all versions and editions of SQL Server up to 2017, though it has been marked as deprecated for some time.
C2 was a rating established by the US Department for Defence applicable to security levels of computer systems, mainly focusing on access to resources. The C2 Audit within SQL server is implemented via server-side traces and captures events such as:

  • Server shutdowns and restarts
  • Successful and failed logins
  • Successful and failed use of permissions when accessing database objects
  • Successful and failed use of permissions when running any command against the database server

It can be seen from the events captured that while this sort of data capture may be useful as part of an overall audit strategy, it won’t cover the audit of individual data required by the GDPR.

If implementing C2 Auditing on an instance, it is important to be aware that it will have some performance impact on a server as each event must be written to the audit file before the operation can complete. In fact, if the audit file cannot be accessed (or disk space runs out) then SQL Server will stop completely until that is resolved. However, it is a limited set of items being audited so the performance impact in general should not be too considerable.

Common Criteria Compliance

Available from SQL 2005 SP1 Enterprise, and in standard edition from SQL 2008, Common Compliance Criteria is described by Microsoft as the replacement for C2 Audit.

Enabling this option configures a couple of other behaviours within SQL server to comply with the “Common Criteria for Information Technology Evaluation” International Standard.

From an auditing point of view, it enables a level of login auditing which can be viewed by querying sys.dm_exec_sessions for currently connected users.

This information includes:

  • Last Successful login time
  • Last unsuccessful login time
  •  Number of Unsuccessful login attempts since last successful login

Triggers

Available in all editions/versions from SQL 2005. Triggers were also available before that, but additional events were added in 2005.

It is a common usage of Triggers to perform some level of database auditing, for instance to capture a modified datetime for a record, or even the identity of the logged-on user. They can also be used to capture more information about changes – for instance to maintain Audit tables as data is inserted, updated or deleted.

The main potential disadvantage of triggers for Auditing is performance. Technologies designed specifically for the purpose may be more lightweight.

Change Tracking (CT)

Available in SQL Server 2008 standard edition onward, CT is not designed to be used for Audit. Rather it is intended to support ETL processes by identifying changes to tables.

When enabled Change Tracking simply identifies that a given row has been inserted, updated or deleted. It doesn’t capture the content of the change, for instance that a given column was changed from value ‘A’ to value ‘B’. It doesn’t maintain a change history, so if I insert a record, update it five times, then delete it, when I check change tracking it will only tell me that it was deleted.

Change Data Capture (CDC)

Available in SQL Server 2008 Enterprise Edition onward, from 2016 SP1 CDC is also available in Standard Edition. Like CT, CDC is not intended as an audit technology – however due to the level of information captured it can be useful in that context.

As well as identifying that data has changed, CDC captures the actual changes and maintains the history in a separate table. Unlike a similar hand-cranked solution using triggers, it manages this asynchronously so has only light performance overhead.

One of the key things missing from CDC from an Audit perspective is that while it captures the data change, it doesn’t also record the Login identity making the change or the datetime for the change.

A Connect item was raised to change this, however the Microsoft response was to reject by design as CDC “is not designed to be an auditing feature”. Microsoft’s recommendation is to use SQL Server Audit instead:

https://connect.microsoft.com/SQLServer/feedback/details/283707/cdc-options-to-capture-more-data-username-date-time-etc
Despite this, if the table itself has columns to capture the user identity and the modified datetime, and the application ensures these are set, then these columns will be included in the change data capture in which case a meaningful Audit of changes to data is provided.

CDC can be enabled on a per table basis, and you can configure which security role(s) have access to the change log. You can also define the retention period which avoids you having to implement a separate solution for cleaning up old audit data.

One thing to bear in mind is that due to the asynchronous process involved to log changes (which uses SQL Server Agent) there is no transactional guarantee that changes will reach the capture log, though the risk of data loss is likely to be minimal.

SQL Server Audit

Available in SQL Server 2008 Enterprise edition, Server level auditing came to Standard edition in 2012 and all features are available in all editions from 2016 SP1.

SQL Server Audit is the out of the box feature that is intended for the purposes of Audit, so logically we may think it is the best one for us to be using. Certainly, it is what Microsoft would recommend for auditing within the database.

Auditing is implemented using Extended Events and there are many Audit events you can enable it to capture. Events are either logged to a file, or to the Windows Security log where it is more difficult to tamper with them. You can then either view the events directly in that destination or through a GUI within SSMS.

Depending on the scope of the events monitored, you can choose to Audit either at an object level (for instance table by table) or at schema or database level.
In the context of GDPR we may choose to Audit data access – insert, select, update and delete. What this sort of Audit will give us is not, as we might expect, the ability to view who accessed or changed specific data. Rather it captures the queries being executed against the objects monitored.

As such it’s probably not the best method for being able to track a change history for a piece of data, though that could be done in theory by analysing the individual queries. Equally, if we want to know who read a specific record, this would require analysis of the queries executed.

Because of that, while SQL Server Audit may form an important part of an overall Audit strategy, it may not, on its own, give us what we need.

It does however allow us to capture all queries that have, for instance, touched certain tables. As such it is the only technology available to use within SQL Server that may allow us to monitor for abuse of privileges. For instance, I may be the DBA and require admin rights over the SQL Server instance, with SQL Server Audit the queries I execute against tables containing personal information would be captured and any dubious querying patterns could be identified.

An important part of the use of SQL Server Audit is going to be proactively monitoring the logs to make sure data access is being carried out appropriately.

Temporal Tables

Temporal tables were introduced in SQL Server 2016 and are a way of easily tracking changes to data – as well as being able to easily query what the value was at a given point in time.

Like CDC they do not capture the user context making the data change, so this must be handled by additional application logic if we require it.

The logging of change data for temporal tables is a synchronous operation performed as part of the transaction that changes data in the main table. Microsoft documentation however states that this creates negligible overhead for inserts, and while there is greater overhead for updates and deletes this is still significantly less than a custom solution (for instance using triggers).

Temporal tables work by having two user tables for each table being tracked. One is the original table – known as the current table, and the other is the history table which includes start and end times for which a given version of the data was current. New commands for querying the state of data at a given point in time make this a seamless operation.

Application Audit

One issue with any auditing that we perform at the database level is that any well designed application, from a security context, is likely to be accessing the database via a service account, rather than application users being granted direct database access. As such it is not possible for pure database auditing to capture a meaningful user context where operations are conducted through the application.

For instance, it may be quite legitimate from the application for a user to update details relating to a customer. We need to maintain a record of those changes so we require the application to log the user context – usually into an “UpdatedBy” column.

There may also be the case where semi-structured, or unstructured data is stored in blob forms, such as XML, JSON or other document formats. In these cases, a database level audit could capture the before and after states of the data, but it would require additional analysis to identify what actual elements of the data within the document had changed. If we want to provide an easily digestible change history then we need to consider implementing this at an application or tooling level.

Finally, we may want to record user access and activities from the application. For instance, the date and time of connections as well as what was accessed.

What might a best practice SQL Data Access Audit solution look like?

Which (if any) of these technologies we use is going to depend very much on the versions and editions of SQL Server we have installed. In an ideal world, to make our lives easy we would all have Enterprise edition – or be on SQL 2016 SP1 or later – and have an extensive choice. In some cases, it may be that the access to additional security and audit components becomes the compelling reason for choosing Enterprise over Standard edition, or for upgrading to a later version of SQL Server.

C2 or Common Compliance

It makes sense to implement one of these, ideally Common Compliance where this is available, as this gives us increased security within SQL Server at little cost and minimal effort to implement. Common Compliance is preferable both as C2 is deprecated and that C2 poorly managed (losing disk access) can create additional risks to application availability. Both offer additional levels of logging connections to our SQL Server boxes that may be useful in identifying those logging onto the server for nefarious purposes.

Change Capture

In particular for GDPR we will want to make sure we have some form of capture and logging of changes to personal data. We have a choice of technologies here.

Triggers

Triggers can provide a serviceable solution but the performance overhead is high so it may be better to use a built-in technology where one is available.

CDC

CDC provides a low overhead tool for managing change capture. However, it is only available in Enterprise Edition until SQL 2016 SP1.

Temporal Tables

These are the latest and greatest, so it may make sense to use where we are using SQL Server 2016, although as it is a synchronous operation it may have more of an immediate performance impact than CDC.

I hope to look at a performance comparison between these different methods of data capture in a later post.

Application Auditing

Whatever tool is used for change capture, application auditing is required to make sure the relevant user context is captured. It’s also worth considering logging user activity.

SQL Server Audit

For a best practice solution, where this is available to us it makes sense to use it. We can use it to capture data access even though the format is not going to be the most easy to analyse . We should also use it to capture events that could relate to anyone making changes to our Change Capture. If anyone makes changes to the Audit itself then this gets captured by the audit so we can detect tampering. We should make sure the Audit is saved to the Security log then that is also tamperproof. We can’t guarantee that we will capture all events, but we can reasonably guarantee that we will detect it if someone prevents us from doing so (e.g. turning off Auditing or Change Capture).

Conclusions

There are a number of components you may want to implement in order to deploy an effective data access auditing solution. In some cases your choices may be limited by the versions and editions of SQL your platforms currently sit on.
In some cases though, where the version and edition of SQL we use supports these technologies, then there is not a great deal of effort required to implement them.

In subsequent posts I’ll hope to show some simple examples in practice, as well as look briefly at the impacts on server performance.

Working with Data in Always Encrypted

In this post we’re going to create some encrypted columns in a table in a test database and look at some of the practicalities and limitations of working with Always Encrypted (AE).

There are actually a fair few limitations, but not because there anything wrong with the technology, rather they naturally fall out of logic of working with encrypted data.

Let’s just bash on with it and you’ll see what I mean.

Setting up a table with encrypted columns

I’ve got a database called AlwaysEncryptedTest, and it’s sitting on a remote instance – i.e. not my local machine. I’ve also got the Keys I created in the last post (Understanding Keys and Certificates with Always Encrypted). I’m now going to create a table with a couple of encrypted columns:

CREATE TABLE [dbo].[EncryptedTable](

Id INT IDENTITY(1,1) CONSTRAINT PK_EncryptedTable PRIMARY KEY CLUSTERED,

LastName [nvarchar](32) COLLATE Latin1_General_BIN2 ENCRYPTED WITH (

COLUMN_ENCRYPTION_KEY = MyColumnKey,

ENCRYPTION_TYPE = Deterministic,

ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256'

) NULL,

FirstName [nvarchar](32) COLLATE Latin1_General_BIN2 ENCRYPTED WITH (

COLUMN_ENCRYPTION_KEY = MyColumnKey,

ENCRYPTION_TYPE = Randomized,

ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256') NULL

);

There’s plenty to discuss there already. You might first of all ask why am I specifying a particular collation for those columns (Latin1_General_BIN2) and overriding my database collation settings. I hate people who do that! The answer is I don’t have a choice, text columns encrypted with AE must have a BIN2 collation. BIN2 collations are (apart from anything else) case-sensitive.

If you think about it, it makes sense that you can’t have a case-insensitive comparisons on an encrypted column – and therefore those collations cannot be supported. When you perform a comparison based on encrypted data, what the engine is doing is comparing one encrypted value with another. To enable a case-insensitive comparison there would have to be some deterministic pattern so that you can tell that two different encrypted values differ only by case, that would be more complicated to implement, would weaken the encryption, and isn’t supported by the algorithm used by AE. The requirement to use a BIN2 collation is driven by what will happen in practice when you compare two encrypted values looking for an exact match.

So there’s something to think about. Any searches you do against encrypted text data are going to be case sensitive. So say your application allows you to search for people by name, and you’re looking for “McGiffen” (my name) then you’re not going to find it if you type “Mcgiffen” or “mcgiffen”. For someone like myself who often see mis-captitalization of their name, as little as I care , it may be a struggle to find me if my name (as it’s Personal Identifiable Information) has been encrypted.

So how do you deal with this? I don’t know – you’ll have to think about that – maybe maintain an upper case copy for searching against and transform search terms into upper case BEFORE sending them to the database.

There are also a whole bunch of other data types that you can’t encrypt which I would imagine are to do with similar practicalities of the encryption implementation. Rather than list those here you can find that set and other limitations in this MSDN article:

https://msdn.microsoft.com/en-GB/library/mt163865.aspx?f=255&MSPPError=-2147217396

You can see in the SQL above I’ve specified that the COLUMN ENCRYPTION KEY “MyColumnKey” which I created in the previous blog post. But then for one column I’ve specified ENCRYPTION TYPE of “Deterministic” and for one it is “Random”. What’s the difference?

Well, Deterministic means that every time you encrypt a given value with a given key, the encrypted result will be the same. Randomized means it will be different. With deterministic it might be possible for someone to perform statistical analysis based on the distribution of your data to infer what some common values are – with Randomized that’s not going to happen.

However, if your encryption type is Randomized then you can’t search on that column and you certainly can’t join on it (or group or index on it either).

This makes perfect sense if you realise that SQL only knows the encrypted value of your data, so if it is to return results by comparing one encrypted value with another then the values must match for anything to be returned, for that to happen they must both have been encrypted using deterministic encryption.

Finally, in the column definition I’ve specified the encryption algorithm – the one you see is the only algorithm currently supported by AE, so just use the same one.

Inserting Encrypted Data

Now we’ve got the table let’s try inserting some data.

It should be no big surprise that if I want to run the following query from SSMS it doesn’t work:

INSERT INTO dbo.EncryptedTable (LastName, FirstName)


VALUES ('McGiffen','Matthew ');

You get a longish error message that basically tells you you’re trying to stick unencrypted data in an encrypted column:

Msg 206, Level 16, State 2, Line 24

Operand type clash: varchar is incompatible with varchar(8000) encrypted with (encryption_type = ‘DETERMINISTIC’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘MyColumnKey’, column_encryption_key_database_name = ‘AlwaysEncryptedTest’) collation_name = ‘Latin1_General_CI_AS’

To be able to insert data, there’s a few things you need to do.

First of all, this isn’t actually supported in SSMS 2016, so you need to download and install the vNext version which has the new feature “Parameterization for Always Encrypted”. Prior to this version if you want to test/play (beyond a few basics) with always encrypted data you had to write code in C# or another language.

Once you’ve got SSMS vNext, you can connect to your database. You have to enable Column Encryption for your connection – you’ll need to do this in your connection string whatever client you are querying your database from if your query might touch encrypted data – be that your application, SQLCMD or SSMS.

In SSMS you do that when you’re connecting to the instance in the “Connect to Server” dialog. Select “Options”, and go to “Additional Connection Parameters” and enter into the box “Column Encryption Setting = Enabled” – as per the image below:

AE_ConnectToServer

(One thing to note, is that SSMS will then try to use this setting for all subsequent connections you make until you go in and remove it. As the setting is only supported on SQL 2016 onward you get an error connecting to older instances, which is little bit of a pain…)

Now we’ll open a new query window against our database, and then we have to actually enable “Parameterization for Always Encrypted”. Right-click over the query and select “Query Options”, then “Advanced” and enable the option as shown below:

AE_Parametization.png

There’s an MSDN article about this feature here:

https://blogs.msdn.microsoft.com/sqlsecurity/2016/12/13/parameterization-for-always-encrypted-using-ssms-to-insert-into-update-and-filter-by-encrypted-columns/

In simple terms, it allows SSMS to perform additional parameterization on your query and encrypt any literal values that need encrypting before sending them to the database. In the previous versions of SSMS this would have just been sent plain text – and so the engine barks at you as in my error above.

The final step to get this to work is that you have to rewrite your original query to use variables rather than literal values in the insert. This helps SSMS to be able to manage the rest of the parameterization process. I’m going to show you a screen-shot of the updated code as it gets displayed in SSMS as there’s something else to notice:

AE_InsertSQL

See that my variable declarations and assignments are underlined in squiggly blue. If I hover over one of them, then I get this message:

AE_ToolTip

This shows me that the Parameterization is in action and working correctly.

If I run the insert now it works fine. Woo-hoo.

There are some implications of this if we think it through. You are only going to be able to do inserts based on values held in a variable or parameter. So you can only do one row at a time – and no, you can’t use a table variable for this. That means you can’t use value lists in an insert or merge statement, but logically, you probably don’t need to. Remember this is likely to be only sensitive personal information we’re encrypting – not reference data for which we may have scripts with lots of values. As such we would generally be inserting this one row at a time, and ideally through a stored procedure using parameters. So everything should be peachy.

It’s also logical to realise you can’t insert from one table to another unless the values in both tables are encrypted (hopefully using the same keys!). This goes back to the fact that the engine can’t encrypt or decrypt values – that has to happen in the client. The above insert only works because SSMS (our client) is doing all the work.

Let’s just run a select and check that we can see the data:

SELECT * FROM dbo.EncryptedTable

AE_Select1

Cool, this works. Note this will also work in SSMS 2016 as the query is simple enough to not require parameterization.

Let’s just have another look at the same data, but this time I’m going to run it direct from the server where the certificate for the Column Master Key doesn’t exist:

AE_Select2

So here you can only see the encrypted version of the data.

Out of interest let’s look at the full value for the LastName field:

0x0180B908F995B86C64511991CDE834DD4888A058D1B482E5E14297FFDA9D3C8E0828E3B80F3C58F98197AC73F5867E2837DD2E6C266FFAA95E38A7B08C111AF7EBEEC476EE6BBDB9704F1AC007CE1F613B

That’s just a little bit longer than “McGiffen” – so you can see there is going to be some storage overhead on your encrypted columns.

I’m just going to chuck a few extra rows into the table (including a duplicate record for me) so I can demonstrate a couple of points mentioned earlier. Here’s what we have when I’m done:

AE_Select3

Now let’s look again at the encrypted version of that:

AE_Select4

Remember in my table definition, I set LastName to be Deterministic Encryption, but FirstName to be randomized. If you look at the encrypted values above we can see some repeated values in the LastName column – even if we didn’t have the Ids to compare with the first recordset it’s pretty straightforward to work out which is “McGiffen” and which is “Rubble”.

Then look at FirstName. We know both records 1 and 2 though have a first name of “Matthew”. If we check the encrypted values in the FirstName column we can see they are both different – this is Randomized Encryption in practice.

Querying Encrypted Data With a Predicate

I’m going to go back to my encrypted (and parameterized) connection and try running some queries against those tables:

DECLARE @LastName NVARCHAR(32) = 'McGiffen';

SELECT * FROM dbo.EncryptedTable

WHERE LastName = @LastName

Results:

AE_Select5

So that worked fine, let’s just try and isolate those first two duplicates at the beginning though:

DECLARE @LastName NVARCHAR(32) = 'McGiffen';

DECLARE @FirstName NVARCHAR(32) = 'Matthew';

SELECT * FROM dbo.EncryptedTable

WHERE LastName = @LastName

AND FirstName = @FirstName;

This time I have no luck. I get the following error message that is telling me in a long-winded manner that I can’t compare against a column with Randomized encryption:

Msg 33299, Level 16, State 2, Line 35

Encryption scheme mismatch for columns/variables ‘@FirstName’, ‘FirstName’. The encryption scheme for the columns/variables is (encryption_type = ‘RANDOMIZED’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘MyColumnKey’, column_encryption_key_database_name = ‘AlwaysEncryptedTest’) and the expression near line ‘8’ expects it to be (encryption_type = ‘DETERMINISTIC’) (or weaker).

Msg 8180, Level 16, State 1, Procedure sp_describe_parameter_encryption, Line 1 [Batch Start Line 27]

Statement(s) could not be prepared.

An error occurred while executing batch. Error message is: Internal error. The format of the resultset returned by sp_describe_parameter_encryption is invalid. One of the resultsets is missing.

Of course this is exactly what we expected to happen.

What I want to get across is that this all makes perfect sense. If you understand what encryption is doing, then the limitations you face naturally arise from that. Key is understanding that the encryption occurs at the client side, so if you want to do anything in a query that requires the SQL engine to interact with an encrypted column then you are going to be limited. Mostly all it can do is compare two encrypted values and verify they are the same – and only if they are using the Deterministic setting. So, if you’re implementing AE, just think carefully about what columns you need to encrypt and particularly about how you need to interact with those columns.

What’s occurring in the background?

In the last post about certificates I mentioned the extra steps that AE has to go through when querying your data. Now we’ve got a table to play with, let’s look at a Profiler trace so we can see exactly what that involves in terms of interactions with SQL. The actions that occur within the client side libraries are a black box to us, but there is still some interesting stuff to be observed at the SQL end.

I’ll just set Profiler going using a default trace and insert another record into my table.

What we see when we do this is that there is an extra call to the database before the query is executed:

EXEC sp_describe_parameter_encryption

N'

DECLARE @LastName AS NVARCHAR (32) = @pce03080ba9844c4182a38c82216e4ad1;

DECLARE @FirstName AS NVARCHAR (32) = @pba60ad612001418aac8496e355e03892;

INSERT INTO dbo.EncryptedTable (LastName, FirstName)

VALUES (@LastName, @FirstName);

,N’@pce03080ba9844c4182a38c82216e4ad1 nvarchar(32),@pba60ad612001418aac8496e355e03892 nvarchar(32)’

This call is the client (SSMS) asking the server for details about any encryption that might be involved in the query. You’ll notice that SSMS has parameterized the query, replacing the literal values with dynamically named parameters.

The stored procedure returns two result-sets. The first contains a row for each Encryption Key involved in the query. In this case that’s just one, thus one record, so I’ve transposed it to make it easier to read:

AE_Desc_Param1

You can see this is sending back to client the encrypted value of the Column Encryption Key – and also telling it where to find the certificate of the local machine to decrypt that key.

AE_Desc_Param2

The second set describes each parameter, which of the identified Keys (from the first set) it is using, and also tells us whether it is using Deterministic or Randomized encryption (this is the “column_encryption_type” value).

Armed with that information, the client can then perform any encryption required and execute the actual query:

EXEC sp_executesql N'DECLARE @LastName AS NVARCHAR (32) = @pce03080ba9844c4182a38c82216e4ad1;

DECLARE @FirstName AS NVARCHAR (32) = @pba60ad612001418aac8496e355e03892;

INSERT INTO dbo.EncryptedTable (LastName, FirstName)

VALUES (@LastName, @FirstName);

',N'@pce03080ba9844c4182a38c82216e4ad1 nvarchar(32),@pba60ad612001418aac8496e355e03892 nvarchar(32)'

,@pce03080ba9844c4182a38c82216e4ad1=0x01F2CD73FDD15216E2D4DC89EB6DC046EBE3FFC70A967BB6A3F822B57EA840F9D60410825455391AAFE7DF161CE78F4C3D4DE65ED8FC2435115C4E0F81E49AD820

,@pba60ad612001418aac8496e355e03892=0x01C4FBE035F600CBF61B2589028863DF732973166D9752B29CBBF1C7814DF6E8BDAD0D95897D79E28884E103F350506B5465A0914AD22EE8C4BB6630FF02F39DB4

You can see that the parameterized query has now been executed via a call to sp_executesql. You can also see that those parameters have been set to the long binary values which represent the encrypted versions of the data I specified – which in this case was “Smith” and “John”.

I covered this briefly in the last post, but it’s good to understand that the extra step before executing the actual query has to happen even when there is no encryption involved as there is no way for the client to know which columns are encrypted and which are not. As such, once you are working with AE and have the Column Encryption Setting enabled for your connections, there will be some overhead on pretty much all your queries. Important to note though – if you are doing this via .NET code then all this extra process just happens automatically for you in background. From your point of view, you just code as normal.

Encrypting Existing Data

This post wouldn’t be complete unless I touched on this subject at least briefly. So that’s what I’ll do. Touch on it briefly.

So, how do you change an existing column containing data, so that it is encrypted?

If you’ve got this far then you can probably figure out for yourself that it’s not going to be as simple as an ALTER TABLE statement. Encryption is a client side function, so if you want to encrypt some data then that data’s got to go via a client.

There are various ways you can do that. You could hand crank something and do it that way. Generally your best bet is going to be a process that involves creating a new table with the encryption set up as you want it, copying over data from the old table into the new, than dropping the old table and renaming the new one. So no, this is not going to be a particularly online activity.

It can be a fairly straightforward one though as Microsoft has provided the tool to do that for you via SSIS using the SQL Server Import Export Wizard.

Here’s a nice MSDN post that takes you through that process

https://blogs.msdn.microsoft.com/sqlsecurity/2015/07/28/encrypting-existing-data-with-always-encrypted/

Understanding Keys and Certificates with Always Encrypted

Always Encrypted on SQL 2016 is pretty easy to set up. There’s even a single wizard to guide you through the whole process. However, the best way to get into trouble with this technology is by not clearly understanding what is going on.

I personally find I understand things best when I am solid on the basics. It also means I can then usually logically extrapolate to answer any questions that come up.

So here we’ll go into a little detail about the keys and certificates involved…

To get up and running with Always Encrypted you need three things:

  • An Always Encrypted Certificate to store on your application server / client machine
  • A Column Master Key in your database
  • A Column Encryption Key in your database

You can actually create all of these through SQL Server Management Studio, in fact the first two items get created in the same step, so let’s run through that process, and we can look at what these items are, and what they get used for in more detail as we go along.

Creating the Certificate and the Column Master Key

You can find Always Encrypted Keys under the Security folder for your database. Right-click over column master keys and select to create a new one:

SSMS_AE

And up comes the GUI:

AE_ColumnMasterKey

The Name field is the name for the Column Master Key within your database – you can see I’ve decided to call mine “Wibble”.

Beneath that you can select – or create – the certificate to be associated with the Key. You can see I already have a couple in the selected key store as I’ve been through this process before, and the selected certificate is highlighted. If you don’t have one, you can select from the “Key Store” drop down where you want the certificate to be stored, and then click the “Generate Certificate” button to generate a new certificate in that store.

It’s worth noting at this point that the certificate is created on your local machine – not the server hosting SQL Server. We’re used to SSMS being the tool through which we interact with and do things at the server end, so this can be a bit counter-intuitive, but in terms of what we are trying to achieve it makes sense. We want to create a certificate that exists on the client – so in this case that’s your local machine.

Click okay and the Column Master Key and certificate get created. You can see your new key in SSMS:

AE_Wibble

To understand what this key actually is, I’m going to right-click and generate the SQL definition:

CREATE COLUMN MASTER KEY [Wibble]

WITH

(

KEY_STORE_PROVIDER_NAME = N'MSSQL_CERTIFICATE_STORE',

KEY_PATH = N'CurrentUser/My/D492BA86737FECDABB14D13476122C1E4BD217D2'

)

GO

We only have two values, KEY_STORE_PROVIDER_NAME and KEY_PATH. All this is telling us is where to find the certificate we just created. So you can see that the Column Master Key is just a pointer to a certificate stored on client machines.

Creating the Column Encryption Key

This is the last thing we need before we can set up encryption on our columns. You can find “Column Encryption Keys” underneath “Column Master Keys” in SSMS. Again, right-click and bring up the GUI:

AE_ColumnKey

You can see I’ve called mine MyColumnKey, and I’ve selected from the drop down that it should be created using the Column Master Key I created earlier – “Wibble”. Click okay and the key gets created.

Again I’m going to generate the SQL definition for my new Column Encryption Key so we can see what’s inside it:

CREATE COLUMN ENCRYPTION KEY [MyColumnKey]

WITH VALUES

(

COLUMN_MASTER_KEY = [Wibble],

ALGORITHM = 'RSA_OAEP',

ENCRYPTED_VALUE = 0x016E000001630075007200720065006E00740075007300650072002F006D0079002F006400340039003200620061003800360037003300370066006500630064006100620062003100340064003100330034003700360031003200320063003100650034006200640032003100370064003200A6296213760796AA271152F6D1F1D8C5B4B84ADDB660EC700936B010918CF62FA63C9F716BCB4337D4D48E7F5ABB879F09EA0ABF2ABFDB54E4740DCD88631C533BE530A1CF491A63B0A334FBBA1971C4B22C18E7B8DFCB9975754F9232E47C5DED5857B6032D4BB95B3C1232D3168366B6A1DA4FFC637849FBC3226FFB78981BD1596DE0CD3ED894B127855FD93F3F40E6DB9602482A9AD69CE96C3D879F544534135FFCCA789CB69E3C52C3A7CCB3AFC55850DA7DD82BE963A36A3AC77ECB26CE77A01F3CC43CEA26757548785A4C5A538250C325CEB106495211A625CDAFF5E9EED96EDE3AFFC8E6518E0E29C4F796209EE5ED35B04257844E6ACF74B0EEF907F667D699FE01BCF29BDA275BC9260E03130DA7008F30CD6CBB12F60AD6B47E1B5338D18FAF50DD6EBE3712FC4E814E71CA1E4F605F97F6DB51B1BBB6819E5BF5E8DBB6EB453B268173DB0B69B6CE78A57BC7C98FBBC34BA205B7DB59FEECCEB235A38B089D31C4892FB870A305D5FB1FC69A91C009B9F9D2B572C0DA2DCAC76C6AF48DF54B3C84D7110BD903F06817FD9BC100773028F357966F029B89DF49B8564DD901349F1304124B3DC691A626568C30224766C18B631CD42097DCE0B927A9A401A51EFE05BD5DC6CB9F7F2F3166A5D2A1DECC7D060C26B41D42BF2EF9ACA5919568979F713EFA655EEEF2A4280060D0D73553633B0BEA4925E595805C

)

GO

We can see it tells us what Column Master Key this is based on, what Algorithm has been used, and a long binary Encrypted Value. So what is this?

What has happened is that in the background a value has been created that will act as the Column Encryption Key when MyColumnKey is used to encrypt (or decrypt) data in a column. That value has then itself been encrypted using the certificate that the Master Column Key is based on. The encrypted value of the Column Encryption key is what then gets actually stored in the database.

That might seem overcomplicated. If we can encrypt data using the certificate, then why do we need a separate key in the database?

It actuality it’s very important that we have both. Imagine the scenario where we just use a certificate as the key (or just a single key in another form) and that expires, or worse there is a breach and it gets copied. In this case we are going to have to change our certificate or key. But if we change the key, how are we going to read the existing data in the table?

There is only one way, which is to decrypt all the existing data, and then re-encrypt it with the new key. Let’s say you have tens of millions of records, this might take a while, and will certainly require the application to be taken offline. Also, while this process is being undertaken, the data is unencrypted and at risk.

With the approach taken by Always Encrypted, we have a way around this. Let’s say we have to replace our certificate. All we need to do is create a new COLUMN MASTER KEY based on the new certificate, then we can decrypt our COLUMN ENCRYPTION KEY, ENCRYPTED_VALUE using the old certificate and re-encrypt that value with our new certificate. Because the actual Key value itself hasn’t changed then we can continue to read the existing data using the new certificate.

In practice this is done for you in the background through a process called “Key Rotation”. A given COLUMN ENCRYPTION KEY can actually support having two COLUMN MASTER KEYS at the same time – each of which would have it’s own encrypted version of the same underlying key. This allows the process to add a new master key and associated encrypted value, before removing the old one – which allows the key rotation process to occur fully online.

There is (understandably) a performance overhead to having two Keys at once so avoid doing this for any other reason.

I’m not going to go any further into the process of Key Rotation here, but it is generally seen as good practice to retire and refresh your certificates regularly – when you create them through SSMS they get a default expiry data one year in the future. So if you are implementing Always Encrypted, it is important to be confident you know how you will approach things when your certificate expires.

So how do the keys get used in practice?

When you run a query over a connection with column encryption enabled, the following steps occur:

  1. The client sends a preliminary call to the SQL Server Instance containing the query text. In this call it is asking for details of potential encryption on any columns involved in the query and what keys may be involved.
  2. If there are encrypted columns involved, SQL supplies that information, as well as the Column Master Key (which tells the client where to find the relevant certificate locally), and the encrypted value of the Column Encryption Key for each encrypted column.
  3. Using the information and keys, the client can then encrypt any values it needs to before sending to the database. For instance, if it is inserting data into, or looking for a particular value, in an encrypted column.
  4. The modified query is now sent to SQL for execution.
  5. SQL returns any results – with values in encrypted columns remaining in their encrypted form.
  6. If a resultset is returned, the client uses the information and keys supplied in step 2 in order to decrypt any encrypted values.

So you can see there are a few extra steps over and above your normal query execution, and that does create some performance overhead.

Final thoughts

Once you get into the business of having encryption keys then the process by which you manage them is going to be critical. If they are too easy for anyone in your organisation to access, then you may sabotage your efforts in encrypting the data in the first place.

You are going to want to have different keys in your production environments to those you have in test and development as a minimum.

Think about how you are going to manage that through your deployment processes. Do you want to generate new keys and certificates with fresh deployments? In which case where are they backed up, and who has access to them? You want to be absolutely sure you can recover the certificate your Master Column Key is based on if it is lost from your production environment. Otherwise your encrypted data is gone for good.

If you are shipping a product to a client with Always Encryption enabled – do they have the knowledge and processes to manage the keys moving forward?

And what is your plan for key rotation? How will the right people be reminded before certificates expire that they need to do something about it?

This is a great technology, and Microsoft has made it as simple as possible to use in practice. Nonetheless, it does need to be used with care.

Working with Data in Always Encrypted

In this post we’re going to create some encrypted columns in a table in a test database and look at some of the practicalities and limitations of working with Always Encrypted (AE).

There are actually a fair few limitations, but not because there anything wrong with the technology, rather they naturally fall out of logic of working with encrypted data.

Let’s just bash on with it and you’ll see what I mean.

Setting up a table with encrypted columns

I’ve got a database called AlwaysEncryptedTest, and it’s sitting on a remote instance – i.e. not my local machine. I’ve also got the Keys I created in the last post (Understanding Keys and Certificates with Always Encrypted). I’m now going to create a table with a couple of encrypted columns:

CREATE TABLE [dbo].[EncryptedTable](
Id INT IDENTITY(1,1) CONSTRAINT PK_EncryptedTable PRIMARY KEY CLUSTERED,
LastName [nvarchar](32) COLLATE Latin1_General_BIN2 ENCRYPTED WITH (
COLUMN_ENCRYPTION_KEY = MyColumnKey,
ENCRYPTION_TYPE = Deterministic,
ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256'
) NULL,
FirstName [nvarchar](32) COLLATE Latin1_General_BIN2 ENCRYPTED WITH (
COLUMN_ENCRYPTION_KEY = MyColumnKey,
ENCRYPTION_TYPE = Randomized,
ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256') NULL
);

There’s plenty to discuss there already. You might first of all ask why am I specifying a particular collation for those columns (Latin1_General_BIN2) and overriding my database collation settings. I hate people who do that! The answer is I don’t have a choice, text columns encrypted with AE must have a BIN2 collation. BIN2 collations are (apart from anything else) case-sensitive.

If you think about it, it makes sense that you can’t have a case-insensitive comparisons on an encrypted column – and therefore those collations cannot be supported. When you perform a comparison based on encrypted data, what the engine is doing is comparing one encrypted value with another. To enable a case-insensitive comparison there would have to be some deterministic pattern so that you can tell that two different encrypted values differ only by case, that would be more complicated to implement, would weaken the encryption, and isn’t supported by the algorithm used by AE. The requirement to use a BIN2 collation is driven by what will happen in practice when you compare two encrypted values looking for an exact match.

So there’s something to think about. Any searches you do against encrypted text data are going to be case sensitive. So say your application allows you to search for people by name, and you’re looking for “McGiffen” (my name) then you’re not going to find it if you type “Mcgiffen” or “mcgiffen”. For someone like myself who often see mis-captitalization of their name, as little as I care , it may be a struggle to find me if my name (as it’s Personal Identifiable Information) has been encrypted.

So how do you deal with this? I don’t know – you’ll have to think about that – maybe maintain an upper case copy for searching against and transform search terms into upper case BEFORE sending them to the database.

There are also a whole bunch of other data types that you can’t encrypt which I would imagine are to do with similar practicalities of the encryption implementation. Rather than list those here you can find that set and other limitations in this MSDN article:
https://msdn.microsoft.com/en-GB/library/mt163865.aspx?f=255&MSPPError=-2147217396

You can see in the SQL above I’ve specified that the COLUMN ENCRYPTION KEY “MyColumnKey” which I created in the previous blog post. But then for one column I’ve specified ENCRYPTION TYPE of “Deterministic” and for one it is “Random”. What’s the difference?

Well, Deterministic means that every time you encrypt a given value with a given key, the encrypted result will be the same. Randomized means it will be different. With deterministic it might be possible for someone to perform statistical analysis based on the distribution of your data to infer what some common values are – with Randomized that’s not going to happen.

However, if your encryption type is Randomized then you can’t search on that column and you certainly can’t join on it (or group or index on it either).

This makes perfect sense if you realise that SQL only knows the encrypted value of your data, so if it is to return results by comparing one encrypted value with another then the values must match for anything to be returned, for that to happen they must both have been encrypted using deterministic encryption.

Finally, in the column definition I’ve specified the encryption algorithm – the one you see is the only algorithm currently supported by AE, so just use the same one.

Inserting Encrypted Data

Now we’ve got the table let’s try inserting some data.

It should be no big surprise that if I want to run the following query from SSMS it doesn’t work:
INSERT INTO dbo.EncryptedTable (LastName, FirstName)
VALUES ('McGiffen','Matthew ');

You get a longish error message that basically tells you you’re trying to stick unencrypted data in an encrypted column:
Msg 206, Level 16, State 2, Line 24
Operand type clash: varchar is incompatible with varchar(8000) encrypted with (encryption_type = ‘DETERMINISTIC’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘MyColumnKey’, column_encryption_key_database_name = ‘AlwaysEncryptedTest’) collation_name = ‘Latin1_General_CI_AS’

To be able to insert data, there’s a few things you need to do.

First of all, this isn’t actually supported in SSMS 2016, so you need to download and install the vNext version which has the new feature “Parameterization for Always Encrypted”. Prior to this version if you want to test/play (beyond a few basics) with always encrypted data you had to write code in C# or another language.

Once you’ve got SSMS vNext, you can connect to your database. You have to enable Column Encryption for your connection – you’ll need to do this in your connection string whatever client you are querying your database from if your query might touch encrypted data – be that your application, SQLCMD or SSMS.

In SSMS you do that when you’re connecting to the instance in the “Connect to Server” dialog. Select “Options”, and go to “Additional Connection Parameters” and enter into the box “Column Encryption Setting = Enabled” – as per the image below:

AE_ConnectToServer

(One thing to note, is that SSMS will then try to use this setting for all subsequent connections you make until you go in and remove it. As the setting is only supported on SQL 2016 onward you get an error connecting to older instances, which is little bit of a pain…)

Now we’ll open a new query window against our database, and then we have to actually enable “Parameterization for Always Encrypted”. Right-click over the query and select “Query Options”, then “Advanced” and enable the option as shown below:

AE_Parametization.png

There’s an MSDN article about this feature here:
https://blogs.msdn.microsoft.com/sqlsecurity/2016/12/13/parameterization-for-always-encrypted-using-ssms-to-insert-into-update-and-filter-by-encrypted-columns/

In simple terms, it allows SSMS to perform additional parameterization on your query and encrypt any literal values that need encrypting before sending them to the database. In the previous versions of SSMS this would have just been sent plain text – and so the engine barks at you as in my error above.

The final step to get this to work is that you have to rewrite your original query to use variables rather than literal values in the insert. This helps SSMS to be able to manage the rest of the parameterization process. I’m going to show you a screen-shot of the updated code as it gets displayed in SSMS as there’s something else to notice:

AE_InsertSQL

See that my variable declarations and assignments are underlined in squiggly blue. If I hover over one of them, then I get this message:

AE_ToolTip

This shows me that the Parameterization is in action and working correctly.

If I run the insert now it works fine. Woo-hoo.

There are some implications of this if we think it through. You are only going to be able to do inserts based on values held in a variable or parameter. So you can only do one row at a time – and no, you can’t use a table variable for this. That means you can’t use value lists in an insert or merge statement, but logically, you probably don’t need to. Remember this is likely to be only sensitive personal information we’re encrypting – not reference data for which we may have scripts with lots of values. As such we would generally be inserting this one row at a time, and ideally through a stored procedure using parameters. So everything should be peachy.

It’s also logical to realise you can’t insert from one table to another unless the values in both tables are encrypted (hopefully using the same keys!). This goes back to the fact that the engine can’t encrypt or decrypt values – that has to happen in the client. The above insert only works because SSMS (our client) is doing all the work.

Let’s just run a select and check that we can see the data:

SELECT * FROM dbo.EncryptedTable

AE_Select1

Cool, this works. Note this will also work in SSMS 2016 as the query is simple enough to not require parameterization.

Let’s just have another look at the same data, but this time I’m going to run it direct from the server where the certificate for the Column Master Key doesn’t exist:
AE_Select2
So here you can only see the encrypted version of the data.

Out of interest let’s look at the full value for the LastName field:
0x0180B908F995B86C64511991CDE834DD4888A058D1B482E5E14297FFDA9D3C8E0828E3B80F3C58F98197AC73F5867E2837DD2E6C266FFAA95E38A7B08C111AF7EBEEC476EE6BBDB9704F1AC007CE1F613B

That’s just a little bit longer than “McGiffen” – so you can see there is going to be some storage overhead on your encrypted columns.

I’m just going to chuck a few extra rows into the table (including a duplicate record for me) so I can demonstrate a couple of points mentioned earlier. Here’s what we have when I’m done:

AE_Select3

Now let’s look again at the encrypted version of that:

AE_Select4

Remember in my table definition, I set LastName to be Deterministic Encryption, but FirstName to be randomized. If you look at the encrypted values above we can see some repeated values in the LastName column – even if we didn’t have the Ids to compare with the first recordset it’s pretty straightforward to work out which is “McGiffen” and which is “Rubble”.

Then look at FirstName. We know both records 1 and 2 though have a first name of “Matthew”. If we check the encrypted values in the FirstName column we can see they are both different – this is Randomized Encryption in practice.

Querying Encrypted Data With a Predicate

I’m going to go back to my encrypted (and parameterized) connection and try running some queries against those tables:

DECLARE @LastName NVARCHAR(32) = 'McGiffen';
SELECT * FROM dbo.EncryptedTable
WHERE LastName = @LastName

Results:

AE_Select5

So that worked fine, let’s just try and isolate those first two duplicates at the beginning though:

DECLARE @LastName NVARCHAR(32) = 'McGiffen';
DECLARE @FirstName NVARCHAR(32) = 'Matthew';
SELECT * FROM dbo.EncryptedTable
WHERE LastName = @LastName
AND FirstName = @FirstName;

This time I have no luck. I get the following error message that is telling me in a long-winded manner that I can’t compare against a column with Randomized encryption:

Msg 33299, Level 16, State 2, Line 35
Encryption scheme mismatch for columns/variables ‘@FirstName’, ‘FirstName’. The encryption scheme for the columns/variables is (encryption_type = ‘RANDOMIZED’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘MyColumnKey’, column_encryption_key_database_name = ‘AlwaysEncryptedTest’) and the expression near line ‘8’ expects it to be (encryption_type = ‘DETERMINISTIC’) (or weaker).
Msg 8180, Level 16, State 1, Procedure sp_describe_parameter_encryption, Line 1 [Batch Start Line 27]
Statement(s) could not be prepared.
An error occurred while executing batch. Error message is: Internal error. The format of the resultset returned by sp_describe_parameter_encryption is invalid. One of the resultsets is missing.

Of course this is exactly what we expected to happen.

What I want to get across is that this all makes perfect sense. If you understand what encryption is doing, then the limitations you face naturally arise from that. Key is understanding that the encryption occurs at the client side, so if you want to do anything in a query that requires the SQL engine to interact with an encrypted column then you are going to be limited. Mostly all it can do is compare two encrypted values and verify they are the same – and only if they are using the Deterministic setting. So, if you’re implementing AE, just think carefully about what columns you need to encrypt and particularly about how you need to interact with those columns.

What’s occurring in the background?

In the last post about certificates I mentioned the extra steps that AE has to go through when querying your data. Now we’ve got a table to play with, let’s look at a Profiler trace so we can see exactly what that involves in terms of interactions with SQL. The actions that occur within the client side libraries are a black box to us, but there is still some interesting stuff to be observed at the SQL end.

I’ll just set Profiler going using a default trace and insert another record into my table.

What we see when we do this is that there is an extra call to the database before the query is executed:

EXEC sp_describe_parameter_encryption
N'
DECLARE @LastName AS NVARCHAR (32) = @pce03080ba9844c4182a38c82216e4ad1;

DECLARE @FirstName AS NVARCHAR (32) = @pba60ad612001418aac8496e355e03892;

INSERT INTO dbo.EncryptedTable (LastName, FirstName)
VALUES (@LastName, @FirstName);

,N’@pce03080ba9844c4182a38c82216e4ad1 nvarchar(32),@pba60ad612001418aac8496e355e03892 nvarchar(32)’

This call is the client (SSMS) asking the server for details about any encryption that might be involved in the query. You’ll notice that SSMS has parameterized the query, replacing the literal values with dynamically named parameters.

The stored procedure returns two result-sets. The first contains a row for each Encryption Key involved in the query. In this case that’s just one, thus one record, so I’ve transposed it to make it easier to read:

AE_Desc_Param1

You can see this is sending back to client the encrypted value of the Column Encryption Key – and also telling it where to find the certificate of the local machine to decrypt that key.

AE_Desc_Param2

The second set describes each parameter, which of the identified Keys (from the first set) it is using, and also tells us whether it is using Deterministic or Randomized encryption (this is the “column_encryption_type” value).
Armed with that information, the client can then perform any encryption required and execute the actual query:

EXEC sp_executesql N'DECLARE @LastName AS NVARCHAR (32) = @pce03080ba9844c4182a38c82216e4ad1;
DECLARE @FirstName AS NVARCHAR (32) = @pba60ad612001418aac8496e355e03892;
INSERT INTO dbo.EncryptedTable (LastName, FirstName)
VALUES (@LastName, @FirstName);
'
,N'@pce03080ba9844c4182a38c82216e4ad1 nvarchar(32),@pba60ad612001418aac8496e355e03892 nvarchar(32)'
,@pce03080ba9844c4182a38c82216e4ad1=0x01F2CD73FDD15216E2D4DC89EB6DC046EBE3FFC70A967BB6A3F822B57EA840F9D60410825455391AAFE7DF161CE78F4C3D4DE65ED8FC2435115C4E0F81E49AD820
,@pba60ad612001418aac8496e355e03892=0x01C4FBE035F600CBF61B2589028863DF732973166D9752B29CBBF1C7814DF6E8BDAD0D95897D79E28884E103F350506B5465A0914AD22EE8C4BB6630FF02F39DB4

You can see that the parameterized query has now been executed via a call to sp_executesql. You can also see that those parameters have been set to the long binary values which represent the encrypted versions of the data I specified – which in this case was “Smith” and “John”.

I covered this briefly in the last post, but it’s good to understand that the extra step before executing the actual query has to happen even when there is no encryption involved as there is no way for the client to know which columns are encrypted and which are not. As such, once you are working with AE and have the Column Encryption Setting enabled for your connections, there will be some overhead on pretty much all your queries. Important to note though – if you are doing this via .NET code then all this extra process just happens automatically for you in background. From your point of view, you just code as normal.

Encrypting Existing Data

This post wouldn’t be complete unless I touched on this subject at least briefly. So that’s what I’ll do. Touch on it briefly.

So, how do you change an existing column containing data, so that it is encrypted?
If you’ve got this far then you can probably figure out for yourself that it’s not going to be as simple as an ALTER TABLE statement. Encryption is a client side function, so if you want to encrypt some data then that data’s got to go via a client.

There are various ways you can do that. You could hand crank something and do it that way. Generally your best bet is going to be a process that involves creating a new table with the encryption set up as you want it, copying over data from the old table into the new, than dropping the old table and renaming the new one. So no, this is not going to be a particularly online activity.

It can be a fairly straightforward one though as Microsoft has provided the tool to do that for you via SSIS using the SQL Server Import Export Wizard.

Here’s a nice MSDN post that takes you through that process
https://blogs.msdn.microsoft.com/sqlsecurity/2015/07/28/encrypting-existing-data-with-always-encrypted/

Understanding Keys and Certificates with Always Encrypted

Always Encrypted on SQL 2016 is pretty easy to set up. There’s even a single wizard to guide you through the whole process. However, the best way to get into trouble with this technology is by not clearly understanding what is going on.

I personally find I understand things best when I am solid on the basics. It also means I can then usually logically extrapolate to answer any questions that come up.

So here we’ll go into a little detail about the keys and certificates involved…

To get up and running with Always Encrypted you need three things:

  • An Always Encrypted Certificate to store on your application server / client machine
  • A Column Master Key in your database
  • A Column Encryption Key in your database

You can actually create all of these through SQL Server Management Studio, in fact the first two items get created in the same step, so let’s run through that process, and we can look at what these items are, and what they get used for in more detail as we go along.
Creating the Certificate and the Column Master Key
You can find Always Encrypted Keys under the Security folder for your database. Right-click over column master keys and select to create a new one:

SSMS_AE

And up comes the GUI:

AE_ColumnMasterKey

The Name field is the name for the Column Master Key within your database – you can see I’ve decided to call mine “Wibble”.

Beneath that you can select – or create – the certificate to be associated with the Key. You can see I already have a couple in the selected key store as I’ve been through this process before, and the selected certificate is highlighted. If you don’t have one, you can select from the “Key Store” drop down where you want the certificate to be stored, and then click the “Generate Certificate” button to generate a new certificate in that store.

It’s worth noting at this point that the certificate is created on your local machine – not the server hosting SQL Server. We’re used to SSMS being the tool through which we interact with and do things at the server end, so this can be a bit counter-intuitive, but in terms of what we are trying to achieve it makes sense. We want to create a certificate that exists on the client – so in this case that’s your local machine.

Click okay and the Column Master Key and certificate get created. You can see your new key in SSMS:

AE_Wibble

To understand what this key actually is, I’m going to right-click and generate the SQL definition:

CREATE COLUMN MASTER KEY [Wibble]
WITH
(
KEY_STORE_PROVIDER_NAME = N'MSSQL_CERTIFICATE_STORE',
KEY_PATH = N'CurrentUser/My/D492BA86737FECDABB14D13476122C1E4BD217D2'
)
GO

We only have two values, KEY_STORE_PROVIDER_NAME and KEY_PATH. All this is telling us is where to find the certificate we just created. So you can see that the Column Master Key is just a pointer to a certificate stored on client machines.

Creating the Column Encryption Key
This is the last thing we need before we can set up encryption on our columns. You can find “Column Encryption Keys” underneath “Column Master Keys” in SSMS. Again, right-click and bring up the GUI:

AE_ColumnKey

You can see I’ve called mine MyColumnKey, and I’ve selected from the drop down that it should be created using the Column Master Key I created earlier – “Wibble”. Click okay and the key gets created.
Again I’m going to generate the SQL definition for my new Column Encryption Key so we can see what’s inside it:
CREATE COLUMN ENCRYPTION KEY [MyColumnKey]
WITH VALUES
(
COLUMN_MASTER_KEY = [Wibble],
ALGORITHM = 'RSA_OAEP',
ENCRYPTED_VALUE = 0x016E000001630075007200720065006E00740075007300650072002F006D0079002F006400340039003200620061003800360037003300370066006500630064006100620062003100340064003100330034003700360031003200320063003100650034006200640032003100370064003200A6296213760796AA271152F6D1F1D8C5B4B84ADDB660EC700936B010918CF62FA63C9F716BCB4337D4D48E7F5ABB879F09EA0ABF2ABFDB54E4740DCD88631C533BE530A1CF491A63B0A334FBBA1971C4B22C18E7B8DFCB9975754F9232E47C5DED5857B6032D4BB95B3C1232D3168366B6A1DA4FFC637849FBC3226FFB78981BD1596DE0CD3ED894B127855FD93F3F40E6DB9602482A9AD69CE96C3D879F544534135FFCCA789CB69E3C52C3A7CCB3AFC55850DA7DD82BE963A36A3AC77ECB26CE77A01F3CC43CEA26757548785A4C5A538250C325CEB106495211A625CDAFF5E9EED96EDE3AFFC8E6518E0E29C4F796209EE5ED35B04257844E6ACF74B0EEF907F667D699FE01BCF29BDA275BC9260E03130DA7008F30CD6CBB12F60AD6B47E1B5338D18FAF50DD6EBE3712FC4E814E71CA1E4F605F97F6DB51B1BBB6819E5BF5E8DBB6EB453B268173DB0B69B6CE78A57BC7C98FBBC34BA205B7DB59FEECCEB235A38B089D31C4892FB870A305D5FB1FC69A91C009B9F9D2B572C0DA2DCAC76C6AF48DF54B3C84D7110BD903F06817FD9BC100773028F357966F029B89DF49B8564DD901349F1304124B3DC691A626568C30224766C18B631CD42097DCE0B927A9A401A51EFE05BD5DC6CB9F7F2F3166A5D2A1DECC7D060C26B41D42BF2EF9ACA5919568979F713EFA655EEEF2A4280060D0D73553633B0BEA4925E595805C
)
GO

We can see it tells us what Column Master Key this is based on, what Algorithm has been used, and a long binary Encrypted Value. So what is this?

What has happened is that in the background a value has been created that will act as the Column Encryption Key when MyColumnKey is used to encrypt (or decrypt) data in a column. That value has then itself been encrypted using the certificate that the Master Column Key is based on. The encrypted value of the Column Encryption key is what then gets actually stored in the database.

That might seem overcomplicated. If we can encrypt data using the certificate, then why do we need a separate key in the database?

It actuality it’s very important that we have both. Imagine the scenario where we just use a certificate as the key (or just a single key in another form) and that expires, or worse there is a breach and it gets copied. In this case we are going to have to change our certificate or key. But if we change the key, how are we going to read the existing data in the table?

There is only one way, which is to decrypt all the existing data, and then re-encrypt it with the new key. Let’s say you have tens of millions of records, this might take a while, and will certainly require the application to be taken offline. Also, while this process is being undertaken, the data is unencrypted and at risk.

With the approach taken by Always Encrypted, we have a way around this. Let’s say we have to replace our certificate. All we need to do is create a new COLUMN MASTER KEY based on the new certificate, then we can decrypt our COLUMN ENCRYPTION KEY, ENCRYPTED_VALUE using the old certificate and re-encrypt that value with our new certificate. Because the actual Key value itself hasn’t changed then we can continue to read the existing data using the new certificate.

In practice this is done for you in the background through a process called “Key Rotation”. A given COLUMN ENCRYPTION KEY can actually support having two COLUMN MASTER KEYS at the same time – each of which would have it’s own encrypted version of the same underlying key. This allows the process to add a new master key and associated encrypted value, before removing the old one – which allows the key rotation process to occur fully online.

There is (understandably) a performance overhead to having two Keys at once so avoid doing this for any other reason.

I’m not going to go any further into the process of Key Rotation here, but it is generally seen as good practice to retire and refresh your certificates regularly – when you create them through SSMS they get a default expiry data one year in the future. So if you are implementing Always Encrypted, it is important to be confident you know how you will approach things when your certificate expires.

So how do the keys get used in practice?
When you run a query over a connection with column encryption enabled, the following steps occur:

  1. The client sends a preliminary call to the SQL Server Instance containing the query text. In this call it is asking for details of potential encryption on any columns involved in the query and what keys may be involved.
  2. If there are encrypted columns involved, SQL supplies that information, as well as the Column Master Key (which tells the client where to find the relevant certificate locally), and the encrypted value of the Column Encryption Key for each encrypted column.
  3. Using the information and keys, the client can then encrypt any values it needs to before sending to the database. For instance, if it is inserting data into, or looking for a particular value, in an encrypted column.
  4. The modified query is now sent to SQL for execution.
  5. SQL returns any results – with values in encrypted columns remaining in their encrypted form.
  6. If a resultset is returned, the client uses the information and keys supplied in step 2 in order to decrypt any encrypted values.

So you can see there are a few extra steps over and above your normal query execution, and that does create some performance overhead.

Final thoughts
Once you get into the business of having encryption keys then the process by which you manage them is going to be critical. If they are too easy for anyone in your organisation to access, then you may sabotage your efforts in encrypting the data in the first place.

You are going to want to have different keys in your production environments to those you have in test and development as a minimum.

Think about how you are going to manage that through your deployment processes. Do you want to generate new keys and certificates with fresh deployments? In which case where are they backed up, and who has access to them? You want to be absolutely sure you can recover the certificate your Master Column Key is based on if it is lost from your production environment. Otherwise your encrypted data is gone for good.

If you are shipping a product to a client with Always Encryption enabled – do they have the knowledge and processes to manage the keys moving forward?

And what is your plan for key rotation? How will the right people be reminded before certificates expire that they need to do something about it?

This is a great technology, and Microsoft has made it as simple as possible to use in practice. Nonetheless, it does need to be used with care.

SQL Server 2016 – Always Encrypted and the GDPR

The European General Data Protection Regulation (GDPR) is coming, bringing new rules about the protection of Personally Identifiable Information (PII).

For those of us in Europe – or wishing to sell software products and solutions within Europe – this is a big deal. It’s going to mean an increased focus on data protection, and the potential for large penalties in the case of data breaches. Those penalties are also increased if we can’t show that we have taken sufficient measures to protect that data.

This affects us where we host applications and data for our clients, but also when we sell software to clients to use on-premise. They are going to want to be sure that the data in those applications is as secure as possible.

Security infrastructure and data governance are going to be of critical importance to everyone, but for software vendors it seems likely that Encryption is going to be a keyword.

In terms of the UK and Brexit, the British government has confirmed that the Brexit process will not affect the commencement of the GDPR in the UK.

The new regulations come into effect in the UK on the 25th May 2018. In the world of development lifecycles, roadmaps and deployments, that’s not a massive amount of time. So wouldn’t it be nice if there was something reasonably easy to implement that we could use to protect our clients’ data.

Transparent Data Encryption (TDE)
In SQL Server we already had Transparent Data Encryption.

TDE works by encrypting all the data for a database on disk – so if someone steals your data files, or backup files then they can’t read your data.

What TDE doesn’t do is keep data encrypted as it’s in memory or passed across the network to a requesting application. Also, if you have been granted access to the database then you will be able to read the data with no issues. That may not seem like much of a problem if you have our security locked down correctly, but usually you at least need to have a DBA who has system admin rights over the database – that DBA could abuse their vast privilege and power!

TDE is also an Enterprise edition only feature. To put that in context, if you currently have an application running on standard edition and which is fully provisioned with the 24 cores you are allowed to run on standard edition, and if you want to implement TDE, then it would cost you $5,000 per core to upgrade to Enterprise – that’s $120,000 – before you can go ahead.

Enter Always Encrypted
Always Encrypted offers a solution to these issues. You can choose to encrypt data on a column by column basis. The data is encrypted by the application before being sent across the network and remains encrypted as it is stored in SQL Server (both in memory and on disk). When it’s retrieved, it remains encrypted until it gets to the client application where it can be decrypted and used.

This is achieved by use of an Always Encrypted Certificate which is only stored on the application server, i.e. the database server has no access to it. Even under the covers SQL Server can’t read your encrypted data without the certificate.

So pesky DBA – your evil plans are foiled!

Seriously though, it’s unfortunately rather common for people to have levels of access to databases that they don’t need and shouldn’t have. Or worse, for access to granted to people through the use of the built-in Sa account – whose password is rarely changed.

To use Always Encrypted you need to be on SQL Server 2016 (or higher), and your application must be using at least version 4.6 of the .NET Framework.

From SQL 2016 SP1, Always Encrypted is a feature available in the Standard and Express editions – so that makes it free for users of those editions, not just those on Enterprise.

As I’ve said about a few of the other new features in 2016 – this isn’t rocket science – this isn’t an amazing new technology that’s going to take us to the stars. What it is though is, easy to use, effective, and minimal effort to implement. It also, once set up, doesn’t require developers to remember to do anything like sending data through a third party library to ensure encryption remains. In terms of hitting an imminent deadline for Data Protection without vast expense – this could be rather handy.

There are of course limitations, impacts and a few things in general that are worth understanding.

Initially I planned just to write a single post on this subject. It was quite straightforward to go through the process of setting up an Always Encrypted test – but once that was done I found I started to ask myself a lot of “what ifs”. I then found that by delving deeper, and understanding what was actually going on with Always Encrypted in a bit more detail at a lower level, that these questions started answering themselves, and that some of the limitations of Always Encrypted made perfect logical sense.

So we’ll have a look at some of that in the next few posts.