What is Transparent Data Encryption?

Transparent Data Encryption (TDE) was introduced in SQL 2008 as a way of protecting “at rest” data. It continues to be available in all versions of SQL right up until the present, until recently it was only available in the Enterprise editions of SQL Server but from SQL 2019 it was made available in standard edition.

When we talk about “at rest” data we are referring to data that has been written to disk. In terms of our SQL databases that includes:

  • Any data files for our database
  • Any log files for our database
  • All backup files for the database, be they Full, Log or Differential backups
  • Database snapshot files
  • Any data written to disk in the TempDB database

The last item in that list, TempDB, needs to be included for completeness. Imagine that you query your database and as part of the query execution TempDB is used. If that data was written to disk then that creates a hole in our protection, someone could potentially read or copy the TempDB files and might see some of the data we are trying to protect. As a result when you enable TDE against any database on your SQL Server instance, the TempDB database is automatically encrypted as well to prevent this from happening.

Data “at rest” of course doesn’t include the following things:

  • Data loaded/stored in memory (buffer pool)
  • Data returned from a query and being passed across the network
  • Data received by a client as a result of a query

If you want to cover those scenarios as well then you need to look at other forms of encryption e.g. TLS and Always Encrypted.

There are also some less obvious exceptions which occur where SQL doesn’t use the buffer pool – and therefore there isn’t an in-memory version of the data:

  • Filestream data
  • Data persisted to disk using Buffer Pool Extensions

And there are a couple of other exceptions that can occur in certain circumstances:

  • Where the buffer pool gets paged to disk due to memory pressure
  • SQL dump files when there is a crash

That’s summarised in the below diagram:

TDE Encrypted vs Unencrypted

TDE mainly uses standard encryption protocols based on AES (Advanced Encryption Standard). When you set up TDE you can specify which AES algorithm you wish to use, AES_128, AES_192 or AES_256. In each case the number specifies the length of the key to be used for encryption in bits.

Obviously the longer your key, the harder the encryption should be to crack, however even for AES_128, estimations of how long it would take to break down the key by brute force vary between a thousand years, to numbers many times greater than the age of the universe – trillions of years. The difference is based on how we anticipate processing power to grow in the future. Even with the lowest estimates AES_128 should be sufficient in most scenarios but most people seem to go for AES-256 which should take the same time squared to be beaten.

Up to 2016, SQL also supported the TRIPLE_DES_3KEY encryption protocol. This is now generally not considered to be as secure as AES, and from SQL 2016 its use is deprecated. So, it is best if you stick to AES even if you are on a SQL version where DES is an option.

Let’s have a look at contents of some SQL data files so you can see the difference with and without TDE. I’ve created a database with a single table and inserted a row of data:

CREATE DATABASE TestTDE;
USE TestTDE;
CREATE TABLE dbo.SomeData(Id INT IDENTITY(1,1), SomeText VARCHAR(255));
INSERT INTO dbo.SomeData (SomeText) VALUES('This is my data');

I’ll close my connection from the database, and detach it so I can open the files in a Hex Editor. Then I search for my text in the data file:

TDE_MDF_File_Unencrypted

As you can see the data is stored clear as day in the data file.

Now let’s look at the same data file once TDE has been enabled. This time if I search for my string it’s not found and my data looks like this:

TDE_MDF_File_Encrypted

Even where the previous file was all zeros where there was free space at the end, the encrypted version also has those encrypted:

TDE_MDF_File_Encrypted_End

TDE Works by using an encryption key that is stored in the database being encrypted – but that key is itself stored encrypted by an object outside of the database. We’ll see all the various objects involved when we look at setting up TDE next:

Setting up Transparent Data Encryption (TDE)

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

More articles on TDE:

Encrypting an existing database with TDE
Understanding Keys and Certificates with Transparent Data Encryption (TDE)
How Secure is Transparent Data Encryption (TDE) – and How to Prevent Hacking

Setting the Sample Rate for Automatic Statistics updates

Statistics are vital in allowing SQL Server to execute your queries in the most performant manner. Having a deeper understanding of how Statistics work really helps when you are performance tuning

I talked a few posts ago about Automatic Sample Sizes for Statistics Updates. From SQL 2016 CU4 we’ve been able to override that. You can manually update a statistics object specifying a given sample rate, and specify that that sample rate is what will be used for any subsequent automatic updates on the same object.

You can’t change the default sample rate globally, but this functionality is actually more useful. It means that if you find that you have a particular statistics object where the automatic sampling rate is too low, and that has a negative effect on query performance, then you can choose a sampling rate specific for that object.

Let’s take a quick look at doing this in practice.

Here’s some reused SQL from the previous post to create a table and throw in a million rows:

--Create table for Testing
CREATE TABLE dbo.Test(
   Id INT IDENTITY(1,1) CONSTRAINT PK_Test PRIMARY KEY CLUSTERED, 
   TextValue VARCHAR(20) NULL
);

--Create index on TextValue
CREATE INDEX IX_Test_TextValue ON dbo.Test(TextValue);

--Insert a bunch of rows
INSERT INTO dbo.Test(TextValue) 
SELECT TOP 1000000 'blah'
FROM sys.objects a, sys.objects b, sys.objects c, sys.objects d; 

I haven’t updated the stats yet. As we can see if I view them:

--View the statistics
DBCC SHOW_STATISTICS('dbo.Test', IX_Test_TextValue) WITH STAT_HEADER;

SetStatsSampleRate1

The stats haven’t any information yet but there is still one thing to notice. See that final column Persisted Sample Percent? That was added to support this functionality.

The stats will only get updated by the auto stats mechanism when I run a query against the table and the optimiser is interested in them but sees they’re stale. From zero to a million rows should be stale enough. Let’s run a quick query:

--Random Query
SELECT * 
FROM dbo.Test
WHERE TextValue = 'not blah’;'

And view the stats again:

SetStatsSampleRate2

We can see the stats got updated and the table was big enough that SQL decided to sample rather than scan the whole index. About 42% of the rows got sampled. Persisted Sample Percent is set to zero, i.e. it is not set.

Let’s say that I want to make sure that this statistics object always updates with a full scan.

What I can now do is manually update the statistics and specific that the sample percentage should be persisted:

--Update the stats and persist the sample rate
UPDATE STATISTICS dbo.Test IX_Test_TextValue 
WITH FULLSCAN, PERSIST_SAMPLE_PERCENT = ON;

Let’s check that did what we wanted by viewing the stats:

SetStatsSampleRate3

Great, we can see all the rows were sampled and the Persisted Sample Percent is now set to 100.

Now let’s see what happens when another auto update kicks in. I insert another million rows with the same SQL from above. Then I run the select query again. The optimiser notices the statistics are now stale and so it updates them and recompiles the plan for the query. We can see this when we check the stats yet again:

SetStatsSampleRate4

SQL has done exactly what we wanted, it performed the auto stats update in the background and it used the 100% persisted sample rate.

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.

Viewing the Statistics Objects Used to Create an Execution Plan

Statistics are vital in allowing SQL Server to execute your queries in the most performant manner. Having a deep understanding of how the SQL Server Optimizer interacts with Statistics really helps when you are performance tuning

One thing that can be useful when looking at an execution plan is to understand what statistics objects the optimizer used to come up with the plan. In this post we look at how that can be achieved using the undocumented traceflag 8666 which can be used to save internal debugging informational into the plan XML – including details of the Statistics objects used. 

There are actually a few other trace flags that do similar things but this seems the simplest and the one that works across the most versions of SQL Server. It looks like it was introduced in SQL 2008 and works on all versions up to and including SQL 2016 – but it doesn’t work on SQL 2005.

Note that this technique only works for plans generated once the trace flag is on, so you can’t view the additional information for existing plans in your cache:

Here’s a quick example using it against the AdventureWorks2012 database. You need to make sure you’ve selected the option to show the actual plan, then run the query as below:

--Turn the trace flag on
DBCC traceon (8666);

--Run the query
SELECT *
FROM Person.Person
WHERE LastName LIKE 'Smith%';

--Turn the trace flag off again
DBCC traceoff (8666);

This query produces the following plan:

optimizer_stats_plan

If you right-click over the SELECT operator and select properties you see the following:

optimizer_stats_plan_props

You’ll notice an extra field “Internal Debugging Information” which contains a bunch of unformatted XML. Rather than grabbing that string and formatting it to make it readable, you can right-click back on the plan itself and select “Show Execution Plan XML…” to view the same information in a more friendly format.

If you scroll down through the XML generated you will get to the following section:

<ModTrackingInfo>

   <Field FieldName=“wszStatName” FieldValue=“_WA_Sys_00000007_693CA210” />

   <Field FieldName=“wszColName” FieldValue=“LastName” />

   <Field FieldName=“m_cCols” FieldValue=“1” />

   <Field FieldName=“m_idIS” FieldValue=“2” />

   <Field FieldName=“m_ullSnapShotModCtr” FieldValue=“19972” />

   <Field FieldName=“m_ullRowCount” FieldValue=“19972” />

   <Field FieldName=“ullThreshold” FieldValue=“4494” />

   <Field FieldName=“wszReason” FieldValue=“heuristic” />

</ModTrackingInfo>

<ModTrackingInfo>

   <Field FieldName=“wszStatName” FieldValue=“IX_Person_LastName_FirstName_MiddleName” />

   <Field FieldName=“wszColName” FieldValue=“LastName” />

   <Field FieldName=“m_cCols” FieldValue=“1” />

   <Field FieldName=“m_idIS” FieldValue=“7” />

   <Field FieldName=“m_ullSnapShotModCtr” FieldValue=“19972” />

   <Field FieldName=“m_ullRowCount” FieldValue=“19972” />

   <Field FieldName=“ullThreshold” FieldValue=“4494” />

   <Field FieldName=“wszReason” FieldValue=“heuristic” />

</ModTrackingInfo>

Each ModTrackingInfo node displays the information about one statistics object that has been referenced. The wszStatName shows us the name of the Statistics object. We have two in this case, _WA_Sys_00000007_693CA210 and IX_Person_LastName_FirstName_MiddleName. The first is an auto-generated one on the LastName column (I happen to know this was generated before I created the index referenced). Interestingly SQL has looked at both – even though they contain pretty much the same info and one isn’t really required.

Why is this useful?

Well, apart from simply the interesting aspect of seeing some of what is going on in the background when SQL decides what plan to use for your query, it is often the case with poor performing queries that there is bad cardinality estimation going on. Sometimes this might be because the statistics aren’t accurate and would benefit from being updated. You can use this technique to see what statistics objects are being used for the estimation (where it is not just obvious) and then you can look at the objects themselves and see whether they might want refreshing, or whether ther may be other issues with your data – such as a skewed distribution – that make esimates difficult.

I see growing opinion in the SQL world that the refreshing of statistics is often more important than rebuilding indexes to reduce fragmentation – the latter operation will also refresh the statistics but is a lot more resource intensive. You also will commonly have statistics on columns that are not indexed, and the threshold for them to be refreshed can be pretty large when you have a lot of data. You can see more information about manually refreshing statistics in my previous posts on the subject:

When do Statistics Get Updated?

Manually updating Statistics

Automatic Sample Sizes for Statistics Updates

Got a problem or embarking on a SQL Server project and want some help and advice? I’m available for consulting – please get in touch or check out my services page to find out what I can do for you.