Checking Progress of the Creation of your Azure Resources

When I’ve created resources in Azure it’s usually taken from a few minutes and up to quarter of an hour – though sometimes longer.

When you’re new to this stuff, you can be uncertain and wonder, “Is it really creating it?”, “Did I hit the right buttons?”. As a result it can be handy to know where to check to see what’s going on.

Sometimes after creating the resource you are taken to a screen that will show you what’s going on:

And usually you can see something is occurring from the bar at the top:

If you click on the alarm icon you can see more details:

You can then click to see “More events in the activity log” to dig deeper:

This is all fairly intuitive, but earlier I was trying to create a SQL Database Managed Instance for the first time. It showed some activity in the items above for a few minutes, but after that nothing happened. Had it failed? Had I done something wrong? Should I start again and try to create a new one?

The answer was to select resource groups from the blades on the left, and select the resource group that I had created the item in:

On the right hand side I can see an item saying “Deployments” and I can see that one is in the process of deploying. I can click the hyperlink for more details:

The third item in the list was the one I was looking for:

Okay, so it is in the process of being created. There’s no way to tell how long it will take but at least I now know it’s happening.

While searching for it I did notice a warning on the create screen for the resource that I hadn’t seen when I first whizzed through the creation:

Looks like I might be waiting a while…

Using the Sequence Object to Generate Reference Numbers

The SEQUENCE object was added to T-SQL in SQL Server 2012. It’s reasonably well known to DBAs, but less so to developers or those new to SQL, so I thought I’d produce a quick post to demonstrate its use.

In basic terms, a SEQUENCE is a way of generating a sequence of numerical values. The following are examples of sequences you could generate:

1, 2, 3, 4, 5 6, 7, 8, 9…

1, 6 , 11, 16, 21, 26, 31…

1000, 1001, 1002, 1003, 1004…

You can pick a starting number (and an ending number if you want), a data type (which might produce a natural maximum to the size of the sequence) and an increment (i.e. how much you want to be added to produce the next number in the sequence). There are other options, but I’m going to focus on the simplest use case. You can find the full documentation here:

https://docs.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql?view=sql-server-2017

So, let’s define my use case. I have a table to hold customer orders. For each record want to define an Order Reference Number of the format ORN0000000001.

Now you could implement something using an IDENTITY column to manage this – but there may be times when that is not ideal, for instance your table may not already have a suitable identity to use (you might have a unique identifier as the primary key) and if you want to store the actual reference then you’d need to add an IDENTITY column in addition to the reference column. Or you might need a reference that is unique across multiple tables.

The SEQUENCE object is also designed to be faster than IDENTITY, creating less blocking when you have a lot of concurrent inserts.

First of all, creating the sequence to generate the numeric part of my reference is easy. Let’s say that a bunch of reference numbers have already been used so I want to start with ORN0000100001

Let’s look at the SQL…

CREATE SEQUENCE dbo.OrderRefSequence 
  
AS bigint
   START
WITH 100001
   INCREMENT
BY 1;

Then I can request numbers from the sequence using NEXT VALUE FOR e.g.

SELECT NEXT VALUE FOR dbo.OrderRefSequence; 

The first time I run that I get the starting number 100,001.

Another nice addition to SQL Server 2012 was the FORMAT function which we can use to format the number into a string whilst padding it with leading zeroes and adding the text prefix:

SELECT FORMAT(NEXT VALUE FOR dbo.OrderRefSequence, 'ORN0000000000#');


That returns me ORN00000100002.

If I keep executing it then the reference increases:

ORN00000100003

ORN00000100004

ORN00000100005

ORN00000100006…

So, now I can just use that when inserting values to my table to get a new reference number each time.

But, what’s even nicer is that you can do it all by defining a default for your column and referencing the sequence in the default.

I’ll create the following table:

CREATE TABLE dbo.Orders (
  
Id UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWSEQUENTIALID(),
  
CustomerId UNIQUEIDENTIFIER NOT NULL,
  
OrderReference VARCHAR(20)
   DEFAULT
(FORMAT(NEXT VALUE FOR dbo.OrderRefSequence, 'ORN0000000000#')),
  
OrderDate DATETIME DEFAULT(GETUTCDATE()));

You can see that the OrderReference is defined with a default using our sequence object.

I insert a bunch of rows into the table. For the sake of this rather contrived example, I only need to specify the CustomerId. I do that by generating a bunch of random unique identifiers – one for each row in the sys.objects table.

INSERT INTO dbo.Orders (CustomerId)
SELECT NEWID() FROM sys.objects;

Let’s have a look at an extract from the table:

You can see I’ve got a nice series of ascending, non-duplicating reference numbers.

One thing to note is that, while the sequence will generally produce unique number, it is still worth enforcing that in your table definition with a unique constraint i.e.

ALTER TABLE dbo.Orders ADD CONSTRAINT UQ_Orders_OrderReference UNIQUE(OrderReference);

This prevents someone from issuing an UPDATE command that might create a duplicate reference. Also, once the sequence runs out of numbers it will repeat back at the beginning unless you specify NO CYCLE in the defintion of the sequence – obviously in most applications this is unlikely to be an issue if you’re using a bigint for the sequence.

There was also a bug in some versions of SQL 2012 and 2014 that meant a duplicate could get created when your server was under memory pressure:

https://support.microsoft.com/en-gb/help/3011465/fix-sequence-object-generates-duplicate-sequence-values-when-sql-serve

This was fixed with SQL Server 2012 SP2 CU4 and SQL Server 2014 CU6 – but it’s better to be safe than sorry.

As a final note, it’s worth remembering that with the GDPR, these sorts of references are defined as personal data.That’s one good reason not to ever consider using these sorts of references as the primary key of your table (there are many others) – but also a reason why – where you already have an identity based primary keys that you could use to generate the references –  it may be worth decoupling them from the primary key and basing them on a separate sequence instead.

There’s got to be a better way

tsql2sday150x150

For T-SQL Tuesday #111, Andy Leonard asks “What is your why? Why do you do what you do?”

Like Andy, I didn’t take a straight route into being a DBA. I actually trained as a teacher, and did a variety of temp jobs – largely in call centres – before I got my first “proper” job as a developer.

My first steps into development were when I was a team leader in a call centre. There was a lot of paper shuffling going on. I felt there had to be a better way of doing things, so developed processes using Access and Excel and convinced the Admin team in the benefits of using them.

Not exactly development, more Macros with a bit of VBA, but that got me spotted by another team that were trying to automate reporting in the call centre – again using MS Access. That was where I first encountered relational databases and got deeper into coding. I joined as they were in the process of taking an Excel spreadsheet that processed flat files using macros – taking the whole day/every day to chunder away, replacing it with a coded solution. Finally we had it importing all the call center stats in seconds on a schedule in the early hours.

Nowadays that seems trivial, but back then it felt like a great achievement. It’s still one of my best work memories – when I showed my boss the automated import for the first time.

Moving forward from there and trying to answer Andy’s question, I realised that what started me off is what’s kept moving me forward. That there’s always got to be a better way.

Another of my favourite early developer memories was being asked to look at whether a file import to an application could be sped up as it was taking 10 minutes. The process was to upload a file with corporate credit card statements from a flat file into a SQL database for the application that companies used to manage them. When I opened it up I was quite impressed with the code, it seemed cleverer than anything I could have written – but maybe a bit too clever. I asked myself if there was a better and simpler way of achieving the same thing. There was, and it turned out to be a lot lot quicker.

Making things run faster remained something I loved doing, but it often seemed often that optimisation wasn’t a big focus in application coding –  that was more about delivering features. I think that’s part of what led me to specialise in data where performance, particularly at scale, is so key. Looking at SQL code and going – “well, sure it works as it is, but is there not a better way of doing this.”

It strikes me that for a lot of us, this is how we ended up in software, we saw things being done a certain way and saw scope for improvement. Felt that there must be a better way.

And it’s good to remember this isn’t an attitude where we can rest on our laurels. Even when we know what we’re doing, have done something a hundred times before, it’s still good to say “There’s got to be a better way”, and to think about what that might be. Not only does that mean we’re continually improving, but it’s also what keeps working in technology fresh and interesting.

“SSPI handshake failed” \ “The login is from an untrusted domain” errors

I’ve recently encountered an issue that was difficult to resolve and I didn’t find the particular cause that was troubling us documented elsewhere on the web so thought I’d record it here.

The issue was with a service account connecting to SQL Server and intermittently failing to logon.

Errors reported in the Windows Application Event log were:
SSPI handshake failed with error code 0x8009030c
Login failed. The login is from an untrusted domain and cannot be used with Windows authentication.

The login attempt didn’t appear to get as far as the SQL instance, so no further information could be captured in a failed Logins trace.

This was affecting a large number of application servers using the same service account. Fortunately this was in development and test environments so no production impact.

The problem was that the account was getting locked out. A service was running every half hour using the account to connect to SQL, but with the wrong password. We also had a process running to unlock locked service accounts – so the account would start working again after a few minutes.

The resolution was to kill that service as it was no longer required. We were able to identify where the failed logins were coming from via the Active Directory audit logs for the account in question.

This was particularly difficult to troubleshoot as the error was a bit misleading.

There were a couple of other sources I came across while trying to google the cause of the problem. I’ve updated those with the cause we found as well:
https://blogs.msdn.microsoft.com/docast/2016/02/11/common-sspi-handshake-failed-errors-and-troubleshooting/#comment-7075
https://stackoverflow.com/questions/1538027/sspi-handshake-failed-with-error-code-0x8009030c-while-establishing-a-connection

Hopefully this post will save someone else the many hours of investigation I’ve just been through!

Some differences with SQL Server when running on AWS RDS

If you plan on using Amazon Web Services (AWS) to host your SQL Server based applications in the cloud, then you have a couple of options.

One is just to have an EC2 instance (a VM) and install the versions of the OS and SQL Server you want. There are also images you can use that will have these pre-installed. This is what’s known as the IaaS option (Infrastructure as a Service). If you take this option, then SQL Server is exactly the same as it would be if you had it on-prem.

Or you can go with Amazon RDS (Relational Database Service).  This is more of a managed service where Amazon looks after some aspects of your database server for you. In return you give up some of the control you would have with your own server or VM. You can still pick the version of SQL Server you want installed, usually down to which cumulative update you want – though note that RDS normally lags behind the latest box version of SQL by 3 months or so. RDS is what’s known as a PaaS offering (Platform as a Service).

So, what do you give up and what do you gain? Here’s a quick summary of a few things I’ve noticed. This is not intended to be comprehensive and please bear in mind that AWS is a fast-moving beast – changes happen regularly.

What you can access

You can still connect to and manage your instance using SSMS, but you have no direct access to the server hosting it, such as configuration of the OS, or access to the disks. Through SSMS you have an access level slightly below Sa – which limits you to only what is allowed.

 

High Availability

In RDS you don’t use Availability Groups, Log Shipping or any of that stuff. Instead, HA is achieved using multiple Availability Zones (AZ). With this enabled, writes are performed synchronously to a replica in a second availability zone (usually a separate physical data centre in the same region). If the primary AZ goes down, then failover will happen automatically. You can also choose to replicate to other regions to be protected again failure of an entire AWS region – though often data protection laws mean that might not be an option for you if you hold personal data and there is only one AWS region in your country.

AWS RDS Multi-AZ promises a monthly up-time of 99.5% (which allows for 22 minutes of downtime per month). Full details of the SLA are here:

https://aws.amazon.com/rds/sla/

 

Disaster Recovery

RDS performs automated backups of your whole instance, including the equivalent of log backups every 5 minutes. That means that in case of a disaster, where Multi-AZ failover fails, or where someone deletes data they shouldn’t, then the maximum data loss (RPO) is 5 minutes. The built-in functionality allows you to restore your instance to a point in time, which is implemented by restoring to a new instance. It is not possible to recover individual databases. Backup retention can be set up to maximum of 35 days.

If you need more than that then you can still take native SQL backups, but this has to be enabled specifically and because you can’t access the underlying disks you have to use a stored procedure specific to RDS:

https://aws.amazon.com/premiumsupport/knowledge-center/native-backup-rds-sql-server/

There are also other options – such as taking extra snapshots and storing them in a separate region.

 

Patching

AWS handles patching of minor versions to your instance for you. You can choose for this to be done automatically or triggered by manual intervention through the AWS console.

 

Encryption

RDS has its own at-rest encryption similar to TDE. This is available for all instances hosted on RDS so unlike TDE you don’t need to be on an Enterprise Edition of SQL Server.

 

Collation

All SQL Server instances on RDS are set up with a server collation of SQL_Latin1_General_CP1_CI_AS and you can’t change this. You can specify a different collation for your databases but this might mean you run into collation issues if you use temporary tables and compare string based columns with those in your databases – as temp tables are created with the collation of the instance.

 

Server Memory

In an on-premise version of SQL Server we would carefully provision a proportion of the overall server memory for SQL Server. On RDS this is not an available setting, the amount of memory is fixed with our general instance sizing, however Amazon do make recommendations for tuning the amount of memory given to an RDS instance – which would mean resizing the instance if necessary.

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html#CHAP_BestPractices.Performance.RAM

The recommendation is that there should be enough memory that the “working set” of data is retained in memory. Memory should be tuned so that the ReadIOPS metric is “small and stable”.

 

Recovery Model

This is worth mentioning even though it is not entirely configurable in RDS. If you have backup retention set to longer than 0 days (i.e. take backups) then the recovery model will be set to FULL. If you set backup retention to zero (which disables backups) then recovery model will be set to SIMPLE. If you manually change the recovery model, RDS will automatically change it back within 5 minutes. This applies to all databases.

MAXDOP and parallelism

In RDS you cannot change the instance level MAXDOP, or the “Cost Threshold for Parallelism”  through SQL. Instead these must be configured through a parameter group:

https://www.mssqltips.com/sqlservertip/5329/setting-sql-server-configuration-options-with-aws-rds-parameter-groups/

 

Optimize for Ad-hoc Workloads

Like the parallelism settings this cannot be modified through SQL. So, if you want this setting enabled you must use a parameter group:

https://www.mssqltips.com/sqlservertip/5329/setting-sql-server-configuration-options-with-aws-rds-parameter-groups/

 

Instant File Initialization

This setting means that when files grow the new space can be allocated immediately without taking time to fill the space with zeros.

This cannot be enabled in RDS, so it is of extra importance to size databases appropriately to avoid auto-growth where possible.

 

Extended Events vs Profiler

Unfortunately, Extended Events is not available in RDS, so if you wish to trace events you must use traditional traces/profiler.

Note that when setting up a trace against a busy instance you should create a server-side trace to minimize the impact on performance.

 

That’s the list of key differences I’ve noticed so far in working with RDS to provision new SQL instances.

 

What stays the same

There are a few other things you may be wondering if you still have control over (I know I was). So just to confirm, you can still do the following:

  • Configure multiple files for TempDB
  • Use either SQL or Windows authentication (or mixed)
  • Schedule jobs with SQL Server Agent
  • Run DBCC CHECKDB
  • Rebuild indexes and statistics

 

Please comment on this post if you notice any other significant differences you think people should be aware of – or if you notice updates to AWS that make any of these points invalid.

 

Thanks!

 

 

 

 

 

 

 

 

Extended Events Made Easy – Sorting, Grouping and Aggregation

When I’m using Profiler to analyse performance issues I often save the results to a table, or upload a trace file into a table, so that I can analyse the data. Often this involves aggregating the values for particular queries so that I can see the most resource hungry.

This is by no means a difficult process, but with Extended Events (XE) it’s arguably even easier.

Let’s start with a session from XEvent Profiler like we saw in my last post on this subject:

Extended Events Made Easy: Using XEvent Profiler

I’ve set a workload running and captured the Live Data. The workload consists of some arbitrary queries I’ve written, each running in a loop in its own query window.

After a few minutes I stop the session so I can do my analysis:

XESortingetc1

I’ve got a couple of hundred events. Let’s say I want to quickly find the longest running individual query. In the Live Data viewer you can just right-click on the column name and select sort ascending or descending. This is pretty obvious stuff, but I thought it was worth highlighting as you can’t do the same in Profiler – there you have to save to a table first then query the table.

So, I’ve sorted descending on Duration:

XESortingetc2

You can see the biggest values shown are the aggregated values for the existing_connection and logout events. In retrospect it might have been better to create an XE session that didn’t include these events. However, I look down the list and quickly find my longest running query (highlighted in blue).

The duration specified – 33,161,516 – is in microseconds. So that is just over 30 seconds.

Of course, now I’m thinking – that’s my longest running, but what I really want is the one that has the most CPU – and not just from one execution but the total from all the executions that happened in my sample time frame.

This is where I can use grouping and aggregation. I can right-click on any of the columns and select “Group by this column”. I’m going to do that for the TextData column:

XESortingetc3

We’re now seeing all the events from the sample time period, grouped together based on TextData. The number on the right of each one is the count of events in that group.

Now I can add some aggregations. After grouping I still can see the original column headings at the top, so I just right-click on cpu_time, select “Calculate Aggregation” and select “SUM”:

XESortingetc4

You’ll see you can pick the other standard aggregations also e.g. MIN, MAX, AVG etc.

You can have multiple aggregations going on, so I’ll do the same for duration and logical_reads.

Once I have my aggregations in place, I’ll want to sort the results. It’s the total cpu_time I’m most interested in so I right-click on that column and now I have the option to “Sort Aggregation Descending”:

XESortingetc5

After those steps this is what I end up with (I’ll just show the top few rows):

XESortingetc6

Admittedly this isn’t the most beautiful of views, but it was quick and easy to get to and it gives me the information I need. I can see my most CPU hungry queries and I can make a plan of action.

Incidentally, the top query was:

UPDATE dbo.Test SET SomeText = 's' + SomeText WHERE SomeText LIKE 's%'

If this wasn’t a totally arbitrary set of queries for demo purpose I might be considering putting an index on the SomeText column.

That’s all I wanted to cover in this post. Hopefully you can see that XE data is really easy to work with, you don’t have to get involved in XML querying and you can perform quick analysis without even having to write SQL – much as we all love writing SQL.

In the last couple of posts, I’ve looked at how easy it is to get started using XE with the XEvent Profiler. For the next post in this series I aim to demonstrate that it’s also really easy to create your own customizable Extended Events sessions.

Previous posts in this series:

There’s Still a Place for SQL Server Profiler

Using the built-in System Health session

Exporting Extended Events Session Data to a Table

Extended Events Made Easy: Using XEvent Profiler

T-SQL Tuesday 107 – Viral Demo

tsql2sday150x150

For this month’s T-SQL Tuesday. Jeff Mlakar invites to talk about “a project you worked on or were impacted by that went horribly wrong”:

https://www.mlakartechtalk.com/t-sql-tuesday-107-invitation-death-march/

I’m not sure the story I’m about to share is exactly what Jeff was looking for, but when you say failed project, this one usually comes to mind.

This is from one of my first roles as a developer, back when I was writing VB code and before I got into SQL Server. This was even before .NET. At the time I was working for an engineering company that dabbled in a few pieces of very specific software for the industry. I’d taken the job so I could develop my coding skills by working with a proper software development team. Only to discover when I started, that I was the development team – the others had left.

So… I worked on this piece of software. It was one of those productivity tools that had a very specific use in a very specific type of project. There was nothing else that did what it did in Windows. Without it you were required to do a lot of hand-cranked analysis using Fortran, or if you were really unfortunate, paper and a pencil. The licences sold for big bucks per seat – but there were only about 20 customers worldwide for whom it might be useful.

I’d been working on a shiny new version. As part of that, I was tasked with creating a demo that we could send to all those clients. The main thing was that it had to have full functionality but be limited to working with a sample dataset. The company were currently in litigation with one client who they’d given a copy for eval, but who’d then used it for completing a big piece of work within a six figure contract. They didn’t want a repeat of that.

My boss wanted to create an auto-run menu for the CD – we’d only just moved over to CDs from having to create 10 floppy disks for each install. An auto-run was the latest and greatest pinnacle of slickness. I gave him a few lessons in creating forms and buttons in VB and off he went.

My boss burnt the CDs one Friday, including his fancy menu executable. He packaged them up with the relevant sales brochures and sent them out. He then went off on holiday for a fortnight.

I came in on the Monday morning to an urgent email from one of our senior managers. Apparently one of the recipients of the CD was reporting that they had put it in a desktop and that it had installed a virus on their system. Testing verified that they were correct. The auto-run menu program did indeed have a fairly nasty virus hiding in the executable, and because it was in the auto-run, all they needed to do to get infected was put the CD in their drive.

Basically, my boss used to use his work PC for downloading hacked “warez” from happyhippo.com. His work desktop had become badly infected and that had spread to the executable he worked on and compiled.  As he wasn’t around to own the issue, a colleague and myself had the fun of calling all the people we’d sent the demo to, asking them if they’d put the CD in any of their computers. If they hadn’t we told them to destroy it. If they had, then we had to talk them through the lengthy process to get rid of the virus.

As a marketing exercise this wasn’t a stunning success.

I often tell friends in the software business this story as a consolation when they feel responsible for some mess up. “Well, at least you didn’t send a virus infected demo to your entire prospective customer base like my old boss did”

I left that company not longer after.

Checking for Membership of a Specific Active Directory Group

As part of my job I manage a bunch of SQL instances for Development and Test.

Access is managed though Active Directory groups, so I rarely have to do anything regards managing permissions. Nonetheless I often get requests from people to give them access. This is usually for a new starter or someone who has moved from one team to another.

Of course, the answer is usually that they just need adding to the right AD group. Rather than assume though, I always get them to check before I pass the request on to the AD team. You never know, there could be something else wrong.

T-SQL has a lovely little function for this, IS_MEMBER. For instance, If I want to know if I’m a member of MyDomain\SQLAdmins I just run:

SELECT IS_MEMBER('MyDomain\SQLAdmins');

If it returns 1 then I am a member. Zero then I am not. Null means it can’t find the group, probably because I’ve spelt it wrong.

So, I can get the requester to check for themselves if they are a member of the group, and then we can raise the request to get them added.

IS_MEMBER is also useful if you want to check if you are a member of a specific database role – either one of the built-in ones or a user-defined one e.g.

SELECT IS_MEMBER('db_owner');

I’m not sure how well known this is and I just had one of those requests, so I thought I’d create a quick post on the topic.

Extended Events Made Easy: Using XEvent Profiler

In this post we’ll look at how quick and easy it is to use the XEvent Profiler to create an Extended Events (XE) session to replace the most common usage of SQL Server Profiler.

Before you start, update your SSMS to a recent version (17.3) or later:

https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-2017

Now, let me ask you the question – how often do you just go into Profiler and create a trace taking all the defaults? If you’re like me then that’s pretty often. It’s nice and easy, select the menu item from SSMS and you’re going in a couple of clicks.

All XEvent Profiler does is implement a couple of standard XE sessions very similar to Profiler, that you can launch with a click or two.

You can find XE Profiler in the object explorer in SSMS under each of your SQL instances – just below SQL Server Agent:

XE_1

Because it’s there for each instance, you don’t have to then connect to the instance separately (as you do in Profiler).

You’ve got two options, Standard which pretty much equates to the default trace in Profiler, or TSQL which will just capture SQL statements being executed. Let’s right-click on Standard and select “Launch Session”. Here’s the Live Data view that comes up:

XE_2

You can see that looks pretty similar to what you see for your traditional traces.

Now, one thing I almost always do for this sort of trace in Profiler is capture the database name. This is a bit of a pain, because I have to select “View all columns”, scroll along to find the database name column, then select it for each relevant event I’m capturing.

In XE it’s much easier. I just right click on one of the column headers in the above view and select “Choose Columns”:

XE_3

Which brings up the menu of columns available to me:

XE_4

I can then just select the database_name column in the left and use the arrow button to push it into the list of selected columns. Want row_count? That’s there too, along with a whole bunch of other stuff. I’ll add those and remove a few I don’t want and exit the dialog.

The other thing I’m commonly going to want to do is add a filter to one or more of the columns. When you’re viewing live data you get the Extended Events toolbar shown in SSMS below the menu bar, just click on Filters…

XE_5

And you get the filters dialog:

XE_6

Click below “Field” to select a column and then add a value to filter on. In this case I’m going to filter to look at a specific database.

It’s worth noting that there are a bunch of operators you can use (not just equals), you can do greater than, less than, contains, does not contain, is null etc.

Now we can see the effect of those changes in the Live Data view:

XE_7

The above is what I’d usually do in Profiler, except that here I’ve also added an extra column and removed a few – because it was so easy to do. The whole thing felt like a simpler and cleaner experience than the old way.

But I get it, a marginally simpler experience probably isn’t enough to make you change habits. What might, is what you can do with your trace results once you’ve got them. In the next post we’ll look at sorting, grouping and aggregation.

Previous posts in this series:

There’s Still a Place for SQL Server Profiler

Using the built-in System Health session

Exporting Extended Events Session Data to a Table

 

Table Variable Performance “Fixed” in SQL 2019

Reading the new features for SQL 2019 I spotted this:TableVariables2019_1

One of the most popular posts on my blog last year was where I pretty much suggested that people not use table variables:

Think twice before using table variables

This wasn’t new information when I wrote it, but bad performance due to the use of table variables remained such a common anti-pattern that I thought it was worth stressing again.

So, when I saw the above 2019 feature I thought I’d better investigate and update what I’m telling people.

TL;DR It looks like table variables are no longer a problem.

I’m going to re-use my examples from the previous post (as well as some of the images captured). One thing to note though it that I’m running these tests on a different, much less performant box than I did last time, so execution times will be longer.

My first test last time was just to show how cardinality estimation was better for a temp table vs. a table variable. This time I’m just going to compare a table variable running under compatibility mode 140 (SQL 2017) with one running under compatibility mode 150 (SQL 2019).

I set the compatibility mode as follows:

ALTER DATABASE AdventureWorks2012 SET COMPATIBILITY_LEVEL = 140;

Then I run the first query:

DECLARE @a TABLE(i INT);
 
--Create a million rows
WITH Nums(i) AS
(
       SELECT 1
       FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n(i)
)
INSERT INTO @a(i)
SELECT n1.i
FROM Nums n1 
CROSS JOIN Nums n2 
CROSS JOIN Nums n3 
CROSS JOIN Nums n4 
CROSS JOIN Nums n5 
CROSS JOIN Nums n6;
 
SELECT i FROM @a;

The execution plan for that final select looks like this:
Variables1

And when I hover over the Table Scan operator I can see the properties:

Variables2

You can see that the estimated number of rows is calculated as 1 – but the actual number of rows was 1 million. This is (was) the problem with table variables.

Now I’ll change the compatibility level:

ALTER DATABASE AdventureWorks2012 SET COMPATIBILITY_LEVEL = 150;

And run the query again.

I get the same execution plan, so let’s just look at the properties of the Table Scan operator this time:
TableVariables2019_2
Here you can see that the estimation is correct, 1 million. This is looking good so far for the future of table variables.

In the last post I then looked at a case where the bad estimation caused a poor selection of execution plan so let’s repeat that test. First, I set my compatibility level back to 140, then I run the following (if you want to repeat, you need the AdventureWorks2012 database):

DECLARE @BusinessEntityId TABLE (BusinessEntityID INT NOT NULL PRIMARY KEY CLUSTERED);
 
INSERT INTO @BusinessEntityId
SELECT BusinessEntityID FROM Person.Person;
 
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
 
SELECT COUNT(*)
FROM @BusinessEntityId b
INNER JOIN Person.Person p
       ON b.BusinessEntityID = p.BusinessEntityID;
 
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;

Let’s look at the execution plan for the final query:
Variables5
Now let’s look at the properties for the clustered index scan:
Variables6
We can see that estimated number of rows is just 1, but that the actual number of rows is 19,972.

If we then look at the properties for the Index seek operator:
Variables7
We can see that the estimated number of executions was 1, but the actual number of executions was 19,972. That’s 19,972 seeks into a table with 19,972 rows. As previously discussed – there’s got to be a better way. A Nested Loops join is generally best when there’s a small number of rows from the top table, and a larger amount of rows from the bottom. In this place they both have 19,972 so it’s not optimal.

Here’s the output of the statistics commands for that query:

Table ‘Person’. Scan count 0, logical reads 59916, physical reads 3810, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘#B5D4D4C8’. Scan count 1, logical reads 35, physical reads 29, read-ahead reads 28, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 30951 ms.

Now we’ll change the compatibility level to 150 and try again. Here’s the execution plan we get this time:
TableVariables2019_3
You can see the join operator is now a Hash Match which is generally a much better choice for evenly sized tables. Let’s look at the properties of the Clustered Index Scan in this new plan:
TableVariables2019_4
This time the estimated and actual values match, which has enabled SQL to make the better choice of plan. We can see that if we look at the statistics output for this one:

Table ‘Workfile’. Scan count 4, logical reads 64, physical reads 0, read-ahead reads 64, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Person’. Scan count 1, logical reads 67, physical reads 1, read-ahead reads 65, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘#B9A565AC’. Scan count 1, logical reads 35, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 25220 ms.

You can see the CPU and reads are massively reduced.

So, it seems table variables aren’t so bad anymore. At least they won’t be once you are running SQL Server 2019.

I should probably add some caveats at this point. This has been a limited test, so you yourselves should make sure you test your own performance if you start using table variables instead of temp tables. There are also going to be deferred compilations when you do this in stored procedures. I haven’t dug down into exactly how that will work with table variables, but I guess it’s going to be similar to the way temp tables work in existing SQL versions.