Avoid “Constants” in T-SQL

Unlike some other languages, T-SQL doesn’t have the concept of a constant.

As good coders, we’ve all at some point tried to use a SQL variable to hold a constant value within a stored procedure, in order to make our code both more readable and maintainable.

I’ll give you an example. Here I’m querying a Task table, and I want all the rows where the TaskStatus is 0 (zero) which means the Task is Open:

SELECT *
FROM dbo.Task 
WHERE TaskStatus = 0;

If someone else comes along to look at this code they don’t know what the value of zero means. My code’s not clear. Worse I might refer to the same value multiple times in my procedure, so if I need to change it later I have to change it in multiple places.

Good practice from other coding languages would say that I replace it with a meaningfully named constant. As mentioned, in T-SQL we don’t have constants so I’ll compromise and use a variable:

DECLARE @OpenTask tinyint = 0;

SELECT *
FROM dbo.Task 
WHERE  TaskStatus = @OpenTask;

Now that’s much more readable – right?

Unfortunately it’s also a bad idea in SQL . Let’s see why.

I’ll create the aforementioned Task table, add an index on TaskStatus. Then I’ll add million rows with status 1 (which we’ll call closed) and 1 row with the value 0 (zero) which is open:

CREATE TABLE dbo.Task
(
    Id INT IDENTITY(1,1) CONSTRAINT PK_Task PRIMARY KEY CLUSTERED,
    UserId INT,
    TaskType INT,
    Payload VARCHAR(255) NOT NULL,
    TaskStatus tinyint NOT NULL
);
GO

CREATE INDEX IX_Task_TaskStatus ON dbo.Task(TaskStatus);

INSERT INTO dbo.Task (UserId,TaskType,Payload,TaskStatus)
SELECT TOP 1000000 1,1,'This Shizzle Is Done',1
FROM sys.objects a, sys.objects b, sys.objects c;

INSERT INTO dbo.Task (UserId,TaskType,Payload,TaskStatus)
SELECT 1,1,'Do This Shizzle',0;

Once that’s completed I’m going to update the statistics just so we know SQL has the most up to date information to produce an optimal execution plan for our queries:

UPDATE STATISTICS dbo.Task WITH fullscan;

Now let’s go back to our original queries. Before I run them let’s think what we want them to do. We have an index on TaskStatus and we only have one row we are looking for, so we’d hope the query will use the index and go straight to the record. The index doesn’t contain all the columns, but that’s okay. We’re only going to have to output one record so if it has to look up the extra columns up in the clustered index that’ll be pretty damn quick.

Let’s run the first query, we’ll capture the execution plan and the STATISTICS output:

SET STATISTICS io ON;
SET STATISTICS time ON;

SELECT *
FROM dbo.Task 
WHERE TaskStatus = 0;

Here’s the execution plan:
Constants1

 

 

 

 

That’s doing exactly what we hoped for, it’s looked up the record in our index using a seek. Then it’s grabbed the rest of the columns from the clustered index using a key lookup.

Here’s the statistics output:
Table ‘Task’. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

So that’s a nice small number of reads and less than a millisecond of CPU.

Now let’s run the “improved” version:

SET STATISTICS io ON;
SET STATISTICS time ON;

DECLARE @OpenTask tinyint = 0;

SELECT *
FROM dbo.Task 
WHERE TaskStatus = @OpenTask;

 

Here’s the execution plan this time:

Constants2

That doesn’t look so good. Let’s check the statistics:
Table ‘Task’. Scan count 1, logical reads 5341, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

CPU time = 109 ms, elapsed time = 96 ms.

Those figures tell us the query has got between 100 and 1,000 times worse. So much for the improved version.

So why is this happening?

The answer is simply that the optimizer doesn’t/can’t look at the values inside variables when a piece of SQL is compiled. Therefore it can’t use the statistics against the indexes on the table to get an accurate idea of how many rows to expect back in the results.

We can see that if we compare the properties of the Index Seek Operator from the first query:

Constants3

Against the properties for the Index Scan Operator from the second query:

Constants4

In the first one we can see that the Actual Number of Rows (at the top) exactly matches the Estimated Number of rows (at the bottom). SQL has been able to use the statistics on the index to get an accurate estimate.

In the second this is not the case. We have 500,000 rows estimate, but only 1 actual. This has led SQL down the route of choosing a plan that would have been more effective for 500,000 rows – but is much less effective for 1. In this case it didn’t know what value to optimize for. Lacking that information it used the density value in the statistics and multiplied that by the total number of rows to get the estimate. Or in other words, the statistics tell it that there are two distinct values (0 and 1) in the table. Not knowing which one has been supplied the optimizer figures than on average half the rows will be returned.

So what should do you to make your code clearer?

The simple answer is to use comments, the following is totally clear to its meaning, and will perform optimally:

SELECT *
FROM dbo.Task 
WHERE TaskStatus = 0 -- Open Task;

 

But what about the maintainability issue, where you may have to refer to the same value multiple times in a given procedure?

Unfortunately you’re just going to have to put up with maintaining the value in multiple places, but in general within a well designed application these should be static values (and hopefully integers) so it shouldn’t be too big a deal.

 

Note this is not the same for parameters passed to a stored procedure. In that case the queries inside the stored procedure will compile using the values passed the first time the procedure was executed – that can be really useful, but it can also cause its own set of issues to be aware of! Just remember parameters and variables – similar but not the same!

10 thoughts on “Avoid “Constants” in T-SQL

  1. Adding OPTION (Recompile) or using dynamic sql will make the second statement the same execution plan as the first. Maybe not the best choice, but a possible tools in the toolbox. We ran into the same problem where a filtered index wasn’t being picked up unless we made it dynamic sql (so the variable became hardcoded in the actual execution). We settled on OPTION (Recompile).

    Like

  2. One convention I’ve adopted for readability and search-ability is:

    WHERE TaskStatus = 0 — 0 = fnTaskStatus.Open

    and create a table valued function that returns Code/Description pairs. I don’t use the function in practice (except perhaps to drive UI translation), it’s just included as documentation that can found in both the code repository and the database server itself.

    Like

  3. I apologize if this isn’t the right place to ask this question (since its not really related to the topic of constants), but I’m fairly new to SQL and I don’t understand part of the example code.

    Why in the block that adds the 1,000,000 rows is it necessary to have the FROM sys.objects a, … line? The values aren’t coming from the sys.objects table (in fact, it doesn’t appear that it is actually used in any way), but I ran the code with and without that line and it definitely makes a difference (only one row gets added without it).

    I tried searching for an answer but don’t really know how to phrase things and I’m not finding anything pertinent. Thanks for any explanation.

    Like

    1. No worrries Scott, it’s nice to be able to help. This is just a lazy way of creating a bunch of rows really quickly. I shouldn’t really use the commas, it’s an old style way of doing joins that isn’t very nice. Effectively this is the same as sys.objects a CROSS JOIN sys.objects b CROSS JOIN sys.objects c. A cross join creates a row for every combination of rows in the referenced tables so if sys.objects had 100 rows I’d get 100x100x100 i.e. one million rows. I specify TOP 1000000 as I’d probably end up with a lot more than that and I only want a million for the demo. I’m not picking anything up from sys.objects but specify my own values, so I get 1 million rows of the values I specified duplicated. You’d probably never actually do this is real code. Does that make sense?

      Like

  4. It does…now 🙂

    What threw me was not just the “shorthand” for the cross joins, but also using a table to just “supply” rows but not values. I’ve never seen either of those before.

    Thanks for taking the time to explain!

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s