Avoid “Constants” in T-SQL

Unlike some other languages, T-SQL doesn’t have the concept of a constant.

As good coders, we’ve all at some point tried to use a SQL variable to hold a constant value within a stored procedure, in order to make our code both more readable and maintainable.

I’ll give you an example. Here I’m querying a Task table, and I want all the rows where the TaskStatus is 0 (zero) which means the Task is Open:

SELECT *
FROM dbo.Task 
WHERE TaskStatus = 0;

If someone else comes along to look at this code they don’t know what the value of zero means. My code’s not clear. Worse I might refer to the same value multiple times in my procedure, so if I need to change it later I have to change it in multiple places.

Good practice from other coding languages would say that I replace it with a meaningfully named constant. As mentioned, in T-SQL we don’t have constants so I’ll compromise and use a variable:

DECLARE @OpenTask tinyint = 0;

SELECT *
FROM dbo.Task 
WHERE  TaskStatus = @OpenTask;

Now that’s much more readable – right?

Unfortunately it’s also a bad idea in SQL . Let’s see why.

I’ll create the aforementioned Task table, add an index on TaskStatus. Then I’ll add million rows with status 1 (which we’ll call closed) and 1 row with the value 0 (zero) which is open:

CREATE TABLE dbo.Task
(
    Id INT IDENTITY(1,1) CONSTRAINT PK_Task PRIMARY KEY CLUSTERED,
    UserId INT,
    TaskType INT,
    Payload VARCHAR(255) NOT NULL,
    TaskStatus tinyint NOT NULL
);
GO

CREATE INDEX IX_Task_TaskStatus ON dbo.Task(TaskStatus);

INSERT INTO dbo.Task (UserId,TaskType,Payload,TaskStatus)
SELECT TOP 1000000 1,1,'This Shizzle Is Done',1
FROM sys.objects a, sys.objects b, sys.objects c;

INSERT INTO dbo.Task (UserId,TaskType,Payload,TaskStatus)
SELECT 1,1,'Do This Shizzle',0;

Once that’s completed I’m going to update the statistics just so we know SQL has the most up to date information to produce an optimal execution plan for our queries:

UPDATE STATISTICS dbo.Task WITH fullscan;

Now let’s go back to our original queries. Before I run them let’s think what we want them to do. We have an index on TaskStatus and we only have one row we are looking for, so we’d hope the query will use the index and go straight to the record. The index doesn’t contain all the columns, but that’s okay. We’re only going to have to output one record so if it has to look up the extra columns up in the clustered index that’ll be pretty damn quick.

Let’s run the first query, we’ll capture the execution plan and the STATISTICS output:

SET STATISTICS io ON;
SET STATISTICS time ON;

SELECT *
FROM dbo.Task 
WHERE TaskStatus = 0;

Here’s the execution plan:
Constants1

That’s doing exactly what we hoped for, it’s looked up the record in our index using a seek. Then it’s grabbed the rest of the columns from the clustered index using a key lookup.

Here’s the statistics output:
Table ‘Task’. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

So that’s a nice small number of reads and less than a millisecond of CPU.

Now let’s run the “improved” version:

SET STATISTICS io ON;
SET STATISTICS time ON;

DECLARE @OpenTask tinyint = 0;

SELECT *
FROM dbo.Task 
WHERE TaskStatus = @OpenTask;

Here’s the execution plan this time:

Constants2

That doesn’t look so good. Let’s check the statistics:
Table ‘Task’. Scan count 1, logical reads 5341, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

CPU time = 109 ms, elapsed time = 96 ms.

Those figures tell us the query has got between 100 and 1,000 times worse. So much for the improved version.

So why is this happening?

The answer is simply that the optimizer doesn’t/can’t look at the values inside variables when a piece of SQL is compiled. Therefore it can’t use the statistics against the indexes on the table to get an accurate idea of how many rows to expect back in the results.

We can see that if we compare the properties of the Index Seek Operator from the first query:

Constants3

Against the properties for the Index Scan Operator from the second query:

Constants4

In the first one we can see that the Actual Number of Rows (at the top) exactly matches the Estimated Number of rows (at the bottom). SQL has been able to use the statistics on the index to get an accurate estimate.

In the second this is not the case. We have 500,000 rows estimate, but only 1 actual. This has led SQL down the route of choosing a plan that would have been more effective for 500,000 rows – but is much less effective for 1. In this case it didn’t know what value to optimize for. Lacking that information it used the density value in the statistics and multiplied that by the total number of rows to get the estimate. Or in other words, the statistics tell it that there are two distinct values (0 and 1) in the table. Not knowing which one has been supplied the optimizer figures than on average half the rows will be returned.

So what should do you to make your code clearer?

The simple answer is to use comments, the following is totally clear to its meaning, and will perform optimally:

SELECT *
FROM dbo.Task 
WHERE TaskStatus = 0 -- Open Task;

But what about the maintainability issue, where you may have to refer to the same value multiple times in a given procedure?

Unfortunately you’re just going to have to put up with maintaining the value in multiple places, but in general within a well designed application these should be static values (and hopefully integers) so it shouldn’t be too big a deal.

Note this is not the same for parameters passed to a stored procedure. In that case the queries inside the stored procedure will compile using the values passed the first time the procedure was executed – that can be really useful, but it can also cause its own set of issues to be aware of! Just remember parameters and variables – similar but not the same!

13 thoughts on “Avoid “Constants” in T-SQL”

Nice, clear article. Thanks

Matthew McGiffen says:

October 19, 2017 at 12:45 pm

Cheers!

Loading...

Reply

Adding OPTION (Recompile) or using dynamic sql will make the second statement the same execution plan as the first. Maybe not the best choice, but a possible tools in the toolbox. We ran into the same problem where a filtered index wasn’t being picked up unless we made it dynamic sql (so the variable became hardcoded in the actual execution). We settled on OPTION (Recompile).

Matthew McGiffen says:

October 19, 2017 at 2:08 pm

I guess that’s fine if you don’t mind the extra compliations, but best to avoid them if you can.

Loading...

Reply

One convention I’ve adopted for readability and search-ability is:

WHERE TaskStatus = 0 — 0 = fnTaskStatus.Open

and create a table valued function that returns Code/Description pairs. I don’t use the function in practice (except perhaps to drive UI translation), it’s just included as documentation that can found in both the code repository and the database server itself.

Hmm. Personally, I love the idea of establishing constants like a variable and referencing them.
Your article makes sense and it seems to be a good reason for MS to introduce something like:
DECLARE @MyConstant tinyint = 0 AS CONSTANT;

Matthew McGiffen says:

October 19, 2017 at 2:05 pm

Agree constants would be a nice addition to T-SQL, I believe Oracle has them.

Loading...

Reply

I apologize if this isn’t the right place to ask this question (since its not really related to the topic of constants), but I’m fairly new to SQL and I don’t understand part of the example code.

Why in the block that adds the 1,000,000 rows is it necessary to have the FROM sys.objects a, … line? The values aren’t coming from the sys.objects table (in fact, it doesn’t appear that it is actually used in any way), but I ran the code with and without that line and it definitely makes a difference (only one row gets added without it).

I tried searching for an answer but don’t really know how to phrase things and I’m not finding anything pertinent. Thanks for any explanation.

Matthew McGiffen says:

October 19, 2017 at 5:56 pm

No worrries Scott, it’s nice to be able to help. This is just a lazy way of creating a bunch of rows really quickly. I shouldn’t really use the commas, it’s an old style way of doing joins that isn’t very nice. Effectively this is the same as sys.objects a CROSS JOIN sys.objects b CROSS JOIN sys.objects c. A cross join creates a row for every combination of rows in the referenced tables so if sys.objects had 100 rows I’d get 100x100x100 i.e. one million rows. I specify TOP 1000000 as I’d probably end up with a lot more than that and I only want a million for the demo. I’m not picking anything up from sys.objects but specify my own values, so I get 1 million rows of the values I specified duplicated. You’d probably never actually do this is real code. Does that make sense?

Loading...

Reply

It does…now 🙂

What threw me was not just the “shorthand” for the cross joins, but also using a table to just “supply” rows but not values. I’ve never seen either of those before.

Thanks for taking the time to explain!

Thanks for keeping this long-standing issue in the SQL “Community Memory”.

I will note that there is a somewhat hacky options to dealing with this using views for situations where you have really pervasive constants that you want to use over and over. I have not used it, but it is a good tool to know about if you find constants to be really burdensome in your tsql code. Here’s the link:

https://blogs.msdn.microsoft.com/sql_server_appendix_z/2013/09/16/sql-server-variables-parameters-or-literals-or-constants/

This should have a qualification of the specific scenarios that you’re referencing. Using constants in insert statements, for example, does not have the same performance impact.

Nice explanation!

David Plaut says:

October 19, 2017 at 12:23 pm

Nice, clear article. Thanks

Loading...

1. Matthew McGiffen says:
  
  October 19, 2017 at 12:45 pm
  
  Cheers!
  
  Loading...
  
Gabriel Potter says:

October 19, 2017 at 1:40 pm

Adding OPTION (Recompile) or using dynamic sql will make the second statement the same execution plan as the first. Maybe not the best choice, but a possible tools in the toolbox. We ran into the same problem where a filtered index wasn’t being picked up unless we made it dynamic sql (so the variable became hardcoded in the actual execution). We settled on OPTION (Recompile).

Loading...

1. Matthew McGiffen says:
  
  October 19, 2017 at 2:08 pm
  
  I guess that’s fine if you don’t mind the extra compliations, but best to avoid them if you can.
  
  Loading...
  
Tom Crowley says:

October 19, 2017 at 1:41 pm

One convention I’ve adopted for readability and search-ability is:

WHERE TaskStatus = 0 — 0 = fnTaskStatus.Open

and create a table valued function that returns Code/Description pairs. I don’t use the function in practice (except perhaps to drive UI translation), it’s just included as documentation that can found in both the code repository and the database server itself.

Loading...

Aaron N. Cutshall (@sqlrv) says:

October 19, 2017 at 1:42 pm

Hmm. Personally, I love the idea of establishing constants like a variable and referencing them.
Your article makes sense and it seems to be a good reason for MS to introduce something like:
DECLARE @MyConstant tinyint = 0 AS CONSTANT;

Loading...

1. Matthew McGiffen says:
  
  October 19, 2017 at 2:05 pm
  
  Agree constants would be a nice addition to T-SQL, I believe Oracle has them.
  
  Loading...
  
Scott Sirotkin says:

October 19, 2017 at 5:40 pm

I apologize if this isn’t the right place to ask this question (since its not really related to the topic of constants), but I’m fairly new to SQL and I don’t understand part of the example code.

Why in the block that adds the 1,000,000 rows is it necessary to have the FROM sys.objects a, … line? The values aren’t coming from the sys.objects table (in fact, it doesn’t appear that it is actually used in any way), but I ran the code with and without that line and it definitely makes a difference (only one row gets added without it).

I tried searching for an answer but don’t really know how to phrase things and I’m not finding anything pertinent. Thanks for any explanation.

Loading...

1. Matthew McGiffen says:
  
  October 19, 2017 at 5:56 pm
  
  No worrries Scott, it’s nice to be able to help. This is just a lazy way of creating a bunch of rows really quickly. I shouldn’t really use the commas, it’s an old style way of doing joins that isn’t very nice. Effectively this is the same as sys.objects a CROSS JOIN sys.objects b CROSS JOIN sys.objects c. A cross join creates a row for every combination of rows in the referenced tables so if sys.objects had 100 rows I’d get 100x100x100 i.e. one million rows. I specify TOP 1000000 as I’d probably end up with a lot more than that and I only want a million for the demo. I’m not picking anything up from sys.objects but specify my own values, so I get 1 million rows of the values I specified duplicated. You’d probably never actually do this is real code. Does that make sense?
  
  Loading...
  
Scott Sirotkin says:

October 19, 2017 at 9:02 pm

It does…now 🙂

What threw me was not just the “shorthand” for the cross joins, but also using a table to just “supply” rows but not values. I’ve never seen either of those before.

Thanks for taking the time to explain!

Loading...

Eric says:

February 8, 2018 at 7:28 pm

Thanks for keeping this long-standing issue in the SQL “Community Memory”.

I will note that there is a somewhat hacky options to dealing with this using views for situations where you have really pervasive constants that you want to use over and over. I have not used it, but it is a good tool to know about if you find constants to be really burdensome in your tsql code. Here’s the link:

https://blogs.msdn.microsoft.com/sql_server_appendix_z/2013/09/16/sql-server-variables-parameters-or-literals-or-constants/

Loading...

Bragi says:

June 15, 2018 at 7:06 pm

This should have a qualification of the specific scenarios that you’re referencing. Using constants in insert statements, for example, does not have the same performance impact.

Loading...

dibyankadibyanka8275 says:

January 7, 2019 at 10:47 pm

Nice explanation!

Loading...

Matthew McGiffen Data

So why is this happening?

So what should do you to make your code clearer?

But what about the maintainability issue, where you may have to refer to the same value multiple times in a given procedure?

Like this:

Related

13 thoughts on “Avoid “Constants” in T-SQL”

Leave a ReplyCancel reply

So why is this happening?

So what should do you to make your code clearer?

But what about the maintainability issue, where you may have to refer to the same value multiple times in a given procedure?

Share this:

Like this:

Related

13 thoughts on “Avoid “Constants” in T-SQL”

Leave a ReplyCancel reply

Discover more from Matthew McGiffen Data