MAXDOP, Parallelism and the Cloud

Parallelism and MAXDOP

The pros and cons of parallelism have always been with us in SQL Server and I blogged about this a couple of years ago. This is an updated version of that post to include details of the new wait stat related to parallelism that was added in 2017 (CXCONSUMER), as well as to discuss the options available for cloud based SQL Server solutions.

There’s no doubt that parallelism in SQL is a great thing. It enables large queries to share the load across multiple processors and get the job done quicker.

However it’s important to understand that it has an overhead. There is extra effort involved in managing the separate streams of work and synchronising them back together to – for instance – present the results.

That can mean in some cases that adding more threads to a process doesn’t actually benefit us and in some cases it can slow down the overall execution.

We refer to the number of threads used in a query as the DOP (Degree of Parallelism) and in SQL Server we have the setting MAXDOP (Maximum Degree of Parallelism) which is the maximum DOP that will be used in executing a single query.

Microsoft generally recommend caution setting MAXDOP above 8:

https://support.microsoft.com/en-gb/help/2806535/recommendations-and-guidelines-for-the-max-degree-of-parallelism-configuration-option-in-sql-server

Here’s a nice post from Kendra Little talking about how having higher settings can actually slow down your query execution time:

https://www.brentozar.com/archive/2013/12/q-can-high-maxdop-make-a-query-slower/

Out of the box, MAXDOP is set to 0, which means there is no limit to the DOP for an individual query. It is almost always worth changing this to a more optimal setting for your workload.

Cost Threshold for Parallelism

This is another setting available to us in SQL Server and defines the cost level at which SQL will consider a parallel execution for a query. Out of the box this is set to 5 which is actually a pretty low number. Query costing is based on Algorithm’s from “Nick’s machine” the box used by the original developer who benchmarked queries for Microsoft.

Nicks_Machine

(Nick’s Machine)

Compared to modern servers Nick’s machine was pretty slow and as the Cost Threshold hasn’t changed for many years, it’s now generally considered too low for modern workloads/hardware. In reality we don’t want all our tiny queries to go parallel as the benefit is negligible and can even be negative, so it’s worth upping this number. Advice varies but generally recommendations say to set this somewhere in the range from 30 to 50 (and then tuning up and down based on your production workload).

There are many articles in the SQL Server community about how the out of the box setting is too low, and asking Microsoft to change it. Here’s a recent one:

http://www.scarydba.com/2017/03/13/change-the-cost-threshold-for-parallelism/

CXPACKET and CXCONSUMER waits

Often in tuning a SQL Server instance we will look at wait stats – which tell us what queries have been waiting for when they run. CXPACKET waits are usually associated with parallelism and particularly the case where multi-threaded queries have been stuck waiting for one or more of the threads to complete – i.e. the threads are taking different lengths of time because the load hasn’t been split evenly. Brent Ozar talks about that here:

https://www.brentozar.com/archive/2013/08/what-is-the-cxpacket-wait-type-and-how-do-you-reduce-it/

High CXPACKET waits can be – but aren’t necessarily – a problem. You can cure CXPACKET waits by simply setting MAXDOP to 1 at a server level (thus preventing parallelism) – but this isn’t necessarily the right solution. Though in some cases in can be, SharePoint for instance is best run with MAXDOP set to 1.

What you can definitely deduce from high CXPACKET waits however is that there is a lot of parallelism going on and that it is worth looking at your settings.

To make it easier to identify issues with parallelism, with SQL Server 2017 CU3 Microsoft added a second wait type related to parallelism – CXCONSUMER. This wait type was also added to SQL Server 2016 in SP2.

Waits related to parallelism are now split between CXPACKET and CXCONSUMER.

Here’s the original announcement from Microsoft regarding the change and giving more details:

https://blogs.msdn.microsoft.com/sql_server_team/making-parallelism-waits-actionable/

In brief, moving forward CXPACKET waits are the ones you might want to worry about, and CXCONSUMER waits are generally benign, encountered as a normal part of parallel execution.

Tuning Parallelism

In tuning parallelism we need to think about how we want different sized queries to act on our server.

Small Queries

In general we don’t want these to go parallel so we up the Cost Threshold to an appropriate number to avoid this. As discussed above 30 is a good number to start with. You can also query your plan cache and look at the actual costs of queries that have been executed on your SQL Instance to get a more accurate idea of where you want to set this. Grant Fritchey has an example of how to do that here:

http://www.scarydba.com/2017/02/20/estimated-costs-queries/

As he mentions in the post, this assessment can be quite expensive to run – so do it when things are quiet.

Medium to Large Queries

This is where we want to take advantage of parallelism, and do so by setting MAXDOP to an appropriate level. Follow the guidelines from the Microsoft article referenced above. Here it is again:

https://support.microsoft.com/en-gb/help/2806535/recommendations-and-guidelines-for-the-max-degree-of-parallelism-configuration-option-in-sql-server

Often the answer is going to be simply to set it to 8 – but then experiment with tuning it up and down slightly to see whether that makes things better or worse.

Very Large Queries

If we have a mixed workload on our server which includes some very expensive queries – possibly for reporting purposes – then we may want to look at upping the MAXDOP for these queries to allow them to take advantage of more processors. One thing to consider though is – do we really want these queries running during the day when things are busy? Ideally they should run in quieter times. If they must run during the day, then do we want to avoid them taking over all the server power and blocking our production workload? In which case we might just let them run at the MAXDOP defined above.

If we decide we want to let them have the extra power then we can override the server MAXDOP setting with a query hint OPTION(MAXDOP n):

https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query

You will want to experiment to find the “best” value for the given query. As discussed above and as shown in Kendra Little’s article, just setting it to the maximum number of cores available isn’t necessarily going to be the fastest option.

Exceptions to the Rule

Regardless of the size, there are some queries that just don’t benefit from parallelism so you may need to assess them on an individual basis to find the right degree of parallelism to use.

With SQL server you can specify the MAXDOP at the server level, but also override it at the database level using a SCOPED CONFIGURATION or for individual queries using a query hint. There are even other ways you can control this:

https://www.brentozar.com/archive/2016/12/ten-ways-set-maxdop/

Options in the Cloud

If your SQL Server is hosted in the cloud, then most of this still applies. You still need to think about tuning parallelism – it isn’t done for you, and the defaults are the same – so probably not optimal for most workloads.

There are in general two flavours of cloud implementation. The first is Infrastructure as a Service (IaaS) where you simply have a VM provided by your cloud provider and run an OS with SQL server on top of it in that VM. Regardless of your cloud provider (e.g. Azure, AWS etc.), if you’re using IaaS for SQL Server then the same rules apply, and you go about tuning parallelism in exactly the same way.

The other type of cloud approach is Platform as a Service (PaaS). This is where you use a managed service for SQL Server. This would include Azure SQL Database, Azure SQL Database Managed Instance, and Amazon RDS for SQL Server. In these cases, the rules still apply, but how you manage these settings may differ. Let’s look at that for the three PaaS options mentioned above.

Azure SQL Database

This is a single SQL Server database hosted in Azure. You don’t have access to server level settings, so you can’t change MAXDOP or the cost threshold. You can however specify MAXDOP at the database level e.g.

ALTER DATABASE SCOPED CONFIGURATION SET MAXDOP = 4;

Cost threshold for Parallelism however is unavailable to change in Azure SQL Database.

Azure SQL Database Managed Instance

This presents you with something that looks very much like the SQL Server you are used to, you just can’t access the box behind it. And similar to your regular SQL instance, you can set MAXDOP and the Cost threshold as normal.

Amazon RDS for SQL Server

This is similar to managed instance. It looks and acts like SQL Server but you can’t access the machine or OS. You access your RDS instance through an account that has permissions that are more limited than your usual sa account or sysadmin role allows. And one of the things you can’t do with your limited permissions is to change the parallelism settings.

Amazon have provided a way around this though and you can change both settings using something called a parameter group:

https://www.mssqltips.com/sqlservertip/5329/setting-sql-server-configuration-options-with-aws-rds-parameter-groups/

Closing Thoughts

Parallelism is a powerful tool at our disposal, but like all tools it should be used wisely and not thrown at every query to its maximum – and this is often what happens with the out of the box settings on SQL Server. Tuning parallelism is not a knee-jerk reaction to high CXPACKET waits, but something we should be considering carefully in all our SQL Server implementations.

Acknowledgements

I wanted to update my original article to include the cloud options noted above, but didn’t have access to an Azure SQL Database Managed Instance to check the state of play. Thanks to TravisGarland via Twitter (@RockyTopDBA) and Chrissy LeMaire via the SQL community slack (@cl) for checking this and letting me know!

Finding “Orphaned” database files – script

When you drop a database from a SQL Server instance the underlying files are usually removed. This doesn’t happen however if you set the database to be offline first, or if you detach the database rather than dropping it.

The scenario with offline databases is the one that occurs most often in practice. I might ask if a database is no longer in use and whether I can remove it. A common response is that people don’t think it’s in use, but can I take it offline and we’ll see if anyone screams. I’ll often put a note in my calendar to remove it after a few weeks if no-one has complained. When I do come to remove it, hopefully I’ll remember to put it back online before I drop it so the files get removed, but sometimes I might forget, and in an environment where many people have permissions to create and drop databases you can end up with a lot of files left behind for databases that no longer exist – these are what I’m referring to as orphaned files.

Obviously this shouldn’t happen in production environments where change should be carefully controlled, but if you manage a lot of development and test environments this can certainly occur.

So I created a script I can run on an instance to identify any files in its default data and log directories that are not related to any databases on the instance. Here it is:

--Orphaned database files
DECLARE @DefaultDataPath VARCHAR(512);
DECLARE @DefaultLogPath VARCHAR(512);

SET @DefaultDataPath = CAST(SERVERPROPERTY('InstanceDefaultDataPath') AS VARCHAR(512));
SET @DefaultLogPath = CAST(SERVERPROPERTY('InstanceDefaultLogPath') AS VARCHAR(512));

IF OBJECT_ID('tempdb..#Files') IS NOT NULL
  
DROP TABLE #Files;

CREATE TABLE #Files (
  
Id INT IDENTITY(1,1),
  
[FileName] NVARCHAR(512),
  
Depth smallint,
  
FileFlag bit,
  
Directory VARCHAR(512) NULL,
  
FullFilePath VARCHAR(512) NULL);

INSERT INTO #Files ([FileName], Depth, FileFlag)
EXEC MASTER..xp_dirtree @DefaultDataPath, 1, 1;

UPDATE #Files
SET Directory = @DefaultDataPath, FullFilePath = @DefaultDataPath + [FileName]
WHERE Directory IS NULL;

INSERT INTO #Files ([FileName], Depth, FileFlag)
EXEC MASTER..xp_dirtree @DefaultLogPath, 1, 1;

UPDATE #Files
SET Directory = @DefaultLogPath, FullFilePath = @DefaultLogPath + [FileName]
WHERE Directory IS NULL;

SELECT
  
f.[FileName],
  
f.Directory,
  
f.FullFilePath
FROM #Files f
LEFT JOIN sys.master_files mf
  
ON f.FullFilePath = REPLACE(mf.physical_name,'\\', '\')
WHERE mf.physical_name IS NULL
  AND
f.FileFlag = 1
ORDER BY f.[FileName], f.Directory

DROP TABLE #Files;

I wouldn’t say that you can just go delete these once you’ve identified them, but at least now you have a list and can investigate further.

By the way, you might notice a nasty join statement in the above query. This is to deal with instances where the default directories have been defined with a double backslash at the end. SQL Server setup allows this and it doesn’t cause any day-to-day problems, but can make this sort of automation challenging. I’ve included it in this query as I’ve encountered a few people having this issue. In general I’d avoid such joins like the plague.

Making things more complicated

One complication can be where you have multiple SQL Server instances on the same server. This isn’t greatly recommended in production, but is common in dev\test. Where a database has been migrated from one instance to another, it’s possible that hasn’t been done correctly and the files still exist under the directories for the old instance and you might then see them as orphaned. I previously posted a script to identify such files:

https://matthewmcgiffen.com/2018/04/24/database-files-down-the-wrong-path/

Also in the case of multiple instances, you might want to report across all of them at once. For that you could use SQL CMS. I’ve posted on that too:

https://matthewmcgiffen.com/2018/04/10/cms-effortlessly-run-queries-against-multiple-sql-servers-at-once/

Combining these three techniques makes it relatively easy to identify files that are probably no longer needed. You can get a list of all files that don’t belong to databases on the instances they live under, correlate that to any files that are down the wrong path for any of your instances, then look at what’s left over.

Using the Sequence Object to Generate Reference Numbers

The SEQUENCE object was added to T-SQL in SQL Server 2012. It’s reasonably well known to DBAs, but less so to developers or those new to SQL, so I thought I’d produce a quick post to demonstrate its use.

In basic terms, a SEQUENCE is a way of generating a sequence of numerical values. The following are examples of sequences you could generate:

1, 2, 3, 4, 5 6, 7, 8, 9…

1, 6 , 11, 16, 21, 26, 31…

1000, 1001, 1002, 1003, 1004…

You can pick a starting number (and an ending number if you want), a data type (which might produce a natural maximum to the size of the sequence) and an increment (i.e. how much you want to be added to produce the next number in the sequence). There are other options, but I’m going to focus on the simplest use case. You can find the full documentation here:

https://docs.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql?view=sql-server-2017

So, let’s define my use case. I have a table to hold customer orders. For each record want to define an Order Reference Number of the format ORN0000000001.

Now you could implement something using an IDENTITY column to manage this – but there may be times when that is not ideal, for instance your table may not already have a suitable identity to use (you might have a unique identifier as the primary key) and if you want to store the actual reference then you’d need to add an IDENTITY column in addition to the reference column. Or you might need a reference that is unique across multiple tables.

The SEQUENCE object is also designed to be faster than IDENTITY, creating less blocking when you have a lot of concurrent inserts.

First of all, creating the sequence to generate the numeric part of my reference is easy. Let’s say that a bunch of reference numbers have already been used so I want to start with ORN0000100001

Let’s look at the SQL…

CREATE SEQUENCE dbo.OrderRefSequence 
  
AS bigint
   START
WITH 100001
   INCREMENT
BY 1;

Then I can request numbers from the sequence using NEXT VALUE FOR e.g.

SELECT NEXT VALUE FOR dbo.OrderRefSequence; 

The first time I run that I get the starting number 100,001.

Another nice addition to SQL Server 2012 was the FORMAT function which we can use to format the number into a string whilst padding it with leading zeroes and adding the text prefix:

SELECT FORMAT(NEXT VALUE FOR dbo.OrderRefSequence, 'ORN0000000000#');


That returns me ORN00000100002.

If I keep executing it then the reference increases:

ORN00000100003

ORN00000100004

ORN00000100005

ORN00000100006…

So, now I can just use that when inserting values to my table to get a new reference number each time.

But, what’s even nicer is that you can do it all by defining a default for your column and referencing the sequence in the default.

I’ll create the following table:

CREATE TABLE dbo.Orders (
  
Id UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWSEQUENTIALID(),
  
CustomerId UNIQUEIDENTIFIER NOT NULL,
  
OrderReference VARCHAR(20)
   DEFAULT
(FORMAT(NEXT VALUE FOR dbo.OrderRefSequence, 'ORN0000000000#')),
  
OrderDate DATETIME DEFAULT(GETUTCDATE()));

You can see that the OrderReference is defined with a default using our sequence object.

I insert a bunch of rows into the table. For the sake of this rather contrived example, I only need to specify the CustomerId. I do that by generating a bunch of random unique identifiers – one for each row in the sys.objects table.

INSERT INTO dbo.Orders (CustomerId)
SELECT NEWID() FROM sys.objects;

Let’s have a look at an extract from the table:

You can see I’ve got a nice series of ascending, non-duplicating reference numbers.

One thing to note is that, while the sequence will generally produce unique number, it is still worth enforcing that in your table definition with a unique constraint i.e.

ALTER TABLE dbo.Orders ADD CONSTRAINT UQ_Orders_OrderReference UNIQUE(OrderReference);

This prevents someone from issuing an UPDATE command that might create a duplicate reference. Also, once the sequence runs out of numbers it will repeat back at the beginning unless you specify NO CYCLE in the defintion of the sequence – obviously in most applications this is unlikely to be an issue if you’re using a bigint for the sequence.

There was also a bug in some versions of SQL 2012 and 2014 that meant a duplicate could get created when your server was under memory pressure:

https://support.microsoft.com/en-gb/help/3011465/fix-sequence-object-generates-duplicate-sequence-values-when-sql-serve

This was fixed with SQL Server 2012 SP2 CU4 and SQL Server 2014 CU6 – but it’s better to be safe than sorry.

As a final note, it’s worth remembering that with the GDPR, these sorts of references are defined as personal data.That’s one good reason not to ever consider using these sorts of references as the primary key of your table (there are many others) – but also a reason why – where you already have an identity based primary keys that you could use to generate the references –  it may be worth decoupling them from the primary key and basing them on a separate sequence instead.

There’s got to be a better way

tsql2sday150x150

For T-SQL Tuesday #111, Andy Leonard asks “What is your why? Why do you do what you do?”

Like Andy, I didn’t take a straight route into being a DBA. I actually trained as a teacher, and did a variety of temp jobs – largely in call centres – before I got my first “proper” job as a developer.

My first steps into development were when I was a team leader in a call centre. There was a lot of paper shuffling going on. I felt there had to be a better way of doing things, so developed processes using Access and Excel and convinced the Admin team in the benefits of using them.

Not exactly development, more Macros with a bit of VBA, but that got me spotted by another team that were trying to automate reporting in the call centre – again using MS Access. That was where I first encountered relational databases and got deeper into coding. I joined as they were in the process of taking an Excel spreadsheet that processed flat files using macros – taking the whole day/every day to chunder away, replacing it with a coded solution. Finally we had it importing all the call center stats in seconds on a schedule in the early hours.

Nowadays that seems trivial, but back then it felt like a great achievement. It’s still one of my best work memories – when I showed my boss the automated import for the first time.

Moving forward from there and trying to answer Andy’s question, I realised that what started me off is what’s kept moving me forward. That there’s always got to be a better way.

Another of my favourite early developer memories was being asked to look at whether a file import to an application could be sped up as it was taking 10 minutes. The process was to upload a file with corporate credit card statements from a flat file into a SQL database for the application that companies used to manage them. When I opened it up I was quite impressed with the code, it seemed cleverer than anything I could have written – but maybe a bit too clever. I asked myself if there was a better and simpler way of achieving the same thing. There was, and it turned out to be a lot lot quicker.

Making things run faster remained something I loved doing, but it often seemed often that optimisation wasn’t a big focus in application coding –  that was more about delivering features. I think that’s part of what led me to specialise in data where performance, particularly at scale, is so key. Looking at SQL code and going – “well, sure it works as it is, but is there not a better way of doing this.”

It strikes me that for a lot of us, this is how we ended up in software, we saw things being done a certain way and saw scope for improvement. Felt that there must be a better way.

And it’s good to remember this isn’t an attitude where we can rest on our laurels. Even when we know what we’re doing, have done something a hundred times before, it’s still good to say “There’s got to be a better way”, and to think about what that might be. Not only does that mean we’re continually improving, but it’s also what keeps working in technology fresh and interesting.

Some differences with SQL Server when running on AWS RDS

If you plan on using Amazon Web Services (AWS) to host your SQL Server based applications in the cloud, then you have a couple of options.

One is just to have an EC2 instance (a VM) and install the versions of the OS and SQL Server you want. There are also images you can use that will have these pre-installed. This is what’s known as the IaaS option (Infrastructure as a Service). If you take this option, then SQL Server is exactly the same as it would be if you had it on-prem.

Or you can go with Amazon RDS (Relational Database Service).  This is more of a managed service where Amazon looks after some aspects of your database server for you. In return you give up some of the control you would have with your own server or VM. You can still pick the version of SQL Server you want installed, usually down to which cumulative update you want – though note that RDS normally lags behind the latest box version of SQL by 3 months or so. RDS is what’s known as a PaaS offering (Platform as a Service).

So, what do you give up and what do you gain? Here’s a quick summary of a few things I’ve noticed. This is not intended to be comprehensive and please bear in mind that AWS is a fast-moving beast – changes happen regularly.

What you can access

You can still connect to and manage your instance using SSMS, but you have no direct access to the server hosting it, such as configuration of the OS, or access to the disks. Through SSMS you have an access level slightly below Sa – which limits you to only what is allowed.

 

High Availability

In RDS you don’t use Availability Groups, Log Shipping or any of that stuff. Instead, HA is achieved using multiple Availability Zones (AZ). With this enabled, writes are performed synchronously to a replica in a second availability zone (usually a separate physical data centre in the same region). If the primary AZ goes down, then failover will happen automatically. You can also choose to replicate to other regions to be protected again failure of an entire AWS region – though often data protection laws mean that might not be an option for you if you hold personal data and there is only one AWS region in your country.

AWS RDS Multi-AZ promises a monthly up-time of 99.5% (which allows for 22 minutes of downtime per month). Full details of the SLA are here:

https://aws.amazon.com/rds/sla/

 

Disaster Recovery

RDS performs automated backups of your whole instance, including the equivalent of log backups every 5 minutes. That means that in case of a disaster, where Multi-AZ failover fails, or where someone deletes data they shouldn’t, then the maximum data loss (RPO) is 5 minutes. The built-in functionality allows you to restore your instance to a point in time, which is implemented by restoring to a new instance. It is not possible to recover individual databases. Backup retention can be set up to maximum of 35 days.

If you need more than that then you can still take native SQL backups, but this has to be enabled specifically and because you can’t access the underlying disks you have to use a stored procedure specific to RDS:

https://aws.amazon.com/premiumsupport/knowledge-center/native-backup-rds-sql-server/

There are also other options – such as taking extra snapshots and storing them in a separate region.

 

Patching

AWS handles patching of minor versions to your instance for you. You can choose for this to be done automatically or triggered by manual intervention through the AWS console.

 

Encryption

RDS has its own at-rest encryption similar to TDE. This is available for all instances hosted on RDS so unlike TDE you don’t need to be on an Enterprise Edition of SQL Server.

 

Collation

All SQL Server instances on RDS are set up with a server collation of SQL_Latin1_General_CP1_CI_AS and you can’t change this. You can specify a different collation for your databases but this might mean you run into collation issues if you use temporary tables and compare string based columns with those in your databases – as temp tables are created with the collation of the instance.

 

Server Memory

In an on-premise version of SQL Server we would carefully provision a proportion of the overall server memory for SQL Server. On RDS this is not an available setting, the amount of memory is fixed with our general instance sizing, however Amazon do make recommendations for tuning the amount of memory given to an RDS instance – which would mean resizing the instance if necessary.

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html#CHAP_BestPractices.Performance.RAM

The recommendation is that there should be enough memory that the “working set” of data is retained in memory. Memory should be tuned so that the ReadIOPS metric is “small and stable”.

 

Recovery Model

This is worth mentioning even though it is not entirely configurable in RDS. If you have backup retention set to longer than 0 days (i.e. take backups) then the recovery model will be set to FULL. If you set backup retention to zero (which disables backups) then recovery model will be set to SIMPLE. If you manually change the recovery model, RDS will automatically change it back within 5 minutes. This applies to all databases.

MAXDOP and parallelism

In RDS you cannot change the instance level MAXDOP, or the “Cost Threshold for Parallelism”  through SQL. Instead these must be configured through a parameter group:

https://www.mssqltips.com/sqlservertip/5329/setting-sql-server-configuration-options-with-aws-rds-parameter-groups/

 

Optimize for Ad-hoc Workloads

Like the parallelism settings this cannot be modified through SQL. So, if you want this setting enabled you must use a parameter group:

https://www.mssqltips.com/sqlservertip/5329/setting-sql-server-configuration-options-with-aws-rds-parameter-groups/

 

Instant File Initialization

This setting means that when files grow the new space can be allocated immediately without taking time to fill the space with zeros.

This cannot be enabled in RDS, so it is of extra importance to size databases appropriately to avoid auto-growth where possible.

 

Extended Events vs Profiler

Unfortunately, Extended Events is not available in RDS, so if you wish to trace events you must use traditional traces/profiler.

Note that when setting up a trace against a busy instance you should create a server-side trace to minimize the impact on performance.

 

That’s the list of key differences I’ve noticed so far in working with RDS to provision new SQL instances.

 

What stays the same

There are a few other things you may be wondering if you still have control over (I know I was). So just to confirm, you can still do the following:

  • Configure multiple files for TempDB
  • Use either SQL or Windows authentication (or mixed)
  • Schedule jobs with SQL Server Agent
  • Run DBCC CHECKDB
  • Rebuild indexes and statistics

 

Please comment on this post if you notice any other significant differences you think people should be aware of – or if you notice updates to AWS that make any of these points invalid.

 

Thanks!

 

 

 

 

 

 

 

 

Checking for Membership of a Specific Active Directory Group

As part of my job I manage a bunch of SQL instances for Development and Test.

Access is managed though Active Directory groups, so I rarely have to do anything regards managing permissions. Nonetheless I often get requests from people to give them access. This is usually for a new starter or someone who has moved from one team to another.

Of course, the answer is usually that they just need adding to the right AD group. Rather than assume though, I always get them to check before I pass the request on to the AD team. You never know, there could be something else wrong.

T-SQL has a lovely little function for this, IS_MEMBER. For instance, If I want to know if I’m a member of MyDomain\SQLAdmins I just run:

SELECT IS_MEMBER('MyDomain\SQLAdmins');

If it returns 1 then I am a member. Zero then I am not. Null means it can’t find the group, probably because I’ve spelt it wrong.

So, I can get the requester to check for themselves if they are a member of the group, and then we can raise the request to get them added.

IS_MEMBER is also useful if you want to check if you are a member of a specific database role – either one of the built-in ones or a user-defined one e.g.

SELECT IS_MEMBER('db_owner');

I’m not sure how well known this is and I just had one of those requests, so I thought I’d create a quick post on the topic.

Extended Events Made Easy: Using XEvent Profiler

In this post we’ll look at how quick and easy it is to use the XEvent Profiler to create an Extended Events (XE) session to replace the most common usage of SQL Server Profiler.

Before you start, update your SSMS to a recent version (17.3) or later:

https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-2017

Now, let me ask you the question – how often do you just go into Profiler and create a trace taking all the defaults? If you’re like me then that’s pretty often. It’s nice and easy, select the menu item from SSMS and you’re going in a couple of clicks.

All XEvent Profiler does is implement a couple of standard XE sessions very similar to Profiler, that you can launch with a click or two.

You can find XE Profiler in the object explorer in SSMS under each of your SQL instances – just below SQL Server Agent:

XE_1

Because it’s there for each instance, you don’t have to then connect to the instance separately (as you do in Profiler).

You’ve got two options, Standard which pretty much equates to the default trace in Profiler, or TSQL which will just capture SQL statements being executed. Let’s right-click on Standard and select “Launch Session”. Here’s the Live Data view that comes up:

XE_2

You can see that looks pretty similar to what you see for your traditional traces.

Now, one thing I almost always do for this sort of trace in Profiler is capture the database name. This is a bit of a pain, because I have to select “View all columns”, scroll along to find the database name column, then select it for each relevant event I’m capturing.

In XE it’s much easier. I just right click on one of the column headers in the above view and select “Choose Columns”:

XE_3

Which brings up the menu of columns available to me:

XE_4

I can then just select the database_name column in the left and use the arrow button to push it into the list of selected columns. Want row_count? That’s there too, along with a whole bunch of other stuff. I’ll add those and remove a few I don’t want and exit the dialog.

The other thing I’m commonly going to want to do is add a filter to one or more of the columns. When you’re viewing live data you get the Extended Events toolbar shown in SSMS below the menu bar, just click on Filters…

XE_5

And you get the filters dialog:

XE_6

Click below “Field” to select a column and then add a value to filter on. In this case I’m going to filter to look at a specific database.

It’s worth noting that there are a bunch of operators you can use (not just equals), you can do greater than, less than, contains, does not contain, is null etc.

Now we can see the effect of those changes in the Live Data view:

XE_7

The above is what I’d usually do in Profiler, except that here I’ve also added an extra column and removed a few – because it was so easy to do. The whole thing felt like a simpler and cleaner experience than the old way.

But I get it, a marginally simpler experience probably isn’t enough to make you change habits. What might, is what you can do with your trace results once you’ve got them. In the next post we’ll look at sorting, grouping and aggregation.

Previous posts in this series:

There’s Still a Place for SQL Server Profiler

Using the built-in System Health session

Exporting Extended Events Session Data to a Table

 

Don’t Just Rely on Query Execution Stats for T-SQL Execution

I recently had an incident where I was looking into the cause of a long running process for a client. It was a batch process that ran overnight, and execution time had been growing until it was now taking over 4 hours.

The database involved is a kind of staging area for MI, so I wanted to look at any heavy queries being executed against the database – I knew that these would all be part of the batch process. I asked them to run the following query for me:

SELECT TOP 50
   DB_NAME(qt.dbid) AS DatabaseName, 
   qt.TEXT, 
   qs.execution_count,
   qs.total_worker_time,
   qs.total_physical_reads,
   qs.total_logical_writes,
   qs.total_logical_reads,
   qs.total_elapsed_time
FROM sys.dm_exec_query_stats qs
CROSS apply sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE ISNULL(qt.dbid,0) = DB_ID();

I got less than 50 rows back so figured I had everything covered, but the total elapsed time across all the queries was less than ten minutes, I knew the server hadn’t been rebooted for about a month so potentially that could be as little as 20 seconds of query execution a night. Even if the full 10 minutes was from the last 24 hours that still didn’t account for the long run times.

So, I convinced myself it wasn’t the queries and started looking at other things they could check. It was a long list. Lots of theories revolved around the idea that something else was running on the host at the same time.

As part of that I decided to get them to run an extended events trace so I could look at exact timings and waits during the batch process (and anything else that might be going on in SQL). I got a lot of data back which took a fair while to index and load. Figuring I’d already ruled out query execution times for the batch process itself, I expected the analysis was going to be tricky to drill down on the problem, however as soon as I looked for heavy queries running during the execution window I found one specific stored procedure with the following metrics:

QueryExStats1

Ouch.

Thing is, these figures were for a stored procedure that was part of the batch run. I date for this in the original output but the figures for query execution duration and CPU were tiny in comparison to these. So, were the query execution stats wrong?

No, they weren’t, dm_exec_query_stats only captures figures for cached query plans, so you might have a query that’s not caching a plan for whatever reason – or might have something that’s doing work that’s not actually a query, imagine a while loop that’s trying to find all the prime numbers under a trillion without using sets or queries. That would consume a lot of CPU and duration but wouldn’t show in in the query execution stats – because it’s not a query.

In this case it turned out the stored procedure was:

  • Trying to process over 2 million rows using a cursor
  • For each row it would attempt to execute a stored procedure that didn’t exist using dynamic SQL
  • It would catch the failure and update an error message against the row – using the cursor
  • Every day it would attempt to process the failed rows again
  • Every day more rows would be inserted to the table that would fail

In particular it was the use of the cursor that was most expensive, and not being captured in the execution stats.

To be kind to the original developer, it was a clever generic solution to a specific problem. It wasn’t optimal from a performance point of view, but they only expected it to have to deal with a few rows daily.

Then the volume of data increased.

Then a subsequent developer made a change but didn’t know they needed to add a new supporting stored procedure. The resulting issue didn’t get picked up in testing.

The number of failed rows had been growing every night for a couple of years, and the execution time creeping up until it went exponential.

This post isn’t about bad code though, it’s about remembering that not all resource consumption on our SQL boxes comes from cached queries, so when trying to troubleshoot remember that query execution stats are one place to look, but don’t cover everything that might be being executed in T-SQL.

T-SQL Tuesday #106 – The Trouble with Triggers

tsql2sday150x150

For this month’s T-SQL Tuesday Steve Jones asks us to talk about our experiences with Triggers:

https://voiceofthedba.com/2018/09/03/t-sql-tuesday-106-trigger-headaches-or-happiness/

In my post I’m going to focus on DML (Data Modification Language) triggers. Their counterparts, DDL (Data Definition Language) triggers have any number of interesting applications. It’s usually DML triggers however that attract the most controversy.

DML triggers seem like a great functionality when you first discover them in your SQL career, especially if you come from an application coding background. Event driven programming for data – sounds brilliant. If someone does something to table X, have a trigger than in turn does something to tables Y and Z, you can even have triggers on Y and Z and cascade actions to other tables and beyond.

Once you’ve been around for a while though you realise what a headache this can end up being. You think you’re just inserting a record into table X but there’s this whole bunch of stuff going on behind the scenes that you may not know about – or want.

This is the biggest objection to triggers. They hide logic. If I have table called FeeTransaction and there is a bunch of logic that MUST happen when I insert a record here, then that’s clearer if it’s all encapsulated in a stored procedure that inserts the record. That makes it easier to maintain and troubleshoot rather than hunting around to find independent events that might be kicking off all over the place.

The only place I’m really a fan of DML triggers is in capturing Audit information. Most commonly for maintaining basic things like the last updated date and time for a record. I like having a mechanism that means I can see when data was last changed – whatever method was used to change it. i.e. if it was through the application, or if someone with write-access just ran an update query. Then I don’t need to rely on that person remembering to alter the LastUpdated column when they made their change.

The need to maintain this sort of basic audit info is so ubiquitous that it seems silly it is not a core feature of SQL Server, i.e. the ability to add a column to a table that will always automatically capture the time a record was modified, presumably this could also be done in a tamper-proof manner. In the same way we could have the ability to capture the Login context used to make the change or potentially anything else we need to capture.

Adam Machanic has raised a request with Microsoft to include functionality, if you agree it would be handy then please upvote here:

https://feedback.azure.com/forums/908035-sql-server/suggestions/32901964-provide-a-mechanism-for-columns-to-automatically-u

Exporting Extended Events Session Data to a Table

If you’re a long time Profiler user like me then you probably often take the option of saving (or loading) your trace results to a table for easy analysis. Well, with Extended Events (XE) it’s easy to do that too.

Once you’ve opened the session to view the data you get an extra drop-down menu “Extended Events” on the menu bar in SSMS. If you open that and browse down, you can quickly find the option to export the results to a table:

xe_table1

Or to a CSV or XEL file if you wish.

You just need to select a destination database connection and table name and the export starts. Be warned that it doesn’t default to the current database connection. I’ve fallen for that and overwritten the data in a table with the same filename on a different SQL instance – whoops!

If the option is greyed out when you open the menu it may be that your event data is still loading. If you look closely in the above screenshot you can see I have over 8 million events captured by this session, so it took a while to load before I was able to export.

Once the export has finished you end up with a table that looks a bit like this (the exact set of columns will depend on which fields you have collected):

xe_table2

One gripe I have with this functionality, and that you can see above, is that all the text columns come out defined as nvarchar(max). Apart from anything else this means you can’t index the columns without changing the data type.

That’s one thing that Profiler did better, for instance database_name captured from Profiler would have been an nvarchar (128).

I thought I’d knock this post together while waiting for my statement to complete which alters database_name to the preferred nvarchar(128). With 8 million records that’s taken 21 minutes so far. And then I’ll have to index it.

Yes, maybe it’s my fault for capturing so many events, but in my defence, I’m trying to do some analysis in a dev environment to work out what I can filter out safely, before passing the same session to a client to use in production to diagnose the problem with a long running SSIS package.

Still, as you can see it’s an easy to process to export your Extended Events data to a table so you can do your analysis there – rather than having to resort to querying XML.