SQL Puzzle 3: Knights and Queens

August 22, 2017 ~ Matthew McGiffen ~ 2 Comments

I thought I’d do another chess puzzle this month. This one is a variant on the Eight Queens problem:

Given the standard 8×8 chessboard, place an equal number of Knights and Queens such that no piece is attacked by another. What’s the maximum number of Knights and Queen’s you can place? Obviously you should use SQL to find the answer!

I’m sure you all know how a knight moves and attacks in chess, but just in case here’s a diagram:

knight_move

The white circles show each possible move for the knight, i.e. it can move two squares in one direction and then 1 square perpendicular to that (or vice-versa). An “L” shape.

Have fun!

Automatic Sample Sizes for Statistics Updates

August 15, 2017 ~ Matthew McGiffen ~ 8 Comments

I mentioned in my previous post about manually updating statistics that you can specify whether they’re updated using a full scan, or you can specify an amount of data to sample, either a percentage of the table size, or a fixed number of rows. You can also choose not to specify this, and SQL Server will decide for you whether to do a full scan, or to sample a certain amount of data.

I thought it would be interesting to look at what the sample sizes are that SQL will choose to use, depending on the amount of data in your table. Note that this occurs if you update statistics without specifying how they should be sampled as below:

UPDATE STATISTICS dbo.Test _WA_Sys_00000002_3AD6B8E2;

This is also the behaviour you will get when SQL updates statistics through the auto-stats mechanism. The fact that auto-stats may sample at a lower rate than is optimal for a given table and the queries against it is another reason you may choose to perform manual statistics updates.

To test this, I created a table and progressively pumped data in. Then after I inserted each batch of rows:

I Ran a stats update capturing the CPU time taken
Checked the statistics to see what sample size was used in the update
Checked the size of the index

Here’s some of the code I used for the test:

--Drop table if exists
IF (OBJECT_ID('dbo.Test')) IS NOT NULL DROP TABLE dbo.Test;

--Create table for Testing
CREATE TABLE dbo.Test(
   Id INT IDENTITY(1,1) CONSTRAINT PK_Test PRIMARY KEY CLUSTERED, 
   TextValue VARCHAR(20) NULL
);

--Create index on TextValue
CREATE INDEX IX_Test_TextValue ON dbo.Test(TextValue);

--Insert a bunch of rows
INSERT INTO dbo.Test(TEXTValue) 
SELECT TOP 100000 'blah'
FROM sys.objects a, sys.objects b, sys.objects c, sys.objects d;  

--Update statistics without specifying how many rows to sample
SET STATISTICS TIME ON;
UPDATE STATISTICS dbo.Test IX_Test_TextValue;
SET STATISTICS TIME OFF;

--View the statistics
DBCC SHOW_STATISTICS('dbo.Test', IX_Test_TextValue) WITH STAT_HEADER;

--Check the size of the index
SELECT
    i.name                  AS IndexName,
    SUM(s.used_page_count)   AS Pages
FROM sys.dm_db_partition_stats  AS s 
JOIN sys.indexes                AS i
ON s.[object_id] = i.[object_id] AND s.index_id = i.index_id
WHERE i.name = 'IX_Test_TextValue'
GROUP BY i.name

The results of my testing are shown in the below table:

StatisticsSamplingRates

You can see that we have a full sample being taken for the statistics updates up to 4000,000 records (896 pages) but that once the table size hits 500,000 sampling is happening. If you look at the number of pages you will see we now have over 1,000 pages, 1000 pages being about 8MB of data, which is the threshold that sampling kicks in.

I wasn’t able to find a nice neat formula to determine the sampling rate based on table size, but if we look at the above figures there are still some insights to be gained. The main one is that you’ll notice that even as we double the table size, the number of rows sampled doesn’t go up by much. For instance from 500,000 to a million rows, only 10,000 more rows are sampled. This also means that even for pretty large tables, the update isn’t taking long – another reason why it’s worth leaving auto stats updates enabled and running synchronously with queries – they’re generally not going to take that long.

Another insight is that the percentage of rows sampled drops off very quickly. As the sample size doesn’t really increase that much even when the table size doubles – the percentage sampled has almost halved each time.

How to be a bad interviewer

August 8, 2017August 8, 2017 ~ Matthew McGiffen ~ 9 Comments

tsql2sday150x150

In the monthly SQL blogging party that is T-SQL Tuesday (brainchild of Adam Machanic), Kendra Little has invited us this month to talk about interview patterns/anti-patterns.
https://littlekendra.com/2017/08/01/tsql-tuesday-93-interviewing-patterns-anti-patterns/?utm_campaign=twitter&utm_medium=twitter&utm_source=twitter

Before my current role which I’ve been in for just over 5 years, I spent most of my career as a contractor, performing shorter roles that lasted from 2 weeks to 2 years. As such I was frequently interviewing several times a year and I’ve often sat on the other side if the table as well.

In my experience it’s been far more often to find bad, arrogant or disrespectful interviewers than the other way round. My biggest advice to candidates is to know your worth, realise that the interview process is as much for you to find out whether this is going to be a decent place to work as the other way round.

You’re not obligated to accept a job once it’s offered, and if you’ve had to put up with a lot of rubbish during the application process then seriously consider if it’s just going to get worse once you’re in the role.

As an employer, you want the best people, particularly in IT where people are the company’s biggest asset. You need to make sure that your recruitment process is not just about assessing the candidates, but also showing them that your company is going to be a great place to work.

If you do want to put people off then here’s some of the interviewer “anti-patterns” I’ve experienced over my career that made me ask the question “Would I really want to work at this place?”

Maybe hitting one or two is okay, but beyond that you need to consider your self respect.

Expecting a candidate to fill in a lengthy application form

What’s wrong with a CV? Why is your company so special that I need to sit there for hours with some awful form in Word? You know what? I don’t think I’m that bothered.

Not responding to my application

I sent you my CV and you didn’t even acknowledge it. Or you did and said I’d hear whether I’d been selected by next week, a month and a half passes and then suddenly you decide you want to interview me. You know what, I’ve probably already found something else and even if I haven’t you’re not coming across as massively competent.

Turning up late for the interview

I’ve taken time out of my busy schedule to come see you guys. I arrive at the interview on time – actually I make sure I’m early – but you keep me waiting half an hour. Obviously your time is more important than mine. Okay there may have been a crisis you had to deal with, or is it just chaos all the time? Not sure I’m getting inspired to work here. Oh, and how would you have reacted if I was that late?

Prove how much better than the candidate you are by asking overly specific questions that few people know the answer to

You get your technical boffin in to review my technical skills, but rather than trying to find out what I know they ask me the most specific esoteric technical questions possible – “What sampling algorithm does SQL use when updating statistics?” Clearly their main interest is in trying to prove they know more than me. Is the rest of the team like this? Not sure I want to work with them. You know what, we all know different things and no-one knows everything. How about discussing some technical scenarios and seeing what I come up with.

Asking standard/stupid questions

“What’s my biggest weakness?” No-one is going to tell you they’re a lazy alcoholic. Of course we’re all perfectionist workaholics. All you learn from asking this sort of question is whether someone has learnt the standard answers. It’s just wasting everyone’s time.

Requiring visit after visit for subsequent interviews and tests

You got me in for technical test, then you asked me to come back for an interview. Then come back again for an interview with HR, then yet again to meet a member of the senior management team. Why couldn’t you at least do it all on one day? Apart from the fact that my time is precious too, by the time we get to the end of this long drawn out process I’ve already found a better job.

Expecting candidate to provide free work

Before the interview you’d like me to prove my technical skills by creating some functionality for you. I’ll just need to give up my weekend to do this free work for you. I’m not even got the job yet and you want me to do unpaid overtime. Goodbye!

Not giving a response or feedback when you said you would

We had the interview on Monday and you said I’d hear either way by the end of Wednesday. It’s Friday now and I’ve had no response. Once again you’re not inspiring me to believe this is going to be a great place to work when you don’t honour your basic promises.

This last one is a bit specific, but it did happen to me…

Give someone a C# test when they come for a SQL job, and then when challenged on it insist that they attempt it anyway as you think “it will still have some value.”

I’m ashamed to this day that I didn’t just walk out at that point!

Manually updating Statistics

August 1, 2017August 2, 2017 ~ Matthew McGiffen ~ 2 Comments

Even though SQL Server automatically updates statistics in the background for you, you may find there are times when you want to manage updating them yourself.

You may have large tables and find that the interval between the automatic updates is too big and is resulting in sub-optimal query plans.
You might need timely updates to a specific object – maybe as part of an ETL process to make sure that statistics are up to date after a particular part of the process, perhaps after a daily load into a table.
You may find that the automatic updates look at too small a sample size and you need to scan more of the table to enable accurate estimates for your queries.

My previous post on the Ascending Key problem demonstrated a common issue where the first of these scenarios could be affecting you. We’ll look at examples of the other scenarios in subsequent posts.

For now though, let’s just look at how you go about updating statistics.

UPDATE STATISTICS

At the heart of all the methods we’ll look at is the UPDATE STATISTICS command. There are a lot of options for using this command, but we’ll just focus on the ones you’re most likely to use. For full documentation here is the official reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/update-statistics-transact-sql

You can use this at the table level in which case all statistics objects belonging to that table will be updated:

UPDATE STATISTICS dbo.Test;

Or you can specify the specific statistics object you want to update:

UPDATE STATISTICS dbo.Test _WA_Sys_00000002_3AD6B8E2;

FULL SCAN or SAMPLE

The other thing you may be likely to want to specify is whether the statistics should be updated using a full scan of the table, or just be looking at a sample of the rows. In the above examples we didn’t specify this so SQL Server will decide for us. In general sampling (as opposed to full scans) kicks in when we have about 8MB of data in the table (or about 1000 pages).

If you want to specify a full scan the syntax is as follows:

UPDATE STATISTICS dbo.Test _WA_Sys_00000002_3AD6B8E2 WITH FULLSCAN;

If you want the statistics update to use sampling (more on how this works in subsequent posts) then you can choose to specify a percentage of the total table to be sampled:

UPDATE STATISTICS dbo.Test _WA_Sys_00000002_3AD6B8E2 WITH SAMPLE 10 PERCENT;

Or you can specify a fixed number of rows:

UPDATE STATISTICS dbo.Test _WA_Sys_00000002_3AD6B8E2 WITH SAMPLE 10000 ROWS;

You might want to use a sample as once your tables get large full scans can take a little time. Equally though if you’re updating statistics in a quieter time (e.g. overnight) you may feel you can afford the extra time for the full scans. Here’s some comparison figures I produced on my local machine showing how long full scans take. Obviously this will change depending on your hardware and other factors including how much of the table is already in memory:

FullScanTimes

You can see however that the time taken pretty much scales linearly as the table size increases.

One thing to be aware of is parallelism. A full scan can run as a parallel operation on your server which can speed it up considerably. When you update statistics using a sample however this can only run single-threaded unless you’re on SQL Server 2016 (or higher). Sampling data to build statistics in parallel was one of the many excellent little improvements in 2016.

sp_UpdateStats

This is another method you might use for manually updating statistics (perhaps as part of a scheduled maintenance job). This system stored procedure can be used for updating all of the statistics objects in a database:

USE Test;
EXEC sp_UpdateStats;

This stored procedure iterates through your database using a WHILE loop and executes the UPDATE STATISTICS command as it goes. One nifty thing about using this procedure is that it only updates statistics objects where rows have changed, so you don’t have any overhead for refreshing statistics where the underlying data hasn’t been modified. You can see this from this extract of the output of the stored procedure:

Updating [dbo].[TestMemory]
[PK__TestMemo__3214EC070D799003], update is not necessary…
0 index(es)/statistic(s) have been updated, 1 did not require update.

Updating [dbo].[TestMemory2]
[PK__TestMemo__3214EC07D3DC52DE], update is not necessary…
0 index(es)/statistic(s) have been updated, 1 did not require update.

Updating [dbo].[Test]
[PK_Test] has been updated…
[IX_Test_TextValue] has been updated…
2 index(es)/statistic(s) have been updated, 0 did not require update.

Ola Hallengren

Of course, if you’re looking to implement statistics update as part of regular maintenance, then you should definitely be considering using Ola Hallengren’s maintenance solution. Ola maintains a great solution for managing database backups and integrity checks as well index and statistics maintenance, and he shares it with the SQL community for free.

You can find full details of Ola’s solution and download it here:
https://ola.hallengren.com/

Matthew McGiffen Data

Month: August 2017