Why Does Everyone Have More Friends Than You?

No, the answer isn’t because you’re a DBA.

This isn’t a technical post about databases, but rather a discussion of a statistical paradox that I read about recently. Statistics and data often go hand in hand, and many of us who work with data often use statistics in our work – particularly if we cross over into BI, Machine Learning or Data Science.

So, let’s state the problem.

I think I have an average number of friends, but it seems like most people have more friends than me.

I can see this on facebook – even though facebook friendship isn’t the same thing as the real kind. A quick google tells me that the average number of facebook friends is 338 and the median is 200. My number is 238. That’s less than average but greater than the median – so according to those figures I do have more friends than most people (on facebook at least).

I could probably do with culling some of those though…

I’m going to pick ten of those facebook friends at random. I use a random number generator to pick a number from 1 to 20 and I’m going to pick that friend and the 20th friend after that, repeating until I get 10 people.

I get the number 15 to start so let’s start gathering some data. How many friends does my 15th friend have, my 35th, my 55th, 75th, 95th 115th etc.

Here’s the numbers sorted lowest to highest:

93
226
239
319
359
422
518
667
1547
4150

If I look at that list only two people have less friends than me – or 20%. 80% have more friends than me. Why do I suddenly feel lonely \ How can this be?

If I have more friends than the median, then I should be in the second half.

Friendship is based on random accidents, and I’ve picked people out of my friends list at random. Surely I’ve made a mistake.

I could repeat the sampling but I’m likely to get similar findings.

The answer is that it’s not all random, or at least not evenly so. The person with over 4,000 friends is over 40 times as likely to know me as the person with less than 100. Maybe they get out a lot more (actually they’re a musician).

I’m more likely to know people if they have a lot of friends than if they have fewer.

This is difficult to get your head round, but it’s important if you’re ever in the business of making inferences from sampled data. It’s called “The Inspection Paradox”.

It’s also one of many reasons why people may get feelings of inferiority from looking at social media.

You can find a lot more examples and explanations in this post:

https://towardsdatascience.com/the-inspection-paradox-is-everywhere-2ef1c2e9d709

Instantaneous Transaction Rollback with SQL 2019

If you’ve read about the Accelerated Database Recovery feature in SQL Server 2019 you could be forgiven for thinking it’s just about speeding up database recovery time in case of a server failure.

In fact, enabling it also means that where you have a long running transaction that fails or is cancelled the rollback is almost instantaneous. This is great news for DBAs who have to sometimes kill a long-running blocking transaction but worry that it may take a long time to rollback – continuing to block all that time.

This is achieved by the fact that Accelerated Database Recovery maintains a version store in the database, and where a row is updated, the old version of the row is kept until after the transaction is complete. That makes it quick and easy to revert to the old version in case of failure.

Let’s look at a quick example.

I have a table with about 10 million rows – all containing the same text value:

CREATE DATABASE TestADR;
USE TestADR;

CREATE TABLE dbo.TestADR(Id int IDENTITY, SomeText varchar(50));

INSERT INTO dbo.TestADR (SomeText)
SELECT TOP 10000000 'FrangipanDeluxe' 
FROM sys.objects a, sys.objects b, sys.objects c, sys.objects d;

I update all the rows in the table to a new value:

UPDATE dbo.TestADR SET SomeText = 'FrangipanDeluxA';

This took about a minute.

I then execute a query to change them back and cancel the query in SSMS after about 30 seconds.

UPDATE dbo.TestADR SET SomeText = 'FrangipanDeluxe';

It took about 30 seconds more to cancel – which is SQL rolling back the changes.

Then I enabled Accelerated Database Recovery, you do this at the database level:

ALTER  DATABASE TestADR 
SET ACCELERATED_DATABASE_RECOVERY = ON;

Now I re-run that last update, again cancelling after 30 seconds.

This time the cancel was instantaneous, it took SQL no noticeable amount of time to roll back the changes.

This is great but we’ll probably want to be careful before we enable it on all our databases – when we get them onto SQL 2019 anyway. There will be an additional overhead in managing the version store and that could have an impact in terms of time taken to complete write queries, as well as storage requirements.

Still, it seems like a good feature  – something to look forward to playing with more.

Check Query Progress with Live Query Stats

This is something I touched on back in 2017 a little after the Live Query Statistics feature was introduced with SQL 2016, but I was using the functionality this morning and felt like it was worth a reminder.

https://matthewmcgiffen.com/2017/02/23/livequerystats/

You can use Live Query Stats to check on the progress of an executing query – and you can do it through the GUI in SSMS.

I created a long running query for a demo, and after 15 minutes I was still waiting for it to finish. That was a bit longer than I intended. Should I kill it – or did I just need to wait a few more minutes for it to complete.

You can check this quickly via the Activity Monitor:

Find the query you are interested in in the processes list:

Right-click and select “Show Live Execution Plan”. That will show you something like this:

I can see from this that my query is about 83% complete, so maybe I’ll just wait a little longer. Note that this is a live view, so the numbers keep updating. If I want I can watch the progress.

This is against a SQL 2019 instance and is the out of the box behaviour. Before SQL 2019 you had to enable trace flag 7412 if you wanted this to work:

DBCC TRACEON(7412,-1);

SQL Server Wishlist

This month for T-SQL Tuesday Kevin Chant asks us what our fantasy SQL feature would be.

https://www.kevinrchant.com/2019/09/03/t-sql-tuesday-118-your-fantasy-sql-feature/

I think it’s appropriate to give a shout-out to Microsoft at this point, because over the last few releases they’ve given us some of the items that are top of my list.

Recommending and setting MAXDOP during setup (coming with SQL 2019) will hopefully mean I no longer have to have arguments about why the out-of the-box setting isn’t appropriate.

The same with setting max memory in the setup (also with SQL 2019).

A more verbose error where string or binary data might be truncated – we got that in SQL 2017.

It’s the little things like these that make me happy – and make my job easier.

A few other little things I’d like – though I accept that something small to describe isn’t always small in its execution…

A change to the default cost threshold for parallelism – be it 20, be it 30, be it 50. I’d prefer any of those to 5.

I think it would also be great to have a cardinality optimizer hint, e.g. OPTIMIZE FOR 5 ROWS. Oracle has this, and it’s not good having to feel jealous of those working on the dark side 😉 You can do the equivalent but it’s convoluted and not clear what’s going on when the uninitiated see the code:

https://dba.stackexchange.com/questions/168726/sql-server-cardinality-hint

There is one big thing I’d like – but it’s never going to happen. Get rid of Enterprise Edition – or rather, make Enterprise Edition the standard. Enterprise is comparatively so expensive, it’s rare I’m going to recommend it. It’s interesting to see that in Azure SQLDB we only have one edition to work with – I’d love to see that in the box product. I understand that change would be a massive revenue loss so can’t see it happening.

If not that though, if we could just have at-rest encryption (i.e. TDE) in the Standard Edition that would make me very happy. In these days of security and privacy consciousness it seems that should be a core  functionality.

UPDATE: TDE is going to be available on Standard Edition from SQL Server 2019. I get my wish!

Finally, I’d just like to upvote Brent’s idea that it would be great to be able to restore just a single table.

That all said, I’d like to go back to my first point. I love what MS is doing with SQL Server, and the continual improvements that are being made. I particularly love that we sometimes get them for free in service packs and cumulative updates – without having to upgrade to a new version.

Keep it coming!

The 99 Flavours of SQL Server are all Vanilla

There are so many options for running SQL Server now. It can feel a little intimidating. 

You’ve spent years gaining experience with SQL running on Windows Server. On bare metal or in a VM. Locally or in a data centre.

Now people are talking about SQL on Linux. SQL on VMs in the cloud. SQL Server Platform as a Service (PaaS) offering such as Azure SQL DB, Azure Managed Instance, or Amazon RDS. 

Running SQL in a container on Docker or Kubernetes. 

It’s easy to feel like you’re at risk of getting left behind. Feel like you’ve got to start learning from the beginning all over again.

The good news is that whatever the flavour – it’s still SQL Server. For everything that’s different there are a hundred things that are still the same.

Database design, writing and executing queries and stored procedures – it’s still the same.

Query tuning, indexing, looking at execution plans. All the same.

Even administration, many of the core concepts are the same, understanding how security works, backups, high-availability. The main difference is often that some of these might be taken care of for you and you don’t need to worry about them any more.

Caveat – you still need to worry about them a bit!

The point is, most of what you already know, the experience you have gained over the years, is still totally valid. Learning about SQL Server on a new platform may feel like a big learning curve, but in reality, the new stuff you need to get to grips with is small compared to all the stuff you already know.

And in some cases, the skills you already have become even more valuable. People might not care if your query tuning on physical kit takes your CPU down from 50% to 10%. But tell them you’ve just reduced their cloud bill by 80% and they really care!

So don’t be intimidated, and don’t feel you need to learn every flavour. Have a play with SQL Server in the cloud, have a play with containers, set up SQL on Linux. You’ll quickly find it’s not that hard, and once it’s running – it’s pretty much the same as ever.

And remember, if someone comes to you with a question about why SQL is running slow, or why a query isn’t doing what they want – on RDS, Docker, Linux, or whatever. You don’t need to know that platform inside out to be able to help, you already know SQL Server and that’s the important bit. 

To paraphrase a popular lyric :

If you’ve got SQL problems I can help you son. I’ve got 99 problems but SQL aint one.

Query Store stores dates as UTC

One of the powerful aspects of Query Store is the ability to directly query the DMVs for details of historical executions and performance.

A key view for this is sys.query_store_runtime_stats (but also sys.query_store_runtime_stats_interval). 

If you’re querying these views to look at what was happening during a particular time period then it’s important to understand that the dates and times are stored according to UTC time (which is equivalent to GMT with no adjustment for daylight savings times).

You can see this if you look at a few rows from the view:

SELECT runtime_stats_id, first_execution_time, last_execution_time
FROM sys.query_store_runtime_stats;

Though it will be more obvious to you if you’re in a time zone other than the UK.

The datetimes are stored as DATETIMEOFFSET which you can see from the +00:00 at the end of each entry. DATETIMEOFFSET allows you to store both the datetime as well as the timezone being used. 

This means that if you’re querying against these columns you need to convert the values you are looking for to UTC first. You can do that my making sure you use GETUTCDATE() instead of GETDATE(), or if you are using specific local times you can use the AT TIME ZONE function e.g.

SELECT CAST('2019-08-21 11:50:40.400 +9:00' AS datetimeoffset) AT TIME ZONE('UTC');

I’m a big fan of storing all datetimes as the UTC version. It avoids things going out of sequence due to daylight saving changes – and can be helpful in mitigating problems when you have application users in multiple countries accessing the same database.

I’ll admit I might be biased though as – being based in the UK – UTC is the same as my local time half the year.

Find queries with high memory grants – using Query Store

I had a server that looked like it had been suffering from memory contention. I wanted to see what queries were being run that had high memory requirements. The problem was that it wasn’t happening right now – I needed to be able to see what had happened over the last 24 hours.

Enter Query Store. In the run-time stats captured by Query Store are included details relating to memory.

I decided to use the column max_query_max_used_memory from sys.query_store_runtime_stats. In books online this is defined as:

Maximum memory grant (reported as the number of 8 KB pages) for the query plan within the aggregation interval.

I took my script from the following post https://matthewmcgiffen.com/2017/11/01/capture-the-most-expensive-queries-across-your-sql-server-using-query-store/ and modified it to look just at this metric.

Here’s the script, it collates figures across all databases that have Query Store enabled and returns the top 50 queries with the highest memory grants. This is looking over the last 24 hours, but you can easily modify that to look at details for any interval you are interested in:

--Gather and report on most memory hungry queries
DECLARE @Reportinginterval int;
DECLARE @Database sysname;
DECLARE @StartDateText varchar(30);
DECLARE @TotalExecutions decimal(20,3);
DECLARE @TotalDuration decimal(20,3);
DECLARE @TotalCPU decimal(20,3);
DECLARE @TotalLogicalReads decimal(20,3);
DECLARE @SQL varchar(MAX);

--Set Reporting interval in days
SET @Reportinginterval = 1;

SET @StartDateText = CAST(DATEADD(DAY, -@Reportinginterval, GETUTCDATE()) AS varchar(30));

--Cursor to step through the databases
DECLARE curDatabases CURSOR FAST_FORWARD FOR
SELECT [name]
FROM sys.databases 
WHERE is_query_store_on = 1
  AND state_desc = 'ONLINE';

--Temp table to store the results
DROP TABLE IF EXISTS #Stats;
CREATE TABLE #Stats (
   DatabaseName sysname,
   SchemaName sysname NULL,
   ObjectName sysname NULL,
   QueryText varchar(1000),
   MaxMemoryGrantMB decimal(20,3)
);

OPEN curDatabases;
FETCH NEXT FROM curDatabases INTO @Database;

--Loop through the datbases and gather the stats
WHILE @@FETCH_STATUS = 0
BEGIN
    
    SET @SQL = '
	   USE [' + @Database + ']
	   INSERT INTO #Stats
		SELECT
			DB_NAME(),
			s.name AS SchemaName,
			o.name AS ObjectName,
			SUBSTRING(t.query_sql_text,1,1000) AS QueryText,
			(MAX(rs.max_query_max_used_memory)/128) AS MaxMemoryMB
		FROM sys.query_store_query q
		INNER JOIN sys.query_store_query_text t
			ON q.query_text_id = t.query_text_id
		INNER JOIN sys.query_store_plan p
			ON q.query_id = p.query_id
		INNER JOIN sys.query_store_runtime_stats rs
			ON p.plan_id = rs.plan_id
		INNER JOIN sys.query_store_runtime_stats_interval rsi
			ON rs.runtime_stats_interval_id = rsi.runtime_stats_interval_id
		LEFT JOIN sys.objects o
			ON q.OBJECT_ID = o.OBJECT_ID
		LEFT JOIN sys.schemas s
			ON o.schema_id = s.schema_id     
		WHERE rsi.start_time > ''' + @StartDateText + '''
		GROUP BY s.name, o.name, SUBSTRING(t.query_sql_text,1,1000)
		OPTION(RECOMPILE);'

    EXEC (@SQL);

    FETCH NEXT FROM curDatabases INTO @Database;
END;

CLOSE curDatabases;
DEALLOCATE curDatabases;

--Report Results
SELECT TOP 50
	DatabaseName,
	SchemaName,
	ObjectName,
	QueryText,
	MaxMemoryGrantMB
FROM #Stats
WHERE QueryText not like 'INSERT INTO #Stats%' --Exclude current query
ORDER BY MaxMemoryGrantMB DESC;

DROP TABLE #Stats;

Adding a Column with a Default Constraint

Last week a question came up about adding a column to a table, and giving that column a default constraint. Would that default value be assigned to all existing rows, and how much processing would be involved.

Unsurprisingly, the answer is that – “it depends”.

I’ve got a table with about a million rows that just has an identity column and a text column I’ve populated from sys.objects:

CREATE TABLE dbo.TestAddColumn (Id int IDENTITY(1,1), TextValue sysname);
INSERT INTO dbo.TestAddColumn(TextValue)
SELECT a.name FROM sys.objects a, sys.objects b, sys.objects c;

Let’s add a nullable bit column and give it a default value of Zero:

ALTER TABLE dbo.TestAddColumn
ADD NewFlag bit NULL CONSTRAINT DF_TestAddColumn_NewFlag DEFAULT 0;

If I look at the table I can see that the new column contains Null values:

i.e. the default value that I’ve defined hasn’t been assigned for existing rows.

I remove the column and the default constraint:

ALTER TABLE dbo.TestAddColumn
DROP CONSTRAINT DF_TestAddColumn_NewFlag;
ALTER TABLE dbo.TestAddColumn DROP COLUMN NewFlag;

Now let’s add the same column but we’ll disallow Null values:

ALTER TABLE dbo.TestAddColumn
ADD NewFlag bit NOT NULL CONSTRAINT DF_TestAddColumn_NewFlag DEFAULT 0;

If we check the table again:

This time we can see that the default value has been assigned.

So whether our default value gets assigned to existing rows depends on whether your column is nullable or not, a nullable column will retain Null as the value. A non-nullable column will get assigned the new default value.

If you want to override that behaviour, and have your default assigned even where the column is nullable, you can use the WITH VALUES statement. First I’ll remove the constraint and column then add it again with values:

ALTER TABLE dbo.TestAddColumn
ADD NewFlag bit NULL CONSTRAINT DF_TestAddColumn_NewFlag DEFAULT 0 WITH VALUES;

We look at the data again:

You can see that the value has been assigned even though the column is Nullable.

One neat thing to note, is the performance impact when carrying out these actions.

Each time I added the column I captured the execution overhead using:

SET STATISTICS IO, TIME ON;

In all cases the resource usage measured was Zero. The Add Column operation was a meta-data only operation – no data in the table needed to be updated – even where the new column was assigned a value.

This was some clever jiggery-pokery added in SQL Server 2012 .

Thanks for reading!

T-SQL Tuesday #114 Puzzle Party – Roundup

Here’s my round-up for this month’s T-SQL Tuesday.

Thanks to everyone who contributed last week. It was great reading your posts and seeing the different ways you interpreted the puzzle theme.

We had real-life problems, we had SQL coding questions, we had puzzles, we had solutions, we had games, and we had the imaginarium.

Hopefully I’ve included all the posts, if I’ve missed any – please yell and it will be corrected soonest possible!

Without further ado, here are the contributions…

Real-life problems

Michal Poreba presents us with a real-life performance tuning issue he faced. Can you solve it? https://dbain.wales/2019/05/14/the-sql-puzzle-party/

Kevin Chant shares an issue he faced with a SQL Agent job using PowerShell. Can you find another way of fixing it? https://www.kevinrchant.com/2019/05/14/t-sql-tuesday-114-powershell-puzzle/

Eugene Meidinger shows a real-life SQL puzzle from his own demos with heaps that he’d love an answer to… https://www.sqlgene.com/2019/05/14/t-sql-tuesday-114-an-unsolved-sql-puzzle/

Coding Questions

Kathi Kellenberger tells us about a deceptively tricky SQL question posed at a family party… https://auntkathisql.com/2019/05/14/may-2019-t-sql-tuesday-puzzle/

Jon Shaulis gives us a problem he was posed in a recruitment scenario, along with a detailed solution that demonstrates how he approaches finding the best solution… https://jonshaulis.com/index.php/2019/05/14/t-sql-tuesday-114-puzzle-party/

Kenneth Fisher asks us to solve some query problems using older versions of SQL Server – so we can appreciate just how much easier our lives are these days… https://sqlstudies.com/2019/05/14/how-was-it-done-back-in-the-day-t-sql-tuesday-114/

Games and Puzzles

Nate Johnson challenges us to write a SQL solution to help evaluating your hand in the card game cribbage… https://natethedba.wordpress.com/2019/05/14/t-sql-tuesday-114-a-puzzle/

Jason Brimhall supplies his solution for solving Sudoku puzzles using T-SQL as well as details of a database he uses to provide daily trivia questions to those he mentors in learning SQL… http://jasonbrimhall.info/2019/05/14/puzzles-and-daily-trivia/

Shane O’Neill takes us on a Loller-Coaster ride… https://nocolumnname.blog/2019/05/14/t-sql-tuesday-114-puzzle-party/

Steve Jones shares some puzzle resources from SQL Server Central and other sources as well as presenting his T-SQL solution to a Christmas coding question… https://voiceofthedba.com/2019/05/14/puzzles-t-sql-tuesday-114/

Is Bert Wagner a video star? The magic 8-ball says “Without a doubt!” https://bertwagner.com/2019/05/14/sql-server-magic-8-ball/

Rob Farley presents puzzles he’s encountered and played with before and reminds us to never stop puzzling if we want to keep our minds sharp! http://blogs.lobsterpot.com.au/2019/05/14/puzzling-times/

The Imaginarium

Finally, in a category on his own, Todd Kleinhans talks about his SQL Imaginarium project. A VR experience for people new to SQL Server: https://toddkleinhans.wordpress.com/2019/05/14/project-sqlimaginarium-1-year-update/

Thanks again everyone and I look forward to next month’s episode of T-SQL Tuesday!

T-SQL Tuesday #114 – Puzzle Party

A few years back I started running regular SQL workshops in my workplace. Teaching beginners the basics of querying databases with SQL, as well as more advanced topics for the more advanced.

During one session we were discussing the issue of knowledge acquired being quickly lost when people didn’t get the chance to regularly practice what they’d learnt. One of the attendees suggested that I should be assigning them homework.

I could see from the faces of everyone else present that the word “homework” struck an unpleasant chord. Perhaps reminding them of school days struggling to get boring bookwork done when they’d rather be at relaxation or play.

Okay, so homework maybe wasn’t going to go down well, but I figured everyone likes a good puzzle. So every Friday I started creating and sharing a puzzle to be solved using SQL. This went on for the best part of a year, then other things got in the way and gradually I stopped.

This is my invitation to you this T-SQL Tuesday. Write a blog post combining puzzles and T-SQL. There’s quite a few ways you could approach this, so hopefully no-one needs be left out for lack of ideas:

  • Present a puzzle to be solved in SQL and challenge your readers to solve it.
  • Or give us a puzzle or quiz about SQL or databases.
  • Show the SQL solution to a classic puzzle or game.
  • Provide a method for solving a classic sort of querying puzzle people face.
  • Show how newer features in SQL can be used to solve old puzzles in new ways.
  • Tell us about a time you solved a problem or overcame a technical challenge that was a real puzzle.
  • Or just make your own interpretation of “puzzle” and go for it!

There’s some great stuff out there already. Itzik Ben-Gan’s done a bunch of them. There’s Kenneth Fisher’s crosswords. The SQL Server Central questions of the day. Pinal Dave’s SQL Puzzles. And there’s a few on my blog too if you take a look back:

https://matthewmcgiffen.com/2017/06/07/sql-puzzle-1-magic-squares/

https://matthewmcgiffen.com/2017/07/19/sql-puzzle-2-eight-queens/

https://matthewmcgiffen.com/2017/08/22/sql-puzzle-3-knights-and-queens/

https://matthewmcgiffen.com/2017/11/08/sql-puzzle-4-the-beale-papers/

https://matthewmcgiffen.com/2017/12/19/sql-puzzle-5-prime-magic/

Let’s puzzle together, trying to solve the challenges each other sets, and make it a real puzzle party!

Have fun all 🙂

The Rules

  1. Your post must be published on Tuesday 14th May. This counts as long as it’s still Tuesday anywhere in the world.
  2. Include the T-SQL Tuesday Logo and make it link to this invitation post.
  3. Pingbacks should work, but to be sure, add a comment to this post with a link to your own so I know where to find it.
  4. Tweet about your post using the #tsql2sday hashtag