This isn’t a technical post about databases, but rather a discussion of a statistical paradox that I read about recently. Statistics and data often go hand in hand, and many of us who work with data often use statistics in our work – particularly if we cross over into BI, Machine Learning or Data Science.
So, let’s state the problem.
I think I have an average number of friends, but it seems like most people have more friends than me.
I can see this on facebook – even though facebook friendship isn’t the same thing as the real kind. A quick google tells me that the average number of facebook friends is 338 and the median is 200. My number is 238. That’s less than average but greater than the median – so according to those figures I do have more friends than most people (on facebook at least).
I could probably do with culling some of those though…
I’m going to pick ten of those facebook friends at random. I use a random number generator to pick a number from 1 to 20 and I’m going to pick that friend and the 20th friend after that, repeating until I get 10 people.
I get the number 15 to start so let’s start gathering some data. How many friends does my 15th friend have, my 35th, my 55th, 75th, 95th 115th etc.
Here’s the numbers sorted lowest to highest:
If I look at that list only two people have less friends than me – or 20%. 80% have more friends than me. Why do I suddenly feel lonely \ How can this be?
If I have more friends than the median, then I should be in the second half.
Friendship is based on random accidents, and I’ve picked people out of my friends list at random. Surely I’ve made a mistake.
I could repeat the sampling but I’m likely to get similar findings.
The answer is that it’s not all random, or at least not evenly so. The person with over 4,000 friends is over 40 times as likely to know me as the person with less than 100. Maybe they get out a lot more (actually they’re a musician).
I’m more likely to know people if they have a lot of friends than if they have fewer.
This is difficult to get your head round, but it’s important if you’re ever in the business of making inferences from sampled data. It’s called “The Inspection Paradox”.
It’s also one of many reasons why people may get feelings of inferiority from looking at social media.
You can find a lot more examples and explanations in this post:
A few years back I started running regular SQL workshops in my workplace. Teaching beginners the basics of querying databases with SQL, as well as more advanced topics for the more advanced.
During one session we were discussing the issue of knowledge acquired being quickly lost when people didn’t get the chance to regularly practice what they’d learnt. One of the attendees suggested that I should be assigning them homework.
I could see from the faces of everyone else present that the word “homework” struck an unpleasant chord. Perhaps reminding them of school days struggling to get boring bookwork done when they’d rather be at relaxation or play.
Okay, so homework maybe wasn’t going to go down well, but I figured everyone likes a good puzzle. So every Friday I started creating and sharing a puzzle to be solved using SQL. This went on for the best part of a year, then other things got in the way and gradually I stopped.
This is my invitation to you this T-SQL Tuesday. Write a blog post combining puzzles and T-SQL. There’s quite a few ways you could approach this, so hopefully no-one needs be left out for lack of ideas:
Present a puzzle to be solved in SQL and challenge your readers to solve it.
Or give us a puzzle or quiz about SQL or databases.
Show the SQL solution to a classic puzzle or game.
Provide a method for solving a classic sort of querying puzzle people face.
Show how newer features in SQL can be used to solve old puzzles in new ways.
Tell us about a time you solved a problem or overcame a technical challenge that was a real puzzle.
Or just make your own interpretation of “puzzle” and go for it!
There’s some great stuff out there already. Itzik Ben-Gan’s done a bunch of them. There’s Kenneth Fisher’s crosswords. The SQL Server Central questions of the day. Pinal Dave’s SQL Puzzles. And there’s a few on my blog too if you take a look back:
As a quick reminder, when you have a 3 x 3 grid, a magic square is one where each row, column or diagonal add up to the same value. e.g.
All the rows, all the columns and both diagonals each add up to 15.
In the previous puzzle I asked you to write SQL to find all the possible 3 x 3 magic squares. After originally coming across that idea I discovered all sorts of variants on the magic square theme.
This time, I’ll ask you to find write SQL to find just one 3 x 3 magic square, but in this case one composed only of prime numbers? They don’t have to be consecutive primes (though they do all have to be different), and to make things easier you can treat the number 1 as a prime if you want.
I’ve not done a SQL puzzle for a while so thought it was getting overdue…
I set this one for my workmates a while ago and people found it quite fun.
The Beale Papers are a set of three ciphertexts which allegedly reveal the details of a buried treasure. The tale is that the treasure was buried by Thomas J. Beale in Bedford County, Virginia, USA back in the 1820s. He entrusted a box containing the ciphered messages to the care of a local innkeeper and then disappeared never to be seen again – his intent was that the papers could be passed on to his descendants and they could decode the messages and claim their rightful inheritance. The innkeeper passed the ciphertexts on to a friend before he died, and said friend spent the rest of his life trying to decipher them. He succeeded on the second message but failed on the others.
Since the 19th century many attempts have been made to decipher the other messages and find the treasure. Even without cracking the code, treasure hunters have got out their shovels over the years to dig up parts of Bedford County without finding anything (as far as we know). Recently there are some claims to have solved the puzzle, but that the treasure was already gone.Of course there’s a good chance the whole thing is a hoax!
Your task is to repeat the deciphering of the second message, but to make it easier I’ll tell you the key.
Beale used an existing text as a key for the cipher in which he assigned a number to each word, 1 for the first word, two for the second and so on.
Then for each letter in the text he wanted to encode he randomly picked a word in the key that started with that letter, and then he enciphered the letter as the number for that word. Make sense?
For instance, if the key text was “now is the time” and the plain text is “tin”, then either (3 2 1) or (4 2 1) are valid encipherments.
The key text Beale used for encoding the second message was the United States declaration of independence. I’m going to leave you to source your own copy of this. Be aware that some of the real world issues with this task that you may come up include the fact that Beale may have made some inaccuracies in his ciphering, and the copy of the key text he used to code his text is unlikely to be exactly the same as the one you use. As such you may need to do some tweaking, spelling correction and perhaps logic to work out what some of the words are. Good luck if you choose to attempt this!
So, starting with the two variables below, write some SQL to decode the message (@CipherText ):
Given the standard 8×8 chessboard, place an equal number of Knights and Queens such that no piece is attacked by another. What’s the maximum number of Knights and Queen’s you can place? Obviously you should use SQL to find the answer!
I’m sure you all know how a knight moves and attacks in chess, but just in case here’s a diagram:
The white circles show each possible move for the knight, i.e. it can move two squares in one direction and then 1 square perpendicular to that (or vice-versa). An “L” shape.
This puzzle was first proposed in 1848 by a composer of chess puzzles called Max Bezzel and has since spawned much analysis and many variants. Simply phrased, it goes as follows:
“Can you place 8 queen’s on a standard (8×8) chessboard so that no two queen’s threaten each other?”
(Just in case, I’ll remind you a queen can move and attack any number of squares in a straight line – horizontally, vertically, or diagonally)
Here’s one example solution:
By all means have some fun trying to find extra solutions to this with pencil and paper or chessboard, however we do have computers (and SQL!) these days, so this month’s challenge is to try find ALL the possible solutions to this puzzle using T-SQL.
I’m a bit of a closet chess enthusiast, so expect more in this theme!
This is the first in what I hope will be a semi-regular series of recreational puzzles where SQL can be used to find the answer. I set these puzzles on occasional Fridays in my workplace, and as I now have quite an archive I thought I should start sharing them with a wider audience.
I came up with the idea for this puzzle involving Magic Squares by accident while trying to do something different – and getting it wrong! For a moment I thought I’d had an original idea but then it was “Hang on – I’m sure I must have seen this before.”
A bit of googling and that thought was proved right – I’m just not that original. In fact the first documented instance of these appears to be the Chinese legend of Lo Shu – possibly dating back as far as 650BC. I was only beaten by the best part of three millennia…
You may have seen these before at some point – but hopefully long enough ago that you don’t know all the answers! Possibly like me they ring a bell for you but you can’t remember exactly where from.
A Magic Square is a square grid e.g. 1×1, 2×2, 3×4, 4×4…. or n x n, where numbers are placed in each square in the grid such that if you add up the numbers in each column, row, or diagonally they all add up to the same total.
Usually you use consecutive numbers starting from 1 (placing each number just once), so in the 2 x 2 grid you would place the numbers 1 to 4, in the 3 x 3 the numbers 1 to 9 – and so on.
Here’s the 1 x1 grid:
I thought I’d be generous and give you that one for free 😉
So my first question is can you find a solution (with the numbers 1 to 4) to the 2 x 2 grid:
Having explored that possibility (not for too long I hope!) the meat of this puzzle is to find solutions to the 3 x 3 grid using the numbers 1 to 9:
In fact, using SQL, can you not just find one solution, but all the possible solutions? How many are there?