No, the answer isn’t because you’re a DBA.
This isn’t a technical post about databases, but rather a discussion of a statistical paradox that I read about recently. Statistics and data often go hand in hand, and many of us who work with data often use statistics in our work – particularly if we cross over into BI, Machine Learning or Data Science.
So, let’s state the problem.
I think I have an average number of friends, but it seems like most people have more friends than me.
I can see this on facebook – even though facebook friendship isn’t the same thing as the real kind. A quick google tells me that the average number of facebook friends is 338 and the median is 200. My number is 238. That’s less than average but greater than the median – so according to those figures I do have more friends than most people (on facebook at least).
I could probably do with culling some of those though…
I’m going to pick ten of those facebook friends at random. I use a random number generator to pick a number from 1 to 20 and I’m going to pick that friend and the 20th friend after that, repeating until I get 10 people.
I get the number 15 to start so let’s start gathering some data. How many friends does my 15th friend have, my 35th, my 55th, 75th, 95th 115th etc.
Here’s the numbers sorted lowest to highest:
If I look at that list only two people have less friends than me – or 20%. 80% have more friends than me. Why do I suddenly feel lonely \ How can this be?
If I have more friends than the median, then I should be in the second half.
Friendship is based on random accidents, and I’ve picked people out of my friends list at random. Surely I’ve made a mistake.
I could repeat the sampling but I’m likely to get similar findings.
The answer is that it’s not all random, or at least not evenly so. The person with over 4,000 friends is over 40 times as likely to know me as the person with less than 100. Maybe they get out a lot more (actually they’re a musician).
I’m more likely to know people if they have a lot of friends than if they have fewer.
This is difficult to get your head round, but it’s important if you’re ever in the business of making inferences from sampled data. It’s called “The Inspection Paradox”.
It’s also one of many reasons why people may get feelings of inferiority from looking at social media.
You can find a lot more examples and explanations in this post: