If it looks random, it probably isn't
Publication date: 13 November 2012Originally published 2012 in Atomic: Maximum Power Computing
Last modified 13-Nov-2012.
Are you ready for another episode of Fun With Conditional Probability? Of course you are!
[See also: Probability and hard-drive failures here and here, probability as it applies to game-NPC dialogue and to second-hand smoke statistics, and this piece on transitive and nontransitive relationships. Pay attention, I'll be asking questions later.]
Suppose that there is some event that has the same chance of happening in any given period of time. Say, for instance, that it's a lightning strike near enough to your house to fry most or all of your electronics. Let's make the "given period of time" a day. Let's say the chance of a lightning strike on any given day is one in ten thousand, ten thousand days being 27.4 years. And, for the sake of simplicity, let's say that more than one strike in a day is impossible.
Now wait until lightning actually strikes. (You may be waiting decades.)
Now, what is the most likely next day when it will strike?
The obvious answer to this question is "there is no most likely day; the chance of a strike on a given day is 1/10,000."
That answer is wrong. The most probable day for the next strike is tomorrow.
I have the great advantage, in writing this, of just being able to assure you that this is the case and not having to try to explain it in eight different ways. But here's a basic way in:
How likely is it that the next strike will be ten million years from now?
Not very, obviously. You'd be surprised if the next strike took only fifty years to occur. It would be extremely amazing if lightning failed to strike for millions of years, if the daily probability of a strike is only one in ten thousand.
This is where the "conditional probability" thing kicks in. The "condition" required for the next strike to be ten million years from now is that lightning must not strike on every day in between. The chance of lightning not striking on any given day is 1 minus the chance that it will, which in this case means 1 minus 0.0001, giving a 0.9999 probability in the standard statistical form where 0 means impossibility and 1 means certainty.
On any given day, a 0.9999 probability of no lightning strike is very close to certainty. But if you look at a very long run of days, it becomes close to certain that the 0.0001-probability event will happen long, long before you've made it to even one million days, let alone ten million years.
This same conditional-probability argument, though, also applies to the day after tomorrow.
There's a 0.0001 chance of lightning tomorrow. But for the next strike to be the day after tomorrow then lightning must strike on that day, and not strike tomorrow. So the probability becomes 0.9999 for no strike tomorrow, times 0.0001 for a strike the next day. Which is 0.00009999.
This is only very slightly less than 0.0001, but it is less. The probability of the next strike - not any strike, but the next strike - occurring the day after tomorrow is thus very slightly lower than the probability of the next strike being tomorrow.
And the further you go into the future, the smaller the number gets. In a week it's about 0.00009994, in a year it's about 0.00009643, in ten years it's down to about 0.00006943 - until it becomes ridiculously small, millions of years in the future.
So the most likely day for the next lightning strike - whether or not it actually even struck today - is tomorrow. It's only a tiny bit more likely that the next strike will be tomorrow than that it will be the next day, but it is more likely.
At this point you may be wondering why I'm injuring your brain with this stuff. It's because this is a really important thing you need to know about the world. This statistical bias for chance events to happen closer to each other than seems intuitively likely means that all sorts of chance phenomena have "clusters" that people naturally think don't look very random at all.
We are surrounded at all times by things that have a somewhat random distribution in space and/or time. Computer hardware failures. Car crashes. Disease outbreaks. The distribution of stars in the sky. Individual kills, and personal and team victories, in all sorts of games, sports and real-world wars.
None of these things are entirely random - actually achieving true, robust randomness is surprisingly difficult. But all of them have a chance component. And the stronger that chance component is, the more clusters you'll see, and the easier it'll be to incorrectly attribute those clusters to some non-chance phenomenon.
"This "xxx_SupaFly69_xxx" dude's fragged me three times in a row! He must be stalking me!"
"Stars seem to clump together into constellations, rather than appear in a more evenly "random" scattering across the sky! That must mean something!"
"I've told the dealer to hit me on the last three hands, and each time that took me straight to 21! Better raise my bets, I've got hot hands tonight!"
(Note: This is exactly what you should do, but only if you're playing New Vegas and have a really high Luck.)
"My usually-hopeless favourite football team just had a four-match winning streak! Clearly their luck has turned around!"
"The average incidence of autism in the Western world is about six cases per thousand children, but this little town of 1,037 people has seventeen cases! This cannot possibly just be a coincidence!"
"That fellow charting where the V-1s and V-2s land on London says they're just following a "Poisson distribution", whatever that is, but if he knows where they're going to land, why won't he tell us?!"
"In the six hours I've spent peering at this roulette table and making notes, I'm pretty darn sure I've discovered some patterns!"
(And yes, in the past, you actually might have done. Today, though, not so much. Casinos love note-takers, because they usually sooner or later come up with a bold new gambling system, or just reinvent an old one, and then lose all of their money.)
I just went to random.org, which delivers high-quality random numbers, created from atmospheric radio noise, and asked it for eight random bytes, expressed in binary as 64 zeroes and ones.
What'd I get?
1111000010101110011001000100100111101111010011111011100010100010, that's what.
The first byte is 11110000. Doesn't look very damn random, does it? Byte five is 11101111. There are five runs of three, four runs of four and one run of five repeated digits in just these 64 bits.
I did it again. This time, I got 1111110111101111010000001011010110111111101001110010100001010000. Only one zero in the first byte. One run of three, four runs of four, two runs of six and one run of seven repeated bits, for pity's sake.
But these strings of bits really are robustly random. Random.org is not running some perverse scam.
Ask random.org to pick numbers from one to ten and, over time, you'll get every digit about the same number of times (though not exactly the same number of times, any more than it's reasonable to expect 100 tosses of a coin to give you exactly fifty heads and fifty tails).
Ask people to choose a number from one to ten, and especially if you make sure to specifically tell them to pick a random number from one to ten, almost no-one will choose one or ten. Of the other numbers, humans (in Western nations at least) have a tendency to pick seven.
(If asked for a "random" number from 1 to 20, a surprising number of people will pick 17.)
If your random-number generator gives you nothing but nines then, yes, there probably is something wrong with it. But clusters that seem far too common to the human mind are actually an indicator that something really is random.
So the next time a boss drops crap loot in four consecutive raids, or three of your friends all have their hard drive fail in the same week, bear in mind that this certainly may indicate something fishy is going on. But it can just as easily be a complete fluke.
And now you can prove that mathematically!