Random... ish... numbers

Publication date: 20 November 2013
Originally published 2013, in PC & Tech Authority
(in which Atomic magazine is now a section)
Last modified 27-Aug-2015.

Computers have problems with randomness.

In the real world, making random numbers is easy. Toss coins, roll dice, point a video camera at a lava lamp and grab low digits from the video stream. The real-world problem is often keeping randomness out of things.

Your standard personal computer's logic circuitry is completely deterministic, though. The same inputs will always give exactly the same output. So the best most computers can do when they need some random numbers is take some arbitrary number, like (to simplify) the number of milliseconds since midnight, and use that as the "seed" for one of several "pseudorandom number generators", or PRNGs.

The output of a PRNG is more than random enough for most of your random-number needs. Resolving fights in games, say, or encrypting everyday voice-over-IP phone calls, or online banking for us non-billionaires.

But perhaps you want your business or political or military secrets, or your online backups, locked up tight. Or you're in the "import-export" business and would like your communications with your "associates" to not be accessible to the "police" who are very interested in your numerous bags of "baby powder". Now you probably need real randomness to generate encryption keys and do other important work.

Cryptology is a very complex and subtle field. "Almost random" numbers from PRNGs can leave security holes big enough for an attacker to drive a metaphorical ocean liner through.

If, for instance, a PRNG is only seeded with the number of milliseconds since midnight, then since there are 86.4 million milliseconds in a day, that's the maximum number of distinct outputs the PRNG will ever produce. If an attacker knows this - or just has a hunch - and can try millions of keys per second, you're in trouble.

One way to solve this problem is to get the key from elsewhere. A password supplied by the user can do the job, here. If your password is "passw0rd" or "s3x" then an attacker will be able to decrypt your data almost instantly, but if your password is "green fish fly on Sundays" or "linglebingleschawingle", he's screwed.

Few personal computers have a proper built-in hardware random number generator, but now and then a consumer CPU or motherboard chipset comes along that has one. They work by combining multiple oscillators that run at different speeds, or looking at electrical noise from a hot component.

(And they may, or may not, have been compromised by the NSA.)

The /dev/random device in the various Unicies can gather randomness in various ways; in Linux it generally uses the timing of unpredictable system events and user activity on the keyboard and/or mouse, which works quite well provided the computer in question has plenty of unpredictable events and input. There are many other such dubious, unproven hacks that use standard computer hardware to create "real" randomness from system-event timing.

All of these things exist because robust random numbers are useful, and there are several tempting pretty-darn-random things in a PC. You could, for instance, use the least significant bits of the audio hardware's microphone input. Even if there isn't a microphone plugged in, there'll still be electrical noise and hum. Or you can explicitly ask the user to waggle the mouse around a bit and/or pound on the keyboard, and similarly sample that data.

Again, though, there are a lot of subtleties here. That audio hum, for instance, may produce predictably periodic output. And mouse input and audio hardware are not expected to be important to security, so those systems aren't hardened against attack. It wouldn't be difficult for a trivially tiny virus to greatly reduce the possible "random" outputs from this sort of number generator, or possibly to remove the randomness altogether and always output exactly the same data. Or just to send the keys to the attacker.

There are also free online random-number sources. Most of them are just Web interfaces to one or another PRNG, but some, like the legendary Lavarand, are proper random-number generators. Chief among those is random.org.

Random.org makes very-high-quality random numbers from atmospheric radio noise, and will package them however you like - d20 rolls for online games, long strings for encryption keys, blocks of raw bytes, you name it. (They stay in business by charging for a few things, like a phone app, a custom contest-draw service, and API for other programs to use.)

Random.org is acceptable for some industrial-strength encryption purposes, since you can access the site via strongly encrypted HTTPS to prevent Internet miscreants from sniffing the bits out of the network traffic going to your computer, and you probably don't need a higher data rate than a paid random.org account can give you. If random.org happens to be down or your Internet connection's out, though, it won't work, and browser spyware could also spoil your day if that's the interface you're using. (Or an attacker could physically steal the data from you, but it's a general axiom of computer security that if the enemy has physical access to your systems, you're probably boned no matter what.)

If you need more reliable randomness or just a lot of random bits per second, there are add-on cards with hardware random-number generators, including fancier versions like photoelectric doodads that give you a window into the guaranteed randomness of quantum physics. These can all be attacked by malicious software too, but at least they'd need attacks tailor-made to the card in question.

The only truly unbreakable encryption is that based on a "one-time pad". This pad was originally, and in some international-espionage situations still is, an actual paper pad, containing pages and pages of numbers that were generated in some genuinely random way. You need at least as much pad data as you have message data to encrypt, and you destroy the pad after you've used it once, hence the name.

To encrypt an alphabetical message with a one-time pad, you could have a pad with a sequence of random numbers from zero to 25 (at last, a use for my d30!), and just add the value of each successive number to the alphabetical location of each successive letter of the message, wrapping around at the ends so that "W" plus "11" will give "H". A receiver with an identical pad can subtract the same numbers from your encrypted text, and recover the message. A receiver without an identical pad cannot.

(You'll need more number range if you want spaces or punctuation encrypted too - using the abovementioned 30-sided die to create your random-number list would give you extra numbers for a space and three punctuation marks. In a simple example of the abovementioned cryptography subtleties, some information about your message can be gleaned by just looking at the lengths of words and sentences, if spaces and punctuation are not encrypted. Note also that if you re-use one-time pads, you give attackers another way in, by comparing multiple ciphertexts created with the same sequence of numbers. There are lots of real-world cryptographic attacks that only work if the attacker has large amounts of data all encrypted the same way, causing some kind of revealing repeating pattern. This is why "one-time" is right there in the name of the one-time pad.)

The only way to break proper one-time-pad encryption is by getting a copy of the pad. But, conversely, the only way to use one-time-pad encryption is to distribute pads, be they paper or digital, to everybody who needs to send or receive encrypted data. If you could safely re-use the pads then this would often be quite workable, but you can't.

So one-time pads were usable for, say, spies in occupied Europe in World War II, as long as they didn't want to send any more encrypted data than they had numbers on their little literal paper pads. This was perfectly workable, since you can encode quite a lot of messages along the lines of "GUD PUSH STH" with a rather tiny book of numbers. You can do a similar thing for digital communications today, by for instance making the "pad" a storage device with gigabytes or terabytes of random numbers on it.

For most real-world encryption tasks, though, one-time pads are useless. You might as well just use whatever secure technique you're forced to devise for distribution of the pads, and just use it to distribute the messages directly.

Over the years, many fools and mountebanks have claimed to have come up with some way to use one-time pads for practical encryption in the modern world. One popular claim is that they've devised a way to generate the one-time pads on the fly, and not have to distribute the pad data. This is mathematically impossible.

But, like many puzzles in the sciences, although it's impossible in theory, it's kind of possible in practice. If you cheat.

Pi, the ratio of a circle's circumference to its diameter, is an irrational number. This means it cannot be expressed as a fraction, so it cannot ever be precisely written down. No matter what base you use, any attempt at a precise quantification of pi will have infinitely many digits.

You don't need many digits of pi for most purposes. Engineers, including engineers making things that fly faster than sound or land on other planets, can often get away with just calling pi "22/7". (Since 22 on 7 is only about 1.0004 times pi's actual value, this is not actually terribly surprising.)

We've known since the eighteenth century that there are infinitely many digits in pi, and much more recently arduous computer analysis of those digits has shown that the sequence and distribution of pi's digits are, so far as anyone can determine, every bit as random as the output of a true random-number generator.

Whether pi actually contains all possible sequences of digits is at present unknown, but numerous sequences are trivially locatable at non-suspicious distances into pi's digits. My home phone number, for instance, occurs about 41 and a half million digits into pi; 8675309 occurs at position 9,202,591. And "12345678" appears 186.6 million digits in. This is all about what you'd expect from a truly random digit-string.

(We also know that irrational numbers by their nature not only never terminate, but also never settle to an infinite repeating sequence. So it's theoretically possible that pi eventually, after unimaginably many digits, ends up containing only ones and twos in random sequence, but it's definite that pi will never settle to, say, "...122122122122...", forever.)

There are plenty of other irrational numbers you could use instead of pi, too. There's "e", the base of the natural logarithm, for instance. More subtly, every natural number that isn't a perfect square has an irrational square root. So just mashing digits on a calculator then pressing the square-root button will usually give you some random-ish digits to play with.

You can use such a string of numbers to encrypt and decrypt things just as if the string of numbers were written down on a one-time pad. As long as your recipient knows how you're coming by the endless sort-of-random digits you used to encrypt your message, they can generate the same numbers themselves and decrypt it.

The problem with this is of course that none of these digit-strings are actually random, and they're accessible to any attacker in very short order. An entirely unremarkable PC running PiFast can now whip up the first million digits of pi in one second. (That vague shrieking sound you hear is coming from Alan Turing's ghost.)

This way of making "random" numbers is also about as well-known to working cryptologists as the Scholar's Mate is to professional chess players. So is every single brilliant super-secret way you have just come up with of conveying information about the number-string you're using to the recipient of your message. So don't expect this to keep your plot to kill the President of the USA secret for very long.

Actually, in that sort of thriller-novel situation, no encryption is particularly reliable. The involved bodies can just spirit you away to a dark hole in an unpleasant country and beat the encryption codes out of you. (Cryptologists refer to this as "rubber-hose cryptanalysis".)

Torture doesn't work to get the truth out of people in the classic "ticking bomb" situation - all they have to do then is lie about where the bomb is until it goes off, and each fresh lie will probably take you an unworkable length of time to expose. But rubber-hose techniques work pretty well to get crypto codes out of people, because you can have a computer to try the codes on sitting right there in the dungeon. You can leave the tortured guy bubbling blood in the corner while you calmly decrypt and read all of his stuff.

(Well, you can unless he used deniable encryption. In which case he's feverishly hoping you don't figure out that he used deniable encryption, because then you'll never know you're finished and may never finish torturing him for the key to the next layer of the cryptology onion, whether there is a next layer or not.)

For all these reasons, an encryption system based on secretly telling the recipient to start the decryption key at digit 382 of the square root of 31 is not really very useful. But I still find it delightful that there are completely deterministic number sequences which pass all the algorithmic tests for randomness.

If you don't, then I can only hope that you didn't read this far.