# Randomness out of nothingness

We all have an intuitive notion of randomness. Most people think they can produce a string of random numbers on demand. However, humans are notoriosly bad at generating numbers at random (however, see here for an exception). If you ask people to choose a random string of numbers, most will avoid repeating a number twice or thrice in a row. But doing so introduces subtle patterns and makes the sequence less than truly random (by radom here I mean that the numbers should be chosen with uniform probability, and indpendently). If the shuffle function on your iPod resulted in a truly random sequence of songs, you would likely get more repeats than you expect – randomly generated events tend to clump.

Randomness is a fundamental aspect of many processes in nature. For instance, evolution is driven by random mutations. These mutations result in diversity, and occasionally produce a mutant that is better adapted to the environment than its predecessors. Therefore, randomness is why we are here.

Many scientists study such random processes experimentally, and develop theoretical models to describe them. Such models are used to decide which drugs are beneficial and safe. However, these theoretical models can be complex, and we frequently need computers to fully understand them. And therein lies a big problem: like humans, computers are not great at generating random numbers. Indeed, computers are designed to provide a precise answer to mathematical questions. Variabilty or randomness in their output is a sign of bad design.

So can a machine designed to provide repeatable output generate random numbers? Yes! Some processes follow precise mathematical rules and result in outputs that look random. Think of the number pi. Its digits are well defined. You can use a computer to generate as many digits of pi as you please. The beginning of the sequence will be familiar (3.14159). However, if you grab a string of digits from an arbitrary location, they will look random (here are a few digits starting from the tenth place after the decimal point 5897932384).

Random number generators based on these principles are good enough for most applications. But it is important to remember that the resulting sequences are not truly random. Scientists have therefore looked at other ways generating random numbers.

The coolest method I know creates random numbers out of nothing. According to quantum mechanics vacuum is not truly empty, but is filled with subatomic particles that pop in and out existence. And this spontaneous creation and anahilation is truly random. Scientists in Australia were able to harness these fluctuations to produce billions of truly random numbers per second. It is amazing to think that perhaps the best way to create randomness is out of nothingness.

My MP3 player (brand: Sansa) actually sometimes does repeat the same song when i put i on shuffle. Each time it happens i’m filled with happiness (the shuffle functions on my previous mp3 players “felt” very non-random) 🙂

“But it is important to remember that the resulting sequences are not truly random.” This statement, more or less universally included in every description of computer RNGs, seems meaningless to me. I’ll make the claim that there is no acceptable mathematical definition of a “true” random number generator, without loss of at least one of the other properties we hold as fundamental to the mathematical treatment of randomness. There is nothing inescapably “pseudo” about psuedo-random generators except for the computational limitations of finite representations.

I’m sure you know the story, recounted in Numerical Recipes and elsewhere, of the company representative insuring that the collection of numbers spit out by their RNG may not be random considered together, but each number is guaranteed to be random “individually”. In my opinion, this joke reveals a fundamental limitation to our thinking, basically coming down to the selection of a seed, and the fact that any useful definition of an RNG has to apply to finite sequences, not just the limit. We are good at describing measure spaces, but how does one get a sequence from a measure space? The most straightforward way is to put some dynamics on the space compatible with the measure. But if you want to call this an RNG and have it mean something similar to what it means for computers, you have to pick an initial condition/seed, and here you lose the benefit of having built the measure space in the first place.

Imagine I take your favorite physical pRNG (e.g. quantum), and use it to seed any computer cRNG you want. The first number will be as random as the pRNG, but so will the kth number, since it also depends on the seed. You might then want to complain about the sequence not meeting higher order properties, but this is not well defined for finite sequences; as you know, 111111111 is not any less random than 237697651. We would complain about an RNG that always produces all 1’s no matter which seed we give (or gives such a sequence “too often” over the set of seeds), but the problem isn’t the sequence of 1’s itself. Defining the “randomness” of a cRNG cannot escape a proper description of the seed. What is a good mathematical definition? If it is the output of some other RNG, you’ve just recursed the problem. It it’s left undefined, the RNG inherits the incomplete definition.

Turning to the quantum generators, then, the point is that they do not have some mathematical property that elevates them above what the computers do. It is rather that we do not have a description for them. They simply do not exist within the space of mathematically described objects. You could of course approximate them as following the output of some dynamical system on some measure space, but then you’ve picked a particular representative with the same limitations as above. Arguing that quasi-RNGs are not truly random because we have a formulaic prediction of the nth number (the very generator formula) is psychology, not math. I can’t write down a formula for the measure space DS, but it is still a well-defined deterministic function, again, once the seed is chosen.

I’m not arguing against the practical utility of stress tests for cRNGs; we should remove as much computational dependence as possible, and it makes sense to complain if “I can do better than chance at predicting your nth number if you give me all the numbers up to n-1, over enough choices of seeds”. My point is that this is really establishing a set of properties in a finite list of numbers, and we need this no matter where the list of numbers came from. In practice we don’t use RNGs for their randomness anyway, but rather their complexity. We want to compute something at a subset of points, and we don’t know in advance where those points should be, so we generate them with a cRNG that we have tested enough to know it will properly sample the space extensively enough for our practical needs. As experimentalists that have to make do with a finite number of measurements, we in fact almost never use “random” sequences; we usually do something like make deterministic sequences counterbalanced up to some order (e.g. every digit, and every possible pair of successive digits, appears an equal number of times, and the top half and bottom half of digits each cover the whole space). From my point of view, the supposed non-randomness of cRNGs is usually just a warped form of a limitation to the computer’s representations of numbers.

I’d be interested in any response you might have. In particular, I’d like to know what you would use as a rigorous mathematical definition for a random number generator. Not a measure over the set of all sequences, which is a cheat, in my opinion, but a definition that is actually a fair comparison to the computer. What is the exact property that a cRNG lacks, that makes it “pseudo”, even in the limit of an infinite number of bits?

PS. I guess if I were going to go full-on crazy rant, I’d say that the even more fundamental limitation to our thinking is that mathematics lacks a truly good way to handle time. Even in dynamical systems, we typically solve problems by converting the nominally temporal problem into a static topological/geometric problem. So when we try to define things in terms of our ability to predict them, we get hung up on the ill defined nature of the past and future as mathematical objects.

There is a lot here – but let me address what I see as the main question: “What is the exact property that a cRNG lacks, that makes it “pseudo”, even in the limit of an infinite number of bits?” I’ll start with a more technical description, and follow up with an interpretation:

I really believe that the word pseudo-random is warranted in describing the RNGs we find on computers. Here is my reasoning: What is an appropriate measure of the randomness of the numbers generated by the algorithms? Let us agree that if you know the algorithm, then a single element in the sequence can serve as a seed, and will allow you to predict all other elements.

I argue that then a proper measure of the randomness of a sequence generated by an algorithm is given by the entropy of the probability distribution over all algorithms for RNGs that would be devised by humans. This probability distribution would be concentrated on relatively low complexity, fast algorithms. We can argue about exactly how to define it. Now compare the entropy of this distribution with the entropy of a sequence of random numbers (with whatever finite precision), drawn uniformly and independently from some discrete distribution. I think the first entropy of algorithms will be much smaller – hence I’d say the sequence generated by a typical RNG is not as random as a truly random sequence.

What this means in more concrete terms is the following: We can think of entropy roughly as equivalent to the amount of time it would take us to guess the sequence of numbers. What I am saying then is suppose that I generated a sequence of a million 16 bit numbers by a) using a standard algorithm on a computer, and b) picking them from the vacuum. For a) we can guess the algorithm that was used to generate the sequence in a), guess the first number in the sequence, and then use it as a seed to produce the rest. For b) we would have to guess every single number. I maintain that if we know that the algorithm for a) was generated by humans, then guessing sequence a) will be much easier. Therefore, in a real sense sequence a) is less random than sequence b).

For most purposes I agree that the distinction is academic. We can use RNGs to do most simulations. However, we need to be quite careful to realize that the entropy of human output can be quite low. This is why it has become so easy to crack enormous numbers of passwords – actually this article is a must read: http://arstechnica.com/security/2012/08/passwords-under-assault/

This is a bit tangential, but I’d like to point out a practical point about cRNGs vs pRNGs, namely that you have to remember to check that the seed is set appropriately. In Monte Carlo simulations, neglecting to set the seed appropriate can lead to trouble (see e.g. http://pubs.acs.org/doi/abs/10.1021/ct800573m ). I don’t know much about cryptography, but I imagine this can be a real problem for most cryptosystems too.

A set of random numbers just means they are drawn from independent identical distributions, i.e. the joint distribution completely factorizes. The joint distribution of a set of pseudo-random numbers does not factorize although the non-independence is usually only observable in the very high moments. A random number generator has nothing to do with dynamics, it’s only that the ones we commonly use are drawn from iterated maps.

With respect Carson, I think you may have missed the point. What is your mathematical definition specifically of an RNG, as distinct from definitions of random sequences?

There is no difference. A perfect random number generator draws numbers from a fixed distribution with replacement. That is the only purpose of a random number generator. How it achieves this is a matter of implementation. That is where the pseudo comes in.

@Carson: But the point is, I’m challenging you on what you mean by “draw”. I don’t think you have a definition for it. In particular, until I see it, I’m not convinced you know of an actual “implementation” that is not pseudo (where by the scare quotes I’m allowing for an idealized implementation that relies on (presumably countably) infinite storage), and/or not completely described due to reliance on a seed. That is the essential question in my original post to Kreso. Is there a rigorously defined mathematical object that we imagine describes the quantum or other physical random number generators, that is different at the level of definition (and not due to finite representations) from the computational generators with which we are all familiar?

@JR So are you questioning whether there is true randomness in the real world? If we have a geiger counter, a cesium-137 atom, and a clock and just record the time of each click up to some precision mod some big number then that is a scheme that generates a uniform set of “random” numbers. The randomness depends on whether or not subatomic decay is truly random.