Sunday, September 4, 2022

"Pseudo" in "pseudo-RNG" does not mean "poor-quality randomness"

Whenever the question of the quality of the randomness generated by a computer program is the subject of a discussion, pretty much invariably and without fail some smartass will jump in and say that what the computer is using is a "PSEUDO random number generator" with the implication that that somehow affects the quality of the randomness and explains why the quality might be poor.

This is a misunderstanding about random number generation and what the "pseudo" in that term means and implies.

The "pseudo" in "pseudo-random number generator" (PRNG) has nothing to do with the quality of the randomness and everything to do with repeatability: Given the same initial conditions (ie. the same initial "seed") you are guaranteed to always get the same stream of numbers.

However, that tells nothing about the "quality" of the randomness of those numbers.

Sure, there are many PRNGs that have very poor, even horrendous quality, and/or horribly short periods (ie. how many numbers they produce before they start repeating the same pattern again). However, they are not poor-quality because they are "pseudo". They are poor-quality because they have been badly designed. Them being "pseudo" has nothing to do with it.

The fact is that it's perfectly possible to implement PRNGs which output (if you don't know the initial seed) is completely indistinguishable from true randomness, by any metric or algorithm you may use, and where predicting the next number (or even that the number is likely to be within a certain range) is pretty much impossible, just like with true randomness.

In fact, so-called cryptographically strong PRNGs (CSPRNGs) rely heavily on this. If a CSPRNG were in any way predictable, even slightly, it would make it useless for cryptography.

To put that in another way: If you are given two very long lists of numbers (billions of numbers), one generated by a CSPRNG and another by some true source of even randomness, it's impossible for you to know which one was produced by the CSPRNG, no matter which metric or algorithm you may use.

Thus, for all intents and purposes, the output of a CSPRNG is pretty much in practice truly random. Indistinguishable from actual randomness. It cannot be higher-quality than that. The only special thing about it being "pseudo" is that if you have the initial seed you can repeat the same stream of numbers with it. (Without the initial seed there's no way for you to even know what this initial seed was. In fact, cryptography heavily relies on this.)

Thus, in the name of everything that's holy and mighty, please stop jumping in with the "pseudo" thing when the question at hand is the quality of the randomness of a computer program, as if "pseudo" meant "poor-quality randomness".