# Generating Almost Normally Distributed Values

Every now and then I want to generate normally distributed values in something like Perl, which does not have built-in functions for it. Unfortunately, there’s no easily remembered way to generate numbers from the normal distribution. I just learned the best way to do it: don’t. Generate logistically distributed values instead.

Seriously! Look at it: the bars are logistically distributed values, and the
curve is the normal distribution. That’s absolutely close enough for any
practical purpose I can think of.^{1} The normal distribution occurs frequently
in proofs of exact mathematical relationships. In the real world, uncertainties
around which specific distribution is appropriate are often large enough that an
approximation is just as good as the real deal.

The reason the logistic distribution is so much better is that it’s easy to remember how to draw from it. Here’s the algorithm:

- Generate a uniformly distributed value between 0 and 1. Almost all standard libraries have functions for this. Call this value \(p\). It will be the quantile we select from the logistic distribution.
Compute the log-odds corresponding to that quantile as such:

\[\ell = \log{\frac{p}{1-p}}.\]

- Give it the correct standard deviation by multiplying with a magic
number
^{2}Well, 0.55 is an approximation. The real magic number is \(\frac{\sqrt{3}}{\pi}\). But I don’t know how I’d remember that so 0.55 is easier., giving us \(x = 0.55 \ell\). - Done! Now \(x\) is a draw from almost the standardised normal distribution.

If we want a different mean and/or standard deviation, we scale up the same way we would if it was a real normal draw, i.e. \(\mu + \sigma x\). In other words, given a uniformly distributed \(p\), the complete process is

\[x = \mu + 0.55 \sigma \log{\frac{p}{1-p}}.\]

*That’s it.*

# Logistic distribution has fatter tails

As we can see from the plot above, the logistic distribution has slightly fatter tails than the normal distribution. Here’s zoomed in to the region beyond two standard deviations. Grey is still logistic distribution, black is normal.

This means when drawing from the logistic distribution we will get slightly more central values, and slightly more extreme values. This is usually a good thing, because few real-world processes are as well behaved as the normal distribution.

# Drawing from the actual normal distribution

The problem with drawing from the actual normal distribution is that we don’t
have a closed-form way of doing it. It needs to be done numerically. For
details, see the Wikipedia page, section *Computational methods*. It’s not that
these methods are complicated, it’s just that I have no chance of remembering
how to do it^{3} Well, except perhaps the Irwin–Hall approximation. and I hate
having to look up basic algorithms.

I can easily remember

\[x = \mu + 0.55 \sigma \log{\frac{p}{1-p}}\]

for the rest of my life – especially now, having written this article!

# Spot the true normal

In case you still don’t believe this is a good idea, here’s your chance to prove me wrong. Half of these plots are drawn from the real normal distribution, whereas half come from a logistic distribution. Guess which one is which! You get a generous 80 samples from each.

I’m happy to receive guesses. Why don’t we set up a scoreboard?

Guesser | p-value |
---|---|

Nick | 0.17 |

jocke-l | 0.17 |

kqr | 0.50 |

modeless | 0.83 |

The p-value is the probability of guessing that many correct (or more) if one truly has no idea which is which. In other words, I got a few right, but this result or better would have happened by chance 50 % of the time even if I had no way of telling them apart. Not a very impressive result.