Saturday, May 31, 2014

Perl 5.20.0 brings a "better" PRNG to Windows

One of the changes in Perl 5.20.0 is that rand now uses a consistent random number generator:

Previously perl would use a platform specific random number generator, varying between the libc rand(), random() or drand48().

This meant that the quality of perl's random numbers would vary from platform to platform, from the 15 bits of rand() on Windows to 48-bits on POSIX platforms such as Linux with drand48().

Perl now uses its own internal drand48() implementation on all platforms. This does not make perl's rand cryptographically secure. [perl #115928]

In addition, this doesn't necessarily mean the builtin pseudo-random number generator is good for Monte-Carlo, Bootstrapping, or other statistical pursuits.

$ type rr.pl
#!/usr/bin/env perl

use strict; use warnings;
use feature 'say';

my %hist;
$hist{rand(1)} += 1 for 1 .. 1_000_000;

say scalar keys %hist;

$ perl rr.pl
1000000

Let's test some coin-tossing. With a decent random number generator, if you take a bunch of random samples of coin tosses, you would expect 5% of them to incorrectly reject the null for a significance level of 5%. By default, the code below takes 1,000 samples of 100 coin tosses, does a hypothesis test for probability of one face being equal to 0.5, and repeats the process 30 times. If were tossing a fair coin, we would expect each run to produce 5% false positives. First, the code, then the output:

#!/usr/bin/env perl

use utf8;
use 5.010;
use strict;
use warnings;
use feature 'say';

use Statistics::Distributions;

run(@ARGV);

sub run {
    my $sample_size = shift // 100;
    my $number_of_samples = shift // 1_000;
    my $π = shift // 0.5;
    my $number_of_runs = shift // 30;

    my $ẕ = Statistics::Distributions::udistr(0.025);
    my $σ = sqrt(($π * (1 - $π))/$sample_size);

    my @r;
    for (1 .. $number_of_runs) {
        my @z = (0) x $number_of_samples;
        for my $i (1 .. $number_of_samples) {
            for (1 .. $sample_size) {
                if ($π < rand) {
                    $z[$i] += 1;
                }
            }
            $z[$i] /= $sample_size;
            $z[$i] = ($z[$i] - $π) / $σ;
        }
        push @r, +(scalar grep abs($_) > $ẕ, @z) / @z;
    }
    say sprintf('%.04f', $_) for @r;
}

Output:

0.0519
0.0689
0.0480
0.0480
0.0549
0.0559
0.0619
0.0549
0.0579
0.0649
0.0529
0.0679
0.0440
0.0579
0.0509
0.0519
0.0589
0.0470
0.0490
0.0509
0.0629
0.0539
0.0679
0.0599
0.0430
0.0709
0.0599
0.0519
0.0430
0.0470

Rejection probabilities seem a little higher than 5%.

No comments:

Post a Comment