Creating a matrix of random entries where lengths of the columns are also random

It looks like I misunderstood the Stackoverflow question “Create vectors with differents lenghts in perl [sic]”. The OP is satisfied with the answer he received, but I like the question I thought he was asking more than the one he actually was asking.

So, let’s say I want to generate a matrix of random entries whose column vectors have random length.

First things first. Presumably, one would be using this code for some kind of simulation or analysis where the statistical properties of the pseudo-RNG generator matter. In such cases, it is better to start with a pRNG with known good properties rather than rely on whatever Perl’s builtin rand got linked to. I give a really extreme example of inadequacy elsewhere on this blog, but even if your builtin rand doesn’t generate such few values, you can’t rule out things like higher order correlations without actually running a battery of tests. Therefore, I am going use a Perl implementation of Mersenne twister.

In a serious simulation exercise, you should seed a Math::Random::MT object with a vector of random integers. On a *nix system, /dev/random and on a Windows system CryptGenRandom (assuming XP SP3 or later) will probably be good enough for this purpose, so I am going to omit that part from the code below.

In a vain attempt to channel my inner MJD, I first wrote a random vector generator generator, creatively named genvgen:

sub genvgen {
    my ($terminator, $randmax) = @_;

    return sub {
        my $is_terminated = $terminator->();
        my $gen = Math::Random::MT->new( ... );
        return sub {
            $is_terminated and return;
            $is_terminated = $terminator->();
            return $gen->rand($randmax);
        }
    }->();
}

The first argument, $terminator is a coderef that returns true if sequence generation should be terminated. For example, if you want sequences to terminate with probability 10%, you’d write:

my $t = Math::Random::MT->new( ... );
my $v = genvgen(sub { $t->rand < 0.1 }, 10);

for my $i (1 .. 20) {
    next unless defined(my $x = $v->());
    printf "%d: %g\n", $i, $x;
}

Given that, generating the matrix is just an application of map:

sub run {
    my %args = (
        cols => 5,
        prob => 0.3,
        randmax => 10,
        @_,
    );

    my $terminator = Math::Random::MT->new;
    my @matrix = map genvgen(
        sub { $terminator->rand < $args{prob} },
        $args{randmax},
    ), 1 .. $args{cols};

    # print 10 rows
    for (1 .. 10) {
        my @row = map {
            my $x = $_->();
            defined($x) ? int($x) : ' ';
        } @matrix;
        say join('|', @row);
    }

    return;
}

Note that if want to make sure each column contains at least one element, then change genvgen to:

sub genvgen {
    my ($terminator, $randmax) = @_;

    return sub {
        my $is_terminated;
        my $gen = Math::Random::MT->new( ... );
        return sub {
            $is_terminated and return;
            $is_terminated = $terminator->();
            return $gen->rand($randmax);
        }
    }->();
}