Monday, April 22, 2013

Randomly and non-destructively pick two items without replacement from an array

Suppose you want to randomly pick a pair of elements from an array without replacement. That is, given [qw(Cat Dog Fox Horse Goose Unicorn)], you want to pick ($first, $second) so that $first ne $second.

One might first try something like this:

#!/usr/bin/env perl

use 5.014;
use strict;
use warnings;

my $set = [qw(Cat Dog Fox Horse Goose Unicorn)];

say "@{ pick1($set) }" for 1 .. 5;

sub pick1 {
    my $set = shift;
    my $n = @$set;

    my $x = $set->[rand $n];
    my $y;

    do {
        $y = $set->[rand $n];
    } until ($x ne $y);

    return [$x, $y];
}

That is not horrible, but can we do better? For example, there is no need to live with the possibility of generating a duplicate when we know we do not want that. splice can help with that:

sub pick2 {
    my $set = shift;

    my @i = (0 .. $#$set);

    my $j = splice @i, rand @i, 1;
    my $k = splice @i, rand @i, 1;

    return [ @{$set}[$j, $k] ];
}

The array of indices of @$set, @i, is there because I do not want the act of picking these pairs to alter the underlying array.

What if $set refers to an array with a large number of elements? Creating @i may be really undesirable in such a case.

But, of course, if you are just picking two elements, you can replace the splice with simple arithmetic:

sub pick3 {
    my $set = shift;

    my $j = int rand(@$set);
    my $k = int rand(@$set - 1);

    $k += ($k >= $j);

    return [ @{$set}[$j, $k] ];
}

Say, @$set has 6 elements. The first invocation of rand picks an index out of {0, 1, 2, 3, 4, 5, 6} and the second invocation picks one from {0, 1, 2, 3, 4, 5}. Say the first index picked was 2. Then, the correct set out of which to pick the second index would be {0, 1, 3, 4, 5, 6}. The line:

    $k += ($k >= $j);

takes care of this mapping.

Of course, for serious work, you should use something like Math::Random::MT rather than whatever rand your runtime gives you.

4 comments:

  1. For pick3, is it the case that resulting $k is uniformly distributed over all non-$j?

    Would it be better to repeat $k randomization until there's no collision with $j, such as:

    do {
    $k = int rand(@$set - 1)
    } until ($j != $k);

    ReplyDelete
    Replies
    1. @Shaun Griffith: If you are going to do that, then the second pick should also be $k = int rand(@$set) and then you reject $k if it was picked before.

      Your suggestion is fine if the PRNG is IID. However, that does sample the PRNG more and it would generate a different sequence than my pick3.

      What I am doing is a relabeling ex post facto. After I pick $j out of the sequence J = 0$n, I pick $k out of the sequence K = 0$n - 1. If my PRNG is uniform, I have an equal probability of picking one of those numbers.

      But, I really wanted to pick $k out of the sequence M = 0$j - 1, $j + 1, … $n. Relabeling every $k ≥ $j as $k + 1 provides a one-to-one and onto mapping from $K to $M. This relabeling does not affect probabilities. If every element of $K has 1/($n-1) probability of being picked, so does every member of $M.

      Hope that makes sense.

      Delete
  2. cf Perl 6:

    say <Cat Dog Fox Horse Goose Unicorn>.pick(2);

    Doc: http://doc.perl6.org/search#pick

    Spec: http://perlcabal.org/syn/S32/Containers.html#List (scroll to 'pick')

    Implemented by Pugs, Niecza, and Rakudo compilers (and maybe others).

    ReplyDelete