Suppose you want to randomly pick a pair of elements from an array *without* replacement. That is, given `[qw(Cat Dog Fox Horse Goose Unicorn)]`

, you want to pick `($first, $second)`

so that `$first ne $second`

.

One might first try something like this:

#!/usr/bin/env perl use 5.014; use strict; use warnings; my $set = [qw(Cat Dog Fox Horse Goose Unicorn)]; say "@{ pick1($set) }" for 1 .. 5; sub pick1 { my $set = shift; my $n = @$set; my $x = $set->[rand $n]; my $y; do { $y = $set->[rand $n]; } until ($x ne $y); return [$x, $y]; }

That is not horrible, but can we do better? For example, there is no need to live with the possibility of generating a duplicate when we know we do not want that. splice can help with that:

sub pick2 { my $set = shift; my @i = (0 .. $#$set); my $j = splice @i, rand @i, 1; my $k = splice @i, rand @i, 1; return [ @{$set}[$j, $k] ]; }

The array of indices of `@$set`, `@i`, is there because I do not want the act of picking these pairs to alter the underlying array.

What if `$set` refers to an array with a large number of elements? Creating `@i` may be really undesirable in such a case.

But, of course, if you are just picking two elements, you can replace the `splice`

with simple arithmetic:

sub pick3 { my $set = shift; my $j = int rand(@$set); my $k = int rand(@$set - 1); $k += ($k >= $j); return [ @{$set}[$j, $k] ]; }

Say, `@$set` has 6 elements. The first invocation of rand picks an index out of {0, 1, 2, 3, 4, 5, 6} and the second invocation picks one from {0, 1, 2, 3, 4, 5}. Say the first index picked was 2. Then, the correct set out of which to pick the second index would be {0, 1, 3, 4, 5, 6}. The line:

$k += ($k >= $j);

takes care of this mapping.

Of course, for serious work, you should use something like Math::Random::MT rather than whatever `rand`

your runtime gives you.

For pick3, is it the case that resulting $k is uniformly distributed over all non-$j?

ReplyDeleteWould it be better to repeat $k randomization until there's no collision with $j, such as:

do {

$k = int rand(@$set - 1)

} until ($j != $k);

@Shaun Griffith: If you are going to do that, then the second pick should also be

Delete$k = int rand(@$set)and then you reject$kif it was picked before.Your suggestion is fine if the PRNG is IID. However, that does sample the PRNG more and it would generate a different sequence than my

pick3.What I am doing is a relabeling

ex post facto. After I pick$jout of the sequence J =0…$n, I pick$kout of the sequence K =0…$n - 1. If my PRNG is uniform, I have an equal probability of picking one of those numbers.But, I really wanted to pick

$kout of the sequence M =0…$j - 1,$j + 1, …$n. Relabeling every$k ≥ $jas$k + 1provides a one-to-one and onto mapping from$Kto$M. This relabeling does not affect probabilities. If every element of$Khas1/($n-1)probability of being picked, so does every member of$M.Hope that makes sense.

cute

ReplyDeletecf Perl 6:

ReplyDeletesay <Cat Dog Fox Horse Goose Unicorn>.pick(2);

Doc: http://doc.perl6.org/search#pick

Spec: http://perlcabal.org/syn/S32/Containers.html#List (scroll to 'pick')

Implemented by Pugs, Niecza, and Rakudo compilers (and maybe others).