Friday, July 18, 2014

In OCaml, how can I get a list of directories in my PATH?

I have been getting my toes wet with OCaml, using Real World OCaml. The book content is freely available on their web site, but I have bought the ebook from O'Reilly, and I thoroughly recommend it.

I have to admit, it hasn't been a quick task. I find that I am too used to the luxury of documentation at my fingertips using perldoc. Reading the book, doing the exercises does breed familiarity, but I am far away from being able to write an image gallery generator (which was my first ever Perl program).

I like the implicit type checking. In fact, that is an idea that appears in Perl as well (not as strict, but, still). For example, let f x = x + 1 defines a function that takes an integer, and returns the following integer. Yes, OCaml does distinguish between types of numbers. No, I haven't yet gotten used to it.

Now, f 5 will return 6. But, f 0.5 will result in This expression has type float but an expression was expected of type int.

In Perl, if you defined my $f = sub { $_[0] + 1 } and invoked it with a string argument, the interpreter would notice it (and even tell you about it if you ask nicely):

$ perl -w -e 'my $f = sub { $_[0] + 1 }; $f->("test")'
Argument "test" isn't numeric in addition (+) at -e line 1.

Strict type checking is useful. The OCaml kind is not the same as the C or Java sort of type checking. Here is an example that had me scratching my head for a while until I studied it further.

Real World OCaml has the following example:

# let path = "/usr/bin:/usr/local/bin:/bin:/sbin";;
val path : string = "/usr/bin:/usr/local/bin:/bin:/sbin"
# String.split ~on:':' path
|> List.dedup ~compare:String.compare
|> List.iter ~f:print_endline
;;
/bin
/sbin
/usr/bin
/usr/local/bin
- : unit = ()

Now, if you squint enough, this is kind of like:

my $path = "/usr/bin:/usr/local/bin:/bin:/sbin";;
say for List::AllUtils::uniq(split /:/, $path);

although I do like the syntactic sugar of |>.

In Perl, I would have just used $ENV{PATH}. My thoughts immediately went to how to do that in OCaml. Luckily, utop has code-completion, so it didn't take me a long time to figure out I could use Sys.getenv to get the value of my $PATH.

utop # Sys.getenv("PATH");;
- : string option =
Some
 "/Users/xyz/.opam/system/bin:/Users/xyz/.opam/system/bin: \
/Users/xyz/bin:/Users/xyz/perl/5.20.0/bin: \
/opt/local/bin:/opt/local/sbin:/usr/bin: \
/bin:/usr/sbin:/sbin:/usr/local/bin: \
/opt/X11/bin:/usr/local/MacGPG2/bin"

Hmmmm … Why is ~/.opam/system/bin in there twice?

Anyway, first, note that naively replacing path with (Sys.getenv "PATH") does not "work":

utop # String.split ~on:':' (Sys.getenv "PATH");;
Error: This expression has type string option
but an expression was expected of type string

Note the Some there. Sys.getenv takes a string and possibly returns a string. In other words, its type is string -> string option = <fun>

We know why: The environment variable may or may not be defined. In Perl, we would get an undefined value in that case. Perl can then convert that value to 0 or "" as needed. In OCaml, you need to explicitly account for that possibility.

Observe the following:

utop # match Sys.getenv "PATH" with
| None -> ""
| Some x -> x
;;
- : string = 
  "/Users/xyz/.opam/ ...

Here, we decided that if Sys.getenv "PATH" does not return a value, we will consider our path to be empty. The type of the return value changed from string option to simply string, and it is no longer prefixed with Some.

If you are doing something real rather than working on small modifications to textbook exercises, you might not want to proceed if the path is not defined. But, for my immediate purpose of actually using the value of my path rather than manually typing in a string, the following was sufficient:

utop # String.split ~on:':'
(match Sys.getenv "PATH" with | None -> "" | Some x -> x)
|> List.dedup ~compare:String.compare
|> List.iter ~f:print_endline
;;

Phewww!

Pattern matching like this is actually quite valuable.

There is still a gaping hole in this construction. What if you type Sys.getenv "PTHA"? You'll end up propagating an empty path throughout a program. In Perl, I tried to avoid that kind of problem by using Const::Fast. As a simple example, I might have:

use Const::Fast;

const my %VAR => (
    HOME => 'HOME',
    PATH => 'PATH',
    TMP  => 'TMP',
);

say $ENV{ $VAR{PTHA} };

which immediately gives me Attempt to access disallowed key 'PTHA' in a restricted hash &hellip. It also serves as a documentation of which environment variables my script actually depends on.

This idea corresponds to the principle of making illegal states unrepresentable which fellow Cornellian Yaron Minsky explains in his guest lecture at CMU.

PS: Why OCaml? Well, for one, I loved Higher Order Perl, and decided I should add another camel to my herd ;-)

Friday, July 11, 2014

I wonder what SAS is doing to cause this activity pattern

Not doing anything special, just a little extraction, reshaping, and summarizing.

Four more instances smooth things out

Don't you hate being IO bound?

That's a little better

Thursday, July 10, 2014

Fun with image transformations in Perl

I came across American Gothic in the palette of Mona Lisa: Rearrange the pixels thanks to Reddit.

Basically, the task is to:

… create an algorithm that makes the most accurate looking copy of the Source by only using the pixels in the Palette. Each pixel in the Palette must be used exactly once in a unique position in this copy. The copy must have the same dimensions as the Source.

There are some interesting animations on that page. But, honestly, I couldn't bring myself to read through a lot of Java and Python. I immediately wondered how far I could get by using a rather naive method.

What I had in mind was this: First, given a source image with no more pixels than the palette image, sort the pixels in both images into a list of (color, coordinate) tuples by the 24-bit RGB value of the pixel color. Second, extract from the source pixel list just the coordinates, and from the palette pixel list just the colors. Finally, go through source image coordinates, setting the color of each pixel to the color in list of palette image pixel colors.

Yeah, OK, so I am not matching on the basis of color perception in radiation damaged mosquito eyes or some such scientific principle, but, still, the world is full of awful stuff these days, and I wanted to entertain myself for a few minutes.

I reached for an old favorite, GD::Image.

Here is the heart of the script:

sub get_pixels_by_color {
    my $gd = shift;
    my $dim = shift;
    return [
        sort { $a->[$COLOR] <=> $b->[$COLOR] }
        map {
            my $y = $_;
            map {
                [
                  pack_rgb( $gd->rgb( $gd->getPixel($_, $y) ) ),
                  [$_, $y]
                ];
            } 0 .. $dim->{width}
        } 0 .. $dim->{height}
    ];
}

pack_rgb is simple: sub pack_rgb { $_[0] << 16 | $_[1] << 8 | $_[2] }.

I only want the coordinates from the source image:

sub get_source_pixels { [ map $_->[$COORDINATES], @{ $_[0] } ] }

And, from the palette image, I just want the colors:

sub get_palette_colors { [ map sprintf('%08X', $_->[$COLOR]), @{ $_[0] } ] }

That sprintf isn't really necessary at all, but it does help if you have to print stuff to figure out a silly error. 00ef0812 is much more meaningful than 15665170.

The following function does the mapping:

sub recreate_source_image_from_palette {
    my $dim = shift;
    my $source_pixels = shift;
    my $palette_colors = shift;
    my $callback = shift;
    my $frame = 0;

    my %colors;
    $colors{$_} = undef for @$palette_colors;

    my $gd = GD::Image->new($dim->{width}, $dim->{height});
    for my $x (keys %colors) {
          $colors{$x} = $gd->colorAllocate(unpack_rgb($x));
    }

    my $period = sprintf '%.0f', @$source_pixels / $ANIMATION_FRAMES;
    for my $i (0 .. $#$source_pixels) {
        $gd->setPixel(
            @{ $source_pixels->[$i] },
            $colors{ $palette_colors->[$i] }
        );
        if ($i % $period == 0) {
            $callback->($frame, \ $gd->png);
            $frame += 1;
        }
    }
    return ($frame, \ $gd->png);
}

First, we create a hash %colors to store the color indexes we need to create via GD::Image->colorAllocate.

Then, it is just a matter of looping through each source coordinate, and setting the color at that pixel to the corresponding one from the palette image. I wanted to generate short animations of the transformations as well, so I also pass this routine a save callback.

Again, I couldn't be bothered with fancy math(!) trying to ensure the completed bitmap was also saved, so, to make up for my laziness, this function returns the final frame.

The results are not spectacular, but not completely disgusting either. Generating the 101 frames takes about 5 seconds for each image pair on my old MacBook Pro.

Here are some examples thanks to ffmpeg:

American Gothic using colors from Mona Lisa

Mona Lisa using colors from Starry Night

Starry Night using colors from Marbles

Mona Lisa using colors from Marbles

And, for reference, is Marbles:

Now, if only putting together this blog post had been as easy as writing the code using Vim, and generating the images using Perl ;-)

The complete script is on GitHub.

Thursday, June 12, 2014

Did I really need to put an SSD in my MacBook Pro?

Strictly speaking, the answer is no. With 16 GB of RAM on this MacBook Pro, the original 250GB Toshiba hard disk was barely being touched. When upgrading the entire MacPorts collection, or building a new version of Perl, I could just start those, and switch to my Remote Desktop or SSH window, and ignore the fan kicking in etc. So, there was no pressing need, per se.

But, there were some gift certificates languishing in my Amazon account. I did consider going with the roomy, dual-platter 7K1000, but I don't store much on this laptop, and the prospect of just another hard drive was not exciting enough to motivate me to learn how to do the upgrade.

Then came Crucial's MX100 with a 512GB model at a really nice price point on Amazon. Now, as a straight replacement, the 256GB drive would have been even cheaper, and would have offered similar performance in my case, because this clunkers SATA interface taps out at 3Gbps, which comes to about 384MB/s, at most. According to some reports, the 512GB drive is faster than the 256GB one, because, you know, number of chips or something.

In the end, I went with the 512GB drive, because, again, swapping the original drive with something of the same size just didn't seem worth the effort.

The physical aspect of removing the covers, getting the original drive out etc is very straightforward on this machine. Of course, I skipped the excellent instructions first, and completely missed the fact that I had to transfer the screws on the sides of the original drive to the SSD. I was quite confused when everything was really wobbly. At least, this gave me the opportunity to really clean some remaining stains from the 20oz cup of coffee that spilled on the keyboard, and just got everywhere inside, last Christmas.

Sigh!

You might have guessed that I have a bit of cavalier attitude towards hardware … For example, I have never owned a static wrist strap, much less seen one. So, don't do as I do if you are the more serious, reasonable to type of person. Follow all the good practices I did not follow.

The moment I had my hands on the MX100-512GB, I opened the cover, removed the original drive, put in the SSD. I inserted the original hard drive in my trusty Thermaltake docking station, turned the power on, kept pressing the option key until the boot medium selection came up. I booted from the recovery partition on the original drive via USB.

This is where I made a mistake.

I had earlier asked if there is a canonical guide to transferring the contents of your original system disk to a new one. See, I am not interested in making a backup. I have the original disk, I want to transfer its contents, including the EFI and recovery partitions to a new, larger disk.

It turns out, Apple's graphical disk utility does not show you any of those partitions, but creates them automatically. So, when you chose the so-called "1 partition" layout, you actually get three.

Being the literal person I am, in my first attempt, I actually reserved about 2 GB for that other stuff, figuring I would solve that riddle after programs & data had finished transferring.

I was surprised to find everything had been taken care of behind scenes (good thing) without any indication that they would be (typical Apple attitude which also labels burger-flippers "geniuses").

I just could not bring myself to accept that I had basically wasted perfectly good 2 GB of space right out of the gate, and, of course, instead of letting things be, I had to, I mean, just had to, repartition using the "1 partition" layout, and do the whole copy again.

Does anyone know why the disk utility takes so much more time than dd? Going from the 400 GB drive on my beloved Lenovo to the 750 GB drive, using fdisk and dd in ArchLinux, dealing with several partitions etc took much less time, including all the fiddling.

The drive is rather quick. I tested sequential reading a bit with some VirtualBox disk images etc, and got close to 300 MB/sec which is rather impressive considering the limitation on the interface speed imposed by the MCP89. Boot time has been much reduced (didn't measure, just noticed it because I had to reboot a few times). In general, stuff happens much faster. If you are interested, here is a review I found helpful.

I did not manually patch any kernel files to turn OSX Mavericks TRIM support on for this drive. Based on what I have read, the drive's garbage collection should be good enough. I did disable last access time, and the sudden motion sensor.

My rotational hard drives have been extremely good to me over the years. But, with this SSD, I decided to actually enable Time Machine for the first time, just in case this thing goes the way of many a USB flash drive.

Friday, June 6, 2014

Happiness is being able to boot into my ArchLinux install again

Without exaggerating too much, let's just say I was very happy when I was able to boot into good old ArchLinux on my old laptop after upgrading to Windows 8.1. It does, of course, make things much, much easier that this is a BIOS based system without all the convoluted stuff.

The process was actually rather trivial … I booted using the current ArchLinux install image, mounted my Windows C: drive on /media/win, mounted my ArchLinux partition on /mnt, used arch-chroot to chroot to /mnt. Then, following the instructions on ArchWiki, installed GRUB to the partition containing my ArchLinux installation, copied the partition's boot sector as a plain file to C:\, rebooted, and used bcdedit to create a new boot menu entry, again, following the instructions in ArchWiki.

The whole thing took no more than five minutes.

Because I already had a working GRUB installation from when I was using it to dual boot between Windows XP and ArchLinux, running grub-mkconfig -o /boot/grub/grub.cfg again resulted in four entries in my GRUB boot menu, three of which were now unnecessary because I am using Windows to select which OS to boot into (I did not want to take any chances with using GRUB to dual boot between Windows and Arch.)

One inconvenience when you use this method is the fact that the choice of OS happens after Windows 8.1 has essentially finished loading. Choosing ArchLinux then results in a soft-reset whereby Windows starts the ArchLinux boot process. Compared to having GRUB manage the process, this ends up taking an extra 40 seconds for the Arch login prompt to appear.

Well, there is, apparently an App for that (hat tip APC magazine).

I am not going to install it, but FYI.

Monday, June 2, 2014

Sending faxes from Windows 8.1 Pro

One of the reasons I like my ancient laptop is the fact that it actually has a built-in modem. Now, it's been many, many years since I had get an internet connection via dial-up, but having a modem has come in very handy. You know, when you just have to submit those sensitive documents by fax while you are on the road, you don't really want to use a random fancy schmancy all-in-one with a built-in hard drive ;-)

Today was one of those days. Unfortunately, it was also a frustrating one.

See, I never actually had to think about print-to-fax when I was using Windows XP. But, this was the first time doing that after updgrading to Windows 8.1 Pro. I went through the clickety-clack process, and finally clicked "Send". The progress monitor came up, told me it was dialing, I heard the dial tone, the digits, the handshake, and as expected, the line went quiet, so I just walked away, and started attending to other things.

When I came back, I realized my computer was repeatedly dialing, getting to the handshake stage, and then killing the sending process.

Ouch!

Some searching revealed the thread "8.1 broke my Windows Fax."

I was skeptical at first. Especially since the accepted solution seemed to recommend replacing a system DLL with a file to be downloaded from some web site. I have no reason to doubt Hevanet's integrity, but the thought just did not sit well with me.

Luckily, I also had access to a Netbook style computer on which I had installed Windows 7 64 bit. I made a backup copy of the FXST30.dll file on my Windows 8 computer, took ownership of the file, and then copied over the old version from the Windows 7 computer.

A simple test fax confirmed that everything was hunky dory.

So, at the very least, the diagnosis of the problem seems to be accurate. There is something wrong with the version of the FXST30.dll file that comes with Windows 8.1. For reference, here are the properties of each file:

And, here are the checksums of the files:

$ for %f in (fxst30*) do perl -MDigest::SHA1=sha1_hex -MPath::Class -wE "say join('= ', $ARGV[0], sha1_hex(scalar file($ARGV[0])->slurp(iomode => '<:raw')))" %f

FXST30.dll= 95231dea5c0e770883c7b06fb58fa681719afdd5

FXST30.dll.windows8= 2222761425d7e38ff1a558259f6535b94e12288e

The thread I mentioned above is from more than a year ago. I would have guess the problem would have been fixed by now. Disappointing.

Saturday, May 31, 2014

Perl 5.20.0 brings a "better" PRNG to Windows

One of the changes in Perl 5.20.0 is that rand now uses a consistent random number generator:

Previously perl would use a platform specific random number generator, varying between the libc rand(), random() or drand48().

This meant that the quality of perl's random numbers would vary from platform to platform, from the 15 bits of rand() on Windows to 48-bits on POSIX platforms such as Linux with drand48().

Perl now uses its own internal drand48() implementation on all platforms. This does not make perl's rand cryptographically secure. [perl #115928]

In addition, this doesn't necessarily mean the builtin pseudo-random number generator is good for Monte-Carlo, Bootstrapping, or other statistical pursuits.

$ type rr.pl
#!/usr/bin/env perl

use strict; use warnings;
use feature 'say';

my %hist;
$hist{rand(1)} += 1 for 1 .. 1_000_000;

say scalar keys %hist;

$ perl rr.pl
1000000

Let's test some coin-tossing. With a decent random number generator, if you take a bunch of random samples of coin tosses, you would expect 5% of them to incorrectly reject the null for a significance level of 5%. By default, the code below takes 1,000 samples of 100 coin tosses, does a hypothesis test for probability of one face being equal to 0.5, and repeats the process 30 times. If were tossing a fair coin, we would expect each run to produce 5% false positives. First, the code, then the output:

#!/usr/bin/env perl

use utf8;
use 5.010;
use strict;
use warnings;
use feature 'say';

use Statistics::Distributions;

run(@ARGV);

sub run {
    my $sample_size = shift // 100;
    my $number_of_samples = shift // 1_000;
    my $π = shift // 0.5;
    my $number_of_runs = shift // 30;

    my $ẕ = Statistics::Distributions::udistr(0.025);
    my $σ = sqrt(($π * (1 - $π))/$sample_size);

    my @r;
    for (1 .. $number_of_runs) {
        my @z = (0) x $number_of_samples;
        for my $i (1 .. $number_of_samples) {
            for (1 .. $sample_size) {
                if ($π < rand) {
                    $z[$i] += 1;
                }
            }
            $z[$i] /= $sample_size;
            $z[$i] = ($z[$i] - $π) / $σ;
        }
        push @r, +(scalar grep abs($_) > $ẕ, @z) / @z;
    }
    say sprintf('%.04f', $_) for @r;
}

Output:

0.0519
0.0689
0.0480
0.0480
0.0549
0.0559
0.0619
0.0549
0.0579
0.0649
0.0529
0.0679
0.0440
0.0579
0.0509
0.0519
0.0589
0.0470
0.0490
0.0509
0.0629
0.0539
0.0679
0.0599
0.0430
0.0709
0.0599
0.0519
0.0430
0.0470

Rejection probabilities seem a little higher than 5%.