Monday, August 18, 2014

File::Which comes with its own 'multiwhich'

I uploaded App::multiwhich, based on a script I have been using for many years, in observance of #CPANDAY. While honestly thought it was a cute, useful little utility which I could improve by fixing edge-cases, I just realized that there is no reason for you to use it ;-)

File::Which comes with its command line utility called pwhich. For example:

$ pwhich -a perl vim doesnotexist
/Users/auser/perl/5.20.0/bin/perl
/opt/local/bin/perl
/usr/bin/perl
/opt/local/bin/vim
/usr/bin/vim
pwhich: no doesnotexist in PATH

The module definitely predates my foray into Perl. I cannot fathom how I missed the pwhich utility.

So, don't use App::multiwhich. Use pwhich. I'll make the requisite changes in the module distribution.

Saturday, August 16, 2014

Just uploaded App::multiwhich in observance of CPAN day

multiwhich is a small utility which looks for an executable in all directories in your $PATH. For example, on my MacBook Pro, it gives me:

$ multiwhich perl vim doesnotexist
---
- perl:
  - /Users/auser/perl/5.20.0/bin/perl
  - /opt/local/bin/perl
  - /usr/bin/perl
- vim:
  - /opt/local/bin/vim
  - /usr/bin/vim
- doesnotexist: []

It should soon be available on CPAN. The repo is on GitHub.

For more, see CPAN day, or follow @cpan_new, or search #cpanday.

Monday, August 11, 2014

Are you a code monkey?

Despite my appreciation of Stackoverflow, I can never see myself agreeing with Jeff Atwood on anything substantial. A post of his that is still featured in my regular nightmares is the one titled We Are Typists First, Programmers Second.

He says:

When you're a fast, efficient typist, you spend less time between thinking that thought and expressing it in code.

That might matter, but, frankly, what I encounter regularly are people who really ought to think hard and long about what they are about to type, and, then, when it is time to type that, think another half an hour before touching a keyboard.

Because, otherwise, they end up generating a 500,000,000 line CSV file from a database by just inserting commas between text fields, as in, VAR1 || ',' || VAR2 || ',' ….

When the source data contains single character flag fields whose "specification" dates back decades to some COBOL thing where they ran out letters and numbers a long time ago, you sometimes get, say, 100 rows with more commas than expected.

But, the code monkey don't care!

He typed that SELECT fast. Put the dump on a server, remembering to use SFTP (of course, self-signed certificate), and got back to his typing.

After all, he is a coder. He understands things no one around him understands.

I wish all you coders, typists, brogrammers would just go on a cruise to the Bermuda Triangle.

Don't be a code monkey!

Typing fast is the LEAST important component of programming.

Think.

He goes on to gratuitously attack Perl programmers:

Don't just type random gibberish as fast as you can on the screen, unless you're a Perl programmer.

Perl has Text::CSV_XS and Text::xSV. Any programmer who is aware of these modules would not waste others' time with nonsense.

Wednesday, August 6, 2014

Don't declare a dependency on Crypt::SSLeay (or IO::Socket::SSL either)

For background, see "Does your code really depend on Crypt::SSLeay?", "Do you need Crypt::SSLeay?", and RT #95663.

Basically, you if you are using LWP, and want to communicate with sites over SSL/TLS, you should declare a dependency on LWP::Protocol::https. That will pull in whatever you need to be able to communicate with web sites over SSL/TLS. You shouldn't have an explicit dependency on the underlying plumbing being used, unless there is a specific, well-thought out reason for that.

Currently, IO::Socket::SSL is much more complete than Crypt::SSLeay. Upgrading to a recent release of LWP and Crypt::SSLeay ensures that IO::Socket::SSL is used instead of Crypt::SSLeay unless you have specifically overridden the choice of plumbing, without you having to lift a finger.

So, regarding RT #95663, I would recommend not changing anything because users who just want to be able to communicate with web sites over SSL/TLS do end up having the better module used anyway. But, if they have build systems, declared dependencies on Crypt::SSLeay or some component therein, their builds don't break, especially given the improvements in Crypt::SSLeay's Makefile.PL.

Friday, July 18, 2014

In OCaml, how can I get a list of directories in my PATH?

I have been getting my toes wet with OCaml, using Real World OCaml. The book content is freely available on their web site, but I have bought the ebook from O'Reilly, and I thoroughly recommend it.

I have to admit, it hasn't been a quick task. I find that I am too used to the luxury of documentation at my fingertips using perldoc. Reading the book, doing the exercises does breed familiarity, but I am far away from being able to write an image gallery generator (which was my first ever Perl program).

I like the implicit type checking. In fact, that is an idea that appears in Perl as well (not as strict, but, still). For example, let f x = x + 1 defines a function that takes an integer, and returns the following integer. Yes, OCaml does distinguish between types of numbers. No, I haven't yet gotten used to it.

Now, f 5 will return 6. But, f 0.5 will result in This expression has type float but an expression was expected of type int.

In Perl, if you defined my $f = sub { $_[0] + 1 } and invoked it with a string argument, the interpreter would notice it (and even tell you about it if you ask nicely):

$ perl -w -e 'my $f = sub { $_[0] + 1 }; $f->("test")'
Argument "test" isn't numeric in addition (+) at -e line 1.

Strict type checking is useful. The OCaml kind is not the same as the C or Java sort of type checking. Here is an example that had me scratching my head for a while until I studied it further.

Real World OCaml has the following example:

# let path = "/usr/bin:/usr/local/bin:/bin:/sbin";;
val path : string = "/usr/bin:/usr/local/bin:/bin:/sbin"
# String.split ~on:':' path
|> List.dedup ~compare:String.compare
|> List.iter ~f:print_endline
;;
/bin
/sbin
/usr/bin
/usr/local/bin
- : unit = ()

Now, if you squint enough, this is kind of like:

my $path = "/usr/bin:/usr/local/bin:/bin:/sbin";;
say for List::AllUtils::uniq(split /:/, $path);

although I do like the syntactic sugar of |>.

In Perl, I would have just used $ENV{PATH}. My thoughts immediately went to how to do that in OCaml. Luckily, utop has code-completion, so it didn't take me a long time to figure out I could use Sys.getenv to get the value of my $PATH.

utop # Sys.getenv("PATH");;
- : string option =
Some
 "/Users/xyz/.opam/system/bin:/Users/xyz/.opam/system/bin: \
/Users/xyz/bin:/Users/xyz/perl/5.20.0/bin: \
/opt/local/bin:/opt/local/sbin:/usr/bin: \
/bin:/usr/sbin:/sbin:/usr/local/bin: \
/opt/X11/bin:/usr/local/MacGPG2/bin"

Hmmmm … Why is ~/.opam/system/bin in there twice?

Anyway, first, note that naively replacing path with (Sys.getenv "PATH") does not "work":

utop # String.split ~on:':' (Sys.getenv "PATH");;
Error: This expression has type string option
but an expression was expected of type string

Note the Some there. Sys.getenv takes a string and possibly returns a string. In other words, its type is string -> string option = <fun>

We know why: The environment variable may or may not be defined. In Perl, we would get an undefined value in that case. Perl can then convert that value to 0 or "" as needed. In OCaml, you need to explicitly account for that possibility.

Observe the following:

utop # match Sys.getenv "PATH" with
| None -> ""
| Some x -> x
;;
- : string = 
  "/Users/xyz/.opam/ ...

Here, we decided that if Sys.getenv "PATH" does not return a value, we will consider our path to be empty. The type of the return value changed from string option to simply string, and it is no longer prefixed with Some.

If you are doing something real rather than working on small modifications to textbook exercises, you might not want to proceed if the path is not defined. But, for my immediate purpose of actually using the value of my path rather than manually typing in a string, the following was sufficient:

utop # String.split ~on:':'
(match Sys.getenv "PATH" with | None -> "" | Some x -> x)
|> List.dedup ~compare:String.compare
|> List.iter ~f:print_endline
;;

Phewww!

Pattern matching like this is actually quite valuable.

There is still a gaping hole in this construction. What if you type Sys.getenv "PTHA"? You'll end up propagating an empty path throughout a program. In Perl, I tried to avoid that kind of problem by using Const::Fast. As a simple example, I might have:

use Const::Fast;

const my %VAR => (
    HOME => 'HOME',
    PATH => 'PATH',
    TMP  => 'TMP',
);

say $ENV{ $VAR{PTHA} };

which immediately gives me Attempt to access disallowed key 'PTHA' in a restricted hash &hellip. It also serves as a documentation of which environment variables my script actually depends on.

This idea corresponds to the principle of making illegal states unrepresentable which fellow Cornellian Yaron Minsky explains in his guest lecture at CMU.

PS: Why OCaml? Well, for one, I loved Higher Order Perl, and decided I should add another camel to my herd ;-)

Friday, July 11, 2014

I wonder what SAS is doing to cause this activity pattern

Not doing anything special, just a little extraction, reshaping, and summarizing.

Four more instances smooth things out

Don't you hate being IO bound?

That's a little better

Thursday, July 10, 2014

Fun with image transformations in Perl

I came across American Gothic in the palette of Mona Lisa: Rearrange the pixels thanks to Reddit.

Basically, the task is to:

… create an algorithm that makes the most accurate looking copy of the Source by only using the pixels in the Palette. Each pixel in the Palette must be used exactly once in a unique position in this copy. The copy must have the same dimensions as the Source.

There are some interesting animations on that page. But, honestly, I couldn't bring myself to read through a lot of Java and Python. I immediately wondered how far I could get by using a rather naive method.

What I had in mind was this: First, given a source image with no more pixels than the palette image, sort the pixels in both images into a list of (color, coordinate) tuples by the 24-bit RGB value of the pixel color. Second, extract from the source pixel list just the coordinates, and from the palette pixel list just the colors. Finally, go through source image coordinates, setting the color of each pixel to the color in list of palette image pixel colors.

Yeah, OK, so I am not matching on the basis of color perception in radiation damaged mosquito eyes or some such scientific principle, but, still, the world is full of awful stuff these days, and I wanted to entertain myself for a few minutes.

I reached for an old favorite, GD::Image.

Here is the heart of the script:

sub get_pixels_by_color {
    my $gd = shift;
    my $dim = shift;
    return [
        sort { $a->[$COLOR] <=> $b->[$COLOR] }
        map {
            my $y = $_;
            map {
                [
                  pack_rgb( $gd->rgb( $gd->getPixel($_, $y) ) ),
                  [$_, $y]
                ];
            } 0 .. $dim->{width}
        } 0 .. $dim->{height}
    ];
}

pack_rgb is simple: sub pack_rgb { $_[0] << 16 | $_[1] << 8 | $_[2] }.

I only want the coordinates from the source image:

sub get_source_pixels { [ map $_->[$COORDINATES], @{ $_[0] } ] }

And, from the palette image, I just want the colors:

sub get_palette_colors { [ map sprintf('%08X', $_->[$COLOR]), @{ $_[0] } ] }

That sprintf isn't really necessary at all, but it does help if you have to print stuff to figure out a silly error. 00ef0812 is much more meaningful than 15665170.

The following function does the mapping:

sub recreate_source_image_from_palette {
    my $dim = shift;
    my $source_pixels = shift;
    my $palette_colors = shift;
    my $callback = shift;
    my $frame = 0;

    my %colors;
    $colors{$_} = undef for @$palette_colors;

    my $gd = GD::Image->new($dim->{width}, $dim->{height});
    for my $x (keys %colors) {
          $colors{$x} = $gd->colorAllocate(unpack_rgb($x));
    }

    my $period = sprintf '%.0f', @$source_pixels / $ANIMATION_FRAMES;
    for my $i (0 .. $#$source_pixels) {
        $gd->setPixel(
            @{ $source_pixels->[$i] },
            $colors{ $palette_colors->[$i] }
        );
        if ($i % $period == 0) {
            $callback->($frame, \ $gd->png);
            $frame += 1;
        }
    }
    return ($frame, \ $gd->png);
}

First, we create a hash %colors to store the color indexes we need to create via GD::Image->colorAllocate.

Then, it is just a matter of looping through each source coordinate, and setting the color at that pixel to the corresponding one from the palette image. I wanted to generate short animations of the transformations as well, so I also pass this routine a save callback.

Again, I couldn't be bothered with fancy math(!) trying to ensure the completed bitmap was also saved, so, to make up for my laziness, this function returns the final frame.

The results are not spectacular, but not completely disgusting either. Generating the 101 frames takes about 5 seconds for each image pair on my old MacBook Pro.

Here are some examples thanks to ffmpeg:

American Gothic using colors from Mona Lisa

Mona Lisa using colors from Starry Night

Starry Night using colors from Marbles

Mona Lisa using colors from Marbles

And, for reference, is Marbles:

Now, if only putting together this blog post had been as easy as writing the code using Vim, and generating the images using Perl ;-)

The complete script is on GitHub.