Friday, December 16, 2011

How can I print dates in multiple locales in Perl?

Suppose you need to print the same date formatted appropriate for multiple locales. Perl's DateTime module can be mixed in with DateTime::Locale to make that easy. Not, necessarily fast or lightweight, mind you, but easy.

The following Perl script provides a demonstration:

#!/usr/bin/env perl

use strict; use warnings;

use DateTime;
use DateTime::Locale;

binmode STDOUT, ':utf8';

my $dt = DateTime->today;
print_date( $dt );

for my $locale ( qw(ar da de en_GB es fr ru tr) ) {
    $dt->set_locale( $locale );
    print_date( $dt );
}

sub print_date {
    my ($dt) = @_;
    my $locale = $dt->locale;

    printf(
        "In %s: %s\n", $locale->name,
        $dt->format_cldr($locale->date_format_full)
    );
}

And, here's the output:

In English United States: Friday, December 16, 2011
In Arabic: الجمعة، 16 ديسمبر، 2011
In Danish: fredag den 16. december 2011
In German: Freitag, 16. Dezember 2011
In English United Kingdom: Friday, 16 December 2011
In Spanish: viernes 16 de diciembre de 2011
In French: vendredi 16 décembre 2011
In Russian: пятница, 16 декабря 2011 г.
In Turkish: 16 Aralık 2011 Cuma

Of course, if you don't have the required fonts installed, certain characters may be displayed as unknown.

Thursday, December 8, 2011

How can I identify groups of people from interaction data?

This post is based on a question I answered on Stackoverflow. Even though mine is the accepted answer, I am not very happy with it, and I have this feeling that there is a better way. I can envision someone who understands graph theory and SQL well being able to put together a few magical joins to solve this problem ;-).

So, you have data on a bunch of people interacting through various ways. For our purposes, we care only about the person initiating the interaction and the recipient. We want to find all sets of people such that everyone in the group interacted with everyone else in the group, with the additional constraint that there be at least three people in a group.

The interactions naturally form a directed graph. When I first read the question (more information was provided by the OP in iterations along the way), I thought the groups would be just the strongly connected components of the graph. However, being in a strongly connected component is a weaker condition than being in a set of people all of whom interacted with each other: Say there are three individuals, A, B, and C. Suppose we have A → B, B → A, B → C, and C → B, where X → Y means X initiated an interaction with Y. Then, there is a path from A to C and a path from C to A, but A and C never interacted and this set of people does not satisfy our definition of a group.

The only thing I was able to come up with was to reduce the set of possible groups by first looking at the strongly connected components of the graph of interactions, and then checking to see if they satisfied our definition of a group. If a such component did not satisfy the requirements, I then considered all subsets of side N - 1 where N is the number of people in the component. If I could not find any groups in any of those sets, then my solution went down to subsets of size N - 2 etc …

This method seems to work in the test case the OP provided and I am pretty sure it is correct, but, as I said at the outset, I strongly suspect there is a better way.

Here is the code, including an embedded test case provided with the OP:

#!/usr/bin/env perl

use strict;
use warnings;

use Graph;
use Algorithm::ChooseSubsets;

use constant MIN_SIZE => 3;

my $interactions = Graph->new(
    directed => 1,
);

while (my $interaction = <DATA>) {
    last unless $interaction =~ /\S/;
    my ($from, $to) = split ' ', $interaction, 3;

    $interactions->add_edge($from, $to);
}

my @groups = map {
    is_group($interactions, $_) ? $_
                                : check_subsets($interactions, $_)
} grep @$_ >= MIN_SIZE, $interactions->strongly_connected_components;


print "Groups: \n";
print "[ @$_ ]\n" for @groups;

sub check_subsets {
    my ($graph, $candidate) = @_;

    my @groups;
    for my $size (reverse MIN_SIZE .. (@$candidate - 1)) {
        my $subsets = Algorithm::ChooseSubsets->new(
            set => $candidate,
            size => $size,
        );

        my $groups_found;
        while (my $subset = $subsets->next) {
            if (is_group($interactions, $subset)) {
                ++$groups_found;
                push @groups, $subset;
            }
        }
        last if $groups_found;
    }

    return @groups;
}

sub is_group {
    my ($graph, $candidate) = @_;

    for my $member (@$candidate) {
        for my $other (@$candidate) {
            next if $member eq $other;
            return unless $graph->has_edge($member, $other);
            return unless $graph->has_edge($other, $member);
        }
    }

    return 1;
}

__DATA__
a   c   Dec  2 06:40:23 IST 2000    comment
f   g   Dec  2 06:40:23 IST 2009    like
c   a   Dec  2 06:40:23 IST 2009    like
g   h   Dec  2 06:40:23 IST 2008    like
a   d   Dec  2 06:40:23 IST 2008    like
r   t   Dec  2 06:40:23 IST 2007    share
d   a   Dec  2 06:40:23 IST 2007    share
t   u   Dec  2 06:40:23 IST 2006    follow
a   e   Dec  2 06:40:23 IST 2006    follow
k   l   Dec  2 06:40:23 IST 2009    like
e   a   Dec  2 06:40:23 IST 2009    like
j   k   Dec  2 06:40:23 IST 2003    like
c   d   Dec  2 06:40:23 IST 2003    like
l   j   Dec  2 06:40:23 IST 2002    like
d   c   Dec  2 06:40:23 IST 2002    like
m   n   Dec  2 06:40:23 IST 2005    like
c   e   Dec  2 06:40:23 IST 2005    like
m   l   Dec  2 06:40:23 IST 2011    like
e   c   Dec  2 06:40:23 IST 2011    like
h   j   Dec  2 06:40:23 IST 2010    like
d   e   Dec  2 06:40:23 IST 2010    like
o   p   Dec  2 06:40:23 IST 2009    like
e   d   Dec  2 06:40:23 IST 2009    like
p   q   Dec  2 06:40:23 IST 2000    comment
q   p   Dec  2 06:40:23 IST 2009    like
a   p   Dec  2 06:40:23 IST 2008    like
p   a   Dec  2 06:40:23 IST 2007    share
l   p   Dec  2 06:40:23 IST 2003    like
j   l   Dec  2 06:40:23 IST 2002    like
t   r   Dec  2 06:40:23 IST 2000    comment
r   h   Dec  2 06:40:23 IST 2009    like
j   f   Dec  2 06:40:23 IST 2008    like
g   d   Dec  2 06:40:23 IST 2007    share
w   q   Dec  2 06:40:23 IST 2003    like
o   y   Dec  2 06:40:23 IST 2002    like
x   y   Dec  2 06:40:23 IST 2000    comment
y   x   Dec  2 06:40:23 IST 2009    like
x   z   Dec  2 06:40:23 IST 2008    like
z   x   Dec  2 06:40:23 IST 2007    share
y   z   Dec  2 06:40:23 IST 2003    like
z   y   Dec  2 06:40:23 IST 2002    like

The output should be:

Groups:
[ x z y ]
[ e c a d ]

Friday, December 2, 2011

What Canon giveth, Canon taketh away

All my cameras since my first digital camera, Fuji's Finepix 4900z, have been Canons.

I have been a big fan of their Sn IS series. These were cameras you could take anywhere. Mind you, they were not small or featherweight by any means, but to get anywhere near the zoom range they offered using a DSLR would probably have meant lugging around about 20 lbs in an ungainly bag. They were not dirt cheap, but if you waited 7-8 months after their introduction, you could get them for about $350 online which, I think, is much cheaper than a good, fast zoom lens for your favorite DSLR. That, of course, mean that I would indeed take the camera everywhere.

The best accessories you could buy for your Canon Sn IS were offered by Lensmate. My favorite was the metal lens barrel which allowed you to attach a bunch of filters but also protected the lens. It basically turned that camera into a hammer ;-) Mine took some beating:

[ Canon S5IS with lens barrel from Lensmate ]

I was even able to attach the MR-14EX ring flash to it using some creativity and Velcro:

[ Canon S5IS with MR14EX ring flash

Why all the past tense? Well, I was parted with my S5IS in earlier year due to total stupidity on my part and I was not going to see it again for at least three months. I could not be without a camera for that long, so I broke down and bought the SX-30IS which was the successor to the S5IS at the time.

I looked at other cameras with long zooms as well, but the fact that they did not have the variangle LCD screen that I knew and loved which enables one to actually compose shots when shooting over crowds or around the edge of a rock on a cliff, and the availability of CHDK kept me in Canon's camp.

The picture quality and zoom on the SX30 are great. I cannot fathom why they would have taken away the night portrait mode (it is a very useful shortcut even in full daylight). It is great to be able to zoom in and out and take as many photos as you want while you record video (a feature all my Sn IS cameras had), but in this camera the noise from the lens moving is very, and I mean, very audible, and annoying.

But, the most important and negative change to me was the fact that you could no longer attach a nice metal barrel around that lens. Especially since it extends so far when you go into full zoom. So, if you are climbing some rocks, you better make sure then camera is off. Otherwise, you might break the lens at the worst possible moment—instead of just denting or scratching a protective housing. So, climb a little, turn camera on, wait, take photo, turn camera off, climb a little more … is annoying. Now, of course, a lens barrel that is able to accommodate the lens on this camera when it is fully zoomed in would be much longer than the one I had on the S5IS, which would make it a much less convenient add-on.

Also, gone is the ability to use AA batteries. You get one of those proprietary batteries. I bought a second one just in case, but so far, one battery has been able to keep up with two days of video and photo shooting (enough to fill two 8GB SD cards at a time). I think it helps that I use the viewfinder to compose and take my shots 99% of the time.

I thought, this way, I'll have a rugged, beat up camera which I can beat up some more, take to the beach etc. I was reunited with my S5IS in Turkey in August, but the reunion was short and ended abruptly when my girlfriend accidentally dropped it on the rocks, and it rolled on a ragged surface after a heartbreaking sound and then fell in to two feet of salty Mediterranean water where it had to sit about three minutes before I could take it out. All my efforts that worked in rescuing coffee soaked computers in the past failed. I don't know what killed it, but I strongly suspect it was the salt water rather than the fall.

I just saw the new SX40 in the store.

The negatives I mentioned above remain. I don't know what Canon can do about the lens barrel, but I do wish they would work on that noise from the lens itself. Oh, and I would appreciate it if you have any suggestions for how to use the MR-14EX with the SX30 IS.

See Steve's review of the SX40.

Thursday, December 1, 2011

A 44MB hard drive and a phone on a wooden table

Here is one of my prized possessions, a Miniscribe Model 6053 MFM hard drive I saved from a dumpster at Cornell.

It is sitting next to my Nokia E5 phone with an 8GB Micro SD card inside.

Understanding that the distance we have traveled in the last 31 years was a function of the economic structure and not just engineering prowess will help you understand why while Russians made great engineers and scientists, the market was not flooded with cheap and useful Soviet computers once the wall came down.

It will also a give you a hint as to why there could never have been a second coming of Steve Jobs without some early decisions by IBM, and Microsoft's subsequent success.

Wednesday, November 30, 2011

Laptop attachment syndrome

Since 1996 or 1997, my main work machine has been a laptop.

The one and only desktop machine I ever bought new was from Midwest Micro (I was one of the few who actually bought a machine with a 486DX4-75). Since then, I have rescued some from being discarded and built a collection of a few Frankenmachines, but I have done most of my work on laptops for the last 15 years.

Now, I have a confession to make.

My main laptop is more than five years old.

It was definitely not top of the line when I bought it.

True, in the intervening years, I put more memory in (2GB instead of 512MB), swapped the hard drive for a larger one (400GB instead of 100GB), and got an even chunkier battery (4.5 hours instead of 2 hours).

It's no beauty, and definitely not a speed demon, and would not win any beauty contests.

But, it is set up the way I want it. My text editors, various programming language distributions, video editing tools, graphics tools, and LibreOffice work just the way I like them.

I can dual boot between ArchLinux and Windows XP. Wireless and graphics work well under Linux. I can roam between access points, USB devices automount correctly. There does not seem to be any more pain left to discover (oh, I forgot, I dare not hibernate the machine under ArchLinux, but suspend works fine).

I have it set up so that Cygwin Perl, ActiveState Perl, and Strawberry Perl all stay out of each other's way. Ditto for the umpteen different versions of gcc I have on that machine.

The keyboard has good travel. I don't have to look at it to type (never been much of a touch typist I am afraid, just 2-4 finger pecking. That's still faster than I can think). More importantly, it has all the keys I need (take that MacBook) and even a microphone input jack (take another one, MacBook!). The touchpad has the right size and right amount of sensitivity. I would prefer a 1400x900 resolution, but 1280x800 works.

It has a built in optical drive. More importantly, it has a built in modem which means I can fax from anywhere which came in extremely handy over the last few months.

While I also use a recent vintage MacBook Pro and a couple of Windows 7 64-bit laptops for various things, I always go back to my trusty Lenovo 3000 N100.

Did I mention, I don't play any other games than Angry Birds for Chrome, Dogfight 2, and Doom?

I am beginning to think there is something wrong with me. After all, I must be able to find a laptop with a decent screen (no shiny, reflective panels for me), decent keyboard, decent touchpad that is also fast, has a huge hard drive, is quiet, does not generate much heat etc.

But, every time I get excited, I find some flaw with it.

I think I am suffering from laptop attachment syndrome.

PS: I will not touch Dell, HP, and Gateway laptops. And Macs will remain too annoying until Apple figure out that Page Up and Page Down keys are indeed necessary.

Saturday, November 12, 2011

I failed to reverse a linked list

A couple of weeks back, I noticed the ad for the Silicon Alley Talent Fair on Stackoverflow.com. Given that I was going to be just across the river from Manhattan this weekend, I figured I would stop by, check out the event and some of the companies I had not heard of before, and drop off some resumes just in case.

First off, let me point out that the event was well organized. I arrived about an hour after the doors had been opened, and noticed that there might have been a line out the door at some point, but my processing went smoothly. My only major complaint is the fact that the organizers used i.e. pretty much everywhere they meant e.g. and listed personal hygiene as the first thing to bring. I am a firm believer in the fact that that sort of thing should not need to be explicitly mentioned, but then, what do I know?

Inside, I got a chance to find out more about some of the companies whose web sites I had checked out before (and some companies whose Career links 404'ed on my phone ;-).

At one point, I got lured to the ZocDoc stand for what I was told was a "fun computer science quiz". I am an economist by training, and a self-taught programmer, so I find these kinds of things to be fun tests of whether I can think on my feet. So, I signed up.

Half an hour later, the thingamajiggie they gave me starts flashing, I run over to their booth, I am met with an interviewer, and he wants me to write C code to reverse a linked list in place on a 3x5 piece of paper. Panic sets in. Not panic that I don't know this, but panic that in the middle of all the commotion and noise and distractions, I won't be thinking clearly. A little sketch clarifies that you need three pointers to keep track of what you are doing.

[ el1 ] → [ el2 ] → [ el3 ] → …

needs to be turned in to:

[ el1 ] ← [ el2 ] ← [ el3 ] ← …

We agree that's the right way to do it, but I just cannot write the loop condition. My hand writing is horrible and large and I can't fit anything on this little sheet. My *s look like blobs. I run out of my five minutes, there are other people waiting etc, so I decide not to waste more of their time. The interviewer takes my resume that was in a pile of other papers, scribbles some stuff on it and I am left with the sinking feeling of arrggghh! Why did I volunteer for this?!

In my defense, I had thought it was going to be a fun computer science quiz.

In the scheme of things, this was not as bad as not being able to solve FizzBuzz, but now I understand how those people feel.

And, while ZocDoc provides a cool, new, and exciting service, I was not even interested in working with them.

Monday, June 27, 2011

Is this why Perl sucks?

I am trying to shift gears to re-focus on a project where I have been pretty sluggish and seem to have hit a some kind of block. Checking Perl blogs, I noticed there seems to be a lot of people worried about name changes and stuff. Apparently, there are some people who think Perl sucks. Based on what they know about Perl 4, they have dismissed Perl 5, and are holding out for Perl 6. If only Perl 6 were not named Perl, those people would realize how wonderful Perl 5 is. Or something like that.

I am sure there is some truth to that. However, I am reasonably certain that people who loudly proclaim Perl sucks! aren't interested in Perl 6 either. Personally, Perl made sense to me for the first time with 5.8. I had used Z80 assembly, Pascal, Fortran, APL (not that I remember any of it), C, forays into C++, and Java (in that order) along with SAS, S-Plus etc prior to that. It has been pointed out to me that Perl became my favorite language at a time after Perl's prominence had peaked. I was even asked Python was available when you switched to Perl. How can you justify that?

I don't know. Perl just made sense to me, especially since I already had at my disposal CPAN which removed so much drudgery out of writing programs to do what I wanted.

That does not mean I decided to remain ignorant of Python and later Ruby. I am conversant in both, but have never felt at home the way I do when immersed in Perl.

It does not take a genius to realize that Perl 6 is different than Perl 5. Maybe I am a simpleton, but I just don't see why Perl 6 should have any bearing on what I do using Perl. Clearly, Perl 6 is related to Perl 5. This is probably a bad analogy, but did anyone feel compelled to change C to something else because of C++. Did the existence of C++ stop anyone from writing C where it was appropriate?

On a lighter note, I decided to search Google for Perl sucks. The second hit is a 2008 blog post by someone called Python Guy.

He (I am going to assume Python Guy is a he) compares the following Perl code:

my @subgroup = (scalar(@{$group}) > $maxNumInSubgroup) ?
    @{$group}[0..($maxNumInSubgroup-1)] :
    @{$group};

to make the case that Perl sucks compared to Python.

First, in case it is not obvious to you, I'll point out you should not use @{$group}[0 .. ($maxNumInSubgroup-1)] to get the subset because if $maxNumInSubgroup is larger than the size of the array referenced by $group, using the array slice would result in allocation of extra undef elements at the end of @subgroup.

Now, the Python solution is appropriately elegant:

subgroup = group[:maxNumInSubgroup]

Is this really why Perl sucks?

Anyone can see the original Perl code is ugly (not the least because of the Camel case variable names ;-)

However, if you do this kind of thing often in your code, it is straightforward to put it in a function:

use warnings; use strict;
use Data::Dumper;

my $group = [ qw(a b c d e) ];

my $subset1 = make_subset($group, 3);
my $subset2 = make_subset($subset1, 1);

print Dumper($subset1, $subset2);

sub make_subset {
    my ($ary, $max_elements) = @_;

    return [ @$ary ] if $max_elements > @$ary;
    return [ @$ary[0 .. --$max_elements] ];
}

Now, is that really so bad that someone would be motivated to shout from the rooftops, Perl sucks!