Wednesday, May 8, 2013

Merging two arrays of integers in C

Inspired by TDWTF's Cheaters Never Prosper story, I decided to put together an example in C.

Given that we'll be dealing with arrays of int and we'll have to pass size information around, first, I declare a struct that holds a pointer to the array and the current size:

struct int_array_with_size {
    size_t size; 
    int *ary; 
};

This is mostly to avoid longish argument lists. Besides, it's good to keep related information together.

Next, we need a way of creating such arrays:

static struct int_array_with_size *
new_int_array_with_size(
        size_t n_elem
        )
{
    struct int_array_with_size *p = malloc(sizeof(*p));
    if (!p) {
        return NULL;
    }

    p->ary = malloc(n_elem * (sizeof(*(p->ary))));
    if (!p->ary) {
        return NULL;
    }

    p->size = n_elem;
    return p;
}

We'd end up writing a lot of extra lines without a simple function to free such a struct:

static void free_int_array_with_size(
        struct int_array_with_size **p
        )
{
    free((*p)->ary);
    (*p)->ary = NULL;
    free(*p);
    *p = NULL;
    return;
}

A function to fill such an array with a bunch of integers is useful as well:

static void
fill_int_array(
        struct int_array_with_size *p,
        int start,
        int delta
        )
{
    size_t i;
    size_t k = p->size;

    for (i = 0; i < k; i += 1) {
        (p->ary)[i] = start;
        start += delta;
    }
    return;
}

And, of course, we might want to print the contents, too:

static void
print_int_array(
        const struct int_array_with_size *p
        )
{
    size_t i;
    size_t k = p->size;

    printf("[ ");
    for (i = 0; i < (k - 1); i += 1) {
        printf("%d, ", (p->ary)[i]);
    }
    printf("%d ]\n", (p->ary)[i]);

    return;
}

And, of course, we'll need min and max:

static size_t min_size_t(const size_t x, const size_t y)
{
    return (x < y) ? x : y;
}

static size_t max_size_t(const size_t x, const size_t y)
{
    return (x > y) ? x : y;
}

Sure, you can define a generic macro, but I am allergic to macros that evaluate expressions more than once.

The algorithm for merging is quite simple: Allocate a similar array that can hold all the elements of the arrays to be merged. Then, fill in the even indexed elements from the first array, and the odd-indexed elements from the second array, until you reach the end of the shorter one. Then, fill in the rest of the elements from the longer one:

struct int_array_with_size *
merge_int_arrays(
        const struct int_array_with_size *x,
        const struct int_array_with_size *y
        )
{
    int *end = NULL;
    int *dest = NULL;
    int *src = NULL;
    size_t limit = 0;
    size_t remainder_size = 0;

    struct int_array_with_size *r = malloc(sizeof(*r));
    if (!r) {
        return NULL;
    }

    r->size = x->size + y->size;
    r->ary = malloc(r->size * sizeof(*(r->ary)));

    if (!r->ary) {
        free(r);
        return NULL;
    }

    limit = min_size_t(x->size, y->size);
    src = x->ary;
    end = r->ary + 2 * limit;

    for (dest = r->ary; dest < end; dest += 2) {
        *dest = *src;
        src += 1;
    }

    src = y->ary;
    for (dest = (1 + r->ary); dest < end; dest += 2) {
        *dest = *src;
        src += 1;
    }

    dest = end;
    remainder_size = sizeof(*(r->ary)) *
        (max_size_t(x->size, y->size) - limit);

    src = (x->size > y->size) ? x->ary : y->ary;
    src += limit;
    memcpy(dest, src, remainder_size);

    return r;
}

Note the one bit of premature optimization where instead of having a single for loop that copies elements from both arrays in one go, I first fill in the even-indexed elements and then the odd-indexed elements based on the assumption that not jumping between memory locations is going to be faster. No, I didn't measure anything … just wrote it that way at first and later thought about why I did that ;-)

Finally, here's a short main as proof that code works for some values of "works":

int main(void) {
    struct int_array_with_size *r;
    struct int_array_with_size *x = new_int_array_with_size(3);
    struct int_array_with_size *y = new_int_array_with_size(5);

    if (!x || !y) {
        return EXIT_FAILURE;
    }

    fill_int_array(x, 1, 3);
    print_int_array(x);

    fill_int_array(y, 1000, 10);
    print_int_array(y);

    r = merge_int_arrays(x, y);
    if (!r) {
        return EXIT_FAILURE;
    }

    print_int_array(r);

    free_int_array_with_size(&r);
    free_int_array_with_size(&y);
    free_int_array_with_size(&x);

    return EXIT_SUCCESS;
}

Of course, what would we do without a Perl version?

#!/usr/bin/env perl

use 5.014;
use strict;
use warnings;

sub make_iterator {
    my $array = shift;
    return sub {
        state $i = 0;
        return $i <= $#$array ? $array->[$i++] : ();
    };
}

sub make_merger {
    my $iterators = shift;
    return sub { map $_->(), @$iterators };
}

my @iterators = map make_iterator($_), [1 .. 3], [5 .. 10];
my $merger = make_merger(\@iterators);

my @merged;
while (my @stripe = $merger->()) {
    push @merged, @stripe;
}

say "[ @merged ]";

OK, now I am being silly. That handles an arbitrary number of arrays as well as data types.

Do newlines really present a problem in AngularJS?

So, I decided to learn a little about AngularJS and I am watching Dan Wahlin's video tutorial. About one-third of the way through, I started getting annoyed by all the horizontal scrolling back and forth, so I formatted my HTML like so:

<li data-ng-repeat="cust in customers |
filter:name | 
orderBy:'city'">
{{cust.name}} - {{cust.city}}
</li>

Using Firefox 20.0.1 on Windows XP, I got the following in Firebug console:

Error: Expected ngRepeat in form of '_item_ in _collection_'
            but got 'cust in customers |
                filter:name | 
                orderBy:'city''.

Everything works if everything is on a single line:

<li data-ng-repeat="cust in customers | filter:name | orderBy:'city'">

What am I missing?

Monday, April 22, 2013

Randomly and non-destructively pick two items without replacement from an array

Suppose you want to randomly pick a pair of elements from an array without replacement. That is, given [qw(Cat Dog Fox Horse Goose Unicorn)], you want to pick ($first, $second) so that $first ne $second.

One might first try something like this:

#!/usr/bin/env perl

use 5.014;
use strict;
use warnings;

my $set = [qw(Cat Dog Fox Horse Goose Unicorn)];

say "@{ pick1($set) }" for 1 .. 5;

sub pick1 {
    my $set = shift;
    my $n = @$set;

    my $x = $set->[rand $n];
    my $y;

    do {
        $y = $set->[rand $n];
    } until ($x ne $y);

    return [$x, $y];
}

That is not horrible, but can we do better? For example, there is no need to live with the possibility of generating a duplicate when we know we do not want that. splice can help with that:

sub pick2 {
    my $set = shift;

    my @i = (0 .. $#$set);

    my $j = splice @i, rand @i, 1;
    my $k = splice @i, rand @i, 1;

    return [ @{$set}[$j, $k] ];
}

The array of indices of @$set, @i, is there because I do not want the act of picking these pairs to alter the underlying array.

What if $set refers to an array with a large number of elements? Creating @i may be really undesirable in such a case.

But, of course, if you are just picking two elements, you can replace the splice with simple arithmetic:

sub pick3 {
    my $set = shift;

    my $j = int rand(@$set);
    my $k = int rand(@$set - 1);

    $k += ($k >= $j);

    return [ @{$set}[$j, $k] ];
}

Say, @$set has 6 elements. The first invocation of rand picks an index out of {0, 1, 2, 3, 4, 5, 6} and the second invocation picks one from {0, 1, 2, 3, 4, 5}. Say the first index picked was 2. Then, the correct set out of which to pick the second index would be {0, 1, 3, 4, 5, 6}. The line:

    $k += ($k >= $j);

takes care of this mapping.

Of course, for serious work, you should use something like Math::Random::MT rather than whatever rand your runtime gives you.

Tuesday, April 16, 2013

Linode disappoints ...

It all started on April 12th with an email from Linode:

Linode administrators have discovered and blocked suspicious activity on the Linode network. This activity appears to have been a coordinated attempt to access the account of one of our customers. This customer is aware of this activity and we have determined its extent and impact. We have found no evidence that any Linode data of any other customer was accessed. In addition, we have found no evidence that payment information of any customer was accessed.

etc … etc

I did not think much of it. After all, if something is on the Internet, it is going to be attacked sooner or later. Heck, I get daily notifications of people trying to access sinan at yahoo dot com and my security logs are full of attempts to log on to my VPS using well known user names. Keep in mind: Don't allow password logins for SSH, and don't copy your private key to the server.

So, yesterday brings some rumblings, including this email from a person I trust:

Subject: Linode's been hacked

http://slashdot.org/firehose.pl?op=view&type=submission&id=2603667

Then, this. Then, this.

At the heart of the issue is a chat log and a directory listing of a web server. As I go to bed, there is no real information from Linode.

While a vulnerability that allows someone access to the public directory of a web server would hardly be news, the chat log also includes more serious claims.

This morning brought an update from Linode.

this group gained access to a web server, parts of our source code, and ultimately, our database. We have been working around the clock since discovering this vulnerability. Our investigation reveals that this group did not have access to any other component of the Linode infrastructure, including access to the host machines or any other server or service that runs our infrastructure.

Credit card numbers in our database are stored in encrypted format, using public and private key encryption. The private key is itself encrypted with passphrase encryption and the complex passphrase is not stored electronically. Along with the encrypted credit card, the last four digits are stored in clear text to assist in lookups and for display on things like your Account tab and payment receipt emails. We have no evidence decrypted credit card numbers were obtained.

At this point, I do not fully understand all the ramifications of this breach, but it is making me very uncomfortable.

I have been singing Linode's praises for such a long time that this was a real blow to my image of them. I am not sure how to proceed right now, but the praise is on hold for the moment.

Sunday, April 14, 2013

Linode doesn't disappoint ... again

Yesterday morning, I noticed Linode was now doubling the RAM for all their offerings.

I don't remember being this excited about a little bit more RAM in a long time. Not even being able to upgrade my MacBook Pro to 16 GB RAM made me as happy as the extra memory my Linode VPS was now going to have.

Couple that with the eight cores, the generous disk space and bandwidth allotment, even for the cheapest plans; couple that with the excellent self-service tools as well as the most responsive support personnel; couple that with the fact that they support Perl and YAPC — … Well, you have a clear winner.

Of course, I had to make a fundamental error in my rush to reboot with fresh memory. Apparently, during the earlier transition to systemd, I never enabled the dhcpcd service. I am not sure why the earlier reboot did not exhibit the symptom, but I have a feeling it has to do with the fact that the migration ended up giving my VPS' eth0 a new MAC address or something. Long story short: After the migration was over, I could not reach my server: Neither SSH nor HTTP worked.

After a brief panic, I logged on via Lish, did an ip addr, noticed that eth0 was not assigned, started dhcpcd, and all was good again.

Simplicity for the excitable idiot in me. I love it!

If you're looking for a no fuss virtual server where you are in control, I recommend Linode with no reservations whatsoever (and, if you want to use a link without my referral code, you can visit Linode.com using this one).

Update 2013/04/16: My praise is on hold for now, see Linode disappoints for more information.

Thursday, April 4, 2013

Linode, please don't stop messing with my uptime!

As I have mentioned before, I am a happy Linode customer. I mean, Linode just keeps on rocking! I have never in my life encountered a business that is such a pleasure to deal with and provides a product so well-functioning that you forget how life used to be before you could get a mighty little Linode and host a dozen or so web sites on it for about $240/year.

I have ArchLinux installed on my Linode. The combination has been so effective, so easy to maintain, so fire-and-forget, that I have made dumb mistakes maintaining it. Recently, I followed ArchLinux and moved my installation to systemd. Thanks to Linode's Lish, I made the changes with full peace of mind, knowing that if anything went wrong, I could just connect on the console, and not even have to contact tech support. As wonderful as Linode's tech support is, we all know that the best tech support is the one you don't have to contact. Linode has not disappointed in that regard. Their product works well. In any case, the only downtime from moving to systemd? The few seconds it took to reboot the instance.

Now, I am no sysadmin with an uptime obsession. But, I remember feeling a little sad that I had to reboot my instance, after more than a year of trouble-free operation, to take advantage of the quad-core environment Linode was automatically upgrading us to. Of course, I was fine with the dual-core system, but, I mean, four cores?! I gotta get me that. Which I did.

One problem with having such a smooth operation is the fact that I forget to check what's going on with the system. Today, I belatedly found out that Linode had just increased network capacity (my outbound cap went from 200G to 2000G—free!)

But, they had to mess with my uptime again, right?

Seriously, eight cores? I mean, 010 CORES! (do your best Jeremy Piven imitation from Gross Pointe Blank).

I told myself: I am not a sysadmin with an unhealthy obsession with uptime. A few seconds after clicking Reboot, I was rewarded with:

Oh, Linode, does your beauty know no bounds?

Apparently not. Because, in addition to just providing an excellent product, Linode is a Platinum level sponsor of YAPC::NA 2013, continuing a tradition of supporting Perl and YAPC.

So, Linode, you can mess with my uptime any time you want. I am looking forward to screaming 0x10 CORES!!! next time.

If you're looking for a no fuss virtual server where you are in control, I recommend Linode with no reservations whatsoever (and, if you want to use a link without my referral code, you can visit Linode.com using this one).

Update 2013/04/16: My praise is on hold for now, see Linode disappoints for more information.

Wednesday, April 3, 2013

Translation of programming terms and software menus considered harmful

I started learning programming around the age of 10 or 11 by following a silly column in a newspaper where once a week they showed a small BASIC program. I did not have a computer, so I basically wrote program state at each point in a box on a piece of paper. One day, one of my mom's co-workers sat me in front of a VAX terminal and let me type things. I kept getting error messages, but I was hooked. My mother and my great aunt organized the smuggling of a ZX Spectrum 48K from Germany to Turkey for me and my brother.

Given that the computer had been purchased in Germany, its instruction manual was also in German, and trying to learn BASIC using my rudimentary understanding of German was proving too frustrating. One day, I found Mastering Machine Code on Your ZX Spectrum. Until then, I had been typing up listings from various computer magazines, trying to figure out how things were working (remember, no internet then. In Turkey, even inter-city calling used to take hours sometimes). I bought that book. Didn't have anything other than simit for lunch for a few days, but it was worth it.

What I learned in that book changed my life. The CPU, RAM, ROM, interrupt tables … Later, my Spectrum 128 II helped me learn about paging memory in and out. But, it all started with Toni Parker's fabulous book.

Over time, some books in Turkish started appearing on bookshelves. Invariably, those books were disappointing because of an imperfect correspondence between the words they used for various English language terms. Take, for example, the term interrupt handler. METU's dictionary gives the translation as iş kesme kotarıcı. If one is not already familiar with the concept of an interrupt handler and also with the various twisty positions people get themselves into while trying to translate computing terms, one would have no idea what that term represents. That is, the phrase iş kesme kotarıcı is just as meaningless to a Turk as is the phrase interrupt handler if she is not already familiar with the concept.

However, if she is never exposed to the English phrase, but only learns in various Franken-translations that vaguely sound Turkish, she won't be able to communicate with people who did not learn the concept using the same Franken-translations. Sure, you can say, first learn using Franken-Turkish, then learn the English terms. But, why double the work?

The same goes for translation of software menus. I know it is very fashionable right now, but it is also very counterproductive on international teams. Even simple terms like record format are problematic. The same METU dictionary gives the translation as tutanak formatı where the word tutanak literally means meeting minutes. That makes absolutely no sense, and makes it impossible to transfer skills learned on the Turkish version of a piece of software to the English version.

The upshot of this is to make programmers and other workers who learn using Franken-translations less easily substitutable in place of programmers and other workers who learn using the English version. After all, English is still the easiest to learn lowest common denominator language. Plus, most of the related knowledge is available in English. Most of the terms in common use started in English dominated software. Whether you think this is good or bad, it is the way it is. And, if people working with software need their knowledge to be portable, they need to know the English terms for everything they use.

There is nothing that annoys me more than trying to help family and friends with computer stuff, only to find them running Turkish versions of software. Most shortcuts are different (but never all), most menus are ordered differently, mnemonics are not the same etc. It makes me realize that had I been introduced to computers and related concepts only in what people thought were correct translations, my progress later would have been much slower.

This is different than software working equally well regardless of the language in which the content is written. I am all for that. However, learning programming concepts and software tools in another language than English doesn't help. In most cases, it serves to prevent programmers and other workers in poorer countries from competing with rich programmers in rich countries.

Now, if you are in the business of selling programming books, and your potential consumers don't think that they are putting themselves at a disadvantage by reading the translation, rather than the English version, then, by all means, give them what they want if the revenue from doing so exceeds the cost. That's a business decision. I am only remarking on the effects I observe.

Jeff Atwood put it best:

Advocating the adoption of English as the de-facto standard language of software development is simple pragmatism, the most virtuous of all hacker traits. If that makes me an ugly American programmer, so be it.