Friday, August 29, 2014

Replacing hash keys with values does not a translation make

<rant>

Some time ago, Gabor and I had a disagreement regarding the value of translating programming articles to languages other than English.

In a nutshell, having actually worked as a translator (Danish↔Turkish and English↔Turkish, including a stint translating for CTW producers Sesame Street episodes written by Turkish writers for the Turkish version), I am quite familiar with what happens when people attempt to translate meaning by looking up terms in a dictionary.

I am afraid, Perlde scalar ve list bağlam, bir dizinin boyutu (English original) forms a good example of why such translations are not only not useful, but also harmful.

I am sure Kadir Beyazlı put in a lot of good work into the translation, but the result is an abomination.

The grammar error in the first word of the title is repeated throughout the body of the post.

More importantly, look at that title again:

Perlde scalar ve list bağlam, bir dizinin boyutu

Is that Turkish or English?

Having failed to find good terms to replace scalar and list, the translator decided to keep using the English words. How does that help a person who supposedly doesn't understand English, and, therefore, would be reading this translation? For all she knows, we could have used pony and rabbit, instead of scalar, and list, and, so long as the substitution was consistent, she would get the same benefit out of reading this.

If you look at the English-Turkish Math Dictionary, scalar is translated as sayısal which actually means numeric, which kinda works when we are talking about an array in scalar context, but then fails when we say a reference is a scalar.

Let me state this unequivocally: The Turkish language has been impoverished over the past century by the blind culling of words of Arabic, Farsi, and in some instances Hebrew origin (although, at least Eylül is still Eylül) in some blind drive towards purification of the language following Atatürk's reforms. Blind importation of words from English and French did not help either (Turks cannot distinguish among the meanings of the word economy in the Turkish Economy, the Economics Department, and economy class — which saddens me. Hint: Türkiye Ekonomisi, İktisat Bölümü, and ucuz bölüm.

I am happy I did not have to learn to write Turkish in Arabic alphabet, but every time I think of my grandfather Dr. Şinasi Kıpçak's vocabulary, mastery of the language, I am filled with both nostalgia and envy.

Coming back to how to translate scalar and list context to Turkish …

Here, as in many cases, the translator must think about what words express the meanings of those phrases most consistently and usefully.

To me, the answer is clear: Scalar context in Perl refers to situations where something is interpreted as just one thing.

So, a translation of that title that actually conveys the meaning instead of doing a simple hash lookup might be:

Perl programlarında tekli ve çoklu bağlam: Bir dizinin elemanlarının sayısı

whose literal translation back to English would be Scalar and list context in Perl programs: The number of elements in an array. I believe such a translation conveys a whole lot more meaning to a person who actually does not speak English.

Moving on, we have:

Mesela "left" kelimesi birçok anlam içerir:

I left the building.

I turned left at the building.

Why use English examples to explain how we can deduce the meaning of homonyms from context? How does someone who does not speak English get anything out of that?

Why not use a simple Turkish example?

Karı küredim.

Karı-koca.

I am willing to bet the sentence Çözümü SCALAR bağlamda veri döndüren scalar() fonksiyonunu kullanmaktır does not make any more sense to a Turkish speaker who speaks no English than The solution is to use the scalar() function that will create SCALAR context for its parameter.

Translation and hash-lookup are different things. If you want to convey meaning, you have to have a command of both languages, and the subject matter. Without that, you are only going to add to the word soup. Translating big event as büyük okazyon helps no one.

I am sure both Kadir and Gabor had the best of intentions with these translations. I just happened to notice that their collaboration happened to produce a translation that highlights everything that a non-English speaking aspiring programmer has to fight with.

In my experience, trying to learn programming from translated technical writings is a fool's errand. One would be far better off picking up a little English, watching movies with subtitles, and reading a great book such as Learning Perl. When doing so, consult mostly an English-English dictionary. Stick with it for about six months, through thick and thin, and you'll be amazed how much better your results will be through that process rather than fighting through:

Şu an bir dizinin SCALAR bağlamdaki değerinin eleman sayısı olduğunu biliyoruz. Ayrıca eğer ki dizi boş ise bu değerin 0 (that is FALSE) olduğunu, 1 veya daha fazla eleman içeriyor ise pozitif bir sayı (that is TRUE) olduğunu da biliyoruz.

Allah rızası için, sen n'apıyorsun yav gözümün içi??? "That is" ifadesini o bağlamdaki karşılığı Türkçe'de "yani" dir. Ayrıca niye "şu an"? Bi de Perl'de olacak başlıktaki.

</rant>

Friday, August 22, 2014

Convert multi-page PDF to invidual PNG images using GraphicsMagick

I had to look this up … A lot of hits by Google show ancient syntax. Here's what works:

$ gm convert 'document.pdf[12-45]' +adjoin output-%03d.png

HTH

Monday, August 18, 2014

File::Which comes with its own 'multiwhich'

I uploaded App::multiwhich, based on a script I have been using for many years, in observance of #CPANDAY. While honestly thought it was a cute, useful little utility which I could improve by fixing edge-cases, I just realized that there is no reason for you to use it ;-)

File::Which comes with its command line utility called pwhich. For example:

$ pwhich -a perl vim doesnotexist
/Users/auser/perl/5.20.0/bin/perl
/opt/local/bin/perl
/usr/bin/perl
/opt/local/bin/vim
/usr/bin/vim
pwhich: no doesnotexist in PATH

The module definitely predates my foray into Perl. I cannot fathom how I missed the pwhich utility.

So, don't use App::multiwhich. Use pwhich. I'll make the requisite changes in the module distribution.

Saturday, August 16, 2014

Just uploaded App::multiwhich in observance of CPAN day

multiwhich is a small utility which looks for an executable in all directories in your $PATH. For example, on my MacBook Pro, it gives me:

$ multiwhich perl vim doesnotexist
---
- perl:
  - /Users/auser/perl/5.20.0/bin/perl
  - /opt/local/bin/perl
  - /usr/bin/perl
- vim:
  - /opt/local/bin/vim
  - /usr/bin/vim
- doesnotexist: []

It should soon be available on CPAN. The repo is on GitHub.

For more, see CPAN day, or follow @cpan_new, or search #cpanday.

Monday, August 11, 2014

Are you a code monkey?

Despite my appreciation of Stackoverflow, I can never see myself agreeing with Jeff Atwood on anything substantial. A post of his that is still featured in my regular nightmares is the one titled We Are Typists First, Programmers Second.

He says:

When you're a fast, efficient typist, you spend less time between thinking that thought and expressing it in code.

That might matter, but, frankly, what I encounter regularly are people who really ought to think hard and long about what they are about to type, and, then, when it is time to type that, think another half an hour before touching a keyboard.

Because, otherwise, they end up generating a 500,000,000 line CSV file from a database by just inserting commas between text fields, as in, VAR1 || ',' || VAR2 || ',' ….

When the source data contains single character flag fields whose "specification" dates back decades to some COBOL thing where they ran out letters and numbers a long time ago, you sometimes get, say, 100 rows with more commas than expected.

But, the code monkey don't care!

He typed that SELECT fast. Put the dump on a server, remembering to use SFTP (of course, self-signed certificate), and got back to his typing.

After all, he is a coder. He understands things no one around him understands.

I wish all you coders, typists, brogrammers would just go on a cruise to the Bermuda Triangle.

Don't be a code monkey!

Typing fast is the LEAST important component of programming.

Think.

He goes on to gratuitously attack Perl programmers:

Don't just type random gibberish as fast as you can on the screen, unless you're a Perl programmer.

Perl has Text::CSV_XS and Text::xSV. Any programmer who is aware of these modules would not waste others' time with nonsense.

Wednesday, August 6, 2014

Don't declare a dependency on Crypt::SSLeay (or IO::Socket::SSL either)

For background, see "Does your code really depend on Crypt::SSLeay?", "Do you need Crypt::SSLeay?", and RT #95663.

Basically, you if you are using LWP, and want to communicate with sites over SSL/TLS, you should declare a dependency on LWP::Protocol::https. That will pull in whatever you need to be able to communicate with web sites over SSL/TLS. You shouldn't have an explicit dependency on the underlying plumbing being used, unless there is a specific, well-thought out reason for that.

Currently, IO::Socket::SSL is much more complete than Crypt::SSLeay. Upgrading to a recent release of LWP and Crypt::SSLeay ensures that IO::Socket::SSL is used instead of Crypt::SSLeay unless you have specifically overridden the choice of plumbing, without you having to lift a finger.

So, regarding RT #95663, I would recommend not changing anything because users who just want to be able to communicate with web sites over SSL/TLS do end up having the better module used anyway. But, if they have build systems, declared dependencies on Crypt::SSLeay or some component therein, their builds don't break, especially given the improvements in Crypt::SSLeay's Makefile.PL.

Friday, July 18, 2014

In OCaml, how can I get a list of directories in my PATH?

I have been getting my toes wet with OCaml, using Real World OCaml. The book content is freely available on their web site, but I have bought the ebook from O'Reilly, and I thoroughly recommend it.

I have to admit, it hasn't been a quick task. I find that I am too used to the luxury of documentation at my fingertips using perldoc. Reading the book, doing the exercises does breed familiarity, but I am far away from being able to write an image gallery generator (which was my first ever Perl program).

I like the implicit type checking. In fact, that is an idea that appears in Perl as well (not as strict, but, still). For example, let f x = x + 1 defines a function that takes an integer, and returns the following integer. Yes, OCaml does distinguish between types of numbers. No, I haven't yet gotten used to it.

Now, f 5 will return 6. But, f 0.5 will result in This expression has type float but an expression was expected of type int.

In Perl, if you defined my $f = sub { $_[0] + 1 } and invoked it with a string argument, the interpreter would notice it (and even tell you about it if you ask nicely):

$ perl -w -e 'my $f = sub { $_[0] + 1 }; $f->("test")'
Argument "test" isn't numeric in addition (+) at -e line 1.

Strict type checking is useful. The OCaml kind is not the same as the C or Java sort of type checking. Here is an example that had me scratching my head for a while until I studied it further.

Real World OCaml has the following example:

# let path = "/usr/bin:/usr/local/bin:/bin:/sbin";;
val path : string = "/usr/bin:/usr/local/bin:/bin:/sbin"
# String.split ~on:':' path
|> List.dedup ~compare:String.compare
|> List.iter ~f:print_endline
;;
/bin
/sbin
/usr/bin
/usr/local/bin
- : unit = ()

Now, if you squint enough, this is kind of like:

my $path = "/usr/bin:/usr/local/bin:/bin:/sbin";;
say for List::AllUtils::uniq(split /:/, $path);

although I do like the syntactic sugar of |>.

In Perl, I would have just used $ENV{PATH}. My thoughts immediately went to how to do that in OCaml. Luckily, utop has code-completion, so it didn't take me a long time to figure out I could use Sys.getenv to get the value of my $PATH.

utop # Sys.getenv("PATH");;
- : string option =
Some
 "/Users/xyz/.opam/system/bin:/Users/xyz/.opam/system/bin: \
/Users/xyz/bin:/Users/xyz/perl/5.20.0/bin: \
/opt/local/bin:/opt/local/sbin:/usr/bin: \
/bin:/usr/sbin:/sbin:/usr/local/bin: \
/opt/X11/bin:/usr/local/MacGPG2/bin"

Hmmmm … Why is ~/.opam/system/bin in there twice?

Anyway, first, note that naively replacing path with (Sys.getenv "PATH") does not "work":

utop # String.split ~on:':' (Sys.getenv "PATH");;
Error: This expression has type string option
but an expression was expected of type string

Note the Some there. Sys.getenv takes a string and possibly returns a string. In other words, its type is string -> string option = <fun>

We know why: The environment variable may or may not be defined. In Perl, we would get an undefined value in that case. Perl can then convert that value to 0 or "" as needed. In OCaml, you need to explicitly account for that possibility.

Observe the following:

utop # match Sys.getenv "PATH" with
| None -> ""
| Some x -> x
;;
- : string = 
  "/Users/xyz/.opam/ ...

Here, we decided that if Sys.getenv "PATH" does not return a value, we will consider our path to be empty. The type of the return value changed from string option to simply string, and it is no longer prefixed with Some.

If you are doing something real rather than working on small modifications to textbook exercises, you might not want to proceed if the path is not defined. But, for my immediate purpose of actually using the value of my path rather than manually typing in a string, the following was sufficient:

utop # String.split ~on:':'
(match Sys.getenv "PATH" with | None -> "" | Some x -> x)
|> List.dedup ~compare:String.compare
|> List.iter ~f:print_endline
;;

Phewww!

Pattern matching like this is actually quite valuable.

There is still a gaping hole in this construction. What if you type Sys.getenv "PTHA"? You'll end up propagating an empty path throughout a program. In Perl, I tried to avoid that kind of problem by using Const::Fast. As a simple example, I might have:

use Const::Fast;

const my %VAR => (
    HOME => 'HOME',
    PATH => 'PATH',
    TMP  => 'TMP',
);

say $ENV{ $VAR{PTHA} };

which immediately gives me Attempt to access disallowed key 'PTHA' in a restricted hash &hellip. It also serves as a documentation of which environment variables my script actually depends on.

This idea corresponds to the principle of making illegal states unrepresentable which fellow Cornellian Yaron Minsky explains in his guest lecture at CMU.

PS: Why OCaml? Well, for one, I loved Higher Order Perl, and decided I should add another camel to my herd ;-)