Wednesday, October 15, 2014

Why do I like Frankenputers?

I saw an acquaintance the other day: An old Acer Extensa 4420. Originally, the system came with 512 MB memory, and a 120 GB hard drive which Vista immediately filled to the brim. The underpowered CPU overheated pretty quickly, and it was generally a hassle to use, but it did have a 14" 1280x800 screen which is much more bearable than the stupid 1366x768 you get for low end machines these days.

Now, with a 2.2Ghz CPU and 4 GB RAM, it runs Vista nicely, but really shines when you boot into Linux. It will never be whisper quiet during software updates, but during normal use —mostly sitting on a desk, occasionally being lugged to another location— it does the job.

Except, the stupid Broadcom Wi-Fi card that came with the thing couldn't maintain a decent connection to the router, going down to 5.5 Mbps just feet from a router to which every other computer I have tried can maintain 72 Mbps when needed.

When I saw an Intel 4965 AGN full size Mini PCIe card going for $9, I wondered whether it would work OK on an AMD motherboard. After all, if standards mean anything, it should.

For nine dollars with free shipping, it was worth giving it a shot.

Well, it does ;-)

The entire process, from pre-installing Intel drivers to shutting down, removing the cover, removing the antenna cables (remember which one went where), removing the screws, putting the new card, screwing it in, re-attaching antenna cables, putting the cover back on, and booting took about 15 minutes—but that's because I dropped one of the screws inside, and it took a while to get it back from where it had securely lodged itself ;-)

Once in ArchLinux, the connection was not up. Oh, of course, the Wireless connection is not wlan0 now, it is wlp5s0 or some such. Fix that in the netctl profile, restart the service via asystemctl, and it connected to the router at 72 Mbps (the router is old as well).

Update: Got a chance to try it with a better router:

The card works beautifully both in ArchLinux (using netctl automatic profile switching), and in Vista. It no longer drops or degrades connections while browsing, or streaming videos, or in the middle of large software updates.

$ lsmod | grep -i iwl
iwl4965                97195  0 
iwlegacy               55009  1 iwl4965
mac80211              514630  2 iwl4965,iwlegacy
cfg80211              454161  3 iwl4965,iwlegacy,mac80211
led_class              12859  3 sdhci,iwlegacy,acer_wmi


$ dmesg | grep -i intel
iwl4965: Intel(R) Wireless WiFi 4965 driver for Linux, in-tree:
iwl4965: Copyright(c) 2003-2011 Intel Corporation
iwl4965 0000:05:00.0: Detected Intel(R) Wireless WiFi Link 4965AGN, REV=0x4

I found out later that this was not guaranteed to work. See Intel's "Why doesn't my laptop recognize my new Intel wireless adapter?":

Some original equipment manufacturers (OEM)s limit the computer to only specific wireless adapters causing an installation error when booting with a different wireless adapter than previously installed in the computer.

It looks like an Apple mentality is taking over the computing industry.

Thursday, October 9, 2014

A bug in Perl's autodie

I made a typo in "How does open 0; print <0>; turn every Perl program into a quine?" which was noticed by Peter Roberts. Instead of typing:

open 0;

I wrote:

open $0;

Fixing that results in:

$ cat
#!/usr/bin/env perl

use autodie;
use strict;
use warnings;

$0 = 'does not exist';

open 0;
print <0>;

$ ./
Can't open('0'): No such file or directory at ./ line 10

We can verify that perl tries to open a file called does not exist using strace or dtruss

$ sudo dtruss -f -t open perl

which, among other things, shows:

xxxx/yyyyyy:  open("does not exist\0", 0x0, 0x1B6) = -1 Err#2

However, autodie fails to deduce the correct file name for the message.

Why was documentation for open FILEHANDLE removed from perlfunc?

As Curtis and Ed both point out in the comments, perldoc -f open contains the following passage which does document the behavior. I must admit to only being able to notice that part if I search for the string $ARTICLE ;-)

As a shortcut a one-argument call takes the filename from the global scalar variable of the same name as the filehandle:

$ARTICLE = 100;
open(ARTICLE) or die "Can't find article $ARTICLE: $!\n";

Here $ARTICLE must be a global (package) scalar variable - not one declared with my or state.

While writing "How does open 0; print <0>; turn every Perl program into a quine?", I realized that the documentation of how open behaves when no filename expressions is provided was removed from perlfunc.

After a few rounds of git bisect, it seems the information was removed as part of commit 1578dcc… with the commit message [perl #117223] Remove IO::File example from perlfunc.

Methinks the commit does a little more than that. Inter alia, I do not think the following should have been removed:

diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index 18ecd40..129012c 100644 (file)
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -3909,12 +3909,6 @@ FILEHANDLE is an expression, its value is the real filehandle.  (This is
 considered a symbolic reference, so C<use strict "refs"> should I<not> be
 in effect.)
-If EXPR is omitted, the global (package) scalar variable of the same
-name as the FILEHANDLE contains the filename.  (Note that lexical 
-variables--those declared with C<my> or C<state>--will not work for this
-purpose; so if you're using C<my> or C<state>, specify EXPR in your
-call to open.)

The associated bug report does not seem to justify the removal of documentation of existing behavior either:

Perfunc contains some example code for 'read_myfile_munged'. There are various problems with that code but I don't need to list them all here - the simplest thing is just to remove it since IO::File is no longer needed to create filehandles that get automatically closed.

Right now, I am assuming the explanation of open FILEHANDLE fell victim to an editing error that went unnoticed (after all, I hope no one used this feature unexpected behavior), because after reading perldoc -f open many times, I simply cannot see this behavior documented anywhere in versions after this commit.

It is behavior that exists, and should be documented.

Before I put time into a patch, I would like to find out if there is a reason the behavior of open FILEHANDLE should NOT be documented.

How does open 0; print <0>; turn every Perl program into a quine?

The other day, on the heels of a brief exchange of tweets:

I chirped in to point out that neither the while nor the for is necessary as the arguments to print print are evaluated in list context, therefore automatically slurping the contents of the bareword filehandle 0, and printing them:

But, why is open 0; equivalent to opening the source of the script that is running?

You probably do know that the special variable $0 holds the name of the program that is being executed. It is not universally guaranteed (see perldoc -v '$0'), but this will usually be a string you can pass to open to open the source of your script.

But, how does open 0 end up opening this file?

To my dismay, I found that perldoc -f open in most recent versions of Perl don't have this, but versions as recent as 5.18.2 explain what happens when open is invoked with a single argument:

If EXPR is omitted, the global (package) scalar variable of the same name as the FILEHANDLE contains the filename.

Clearly, I don't recommend relying on this: If nothing else, you should avoid using global variables, and you should use the three argument form of open anyway. I am just explaining how this trick works.

So, when perl sees open 0;, it does the equivalent of open 0, '<', $0.

You can easily verify this:

$ cat
#!/usr/bin/env perl

# We are not golfing any more
use autodie;
use strict;
use warnings;

$0 = 'does not exist'; # Thanks Peter 
open 0;

$ ./
Can't open('0'): No such file or directory at ./ line 9

Another example:

$ cat
#!/usr/bin/env perl

'does not exist.txt' =~ /\A(.+)\z/;

open 1;
print <1>;

$ cat 'does not exist.txt'
   -- Dr. Evil

$ ./
   -- Dr. Evil


Friday, September 12, 2014

Help me switch completely to console Vim on OSX

I have decided to stop using MacVim. There is no one specific reason. A whole bunch of little pinpricks have made me uncomfortable enough that I deleted it from my Applications folder, cleaned my Open-With menu, and I am using a custom compiler Vim from iTerm2 now:

The only thing I am missing is the ability to open a file in a Finder window by right-clicking and selecting something like "Edit with Vim", and having the file opened in either as a buffer in a currently running vim instance in an iTerm window, or starting a new instance. This is obviously not essential, as I can navigate within Vim, especially using the wonderful CtrlP plugin.

However, it is bothering me that I don't know how to do this, and if anyone has already found a way, I would appreciate hearing about it.

Thursday, September 11, 2014

Scraping PDF documents without losing your sanity

The epiphany came when I was trying to extract usable information from a bunch of documents.

Some people insist on distributing essential information in PDF format making it very hard to make use of said information.

Now, I have never really made it past the table of contents of Adobe's PDF Reference, and I can't really figure out many of the available Perl modules dealing with PDFs. I know what goes into a PDF document (basically boxes with coordinates), but, just as I have never written a web server in Postscript either, I haven't been able to go into this in depth.

One of the problems with utilities that naively convert PDF to text is that usually they do a straightforward translation of the layout which does a number on the order the text comes out. The location of an object on the page and it's position in the object stream don't really correspond very reliably to each other.

Thanks to Thomas Levine, I found out about pdftohtml.

At first, I was very frustrated … Then, I realized the value of the -xml option.

With this option, the PDF document is output as <page> and <text> elements. For example:

<page number="6" position="absolute"
  top="0" left="0" height="918" width="1188">
<text top="176" left="109" width="125"
  height="15" font="2">DATA RECORD </text>

This is extremely useful when trying to extract information. First, if the entity producing the document used consistent styling, the font attribute of the text elements can be used to select items of interest. However, multi-column documents are still a pain.

The key to my epiphany lies in sorting the text elements using a lexicographic ordering: Text on page 5 should come before text on page 7. Text in column one comes before text in column three. Text on line five in column two comes before text on line two in column three … See what I did there?

At first you might think it is OK to define columns using the left attributes of text elements. The problem is when some attributes for the data you want to extract are defined in section headers that can appear in the middle of a column. People will usually center the text in those headers (for visual aesthetic reasons), and therefore they will appear to be in a later column than the data items that follow.

This may seem obvious right now, but the solution came to me only after looking at the following plot:

That is, I need a mapping of ranges of left margins to columns.

Once that mapping is defined, text elements can be sorted into a natural reading order, and information can be extracted using usual methods.

Monday, September 1, 2014

Stop your Mac from keeping a perpetual connection to Apple

I had done this some time ago on my laptop, but had to try to remember once again while helping someone else. I am just noting it here so it is not as difficult to remember the next time :-)

Basically, the problem is:

$ netstat -an
tcp4       0      0    ESTABLISHED
tcp4       0      0      ESTABLISHED

These connections are established as soon as the user logs in, and maintained perpetually.

$ lsof -i 4tcp
apsd    334 root    8u  IPv4 0x…      0t0  TCP 192.…:52622-> (ESTABLISHED)
apsd    334 root   11u  IPv4 0x…      0t0  TCP 192.…:52622-> (ESTABLISHED)
apsd    334 root   12u  IPv4 0x…      0t0  TCP 192.…:52623-> (ESTABLISHED)
apsd    334 root   14u  IPv4 0x…      0t0  TCP 192.…:52623-> (ESTABLISHED)

Seriously annoying.

apsd is not a rogue process or anything, but here's what the man page says:

ApplePushService daemon for Apple Push Notification service.
This is part of the ApplePushService framework.

There are no configuration options to apsd.
Users should not run apsd manually.

Well, alrighty then.

apple.stackexchange to the rescue:

$ sudo launchctl unload -w \

turns it off, and,

$ sudo launchctl load -w \

turns it back on.