Monday, May 5, 2014

Why is PERLIO_F_CRLF set on the bottom-most 'unix' layer on Windows?

Previously, I discovered something that seemed odd to me. Running the following script in a cmd.exe window on Windows told me that PERLIO_F_CRLF was set on the bottom-most 'unix' layer:

#!/usr/bin/env perl

use strict;
use warnings;

use YAML::XS;

print Dump [
    map {
        my $x = defined($_) ? $_ : '';
        $x =~ s/\A([0-9]+)\z/sprintf '0x%08x', $1/eg;
        $x;
    } PerlIO::get_layers(STDOUT, details => 1)
];

Output:

---
- unix
- ''
- 0x01205200
- crlf
- ''
- 0x00c85200

You can find the flag values in my previous post.

At the time, I wasn't sure what to make of this. However, while trying to figure out where this flag gets set, I noticed something. perliol seems to be very clear:

"unix"

A basic non-buffered layer which calls Unix/POSIX read(), write(), lseek(), close(). No buffering. Even on platforms that distinguish between O_TEXT and O_BINARY this layer is always O_BINARY. (emphasis mine)

This statement seems to imply that PERLIO_F_CRLF should never be set on a 'unix' layer. Not even on Windows.

Note that if I open a simple file, and check that filehandle, the CRLF flag is not set on the bottom-most 'unix' layer:

open my $fh, '>', 'test';

print Dump [
    map {
        my $x = defined($_) ? $_ : '';
        $x =~ s/\A([0-9]+)\z/sprintf '0x%08x', $1/eg;
        $x;
    } PerlIO::get_layers($fh, details => 1)
];

Output:

---
- unix
- ''
- 0x00201200
- crlf
- ''
- 0x00405200

Now, I do not know if changing this would fix the UTF-8 display problem in cmd.exe, but it seems to me that it should given that the problem only shows up in cmd.exe set to code page 65001.

I just need to figure out where this flag gets set. Any pointers?

Update

The standard streams are initialized in PerlIO_stdstreams:

void
PerlIO_stdstreams(pTHX)
{
    dVAR;
    if (!PL_perlio) {
 PerlIO_init_table(aTHX);
 PerlIO_fdopen(0, "Ir" PERLIO_STDTEXT);
 PerlIO_fdopen(1, "Iw" PERLIO_STDTEXT);
 PerlIO_fdopen(2, "Iw" PERLIO_STDTEXT);
    }
}

Note:

#ifdef PERLIO_USING_CRLF
#define PERLIO_STDTEXT "t"
#else
#define PERLIO_STDTEXT ""
#endif

PerlIO_fdopen calls PerlIO_openn which somehow gets to apply layers. Eventually, we find ourselves in PerlIOUnix_pushed which leads to PerlIOBase_pushed where we have:

 while (*mode) {
     switch (*mode++) {
     case '+':
  l->flags |= PERLIO_F_CANREAD | PERLIO_F_CANWRITE;
  break;
     case 'b':
  l->flags &= ~PERLIO_F_CRLF;
  break;
     case 't':
  l->flags |= PERLIO_F_CRLF;
  break;

So, PERLIO_STDTEXT is "t" on Windows, and this leads to l->flags |= PERLIO_F_CRLF;, but that doesn't explain why the bottom-most 'unix' layer on STDOUT in cmd.exe has this flag set whereas the same type of layer on a plain filehandle does not.

PerlIOUnix_pushed ends with:

PerlIOBase(f)->flags |= PERLIO_F_OPEN;

Given the guarantee expressed in the documentation, maybe there should also be a:

PerlIOBase(f)->flags &= ~PERLIO_F_CRLF;

Hmmmmm …

Another update

I guess that was a red herring. I built a perl with that line. The flags value is "fixed", but the output problem remains.

No comments:

Post a Comment