And, it will help you save time and make money ;-)
I was motivated to post this because of another one of those Stackoverflow questions. I decided at the outset not to answer that question because the poster basically wants a job done for him for free:
I need the script to get the HTML, parse the table then to save the content (User + Online time), I would also want it to run every 15 mins and to make a report in the end of the day.
However, a so-called answer stated:
in my opinion perl can get a little ugly.
does it need to be perl....if it does ot i would recommend python.
Of course, I am kinda used to people proclaiming Perl sucks, but the supreme irony of the ugliness of the post asserting Perl's ugliness motivated me.
HTML::TableExtract is beautiful. Over the years, it has saved me a lot of time, and even helped me make some money.
So, consider the Personal Income table available from the Bureau of Economic Analysis.
Let's say I want to get the Unemployment Insurance row out of that table. Here's how you do it using HTML::TableExtract:
#!/usr/bin/env perl
use strict; use warnings;
use HTML::TableExtract;
my $te = HTML::TableExtract->new(
attribs => { id => 'tbl' },
);
# local copy of
# http://bea.gov/iTable/iTableHtml.cfm?reqid=9&step=3&isuri=1&903=58
$te->parse_file('personal-income.html');
my ($table) = $te->tables;
for my $row ($table->rows) {
my ($undef, $label, @row) = @$row;
next unless defined $label;
if ($label eq 'Unemployment insurance') {
print "$label\t@row\n";
}
}
And, here is the output:
C:\temp> uu
Unemployment insurance 101.1 127.9 144.8 148.7 152.8 137.4 135.8 128.7 117.5 108.8 103.0 100.1
Of course, things can be refined, but this is pretty beautiful.
yes, very useful, I use it to parse Jira pages, my variant:
ReplyDeleteuse strict;
use warnings;
use Modern::Perl;
use HTML::Template;
sub parse_chunk_table {
my $html_string = shift;
my $te =
HTML::TableExtract->new( attribs => { class => 'confluenceTable' } );
$te->parse($html_string);
# Examine all matching tables
my %chunk = ();
foreach my $ts ( $te->tables ) {
my $check_tables = 0;
#print Dumper ($ts->rows);
foreach my $row ( $ts->rows ) {
if ( $$row[0] eq 'chunk_revision_sk' ) {
$check_tables = 1;
}
if ( ( $$row[0] ne 'chunk_revision_sk' ) & ( $check_tables eq 1 ) )
{
$chunk{ $$row[0] }++;
}
}
}
return \%chunk;
}
It does not print the output you stated.
ReplyDeleteit gives a mistake .. :X
revise it.