Categories
Chart::Plotly Covid-19 Moo MooX::Options New York City Perl

NYC Covid-19 Infections by Zip Code, with Perl

The NYC Department of Health started publishing their Covid-19 test testing results on GitHub . One of their datasets tests-by-zctascv is, in their own words.

This file includes the cumulative count of New York City residents by ZIP code of residence who:
Were ever tested for COVID-19 (SARS-CoV-2)
Tested positive The cumulative counts are as of the date of extraction from the NYC Health Department’s disease surveillance database.

tests-by-zcta.csv
GitHub View of “tests-by-zcta.csv”

This file is updated almost every day and shows the number of people tested, the number who are found to have Covid-19 in each New York City Zip code. It also shows the the cumulative percentage of those tested who have the virus. 

What I would like to add, is more detailed information for each Zip Code so that it makes more sense to me. For each zip code, I would like to add the borough, the town, or district in that borough.  To make things a little more complicated,  NYC boroughs are divided up differently. Manhattan addresses are “New York City”, Brooklyn, Bronx and Staten Island are their own cities for mailing address purposes. Queens however is different.  Queens is broken up into towns like Flushing and Long Island City, Woodside, Jamaica etc. 

In a previous post Creating A Simple JSON NYC Zip Code Database File With Perl and MooX::Options , I created a little database file to match the zip codes with the neighbourhood.

Now I created a new script to download the raw raw csv data from the NYC Department Of Health GitHub page and merge it with my little Zip Code database.

See the code on GitHub

sub get_raw_covid_data_by_zip {
    my $self = shift;
    my @data =
      map { _conv_zcta_rec_to_hash($_) }
      split( /\r?\n/, get( $self->zcta_github_link ) );
    shift @data
      if ( $data[0]->{cumulative_percent_of_those_tested} =~ /zcta_cum/ )
      ;    # Dont need that header
    say "Got @{[ scalar @data ]} lines of covid data. Thanks Mr. Mayor";
    return \@data;
}

The above function uses the CPAN module LWP::Simple which exports the ‘get’ function to download the data from GitHub. The ‘split’ function breaks the data up into individual lines, which are fed into the ‘map’ function where each individual line of data is passed into ‘_conv_zcta_rec_to_hash’ which breaks the line into a Hash, which is enriched with some extra Zip Code location information.

 

sub _conv_zcta_rec_to_hash {
    my $str = shift;
    state $date_h = _get_date_h();
    my %h;
    (
        $h{zip}, $h{positive}, $h{total_tested},
        $h{cumulative_percent_of_those_tested}
    ) = split /\s*,\s*/, $str;

    ( $h{zip} ) = $h{zip} =~ /(\d+)/;
    $h{zip} ||= $NA_ZIP;    # There is one undef zip in test data
    $h{yyyymmdd} = $date_h->{yyyymmdd};
    return \%h;
}

Here’s a sample of one line of data as a hash element.

{
     cumulative_percent_of_those_tested => "42.44",
     positive     => "337",
     total_tested => "794",
     yyyymmdd     => "20200418",
     zip          => "10003",
},

The newly created array of hashes is then serialized to JSON format and printed to a file using File::Serialize . This will be my file database that I can use to provide other useful information.

sub create_latest_tests_by_ztca_file {
    my $self       = shift;
   
 my $covid_data = $self->get_raw_covid_data_by_zip();
 
   serialize_file $self->tests_by_zcta_db_json_file => $covid_data;
 
   say "Created a new " . $self->tests_by_zcta_db_json_file;
    1;
}

Printing the test results to a CSV file.

Printing this to a C.S.V file is easy enough with Perl and Text::CSV_XS.

sub write_latest_zcta_to_csv {
    my ($self) = @_;
    my @col_headers = (
        qw/Zip Date City District Borough/,
        'Total Tested', 'Positive', '% of Tested'
    );
    my @col_names = (
        qw/zip yyyymmdd city district borough total_tested positive cumulative_percent_of_those_tested /
    );
    my $csv       = Text::CSV_XS->new( { binary => 1, eol => $/ } );
    my $zcta_file = $self->get_todays_csv_file($ALL_ZCTA_DATA_CSV);
    my $z_fh      = $zcta_file->openw;
    $csv->print( $z_fh, \@col_headers ) or $csv->error_diag;

    for my $one_day_zip_rec (
        sort { $b->{positive} <=> $a->{positive} || $a->{zip} <=> $b->{zip} }
        @{ $self->tests_by_zcta_today } )
    {
        my $location_rec =
          $self->zip_db->zip_db_hash->{ $one_day_zip_rec->{zip} }
          || _get_filler_location_rec( $one_day_zip_rec->{zip} );
        $self->zip_db->zip_db_hash->{ $one_day_zip_rec->{zip} } ||=
          $location_rec;
        my %csv_rec = ( %$one_day_zip_rec, %$location_rec );
        $csv->print( $z_fh, [ @csv_rec{@col_names} ] );
    }
    close($z_fh) or warn "Failed to close $zcta_file";
    say "Created a new $zcta_file";
}

my $zcta_file = $self->get_todays_csv_file($ALL_ZCTA_DATA_CSV);

Uses a Moo attribute to return a csv file path with the current days timestamp.

for my $one_day_zip_rec (
sort { $b->{positive} <=> $a->{positive} || $a->{zip} <=> $b->{zip} }
@{ $self->tests_by_zcta_today } )
{...

When reading the current days test results data, it is sorted by the positive results. Then it’s combined with the zip code location data for that zip code, and printed.

my %csv_rec = ( %$one_day_zip_rec, %$location_rec );
$csv->print( $z_fh, [ @csv_rec{@col_names} ] );

Below is a sample CSV file for April 17 2020.

Next we can create nice Plotly charts to display the test results.

Categories
File::Serialize JSON Moo MooX::Options MooX::Options NewYorkCity Perl Zip Codes

Creating A Simple JSON NYC Zip Code Database File With Perl and MooX::Options

I found myself needing some New York City detailed Zip Code information for another script I was creating. The zip codes themselves are easy enough to find online. I needed to include more details about each zip code location.  I created a Perl script to merge two hard coded Perl data structures, which are printed out as a very basic JSON database file.

When creating Perl scripts with command line options, my go-to CPAN module is Getopt::Long. However for this script I will use MooX::Options, as I may extract some of the methods to be used in a future Moo module.

This will have three options, ‘create_zip_db’, ‘read_zip_db’  and ‘verbose’. The ‘doc’ attribute gives a brief description of each option. The ‘short’ attribute specifies any aliases that can be used for each option. The is ‘ro’ , means that the option value is immutable.

option create_zip_db => (
    is    => 'ro',
    short => 'new_zipdb|new_zip',
    doc   => q/Create a new NYC Zip, Borough, District, Town JSON file./,
);

option read_zip_db => (
    is    => 'ro',
    short => 'read_db',
    doc   => q/Read the NYC Zip file database./,
);

option verbose => ( is => 'ro', doc => 'Print details' );

There are three Moo attributes.  Some time in the future I can put these into a separate Moo module.

has db_dir => (
    is      => 'rw',
    isa     => Path,
    coerce  => 1,
    default => sub { "$Bin/../db" }
);

has zip_db_json_file => (
    is      => 'lazy',
    isa     => Path,
    builder => sub {
        $_[0]->db_dir->child("zip_db.json");
    }
);

has zip_hash => (
    is => 'lazy',
    isa =>
      sub { die "'zips_hash' must be a HASH" unless ( ref( $_[0] ) eq 'HASH' ) }
    ,
    builder => sub {
        deserialize_file $_[0]->zip_db_json_file;
    }
);

The first attribute ‘db_dir’ specifies the future location of the JSON file. It uses Types::Path Tiny   to enforce this directory path as a Path::Tiny  object. The ‘zip_db_json_file’ is also a Types::Path::Tiny Path.

The ‘zip_hash’ is the data structure what will store the NYC Zip code, borough, district, town information. The ‘isa’ for this attribute will ensure that it is a Perl hash.  The ‘deserialize_file’  function comes from the CPAN module, File::Serialize , which is very useful for dumping out Perl data structures to a JSON file, or in this case slurping in a JSON file to a Perl data structure. It also handles formats other than JSON.

Note that the ‘zip_hash’ attribute is ‘lazy’.  I’m not saying that zip codes are particularly adverse to work. This is just Moo’s way of saying, “please don’t make me do anything until I really have to”.  That way, resources are not nu-necessarily used creating a structure that isn’t being called for. 

# Main
sub run {
    my ($self) = @_;
    $self->create_new_zipdb_file if $self->create_zip_db;
    $self->read_and_dump_the_db  if $self->read_zip_db;
    say "All Done!"              if $self->verbose;
}
main->new_with_options()->run;

MooX::Options has it’s own particular style for creating a “Main” function that you won’t usually see in standard Perl scripts. It may be borrowed from brian d foy’s “Modulino” concept. Anyway, the script is invoked by:

main->new_with_options()->run;

The main ‘run’ function will call the methods as specified by the command line options.

To run this script from the command line.

# To get help
λ perl bin\create_zipdb.pl -h
USAGE: create_zipdb.pl [-h] [long options ...]

    --create_zip_db  Create a new NYC Zip, Borough, District, Town JSON
                     file.
    --read_zip_db    Read the NYC Zip file database.
    --verbose        Print details

    --usage          show a short help message
    -h               show a compact help message
    --help           show a long help message
    --man            show the manual

# Create a JSON file database
λ perl bin\create_zipdb.pl --create_zip_db --v

# Read the database and dump to the terminal
λ perl bin\create_zipdb.pl --read_zip_db

Most of the actual work of reading in the hard coded data structure and creating/reading the JSON database file is done here:

sub create_new_zipdb_file {
    my $self          = shift;
    my $zip_boro_dist = $self->get_raw_zip_data();
    serialize_file $self->zip_db_json_file => $zip_boro_dist;
    say "Created a new " . $self->zip_db_json_file if $self->verbose;
}

sub get_raw_zip_data {
    my $self         = shift;
    my %zips_to_city = %{ _get_zips_to_city() };
    my %bdz          = %{ _get_borough_district_zips() };
    my %zip_boro_dist;
    for my $borough ( sort keys %bdz ) {
        my %district = %{ $bdz{$borough} };
        for my $district_name ( sort keys %district ) {
            my @district_zips = @{ $district{$district_name} };
            for my $zip ( sort @district_zips ) {
                my ( $city, $county ) = split /,/, $zips_to_city{$zip};
                $county =
                    $borough eq 'Brooklyn' ? 'Kings'
                  : $borough eq 'Bronx'    ? 'Bronx'
                  : 'New York'
                  unless $county;

                $zip_boro_dist{$zip} = {
                    borough  => $borough,
                    district => $district_name,
                    city     => $city,
                    county   => $county,
                };
            }
        }
    }
    return \%zip_boro_dist;
}

sub read_and_dump_the_db {
    my $self         = shift;
    my $location_rec = $self->zip_hash;
    dump $location_rec;
}

Method ‘get_raw_zip_data’ grabs the two hard coded data structures and merges them. It makes a few little adjustments.  It is called by ‘create_new_zipdb_file which uses the ‘serialize_file’ function from  File::Serialize to dump the the Perl data structure in JSON format to the output JSON file.

Method ‘read_and_dump_the_db’ just reads this JSON file into the ‘zip_hash’ and dumps the contents to the console.

   "10022" : {
      "borough" : "Manhattan",
      "city" : "New York",
      "county" : "New York",
      "district" : "Gramercy Park and Murray Hill"
   },
   "10023" : {
      "borough" : "Manhattan",
      "city" : "New York",
      "county" : "New York",
      "district" : "Upper West Side"
   },
   ...
     "10314" : {
      "borough" : "Staten Island",
      "city" : "Staten Island",
      "county" : "Richmond",
      "district" : "Mid-Island"
   },
   "10451" : {
      "borough" : "Bronx",
      "city" : "Bronx",
      "county" : "Bronx",
      "district" : "High Bridge and Morrisania"
   },
   ...
  "11426" : {
      "borough" : "Queens",
      "city" : "Bellerose",
      "county" : "Queens",
      "district" : "Southeast Queens"
   },
   "11427" : {
      "borough" : "Queens",
      "city" : "Queens Village",
      "county" : "Queens",
      "district" : "Southeast Queens"
   },
   "11428" : {
      "borough" : "Queens",
      "city" : "Queens Village",
      "county" : "Queens",
      "district" : "Southeast Queens"
   },

The complete script can be found here create_zipdb.pl

Categories
Moo Perl

Using MooX::Options to run a Perl script.

In my last post about using Moo I added some more functionality to my File::Info package. I added some more attributes and some Moo roles. I also updated and ran my test script to ensure that it worked well.

Now I want to put my File::Info class to use.  Normally when I write scripts I like to use configuration files with Config::General and command line options (CLI) with Getopt::Long. These have served me very well in the past. Having seen a post recently on the excellent Perl Maven site about Command Line Scripts With Moo I decided to give MooX::Options a try.

This script accepts one input option,  ‘in_file’. The MooX option attribute takes care of  most of the validation and error handling of the input. It also takes care of displaying ‘–help’ and ‘–man’  documentation based on what is entered into the ‘doc’ and ‘long_doc’ option parameters respectively.

You can even do coercion on the option input just as you would with any Moo attribute. Here I could have coerced the input file into a ‘Path::Tiny’ object if I wished. However this is already taken care of in the File::Info module already.

Even though this is a script, because it uses Moo, you could also create Moo attributes. It really is a neat way to write Perl scripts with input options.

 

use Moo;
use MooX::Options;
use v5.16;
use FindBin qw/$Bin/;
use lib qq{$Bin/../lib};

use File::Info;

#-------------------------------------------------------------------------------
#  Options
#-------------------------------------------------------------------------------

option 'in_file' => (
    is       => 'ro',
    format   => 's',
    required => 1,
    short => q{file|i_f},
    doc      => q{The file you wish to examine.},
    long_doc => q{
    The full file path of the file that you would like to get size and age information about.},
);

#-------------------------------------------------------------------------------
#  Functions
#-------------------------------------------------------------------------------
sub run_like_the_wind {
    my ($self) = @_;
    say qq{\n};
    say q{Your File: } . $self->in_file;
    say q{=} x (length($self->in_file) + 11);

    my $file_info_obj = File::Info->new( file => $self->in_file );

    say qq{Not so pretty...\n}
      . $file_info_obj->file->stringify . q{ size is }
      . $file_info_obj->size_bytes
      . q{ and it's been }
      . $file_info_obj->seconds_since_mod
      . qq{ seconds since its last modification.};

    say qq{\nA little prettier...\n}
      . $file_info_obj->file->stringify . q{ size is }
      . $file_info_obj->make_file_size_pretty
      . q{ and was last modified on }
      . $file_info_obj->mod_time_moment->strftime(qq{%a %b %e at %I:%M:%S %p})  . q{ local time.};

     say q{That's } .$file_info_obj->time_since_mod_pretty  . qq{ ago!\n};
}

#--- Run the script
main->new_with_options->run_like_the_wind;

On the first run I forget to enter the ‘in_file’.

Moo > perl bin/file_info.pl 
in_file is missing
USAGE: file_info.pl [-h] [long options...]

    --in_file: String
        The file you wish to examine.

    
    --usage:
        show a short help message

    
    -h --help:
        show a help message

    
    --man:
        show the manual

    
[14:44 - 0.62]
Moo > 

It prints a nice error message with some instructions.  Next time I will get it right.

Moo > perl bin/file_info.pl --in_file IMAG0029.jpg 


Your File: IMAG0029.jpg
=======================
Not so pretty...
IMAG0029.jpg size is 592023 and it's been 78620118 seconds since its last modification.

A little prettier...
IMAG0029.jpg size is 578.15 KB and was last modified on Sat Apr  6 at 04:02:28 PM local time.
That's 129.99 Weeks ago!

[14:57 - 0.04]
[austin@the-general-II 83] Moo > 

There is a lot more that could be added to our File::Info module. It could be subclassed and or given some more Roles to provide some extra functionality.

Here is some more useful links on this topic.

MooX::Options on CPAN

Now I Have Better Options, by Mark Fowler in the Perl Advent Calendar

App::Math::Tutor, by Jens Rehsack , has lots of Moo and Moox::Options examples.