Categories
Covid-19 JSON New York City Python

NYC Covid Infections By Zip Code With Python

In my last post I created a CLI tool to display NYC Covid-19 test results by Zip code using Perl, my favorite language for the moment. I would also like to do the same using Python. Purely as a an excuse to learn Python. This will download the same data, from the NYC health department’s GitHub page  , and create a JSON file which I can use as a very basic database for later analysis.

Here is an sample of the downloaded raw data.

"MODZCTA","Positive","Total","zcta_cum.perc_pos"
NA,1558,1862,83.67
"10001",309,861,35.89
"10002",870,2033,42.79
"10003",396,1228,32.25
"10004",27,85,31.76
"10005",54,206,26.21
"10006",21,91,23.08
"10007",49,204,24.02
"10009",607,1745,34.79

This is the first iteration of my script.

from __future__ import print_function
import datetime, json, requests, os, re, sys


RAW_ZCTA_DATA_LINK = 'https://raw.githubusercontent.com/nychealth/coronavirus-data/master/tests-by-zcta.csv'

ALL_ZCTA_DATA_CSV = 'all_zcta_data.csv'

# -------------------------------------------------------------------------------------------------
#         Functions
# -------------------------------------------------------------------------------------------------
def get_today_str():
    today = datetime.date.today().strftime("%Y%m%d")
    return today

def find_bin():
    this_bin = os.path.abspath(os.path.dirname(__file__))
    return this_bin

def create_dir_if_not_exists(base_dir, dir_name):
    the_dir =  base_dir + '/' + dir_name
    if not os.path.isdir(the_dir):
        os.mkdir(the_dir)
    return the_dir

def create_db_dirs():
    this_bin = find_bin()
    db_dir = create_dir_if_not_exists(this_bin, 'db')
    today_str = get_today_str()
    year_month = today_str[0:4] + '_' + today_str[4:6];
    year_month_dir = create_dir_if_not_exists(db_dir, year_month)
    return year_month_dir

def get_covid_test_data_text():
    r = requests.get(RAW_ZCTA_DATA_LINK)
    print("Resp: " + str(r.status_code))
    return r.text

def create_list_of_test_data():
    test_vals = []
    covid_text = get_covid_test_data_text()
    for l in covid_text.splitlines():
        lvals = re.split('\s*,\s*', l )
        if lvals[0] == '"MODZCTA"':
            continue
        zip_dic = { 'zip' : lvals[0], 'positive': lvals[1],  'total_tested': lvals[2], 'cumulative_percent_of_those_tested': lvals[3]}
        test_vals.append(zip_dic) 
    return test_vals

def write_todays_test_data_to_file():
    year_month_dir = create_db_dirs()
    test_data = create_list_of_test_data()
    print(test_data[0])
    today_str = get_today_str()
    todays_file = year_month_dir + '/' + today_str + '_tests_by_ztca.json'
    out_file = open ( todays_file, 'w')
    json.dump(test_data, out_file, indent=2)
    print("Created todays ZTCA tests file,{todays_file}".format(**locals()))
    out_file.close()



# -------------------------------------------------------------------------------------------------


write_todays_test_data_to_file()

Just a few snippets of interesting code here.

To get todays date as a string in the format ‘yyyymmdd’, example, 20200401, I used the datetime module.

today = datetime.date.today().strftime("%Y%m%d")

Python has an interesting syntax for slicing strings or lists up into pieces. I used it here to create a directory name using the current year and month.

year_month = today_str[0:4] + '_' + today_str[4:6]

The ‘[0:4]’ gets the first four characters of the string. The ‘[4:6]’ grabs the subsequent 2 characters of the string.  These are combined to create a sub-directory name like ‘2020_05’.

To get the directory location of this script, kind-of similar to the Find::Bin in Perl, I used the path method of the os path library.

this_bin = os.path.abspath(os.path.dirname(__file__))

After downloading the raw test data for the current date from the NYC department of health GitHub page, using the requests library.

r = requests.get(RAW_ZCTA_DATA_LINK)
    print("Resp: " + str(r.status_code))
    return r.text

It is then split up using the ‘re’ module, which seems to  be Pythons rather awkward way of doing regular expression matching.

 lvals = re.split('\s*,\s*', l )

This will split each line of input data similar to this,

"10003",396,1228,32.25

 Which can then be inserted to a python Dictionary structure like this,

{
  "zip": "10003",
  "yyyymmdd": "20200503",
  "positive": "396",
  "total_tested": "1228",
  "cumulative_percent_of_those_tested": "32.25"
}

This is appended to the end of a list of similar Dictionaries.

You may notice how I create the file path string is a little kludgy.

 todays_file = year_month_dir + '/' + today_str + '_tests_by_ztca.json'

I have since learned that there’s a better way to do this using the os path library, which I’ll do the next time. 

To print the data in JSON format to a file, Python provides the aptly named ‘json’ library.  To dump the data to a file, simply,

json.dump(test_data, out_file, indent=2)

The “indent=2”, isn’t necessary, but it makes the output more readable.

To read JSON data from the file, 

test_data = json.load(in_file)

Read more about it here, Python JSON docs.

In the next post I will add more functionality to add more location details for each zip code where the tests were conducted, using a NYC Zip Code database file.

Categories
Chart::Plotly Covid-19 Moo MooX::Options New York City Perl

NYC Covid-19 Infections by Zip Code, with Perl

The NYC Department of Health started publishing their Covid-19 test testing results on GitHub . One of their datasets tests-by-zctascv is, in their own words.

This file includes the cumulative count of New York City residents by ZIP code of residence who:
Were ever tested for COVID-19 (SARS-CoV-2)
Tested positive The cumulative counts are as of the date of extraction from the NYC Health Department’s disease surveillance database.

tests-by-zcta.csv
GitHub View of “tests-by-zcta.csv”

This file is updated almost every day and shows the number of people tested, the number who are found to have Covid-19 in each New York City Zip code. It also shows the the cumulative percentage of those tested who have the virus. 

What I would like to add, is more detailed information for each Zip Code so that it makes more sense to me. For each zip code, I would like to add the borough, the town, or district in that borough.  To make things a little more complicated,  NYC boroughs are divided up differently. Manhattan addresses are “New York City”, Brooklyn, Bronx and Staten Island are their own cities for mailing address purposes. Queens however is different.  Queens is broken up into towns like Flushing and Long Island City, Woodside, Jamaica etc. 

In a previous post Creating A Simple JSON NYC Zip Code Database File With Perl and MooX::Options , I created a little database file to match the zip codes with the neighbourhood.

Now I created a new script to download the raw raw csv data from the NYC Department Of Health GitHub page and merge it with my little Zip Code database.

See the code on GitHub

sub get_raw_covid_data_by_zip {
    my $self = shift;
    my @data =
      map { _conv_zcta_rec_to_hash($_) }
      split( /\r?\n/, get( $self->zcta_github_link ) );
    shift @data
      if ( $data[0]->{cumulative_percent_of_those_tested} =~ /zcta_cum/ )
      ;    # Dont need that header
    say "Got @{[ scalar @data ]} lines of covid data. Thanks Mr. Mayor";
    return \@data;
}

The above function uses the CPAN module LWP::Simple which exports the ‘get’ function to download the data from GitHub. The ‘split’ function breaks the data up into individual lines, which are fed into the ‘map’ function where each individual line of data is passed into ‘_conv_zcta_rec_to_hash’ which breaks the line into a Hash, which is enriched with some extra Zip Code location information.

 

sub _conv_zcta_rec_to_hash {
    my $str = shift;
    state $date_h = _get_date_h();
    my %h;
    (
        $h{zip}, $h{positive}, $h{total_tested},
        $h{cumulative_percent_of_those_tested}
    ) = split /\s*,\s*/, $str;

    ( $h{zip} ) = $h{zip} =~ /(\d+)/;
    $h{zip} ||= $NA_ZIP;    # There is one undef zip in test data
    $h{yyyymmdd} = $date_h->{yyyymmdd};
    return \%h;
}

Here’s a sample of one line of data as a hash element.

{
     cumulative_percent_of_those_tested => "42.44",
     positive     => "337",
     total_tested => "794",
     yyyymmdd     => "20200418",
     zip          => "10003",
},

The newly created array of hashes is then serialized to JSON format and printed to a file using File::Serialize . This will be my file database that I can use to provide other useful information.

sub create_latest_tests_by_ztca_file {
    my $self       = shift;
   
 my $covid_data = $self->get_raw_covid_data_by_zip();
 
   serialize_file $self->tests_by_zcta_db_json_file => $covid_data;
 
   say "Created a new " . $self->tests_by_zcta_db_json_file;
    1;
}

Printing the test results to a CSV file.

Printing this to a C.S.V file is easy enough with Perl and Text::CSV_XS.

sub write_latest_zcta_to_csv {
    my ($self) = @_;
    my @col_headers = (
        qw/Zip Date City District Borough/,
        'Total Tested', 'Positive', '% of Tested'
    );
    my @col_names = (
        qw/zip yyyymmdd city district borough total_tested positive cumulative_percent_of_those_tested /
    );
    my $csv       = Text::CSV_XS->new( { binary => 1, eol => $/ } );
    my $zcta_file = $self->get_todays_csv_file($ALL_ZCTA_DATA_CSV);
    my $z_fh      = $zcta_file->openw;
    $csv->print( $z_fh, \@col_headers ) or $csv->error_diag;

    for my $one_day_zip_rec (
        sort { $b->{positive} <=> $a->{positive} || $a->{zip} <=> $b->{zip} }
        @{ $self->tests_by_zcta_today } )
    {
        my $location_rec =
          $self->zip_db->zip_db_hash->{ $one_day_zip_rec->{zip} }
          || _get_filler_location_rec( $one_day_zip_rec->{zip} );
        $self->zip_db->zip_db_hash->{ $one_day_zip_rec->{zip} } ||=
          $location_rec;
        my %csv_rec = ( %$one_day_zip_rec, %$location_rec );
        $csv->print( $z_fh, [ @csv_rec{@col_names} ] );
    }
    close($z_fh) or warn "Failed to close $zcta_file";
    say "Created a new $zcta_file";
}

my $zcta_file = $self->get_todays_csv_file($ALL_ZCTA_DATA_CSV);

Uses a Moo attribute to return a csv file path with the current days timestamp.

for my $one_day_zip_rec (
sort { $b->{positive} <=> $a->{positive} || $a->{zip} <=> $b->{zip} }
@{ $self->tests_by_zcta_today } )
{...

When reading the current days test results data, it is sorted by the positive results. Then it’s combined with the zip code location data for that zip code, and printed.

my %csv_rec = ( %$one_day_zip_rec, %$location_rec );
$csv->print( $z_fh, [ @csv_rec{@col_names} ] );

Below is a sample CSV file for April 17 2020.

Next we can create nice Plotly charts to display the test results.

Categories
File::Serialize JSON Moo MooX::Options MooX::Options NewYorkCity Perl Zip Codes

Creating A Simple JSON NYC Zip Code Database File With Perl and MooX::Options

I found myself needing some New York City detailed Zip Code information for another script I was creating. The zip codes themselves are easy enough to find online. I needed to include more details about each zip code location.  I created a Perl script to merge two hard coded Perl data structures, which are printed out as a very basic JSON database file.

When creating Perl scripts with command line options, my go-to CPAN module is Getopt::Long. However for this script I will use MooX::Options, as I may extract some of the methods to be used in a future Moo module.

This will have three options, ‘create_zip_db’, ‘read_zip_db’  and ‘verbose’. The ‘doc’ attribute gives a brief description of each option. The ‘short’ attribute specifies any aliases that can be used for each option. The is ‘ro’ , means that the option value is immutable.

option create_zip_db => (
    is    => 'ro',
    short => 'new_zipdb|new_zip',
    doc   => q/Create a new NYC Zip, Borough, District, Town JSON file./,
);

option read_zip_db => (
    is    => 'ro',
    short => 'read_db',
    doc   => q/Read the NYC Zip file database./,
);

option verbose => ( is => 'ro', doc => 'Print details' );

There are three Moo attributes.  Some time in the future I can put these into a separate Moo module.

has db_dir => (
    is      => 'rw',
    isa     => Path,
    coerce  => 1,
    default => sub { "$Bin/../db" }
);

has zip_db_json_file => (
    is      => 'lazy',
    isa     => Path,
    builder => sub {
        $_[0]->db_dir->child("zip_db.json");
    }
);

has zip_hash => (
    is => 'lazy',
    isa =>
      sub { die "'zips_hash' must be a HASH" unless ( ref( $_[0] ) eq 'HASH' ) }
    ,
    builder => sub {
        deserialize_file $_[0]->zip_db_json_file;
    }
);

The first attribute ‘db_dir’ specifies the future location of the JSON file. It uses Types::Path Tiny   to enforce this directory path as a Path::Tiny  object. The ‘zip_db_json_file’ is also a Types::Path::Tiny Path.

The ‘zip_hash’ is the data structure what will store the NYC Zip code, borough, district, town information. The ‘isa’ for this attribute will ensure that it is a Perl hash.  The ‘deserialize_file’  function comes from the CPAN module, File::Serialize , which is very useful for dumping out Perl data structures to a JSON file, or in this case slurping in a JSON file to a Perl data structure. It also handles formats other than JSON.

Note that the ‘zip_hash’ attribute is ‘lazy’.  I’m not saying that zip codes are particularly adverse to work. This is just Moo’s way of saying, “please don’t make me do anything until I really have to”.  That way, resources are not nu-necessarily used creating a structure that isn’t being called for. 

# Main
sub run {
    my ($self) = @_;
    $self->create_new_zipdb_file if $self->create_zip_db;
    $self->read_and_dump_the_db  if $self->read_zip_db;
    say "All Done!"              if $self->verbose;
}
main->new_with_options()->run;

MooX::Options has it’s own particular style for creating a “Main” function that you won’t usually see in standard Perl scripts. It may be borrowed from brian d foy’s “Modulino” concept. Anyway, the script is invoked by:

main->new_with_options()->run;

The main ‘run’ function will call the methods as specified by the command line options.

To run this script from the command line.

# To get help
λ perl bin\create_zipdb.pl -h
USAGE: create_zipdb.pl [-h] [long options ...]

    --create_zip_db  Create a new NYC Zip, Borough, District, Town JSON
                     file.
    --read_zip_db    Read the NYC Zip file database.
    --verbose        Print details

    --usage          show a short help message
    -h               show a compact help message
    --help           show a long help message
    --man            show the manual

# Create a JSON file database
λ perl bin\create_zipdb.pl --create_zip_db --v

# Read the database and dump to the terminal
λ perl bin\create_zipdb.pl --read_zip_db

Most of the actual work of reading in the hard coded data structure and creating/reading the JSON database file is done here:

sub create_new_zipdb_file {
    my $self          = shift;
    my $zip_boro_dist = $self->get_raw_zip_data();
    serialize_file $self->zip_db_json_file => $zip_boro_dist;
    say "Created a new " . $self->zip_db_json_file if $self->verbose;
}

sub get_raw_zip_data {
    my $self         = shift;
    my %zips_to_city = %{ _get_zips_to_city() };
    my %bdz          = %{ _get_borough_district_zips() };
    my %zip_boro_dist;
    for my $borough ( sort keys %bdz ) {
        my %district = %{ $bdz{$borough} };
        for my $district_name ( sort keys %district ) {
            my @district_zips = @{ $district{$district_name} };
            for my $zip ( sort @district_zips ) {
                my ( $city, $county ) = split /,/, $zips_to_city{$zip};
                $county =
                    $borough eq 'Brooklyn' ? 'Kings'
                  : $borough eq 'Bronx'    ? 'Bronx'
                  : 'New York'
                  unless $county;

                $zip_boro_dist{$zip} = {
                    borough  => $borough,
                    district => $district_name,
                    city     => $city,
                    county   => $county,
                };
            }
        }
    }
    return \%zip_boro_dist;
}

sub read_and_dump_the_db {
    my $self         = shift;
    my $location_rec = $self->zip_hash;
    dump $location_rec;
}

Method ‘get_raw_zip_data’ grabs the two hard coded data structures and merges them. It makes a few little adjustments.  It is called by ‘create_new_zipdb_file which uses the ‘serialize_file’ function from  File::Serialize to dump the the Perl data structure in JSON format to the output JSON file.

Method ‘read_and_dump_the_db’ just reads this JSON file into the ‘zip_hash’ and dumps the contents to the console.

   "10022" : {
      "borough" : "Manhattan",
      "city" : "New York",
      "county" : "New York",
      "district" : "Gramercy Park and Murray Hill"
   },
   "10023" : {
      "borough" : "Manhattan",
      "city" : "New York",
      "county" : "New York",
      "district" : "Upper West Side"
   },
   ...
     "10314" : {
      "borough" : "Staten Island",
      "city" : "Staten Island",
      "county" : "Richmond",
      "district" : "Mid-Island"
   },
   "10451" : {
      "borough" : "Bronx",
      "city" : "Bronx",
      "county" : "Bronx",
      "district" : "High Bridge and Morrisania"
   },
   ...
  "11426" : {
      "borough" : "Queens",
      "city" : "Bellerose",
      "county" : "Queens",
      "district" : "Southeast Queens"
   },
   "11427" : {
      "borough" : "Queens",
      "city" : "Queens Village",
      "county" : "Queens",
      "district" : "Southeast Queens"
   },
   "11428" : {
      "borough" : "Queens",
      "city" : "Queens Village",
      "county" : "Queens",
      "district" : "Southeast Queens"
   },

The complete script can be found here create_zipdb.pl