NYC Covid Infections By Zip Code With Python

In my last post I created a CLI tool to display NYC Covid-19 test results by Zip code using Perl, my favorite language for the moment. I would also like to do the same using Python. Purely as a an excuse to learn Python. This will download the same data, from the NYC health department’s GitHub page  , and create a JSON file which I can use as a very basic database for later analysis.

Here is an sample of the downloaded raw data.

"MODZCTA","Positive","Total","zcta_cum.perc_pos"
NA,1558,1862,83.67
"10001",309,861,35.89
"10002",870,2033,42.79
"10003",396,1228,32.25
"10004",27,85,31.76
"10005",54,206,26.21
"10006",21,91,23.08
"10007",49,204,24.02
"10009",607,1745,34.79

This is the first iteration of my script.

from __future__ import print_function
import datetime, json, requests, os, re, sys


RAW_ZCTA_DATA_LINK = 'https://raw.githubusercontent.com/nychealth/coronavirus-data/master/tests-by-zcta.csv'

ALL_ZCTA_DATA_CSV = 'all_zcta_data.csv'

# -------------------------------------------------------------------------------------------------
#         Functions
# -------------------------------------------------------------------------------------------------
def get_today_str():
    today = datetime.date.today().strftime("%Y%m%d")
    return today

def find_bin():
    this_bin = os.path.abspath(os.path.dirname(__file__))
    return this_bin

def create_dir_if_not_exists(base_dir, dir_name):
    the_dir =  base_dir + '/' + dir_name
    if not os.path.isdir(the_dir):
        os.mkdir(the_dir)
    return the_dir

def create_db_dirs():
    this_bin = find_bin()
    db_dir = create_dir_if_not_exists(this_bin, 'db')
    today_str = get_today_str()
    year_month = today_str[0:4] + '_' + today_str[4:6];
    year_month_dir = create_dir_if_not_exists(db_dir, year_month)
    return year_month_dir

def get_covid_test_data_text():
    r = requests.get(RAW_ZCTA_DATA_LINK)
    print("Resp: " + str(r.status_code))
    return r.text

def create_list_of_test_data():
    test_vals = []
    covid_text = get_covid_test_data_text()
    for l in covid_text.splitlines():
        lvals = re.split('\s*,\s*', l )
        if lvals[0] == '"MODZCTA"':
            continue
        zip_dic = { 'zip' : lvals[0], 'positive': lvals[1],  'total_tested': lvals[2], 'cumulative_percent_of_those_tested': lvals[3]}
        test_vals.append(zip_dic) 
    return test_vals

def write_todays_test_data_to_file():
    year_month_dir = create_db_dirs()
    test_data = create_list_of_test_data()
    print(test_data[0])
    today_str = get_today_str()
    todays_file = year_month_dir + '/' + today_str + '_tests_by_ztca.json'
    out_file = open ( todays_file, 'w')
    json.dump(test_data, out_file, indent=2)
    print("Created todays ZTCA tests file,{todays_file}".format(**locals()))
    out_file.close()



# -------------------------------------------------------------------------------------------------


write_todays_test_data_to_file()

Just a few snippets of interesting code here.

To get todays date as a string in the format ‘yyyymmdd’, example, 20200401, I used the datetime module.

today = datetime.date.today().strftime("%Y%m%d")

Python has an interesting syntax for slicing strings or lists up into pieces. I used it here to create a directory name using the current year and month.

year_month = today_str[0:4] + '_' + today_str[4:6]

The ‘[0:4]’ gets the first four characters of the string. The ‘[4:6]’ grabs the subsequent 2 characters of the string.  These are combined to create a sub-directory name like ‘2020_05’.

To get the directory location of this script, kind-of similar to the Find::Bin in Perl, I used the path method of the os path library.

this_bin = os.path.abspath(os.path.dirname(__file__))

After downloading the raw test data for the current date from the NYC department of health GitHub page, using the requests library.

r = requests.get(RAW_ZCTA_DATA_LINK)
    print("Resp: " + str(r.status_code))
    return r.text

It is then split up using the ‘re’ module, which seems to  be Pythons rather awkward way of doing regular expression matching.

 lvals = re.split('\s*,\s*', l )

This will split each line of input data similar to this,

"10003",396,1228,32.25

 Which can then be inserted to a python Dictionary structure like this,

{
  "zip": "10003",
  "yyyymmdd": "20200503",
  "positive": "396",
  "total_tested": "1228",
  "cumulative_percent_of_those_tested": "32.25"
}
{
  "zip": "10003",
  "yyyymmdd": "20200503",
  "positive": "396",
  "total_tested": "1228",
  "cumulative_percent_of_those_tested": "32.25"
}

This is appended to the end of a list of similar Dictionaries.

You may notice how I create the file path string is a little kludgy.

 todays_file = year_month_dir + '/' + today_str + '_tests_by_ztca.json'

I have since learned that there’s a better way to do this using the os path library, which I’ll do the next time. 

To print the data in JSON format to a file, Python provides the aptly named ‘json’ library.  To dump the data to a file, simply,

json.dump(test_data, out_file, indent=2)

The “indent=2”, isn’t necessary, but it makes the output more readable.

To read JSON data from the file, 

test_data = json.load(in_file)

Read more about it here, Python JSON docs.

In the next post I will add more functionality to add more location details for each zip code where the tests were conducted, using a NYC Zip Code database file.