{"id":772,"date":"2020-04-30T22:03:02","date_gmt":"2020-05-01T02:03:02","guid":{"rendered":"http:\/\/www.aibistin.com\/?p=772"},"modified":"2023-03-12T16:04:22","modified_gmt":"2023-03-12T20:04:22","slug":"nyc-covid-infections-by-zip-code-with-python","status":"publish","type":"post","link":"https:\/\/www.aibistin.com\/?p=772","title":{"rendered":"NYC Covid Infections By Zip Code With Python"},"content":{"rendered":"\n<p>In my <a href=\"http:\/\/www.aibistin.com\/?p=727\">last post<\/a> I created a CLI tool to display NYC Covid-19 test results by Zip code using Perl, my favorite language for the moment. I would also like to do the same using Python. Purely as a an excuse to learn Python. This will download the same data, from <a href=\"https:\/\/github.com\/nychealth\/coronavirus-data\/blob\/master\/tests-by-zcta.csv\" target=\"_blank\" rel=\"noopener noreferrer\">the NYC health department&#8217;s GitHub page&nbsp; <\/a>, and create a JSON file which I can use as a very basic database for later analysis. <\/p>\n<p>Here is an sample of the downloaded raw data.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\"MODZCTA\",\"Positive\",\"Total\",\"zcta_cum.perc_pos\"\nNA,1558,1862,83.67\n\"10001\",309,861,35.89\n\"10002\",870,2033,42.79\n\"10003\",396,1228,32.25\n\"10004\",27,85,31.76\n\"10005\",54,206,26.21\n\"10006\",21,91,23.08\n\"10007\",49,204,24.02\n\"10009\",607,1745,34.79<\/code><\/pre>\n\n\n\n<p>This is the first iteration of my script.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: false; title: ; notranslate\" title=\"\">\nfrom __future__ import print_function\nimport datetime, json, requests, os, re, sys\n\n\nRAW_ZCTA_DATA_LINK = 'https:\/\/raw.githubusercontent.com\/nychealth\/coronavirus-data\/master\/tests-by-zcta.csv'\n\nALL_ZCTA_DATA_CSV = 'all_zcta_data.csv'\n\n# -------------------------------------------------------------------------------------------------\n#         Functions\n# -------------------------------------------------------------------------------------------------\ndef get_today_str():\n    today = datetime.date.today().strftime(&quot;%Y%m%d&quot;)\n    return today\n\ndef find_bin():\n    this_bin = os.path.abspath(os.path.dirname(__file__))\n    return this_bin\n\ndef create_dir_if_not_exists(base_dir, dir_name):\n    the_dir =  base_dir + '\/' + dir_name\n    if not os.path.isdir(the_dir):\n        os.mkdir(the_dir)\n    return the_dir\n\ndef create_db_dirs():\n    this_bin = find_bin()\n    db_dir = create_dir_if_not_exists(this_bin, 'db')\n    today_str = get_today_str()\n    year_month = today_str&#x5B;0:4] + '_' + today_str&#x5B;4:6];\n    year_month_dir = create_dir_if_not_exists(db_dir, year_month)\n    return year_month_dir\n\ndef get_covid_test_data_text():\n    r = requests.get(RAW_ZCTA_DATA_LINK)\n    print(&quot;Resp: &quot; + str(r.status_code))\n    return r.text\n\ndef create_list_of_test_data():\n    test_vals = &#x5B;]\n    covid_text = get_covid_test_data_text()\n    for l in covid_text.splitlines():\n        lvals = re.split('\\s*,\\s*', l )\n        if lvals&#x5B;0] == '&quot;MODZCTA&quot;':\n            continue\n        zip_dic = { 'zip' : lvals&#x5B;0], 'positive': lvals&#x5B;1],  'total_tested': lvals&#x5B;2], 'cumulative_percent_of_those_tested': lvals&#x5B;3]}\n        test_vals.append(zip_dic) \n    return test_vals\n\ndef write_todays_test_data_to_file():\n    year_month_dir = create_db_dirs()\n    test_data = create_list_of_test_data()\n    print(test_data&#x5B;0])\n    today_str = get_today_str()\n    todays_file = year_month_dir + '\/' + today_str + '_tests_by_ztca.json'\n    out_file = open ( todays_file, 'w')\n    json.dump(test_data, out_file, indent=2)\n    print(&quot;Created todays ZTCA tests file,{todays_file}&quot;.format(**locals()))\n    out_file.close()\n\n\n\n# -------------------------------------------------------------------------------------------------\n\n\nwrite_todays_test_data_to_file()\n\n<\/pre><\/div>\n\n\n<p>Just a few snippets of interesting code here.<\/p>\n<p>To get todays date as a string in the format &#8216;yyyymmdd&#8217;, example, 20200401, I used the datetime module.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>today = datetime.date.today().strftime(\"%Y%m%d\")<\/code><\/pre>\n\n\n\n<p>Python has an interesting syntax for slicing strings or lists up into pieces. I used it here to create a directory name using the current year and month.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>year_month = today_str&#91;0:4] + '_' + today_str&#91;4:6]<\/code><\/pre>\n\n\n\n<p>The &#8216;[0:4]&#8217; gets the first four characters of the string. The &#8216;[4:6]&#8217; grabs the subsequent 2 characters of the string.&nbsp; These are combined to create a sub-directory name like &#8216;2020_05&#8217;.<\/p>\n<p>To get the directory location of this script, kind-of similar to the Find::Bin in Perl, I used the path method of the<a href=\"https:\/\/docs.python.org\/3\/library\/os.path.html\"> os path library<\/a>. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>this_bin = os.path.abspath(os.path.dirname(__file__))<\/code><\/pre>\n\n\n\n<p>After downloading the raw test data for the current date from the NYC department of health <a href=\"https:\/\/github.com\/nychealth\/coronavirus-data\/blob\/master\/tests-by-zcta.csv\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub page<\/a>, using the <a href=\"https:\/\/pypi.org\/project\/requests\/\">requests library<\/a>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>r = requests.get(RAW_ZCTA_DATA_LINK)\n    print(\"Resp: \" + str(r.status_code))\n    return r.text<\/code><\/pre>\n\n\n\n<p>It is then split up using the <a href=\"https:\/\/docs.python.org\/3\/library\/re.html\">&#8216;re&#8217; module<\/a>, which seems to&nbsp; be Pythons rather awkward way of doing regular expression matching.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> lvals = re.split('\\s*,\\s*', l )<\/code><\/pre>\n\n\n\n<p>This will split each line of input data similar to this,<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\"10003\",396,1228,32.25<\/code><\/pre>\n\n\n\n<p>&nbsp;Which can then be inserted to a python Dictionary structure like this,<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: jscript; gutter: false; title: ; notranslate\" title=\"\">\n{\n  &quot;zip&quot;: &quot;10003&quot;,\n  &quot;yyyymmdd&quot;: &quot;20200503&quot;,\n  &quot;positive&quot;: &quot;396&quot;,\n  &quot;total_tested&quot;: &quot;1228&quot;,\n  &quot;cumulative_percent_of_those_tested&quot;: &quot;32.25&quot;\n}\n<\/pre><\/div>\n\n\n<pre class=\"wp-block-code\"><code>{\n  \"zip\": \"10003\",\n  \"yyyymmdd\": \"20200503\",\n  \"positive\": \"396\",\n  \"total_tested\": \"1228\",\n  \"cumulative_percent_of_those_tested\": \"32.25\"\n}<\/code><\/pre>\n\n\n\n<p>This is appended to the end of a list of similar Dictionaries.<\/p>\n\n\n\n<p>You may notice how I create the file path string is a little kludgy.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> todays_file = year_month_dir + '\/' + today_str + '_tests_by_ztca.json'<\/code><\/pre>\n\n\n\n<p>I have since learned that there&#8217;s a better way to do this using the <a href=\"https:\/\/docs.python.org\/3\/library\/os.path.html#os.path.join\">os path library<\/a>, which I&#8217;ll do the next time.&nbsp;<\/p>\n<p>To print the data in JSON format to a file, Python provides the aptly named &#8216;json&#8217; library.&nbsp; To dump the data to a file, simply,<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>json.dump(test_data, out_file, indent=2)<\/code><\/pre>\n\n\n\n<p>The &#8220;indent=2&#8221;, isn&#8217;t necessary, but it makes the output more readable.<\/p>\n<p>To read JSON data from the file,&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>test_data = json.load(in_file)<\/code><\/pre>\n\n\n\n<p>Read more about it here, <a href=\"https:\/\/docs.python.org\/3\/library\/json.html\">Python JSON docs<\/a>.<\/p>\n<p>In the next post I will add more functionality to add more location details for each zip code where the tests were conducted, using a <a href=\"http:\/\/www.aibistin.com\/?p=673\">NYC Zip Code database<\/a> file.<\/p>\n\n\n\n<!--nextpage-->\n\n\n\n\n\n<p>Adding some more functionality to merge the test data with the New York City Zip data to provide more details about each location where the tested people are from .<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport datetime\nimport json\nimport requests\nimport os\nimport re\nimport sys\n\nRAW_ZCTA_DATA_LINK = 'https:\/\/raw.githubusercontent.com\/nychealth\/coronavirus-data\/master\/tests-by-zcta.csv'\nALL_ZCTA_DATA_CSV = 'all_zcta_data.csv'\nBIN_DIR = os.path.abspath(os.path.dirname(__file__))\nDB_DIR = os.path.join(BIN_DIR, '..', 'db')\nNA_ZIP = &quot;88888&quot;\nTHIS_SCRIPT = sys.argv&#x5B;0]\nZIP_DB = os.path.join(DB_DIR, 'zip_db.json')\n...\n...\ndef get_zip_data():\n    z_db = open(ZIP_DB, 'r')\n    zip_data = json.load(z_db)\n    z_db.close()\n    return zip_data\n\n\ndef get_filler_location_rec(bad_zip=NA_ZIP):\n    label = 'Unknown-' + bad_zip\n    return {\n        'zip': bad_zip,\n        'borough': label,\n        'city': label,\n        'district': label,\n        'county': label\n    }\n\n\n# Zip: 11697\n# Data: {&quot;borough&quot;: &quot;Queens&quot;, &quot;city&quot;: &quot;Breezy Point&quot;, &quot;county&quot;: &quot;Queens&quot;, &quot;district&quot;: &quot;Rockaways&quot;}\n# {\n#  &quot;zip&quot;: &quot;11697&quot;,\n#  &quot;positive&quot;: &quot;82&quot;,\n#  &quot;total_tested&quot;: &quot;193&quot;,\n#  &quot;cumulative_percent_of_those_tested&quot;: &quot;42.49&quot;\n# }\n\ndef merge_zip_data():\n    all_zip_data = get_zip_data()\n    todays_test_data = get_todays_test_data()\n    merged_data = &#x5B;]\n\n    for td in todays_test_data:\n        if td&#x5B;'zip'] == 'MODZCTA':\n            continue\n\n        zip_data = all_zip_data.get(td&#x5B;'zip'])\n        if not zip_data:\n            print(&quot;NO Zip data for &quot; + td&#x5B;'zip'])\n            zip_data = get_filler_location_rec(td&#x5B;'zip'])\n\n        zip_data.update(td.copy())\n        merged_data.append(zip_data)\n\n    return merged_data\n\n\ndef sort_test_data_func(zip_data):\n    return int(zip_data&#x5B;'positive'])\n\n\ndef write_todays_data_to_csv():\n    merged_test_data = merge_zip_data()\n    csv_file = get_todays_csv_file()\n    col_headers = &#x5B;\n        'Zip',\n        'Date',\n        'City',\n        'District',\n        'Borough',\n        'Total Tested',\n        'Positive',\n        '% of Tested']\n    cols = &#x5B;\n        'zip',\n        'yyyymmdd',\n        'city',\n        'district',\n        'borough',\n        'total_tested',\n        'positive',\n        'cumulative_percent_of_those_tested']\n\n    merged_test_data_sorted = sorted( merged_test_data, key=sort_test_data_func, reverse=True)\n   # merged_test_data_sorted = merged_test_data\n\n    csv_fh = open(csv_file, 'w')\n    csvwriter = csv.DictWriter(csv_fh, fieldnames=cols, restval='')\n    csvwriter.writeheader()\n    for zip_test_data in merged_test_data_sorted:\n        ztd = {x: zip_test_data&#x5B;x] for x in cols}\n        csvwriter.writerow(ztd)\n\n    csv_fh.close()\n    print(&quot;Finished writing to the &quot; + csv_file)\n\n\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nThe first subroutine, 'get_zip_data' just reads in the ZipCode data using Pythons 'json' library. The data comes in this format: \n<\/pre><\/div>\n\n\n<pre class=\"wp-block-code\"><code>&#91;\n  {\n    \"zip\": \"88888\",\n    \"yyyymmdd\": \"20200504\",\n    \"positive\": \"1607\",\n    \"total_tested\": \"1917\",\n    \"cumulative_percent_of_those_tested\": \"83.83\"\n  },\n  {\n    \"zip\": \"10001\",\n    \"yyyymmdd\": \"20200504\",\n    \"positive\": \"311\",\n    \"total_tested\": \"878\",\n    \"cumulative_percent_of_those_tested\": \"35.42\"\n  },\n...<\/code><\/pre>\n\n\n\n<p>The &#8216;get_filler_location_rec&#8217; function adds default data in the case where sometimes the NYC department of health doesn&#8217;t provide the ZipCode for the test set.&nbsp;<\/p>\n<p>The test data and the ZipCode information is then merged to add more locations details to the test results.&nbsp; It loops through the test data results, and for each record it does this,<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>zip_data = all_zip_data.get(td&#91;'zip'])\n...\nzip_data.update(td.copy())<\/code><\/pre>\n\n\n\n<p>It gets a dictionary of the location information for the zip code, updates that with a copy of the test data for that location.&nbsp; The combined Dictionary is then appended to the list of merged test data.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>merged_data.append(zip_data)<\/code><\/pre>\n\n\n\n<p>&nbsp; Writing out the merged data to a CSV file can be done using the <a href=\"https:\/\/docs.python.org\/3\/library\/csv.html\">csv<\/a> library.&nbsp; Id like to sort it in order of the number of positive cases descending,&nbsp; using the &#8216;sorted&#8217; function and applying my sort criteria with<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\ndef sort_test_data_func(zip_data):\n    return int(zip_data&#91;'positive'])\n...\n...\nmerged_test_data_sorted = sorted( merged_test_data, key=sort_test_data_func, reverse=True)<\/code><\/pre>\n\n\n\n<p>The resulting CSV file was double spaced.&nbsp; So, after looking at the &#8216;csv&#8217; library docs I changed the file open statement from:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>csv_fh = open(csv_file, 'w')<\/code><\/pre>\n\n\n\n<p>to<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>csv_fh = open(csv_file, 'w', newline = '')<\/code><\/pre>\n\n\n\n<p>You&#8217;ll also notice, ( if you haven&#8217;t fallen asleep already) that I used the &#8216;csv.DictWriter&#8217; (I&#8217;d love to know who comes up the naming in Python), as I am writing a list of Dictionaries to the CSV file. The DictWriter knows which dictionary fields to write to the CSV file using the &#8220;fieldnames=cols&#8221; attribute.\u00a0<\/p>\n<p>The CSV file looks something like this.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"327\" src=\"http:\/\/www.aibistin.com\/wp-content\/uploads\/2020\/05\/covid_csv_py-1024x327.png\" alt=\"\" class=\"wp-image-807\" srcset=\"https:\/\/www.aibistin.com\/wp-content\/uploads\/2020\/05\/covid_csv_py-1024x327.png 1024w, https:\/\/www.aibistin.com\/wp-content\/uploads\/2020\/05\/covid_csv_py-300x96.png 300w, https:\/\/www.aibistin.com\/wp-content\/uploads\/2020\/05\/covid_csv_py-768x245.png 768w, https:\/\/www.aibistin.com\/wp-content\/uploads\/2020\/05\/covid_csv_py.png 1077w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">NYC Covid-19 testing data CSV<\/figcaption><\/figure>\n\n\n\n\n\n<div class=\"wp-block-file\"><a id=\"wp-block-file--media-a6e49fab-da03-4100-82b9-93948a9144b7\" href=\"http:\/\/www.aibistin.com\/wp-content\/uploads\/2020\/05\/20200509_tests_by_ztca.csv\">20200509_tests_by_ztca<\/a><a href=\"http:\/\/www.aibistin.com\/wp-content\/uploads\/2020\/05\/20200509_tests_by_ztca.csv\" class=\"wp-block-file__button wp-element-button\" download aria-describedby=\"wp-block-file--media-a6e49fab-da03-4100-82b9-93948a9144b7\">Download<\/a><\/div>\n\n\n\n<p>And that&#8217;s all I have to say about that.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In my last post I created a CLI tool to display NYC Covid-19 test results by Zip code using Perl, my favorite language for the moment. I would also like to do the same using Python. Purely as a an excuse to learn Python. This will download the same data, from the NYC health department&#8217;s [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[105,104,103,99,100],"tags":[59,71,34,48,95,58,73,68,69,72,70,74,50],"class_list":["post-772","post","type-post","status-publish","format-standard","hentry","category-csv","category-data","category-new-york-city","category-programming","category-python","tag-covid-19","tag-dictionary","tag-file","tag-json","tag-new-york-city","tag-nyc","tag-os-path","tag-python","tag-python3","tag-re","tag-requests","tag-tests","tag-zipcodes"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.aibistin.com\/index.php?rest_route=\/wp\/v2\/posts\/772","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.aibistin.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aibistin.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aibistin.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aibistin.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=772"}],"version-history":[{"count":27,"href":"https:\/\/www.aibistin.com\/index.php?rest_route=\/wp\/v2\/posts\/772\/revisions"}],"predecessor-version":[{"id":901,"href":"https:\/\/www.aibistin.com\/index.php?rest_route=\/wp\/v2\/posts\/772\/revisions\/901"}],"wp:attachment":[{"href":"https:\/\/www.aibistin.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=772"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aibistin.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=772"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aibistin.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=772"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}