{"id":1020,"date":"2023-03-25T18:27:06","date_gmt":"2023-03-25T22:27:06","guid":{"rendered":"https:\/\/www.aibistin.com\/?p=1020"},"modified":"2023-03-25T18:39:06","modified_gmt":"2023-03-25T22:39:06","slug":"crash-city","status":"publish","type":"post","link":"https:\/\/www.aibistin.com\/?p=1020","title":{"rendered":"Crash City"},"content":{"rendered":"\n<div class=\"wp-block-jetpack-markdown\"><\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><h2>Analyzing NYC Traffic Collision Data on the Linux Command Line with SoQL and Curl<\/h2>\n<p>New York City and the NYPD publish a dataset of traffic collisions, related fatalities, injuries and other details <a href=\"https:\/\/dev.socrata.com\/foundry\/data.cityofnewyork.us\/h9gi-nx95\">here<\/a>.\nThis was first published in July 2012, and is updated regularly to this date.<\/p>\n<p>NYC Commuters, especially pedestrians and cyclists, have to endure many hazards just to get to and from work. Crime is one issue, but it&#8217;s not as treacherous as crossing Queens Boulevard during rush hour, or cycling in downtown Manhattan when <a href=\"https:\/\/www.nova.ie\/wp-content\/uploads\/2020\/05\/Clown-Car.jpg\">some clown<\/a> driving an SUV the size of an <a href=\"https:\/\/www.motorbiscuit.com\/american-trucks-suvs-almost-bigger-world-war-ii-tanks\/\">Sherman tank<\/a> is taking up half the road.<\/p>\n<p>Previously I did some <a href=\"https:\/\/www.aibistin.com\/?p=907\">analysis<\/a> using their downloadable CSV dataset.<\/p>\n<p>Here I\u2019m going to use the <a href=\"https:\/\/www.man7.org\/linux\/man-pages\/man1\/curl.1.html\">curl<\/a> utility along with the <a href=\"https:\/\/docs.oracle.com\/en\/database\/oracle\/simple-oracle-document-access\/rest\/index.html\">SODA<\/a>, Socrata Query Language <a href=\"https:\/\/dev.socrata.com\/docs\/endpoints.html\">SoQL<\/a>, to try and make some sense out of this published <a href=\"https:\/\/dev.socrata.com\/foundry\/data.cityofnewyork.us\/h9gi-nx95\">data<\/a>.<\/p>\n<h3>How many collisions in New York CIty since July 2012<\/h3>\n<pre><code class=\"language-bash\">curl --get --data-urlencode &quot;\\$\\$app_token=uvwxyz&quot; --data-urlencode &quot;\\$select=count(*)&quot; https:\/\/data.cityofnewyork.us\/resource\/h9gi-nx95.json \n[{&quot;count&quot;:&quot;1977803&quot;}]\n<\/code><\/pre>\n<h4>Explanation<\/h4>\n<ul>\n<li><code>curl --get<\/code> or <code>-G<\/code>\n<ul>\n<li>Use the GET verb as we are \u2018getting\u2019 data<\/li>\n<li><code>v<\/code>\n<ul>\n<li>Lots of verbose output as you can see from the above output.<\/li>\n<\/ul>\n<\/li>\n<li><code>d<\/code>\n<ul>\n<li>Request data to pass to the API in ASCII format<\/li>\n<\/ul>\n<\/li>\n<li><code>--data-urlencode<\/code>\n<ul>\n<li>URL-Encode the data. Safer than just using <code>-d<\/code><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li><code>$$app_token<\/code>\n<ul>\n<li>Users personal authorization. Not really necessary for ad-hoc requests<\/li>\n<li>Socrata open data API [App-Token]](https:\/\/dev.socrata.com\/docs\/app-tokens.html)<\/li>\n<li>I replaced my actual token with \u2018uvwxyz\u2019 for fairly obvious reasons<\/li>\n<\/ul>\n<\/li>\n<li>`&quot;$select=count(*)&quot;\n<ul>\n<li>Similar to the SQL <code>SELECT<\/code> and SQL <code>count<\/code> aggregate function<\/li>\n<li>SoQL <a href=\"https:\/\/dev.socrata.com\/docs\/queries\/select.html\">$select<\/a><\/li>\n<li>SoQL  <a href=\"https:\/\/dev.socrata.com\/docs\/functions\/count.html\">count<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h4>Observation<\/h4>\n<p>1,977,803 Collisions from July 2012 to March 2023 seems like a lot to me. You\u2019d wonder what\u2019s the point of driving tests if we still end up with this many collisions.<\/p>\n<h3>Getting all the collision records between two arbitrary dates, June 30th 2022 to December 31 2022.<\/h3>\n<p>This time I\u2019ll use the <code>-v<\/code> switch for curl to get a much more verbose output.<\/p>\n<pre><code class=\"language-bash\">&gt; curl --get -v  --data-urlencode &quot;\\$\\$app_token=xyz&quot; --data-urlencode &quot;\\$select=*&quot; --data-urlencode &quot;\\$where=crash_date between '2022-06-30T00:00:00.000' and '2022-12-31T00:00:00.000'&quot;  https:\/\/data.cityofnewyork.us\/resource\/h9gi-nx95.json\n\n % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 52.206.68.26:443...\n* Connected to data.cityofnewyork.us (52.206.68.26) port 443 (#0)\n...\n&gt; GET \/resource\/h9gi-nx95.json?$$app_token=xyz&amp;$select=%2A&amp;$where=crash_date+between+%272022-06-30T00%3A00%3A00.000%27+and+%272022-12-31T00%3A00%3A00.000%27 HTTP\/1.1\n&gt; Host: data.cityofnewyork.us\n&gt; User-Agent: curl\/7.81.0\n...\n&lt; HTTP\/1.1 200 OK\n....\n&lt; X-SODA2-Fields: [&quot;crash_date&quot;,&quot;crash_time&quot;,&quot;borough&quot;,&quot;zip_code&quot;,&quot;latitude&quot;,&quot;longitude&quot;,&quot;location&quot;,&quot;on_street_name&quot;,&quot;off_street_name&quot;,&quot;cross_street_name&quot;,&quot;number_of_persons_injured&quot;,&quot;number_of_persons_killed&quot;,&quot;number_of_pedestrians_injured&quot;,&quot;number_of_pedestrians_killed&quot;,&quot;number_of_cyclist_injured&quot;,&quot;number_of_cyclist_killed&quot;,&quot;number_of_motorist_injured&quot;,&quot;number_of_motorist_killed&quot;,&quot;contributing_factor_vehicle_1&quot;,&quot;contributing_factor_vehicle_2&quot;,&quot;contributing_factor_vehicle_3&quot;,&quot;contributing_factor_vehicle_4&quot;,&quot;contributing_factor_vehicle_5&quot;,&quot;collision_id&quot;,&quot;vehicle_type_code1&quot;,&quot;vehicle_type_code2&quot;,&quot;vehicle_type_code_3&quot;,&quot;vehicle_type_code_4&quot;,&quot;vehicle_type_code_5&quot;]\n&lt; X-SODA2-Types: [&quot;floating_timestamp&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;,&quot;number&quot;,&quot;number&quot;,&quot;location&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;,&quot;number&quot;,&quot;number&quot;,&quot;number&quot;,&quot;number&quot;,&quot;number&quot;,&quot;number&quot;,&quot;number&quot;,&quot;number&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;,&quot;number&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;]\n&lt; X-SODA2-Data-Out-Of-Date: false\n...\n{ [14733 bytes data]\n...\n[{&quot;crash_date&quot;:&quot;2022-06-30T00:00:00.000&quot;,&quot;crash_time&quot;:&quot;14:01&quot;,&quot;cross_street_name&quot;:&quot;101       EAST DRIVE&quot;,&quot;number_of_persons_injured&quot;:&quot;1&quot;,&quot;number_of_p\nersons_killed&quot;:&quot;0&quot;,&quot;number_of_pedestrians_injured&quot;:&quot;0&quot;,&quot;number_of_pedestrians_killed&quot;:&quot;0&quot;,&quot;number_of_cyclist_injured&quot;:&quot;1&quot;,&quot;number_of_cyclist_killed&quot;:\n&quot;0&quot;,&quot;number_of_motorist_injured&quot;:&quot;0&quot;,&quot;number_of_motorist_killed&quot;:&quot;0&quot;,&quot;contributing_factor_vehicle_1&quot;:&quot;Pedestrian\/Bicyclist\/Other Pedestrian Error\/Con\nfusion&quot;,&quot;collision_id&quot;:&quot;4542318&quot;,&quot;vehicle_type_code1&quot;:&quot;Bike&quot;}\n...\n,{&quot;crash_date&quot;:&quot;2022-07-03T00:00:00.000&quot;,&quot;crash_time&quot;:&quot;22:30&quot;,&quot;borough&quot;:&quot;BRONX&quot;,&quot;zip_code&quot;:&quot;10458&quot;,&quot;latitude&quot;:&quot;40.866802&quot;,&quot;longitude&quot;:&quot;-73.88444&quot;,&quot;location&quot;:{&quot;latitude&quot;:&quot;40.866802&quot;,&quot;longitude&quot;:&quot;-73.88444&quot;,&quot;human_address&quot;:&quot;{\\&quot;address\\&quot;: \\&quot;\\&quot;, \\&quot;city\\&quot;: \\&quot;\\&quot;, \\&quot;state\\&quot;: \\&quot;\\&quot;, \\&quot;zip\\&quot;: \\&quot;\\&quot;}&quot;},&quot;on_street_name&quot;:&quot;WEBSTER AVENUE&quot;,&quot;off_street_name&quot;:&quot;EAST 199 STREET&quot;,&quot;number_of_persons_injured&quot;:&quot;0&quot;,&quot;number_of_persons_killed&quot;:&quot;0&quot;,&quot;number_of_pedestrians_injured&quot;:&quot;0&quot;,&quot;number_of_pedestrians_killed&quot;:&quot;0&quot;,&quot;number_of_cyclist_injured&quot;:&quot;0&quot;,&quot;number_of_cyclist_killed&quot;:&quot;0&quot;,&quot;number_of_motorist_injured&quot;:&quot;0&quot;,&quot;number_of_motorist_killed&quot;:&quot;0&quot;,&quot;contributing_factor_vehicle_1&quot;:&quot;Driver Inattention\/Distraction&quot;,&quot;contributing_factor_vehicle_2&quot;:&quot;Unspecified&quot;,&quot;collision_id&quot;:&quot;4543075&quot;,&quot;vehicle_type_code1&quot;:&quot;Station Wagon\/Sport Utility Vehicle&quot;,&quot;vehicle_type_code2&quot;:&quot;Station Wagon\/Sport Utility Vehicle&quot;}]\n<\/code><\/pre>\n<h4>Explanation<\/h4>\n<ul>\n<li>\n<p>1000 records<\/p>\n<ul>\n<li>When no <code>$limit<\/code> is set, this is the default maximum rows returned<\/li>\n<\/ul>\n<\/li>\n<li>\n<p><code>curl --get<\/code> or <code>-G<\/code><\/p>\n<ul>\n<li>Use the GET verb as we are \u2018getting\u2019 data<\/li>\n<li><code>-v<\/code>\n<ul>\n<li>Lots of verbose output as you can see<\/li>\n<\/ul>\n<\/li>\n<li><code>-d<\/code>\n<ul>\n<li>Request data to pass to the API in ASCII format<\/li>\n<\/ul>\n<\/li>\n<li><code>--data-urlencode<\/code>\n<ul>\n<li>URL-Encode the data. Safer than just using <code>-d<\/code><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<p><code>$$app_token<\/code><\/p>\n<ul>\n<li>Users personal authorization. Not really necessary for ad-hoc requests<\/li>\n<\/ul>\n<\/li>\n<li>\n<p>`&quot;$select=*&quot;<\/p>\n<ul>\n<li>Similar to an SQL <code>SELECT<\/code><\/li>\n<li>Selecting all columns. This is the default and can be omitted<\/li>\n<li>SoQL <a href=\"https:\/\/dev.socrata.com\/docs\/queries\/select.html\">$select<\/a><\/li>\n<\/ul>\n<\/li>\n<li>\n<p><code>$where<\/code><\/p>\n<ul>\n<li>Similar to SQL <code>WHERE<\/code> to filter down data.<\/li>\n<li>SoQL <a href=\"https:\/\/dev.socrata.com\/docs\/queries\/where.html\">$where<\/a><\/li>\n<\/ul>\n<\/li>\n<li>\n<p><code>between \u2026 and \u2026<\/code><\/p>\n<ul>\n<li>SoQL <a href=\"https:\/\/dev.socrata.com\/docs\/functions\/between.html\">between<\/a><\/li>\n<li>Narrow our results down to collisions between the two *<em>inclusive<\/em> \u2018crash_date\u2019 values<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h4>Observation<\/h4>\n<p>It turns out after piping this request to a <code>wc<\/code> command, that the API only returns 1000 rows, which is the default maximum amount if the <code>$limit<\/code> clause isn\u2019t specified. With the <code>$limit<\/code> clause, the maximum amount that can be returned with one call is 50,000 rows. To get more, you will need to order and <a href=\"https:\/\/dev.socrata.com\/docs\/paging.html\">page<\/a> through the data.\nOne other thing to note here is that when using the <code>-v<\/code>, verbose switch, you get to see the column names and their data types.<\/p>\n<h5>The NYC dataset column names<\/h5>\n<pre><code class=\"language-bash\">&quot;crash_date&quot;,&quot;crash_time&quot;,&quot;borough&quot;,&quot;zip_code&quot;,&quot;latitude&quot;,&quot;longitude&quot;,&quot;location&quot;,&quot;on_street_name&quot;,&quot;off_street_name&quot;,&quot;cross_street_name&quot;,&quot;number_of_persons_injured&quot;,&quot;number_of_persons_killed&quot;,&quot;number_of_pedestrians_injured&quot;,&quot;number_of_pedestrians_killed&quot;,&quot;number_of_cyclist_injured&quot;,&quot;number_of_cyclist_killed&quot;,&quot;number_of_motorist_injured&quot;,&quot;number_of_motorist_killed&quot;,&quot;contributing_factor_vehicle_1&quot;,&quot;contributing_factor_vehicle_2&quot;,&quot;contributing_factor_vehicle_3&quot;,&quot;contributing_factor_vehicle_4&quot;,&quot;contributing_factor_vehicle_5&quot;,&quot;collision_id&quot;,&quot;vehicle_type_code1&quot;,&quot;vehicle_type_code2&quot;,&quot;vehicle_type_code_3&quot;,&quot;vehicle_type_code_4&quot;,&quot;vehicle_type_code_5&quot;\n<\/code><\/pre>\n<h4>Corresponding Field Data Types<\/h4>\n<pre><code class=\"language-bash\">&quot;floating_timestamp&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;,&quot;number&quot;,&quot;number&quot;,&quot;location&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;,&quot;number&quot;,&quot;number&quot;,&quot;number&quot;,&quot;number&quot;,&quot;number&quot;,&quot;number&quot;,&quot;number&quot;,&quot;number&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;,&quot;number&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;,&quot;text&quot;\n<\/code><\/pre>\n<h3>SoQL <a href=\"https:\/\/dev.socrata.com\/docs\/queries\/\">Query Clauses<\/a> from the Docs<\/h3>\n<table>\n<thead>\n<tr>\n<th>Parameter<\/th>\n<th>Description<\/th>\n<th>Default<\/th>\n<th style=\"text-align:center\">In $query<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>$select<\/td>\n<td>The set of columns to be returned, similar to a SELECT in SQL<\/td>\n<td>All columns, equivalent to $select=*<\/td>\n<td style=\"text-align:center\">SELECT<\/td>\n<\/tr>\n<tr>\n<td>$where<\/td>\n<td>Filters the rows to be returned, similar to WHERE<\/td>\n<td>No filter<\/td>\n<td style=\"text-align:center\">WHERE<\/td>\n<\/tr>\n<tr>\n<td>$order<\/td>\n<td>Column to order results on, similar to ORDER BY in SQL<\/td>\n<td>Unspecified order<\/td>\n<td style=\"text-align:center\">ORDER BY<\/td>\n<\/tr>\n<tr>\n<td>$group<\/td>\n<td>Column to group results on, similar to GROUP BY in SQL<\/td>\n<td>No grouping<\/td>\n<td style=\"text-align:center\">GROUP BY<\/td>\n<\/tr>\n<tr>\n<td>$having<\/td>\n<td>Filters the rows that result from an aggregation, similar to HAVING<\/td>\n<td>No filter<\/td>\n<td style=\"text-align:center\">HAVING<\/td>\n<\/tr>\n<tr>\n<td>$limit<\/td>\n<td>Maximum number of results to return<\/td>\n<td>1000 (2.0 endpoints: maximum of 50,000; 2.1: unlimited \u00bb)<\/td>\n<td style=\"text-align:center\">LIMIT<\/td>\n<\/tr>\n<tr>\n<td>$offset<\/td>\n<td>Offset count into the results to start at, used for paging<\/td>\n<td>0<\/td>\n<td style=\"text-align:center\">OFFSET<\/td>\n<\/tr>\n<tr>\n<td>$q<\/td>\n<td>Performs a full text search for a value.<\/td>\n<td>No search<\/td>\n<td style=\"text-align:center\">N\/A<\/td>\n<\/tr>\n<tr>\n<td>$query<\/td>\n<td>A full SoQL query string, all as one parameter<\/td>\n<td>N\/A<\/td>\n<td style=\"text-align:center\">N\/A<\/td>\n<\/tr>\n<tr>\n<td>$$bom<\/td>\n<td>Prepends a UTF-8 Byte Order Mark to the beginning of CSV output<\/td>\n<td>false<\/td>\n<td style=\"text-align:center\">N\/A<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3>Get all the collisions for zip code 10036, Times Square NYC, for Feb 2023<\/h3>\n<p>Save it into file <code>times_square_july_2022.json<\/code><\/p>\n<pre><code class=\"language-bash\">curl --get --data-urlencode &quot;\\$\\$app_token=uvwxyz&quot;  --data-urlencode &quot;\\$select=*&quot; \/ --data-urlencode &quot;\\$where=crash_date between '2023-02-01T00:00:00.000' and '2023-02-28T00:00:00.000'&quot; --data-urlencode &quot;zip_code=10036&quot;  https:\/\/data.cityofnewyork.us\/resource\/h9gi-nx95.json &gt;    collisions_z10036_feb_2023.json\n<\/code><\/pre>\n<h4>Explanation<\/h4>\n<ul>\n<li>&quot;$where=crash_date between &#8216;2023-02-01T00:00:00.000&#8217; and &#8216;2023-02-28T00:00:00.000&#8217;&quot; &#8211;data-urlencode &quot;zip_code=10036&quot;\n<ul>\n<li>Specify dates between and including February 1st to the 28th.<\/li>\n<li><code>zip_code=10036<\/code>  to narrow down our results.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Count how many collisions using the Linux <code>wc<\/code> command with our newly created file, <code>times_square_july_2022.json<\/code>.<\/p>\n<pre><code class=\"language-bash\">wc -l collisions_z10036_feb_2023.json \n25 collisions_z10036_feb_2023.json\n<\/code><\/pre>\n<p>Double check that count of 25 collisions, using the SoQl <code>count(*)<\/code> function.<\/p>\n<pre><code class=\"language-bash\">&gt; curl --get --data-urlencode &quot;\\$\\$app_token=uvwxyz&quot;  --data-urlencode &quot;\\$select=count(*)&quot;  --data-urlencode &quot;\\$where=crash_date between '2023-02-01T00:00:00.000' and '2023-02-28T00:00:00.000'&quot; --data-urlencode &quot;zip_code=10036&quot;  https:\/\/data.cityofnewyork.us\/resource\/h9gi-nx95.json \n[{&quot;count&quot;:&quot;25&quot;}]\n<\/code><\/pre>\n<h4>Explanation<\/h4>\n<ul>\n<li><code>\\$select=count(*)<\/code>\n<ul>\n<li>Similar to the SQL <code>count<\/code> function, this uses the SoQL <a href=\"https:\/\/dev.socrata.com\/docs\/functions\/count.html\">count<\/a> function to count the number of rows that match our search criteria.<\/li>\n<\/ul>\n<\/li>\n<li><code>[{&quot;count&quot;:&quot;25&quot;}]<\/code>,  which matches the number of records in the collisions_z10036_feb_2023.json file<\/li>\n<\/ul>\n<h3>Observation<\/h3>\n<p>25 collisions in one midtown zip code for February is almost 1 collision a day. I\u2019m sure that&#8217;s lower than many other zip codes.<\/p>\n<h3>Get the 10 worst zip codes for collisions in February 2023<\/h3>\n<pre><code class=\"language-bash\">&gt; curl --get --silent  \u2018$$app_token=uvwxyz\u2019  --data-urlencode &quot;\\$select=count(*), zip_code&quot;   --data-urlencode &quot;\\$where=crash_date between '2023-02-01T00:00:00.000' and '2023-02-28T00:00:00.000'&quot;  --data-urlencode '$group=zip_code'   https:\/\/data.cityofnewyork.us\/resource\/h9gi-nx95.json | jq -r '.[] | .zip_code + &quot; &quot; + .count' | sort  -k 2,2nr -k 1n | head -n10\n11207 105\n11212 85\n11208 79\n11226 75\n11234 72\n11236 72\n11101 71\n11203 67\n11368 67\n11211 62\n<\/code><\/pre>\n<h4>Explanation<\/h4>\n<p>OK, I threw in a lot of commands here.<\/p>\n<ul>\n<li>&quot;$select=count(*), zip_code&quot;\n<ul>\n<li>Selecting the count and zip_code<\/li>\n<li>SoQL <a href=\"https:\/\/dev.socrata.com\/docs\/functions\/count.html\">count<\/a> function to count the number of rows that match our search criteria.<\/li>\n<\/ul>\n<\/li>\n<li><code>$group=zip_code<\/code>\n<ul>\n<li>Similar to the SQL <code>GROUP BY<\/code><\/li>\n<li>Returns aggregate rows grouped by the <code>zip_code<\/code><\/li>\n<\/ul>\n<\/li>\n<li><code>jq -r '.[] | .zip_code + &quot; &quot; + .count'<\/code>\n<ul>\n<li>Using the very useful  <a href=\"https:\/\/stedolan.github.io\/jq\/\">jq<\/a> to do additional filtering<\/li>\n<li><code>jq<\/code> bills itself as, \u201ca lightweight and flexible command-line JSON processor\u201d<\/li>\n<li>I  extract the zip_code and collision count for each zip code and concatenate them using the <code>bash<\/code> +, concatenation operator<\/li>\n<\/ul>\n<\/li>\n<li><code>sort  -k 2,2nr -k 1n<\/code>\n<ul>\n<li>Using the bash <a href=\"https:\/\/ss64.com\/bash\/sort.html\">sort<\/a> command, we do a reverse numerical sort by the second field, which is the count. We also do a numerical sort on the zip_code for zip_codes with identical collision counts<\/li>\n<\/ul>\n<\/li>\n<li><code>head -n10<\/code>\n<ul>\n<li>This gets the first 10, which are the 10 zip codes with the most collisions, starting with the very worst.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h4>Observation<\/h4>\n<p>I could have used SoQL <code>$sort<\/code> and <code>$limit<\/code> to do some of this work, but I chose the <code>bash<\/code> sort, just because &#8230;\nZip code <a href=\"https:\/\/www.unitedstateszipcodes.org\/11207\">11207<\/a>, East New York, Brooklyn, emerges as the zip with the most collisions in February.\nThis zip has a lot of issues with traffic safety, as you could also check <a href=\"https:\/\/www.aibistin.com\/?p=907\">here<\/a> .\n105 collisions in one month.  3.75 a day? There\u2019s something seriously wrong there. You\u2019d probably need some kind of armor suit just to cross the street there.<\/p>\n<h3>As the queries get more complex, these one line commands start to get long and hard to manage.<\/h3>\n<p>Curl has an option to create a config file. On a Linux system the default config is usually <code>~\/.curlrc<\/code>. You can specify a config file with the <code>-K<\/code> or <code>--config<\/code> switch.<\/p>\n<p>I created the below config file for these requests\nThe config file sets the NYC API URL, the $$app_token parameter, a GET request, as well as asking for verbose output<\/p>\n<pre><code>##### The .\/.nyc_curlrc file contents\n# --- NYC Collision Data ---\nget\nurl = &quot;https:\/\/data.cityofnewyork.us\/resource\/h9gi-nx95.json&quot;\ndata-urlencode  =  &quot;\\$\\$app_token=uvwxyz&quot;\nverbose\n<\/code><\/pre>\n<p>The previous example can now be rewritten to use the <code>.nyc_curlrc<\/code>  config file. I also broke up the commands into separate lines using the bash continuation \u2018\\\u2019 . Enclosing some of the commands in single quotes also means that the \u2018$\u2019 doesn\u2019t need to be escaped.<\/p>\n<pre><code class=\"language-bash\">&gt; curl -K .\/.nyc_curlrc \\\n --data-urlencode '$select=count(*), zip_code' \\\n --data-urlencode '$where=crash_date between &quot;2023-02-01T00:00:00.000&quot; and &quot;2023-02-28T00:00:00.000&quot;' \\\n --data-urlencode '$group=zip_code'  \\\n  | jq -r '.[] | .zip_code + &quot; &quot; + .count' | sort  -k 2,2nr -k 1n | head -n10\n<\/code><\/pre>\n<p>This is a little more concise than the previous version, and yields the same result.<\/p>\n<h3>Now to find how many cyclists and pedestrians were killed over the duration of this dataset<\/h3>\n<pre><code class=\"language-bash\"> curl -K .\/.nyc_curlrc \\\n  --data-urlencode &quot;\\$select=date_extract_y(crash_date) AS year, SUM(number_of_pedestrians_killed) AS tot_pedestrians_killed, SUM(number_of_cyclist_killed) AS tot_cyclist_killed&quot;  \\\n  --data-urlencode &quot;\\$group=year&quot; \\\n  --data-urlencode &quot;\\$order=tot_pedestrians_killed DESC&quot;  | jq .\n\n  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n100   885    0   885    0     0   1616      0 --:--:-- --:--:-- --:--:--  1614\n[\n  {\n    &quot;year&quot;: &quot;2013&quot;,\n    &quot;tot_pedestrians_killed&quot;: &quot;176&quot;,\n    &quot;tot_cyclist_killed&quot;: &quot;11&quot;\n  },\n  {\n    &quot;year&quot;: &quot;2016&quot;,\n    &quot;tot_pedestrians_killed&quot;: &quot;149&quot;,\n    &quot;tot_cyclist_killed&quot;: &quot;18&quot;\n  },\n  {\n    &quot;year&quot;: &quot;2014&quot;,\n    &quot;tot_pedestrians_killed&quot;: &quot;133&quot;,\n    &quot;tot_cyclist_killed&quot;: &quot;20&quot;\n  },\n  {\n    &quot;year&quot;: &quot;2015&quot;,\n    &quot;tot_pedestrians_killed&quot;: &quot;133&quot;,\n    &quot;tot_cyclist_killed&quot;: &quot;15&quot;\n  },\n  {\n    &quot;year&quot;: &quot;2022&quot;,\n    &quot;tot_pedestrians_killed&quot;: &quot;132&quot;,\n    &quot;tot_cyclist_killed&quot;: &quot;18&quot;\n  },\n  {\n    &quot;year&quot;: &quot;2019&quot;,\n    &quot;tot_pedestrians_killed&quot;: &quot;131&quot;,\n    &quot;tot_cyclist_killed&quot;: &quot;31&quot;\n  },\n  {\n    &quot;year&quot;: &quot;2021&quot;,\n    &quot;tot_pedestrians_killed&quot;: &quot;129&quot;,\n    &quot;tot_cyclist_killed&quot;: &quot;19&quot;\n  },\n  {\n    &quot;year&quot;: &quot;2017&quot;,\n    &quot;tot_pedestrians_killed&quot;: &quot;127&quot;,\n    &quot;tot_cyclist_killed&quot;: &quot;27&quot;\n  },\n  {\n    &quot;year&quot;: &quot;2018&quot;,\n    &quot;tot_pedestrians_killed&quot;: &quot;123&quot;,\n    &quot;tot_cyclist_killed&quot;: &quot;10&quot;\n  },\n  {\n    &quot;year&quot;: &quot;2020&quot;,\n    &quot;tot_pedestrians_killed&quot;: &quot;101&quot;,\n    &quot;tot_cyclist_killed&quot;: &quot;29&quot;\n  },\n  {\n    &quot;year&quot;: &quot;2012&quot;,\n    &quot;tot_pedestrians_killed&quot;: &quot;72&quot;,\n    &quot;tot_cyclist_killed&quot;: &quot;6&quot;\n  },\n  {\n    &quot;year&quot;: &quot;2023&quot;,\n    &quot;tot_pedestrians_killed&quot;: &quot;18&quot;,\n    &quot;tot_cyclist_killed&quot;: &quot;8&quot;\n  }\n]\n<\/code><\/pre>\n<h4>Explanation<\/h4>\n<ul>\n<li>date_extract_y(crash_date) AS year\n<ul>\n<li>Will extract \u20182023\u2019 from \u20182023-02-03T00:00:00.000\u2019<\/li>\n<li>SoQL <a href=\"https:\/\/dev.socrata.com\/docs\/functions\/date_extract_y.html\">date_extract_y<\/a><\/li>\n<li>SUM(number_of_pedestrians_killed) AS tot_pedestrians_killed\n<ul>\n<li><code>SUM<\/code>\n<ul>\n<li>Similar to SQL <code>SUM<\/code> aggregate function.<\/li>\n<\/ul>\n<\/li>\n<li><code>AS<\/code>\n<ul>\n<li>Give these aggregate results a meaningful label<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li><code>$group=year<\/code>  and <code>$order=tot_pedestrians_killed<\/code>\n<ul>\n<li>Similar to the SQL <code>GROUP BY<\/code> and <code>ORDER BY<\/code><\/li>\n<li>Returns aggregate rows grouped by the year they occurred.<\/li>\n<li>Sorted having the year with most pedestrian fatalities first<\/li>\n<\/ul>\n<\/li>\n<li><code>jq .<\/code>\n<ul>\n<li>This is the most basic <a href=\"https:\/\/stedolan.github.io\/jq\/\">jq<\/a> command<\/li>\n<li>It just prints the JSON output in it\u2019s default \u201cpretty\u201d format<\/li>\n<\/ul>\n<\/li>\n<li>We could have added <code>--silent<\/code> to the <code>curl<\/code> command or config file, to not print the curl download statistics.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h4>Observation<\/h4>\n<pre><code>    2012 and the current year, 2023,  can be omitted as both years have incomplete data. \n<\/code><\/pre>\n<h3>Run the previous query minus years 2012 and 2023<\/h3>\n<pre><code class=\"language-bash\">curl -K .\/.nyc_curlrc   --data-urlencode '$select=date_extract_y(crash_date) AS year, SUM(number_of_pedestrians_killed) AS tot_pedestrians_killed, SUM(number_of_cyclist_killed) AS tot_cyclists_killed' \\\n  --data-urlencode '$where=year not in (&quot;2012&quot;, &quot;2023&quot;)' \\\n  --data-urlencode '$group=year' \\\n  --data-urlencode '$order=tot_pedestrians_killed DESC, tot_cyclists_killed'\n\n\n[{&quot;year&quot;:&quot;2013&quot;,&quot;tot_pedestrians_killed&quot;:&quot;176&quot;,&quot;tot_cyclists_killed&quot;:&quot;11&quot;}\n,{&quot;year&quot;:&quot;2016&quot;,&quot;tot_pedestrians_killed&quot;:&quot;149&quot;,&quot;tot_cyclists_killed&quot;:&quot;18&quot;}\n,{&quot;year&quot;:&quot;2015&quot;,&quot;tot_pedestrians_killed&quot;:&quot;133&quot;,&quot;tot_cyclists_killed&quot;:&quot;15&quot;}\n,{&quot;year&quot;:&quot;2014&quot;,&quot;tot_pedestrians_killed&quot;:&quot;133&quot;,&quot;tot_cyclists_killed&quot;:&quot;20&quot;}\n,{&quot;year&quot;:&quot;2022&quot;,&quot;tot_pedestrians_killed&quot;:&quot;132&quot;,&quot;tot_cyclists_killed&quot;:&quot;18&quot;}\n,{&quot;year&quot;:&quot;2019&quot;,&quot;tot_pedestrians_killed&quot;:&quot;131&quot;,&quot;tot_cyclists_killed&quot;:&quot;31&quot;}\n,{&quot;year&quot;:&quot;2021&quot;,&quot;tot_pedestrians_killed&quot;:&quot;129&quot;,&quot;tot_cyclists_killed&quot;:&quot;19&quot;}\n,{&quot;year&quot;:&quot;2017&quot;,&quot;tot_pedestrians_killed&quot;:&quot;127&quot;,&quot;tot_cyclists_killed&quot;:&quot;27&quot;}\n,{&quot;year&quot;:&quot;2018&quot;,&quot;tot_pedestrians_killed&quot;:&quot;123&quot;,&quot;tot_cyclists_killed&quot;:&quot;10&quot;}\n,{&quot;year&quot;:&quot;2020&quot;,&quot;tot_pedestrians_killed&quot;:&quot;101&quot;,&quot;tot_cyclists_killed&quot;:&quot;29&quot;}]\n<\/code><\/pre>\n<h4>Explanation<\/h4>\n<ul>\n<li>$where=year not in (&quot;2012&quot;, &quot;2023&quot;)\n<ul>\n<li>Added a <code>WHERE<\/code> clause to omit years 2012 and 2023 from the query<\/li>\n<li>SoQL <a href=\"https:\/\/dev.socrata.com\/docs\/functions\/not_in.html\">not in (&#8230;)<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h4>Observation<\/h4>\n<p>Well, it\u2019s not that safe being a pedestrian or cyclist in New York City. Checking the injury count would yield much higher numbers.<\/p>\n<h3>Run a query to get a yearly total of injured pedestrians and cyclists<\/h3>\n<p>Our query string was getting a little bit out of hand and difficult to manage.\nI created a dedicated config file, <code>.nyc_ped_cyc_injured_yearly_curlrc<\/code>  for our next request.<\/p>\n<h4>The Config<\/h4>\n<pre><code class=\"language-bash\">&gt; cat .nyc_ped_cyc_injured_yearly_curlrc \n# --- NYC Collision Data - Injured List  ---\nget\nurl = &quot;https:\/\/data.cityofnewyork.us\/resource\/h9gi-nx95.json&quot;\ndata-urlencode  = &quot;\\$\\$app_token=uvwxyz&quot;\ndata-urlencode  = &quot;\\$select=date_extract_y(crash_date) AS year, SUM(number_of_pedestrians_injured) AS tot_pedestrians_injured, SUM(number_of_cyclist_injured) AS tot_cyclists_injured&quot;\ndata-urlencode  = &quot;\\$where=year not in ('2012','2023')&quot;\ndata-urlencode  = &quot;\\$group=year&quot; \ndata-urlencode  = &quot;\\$order=tot_pedestrians_injured DESC, tot_cyclists_injured DESC&quot;\nsilent\n<\/code><\/pre>\n<h4>Query using the config file<\/h4>\n<pre><code class=\"language-bash\">&gt;  curl --config .\/.nyc_ped_cyc_injured_yearly_curlrc \n[{&quot;year&quot;:&quot;2013&quot;,&quot;tot_pedestrians_injured&quot;:&quot;11988&quot;,&quot;tot_cyclists_injured&quot;:&quot;4075&quot;}\n,{&quot;year&quot;:&quot;2017&quot;,&quot;tot_pedestrians_injured&quot;:&quot;11151&quot;,&quot;tot_cyclists_injured&quot;:&quot;4889&quot;}\n,{&quot;year&quot;:&quot;2018&quot;,&quot;tot_pedestrians_injured&quot;:&quot;11123&quot;,&quot;tot_cyclists_injured&quot;:&quot;4725&quot;}\n,{&quot;year&quot;:&quot;2016&quot;,&quot;tot_pedestrians_injured&quot;:&quot;11090&quot;,&quot;tot_cyclists_injured&quot;:&quot;4975&quot;}\n,{&quot;year&quot;:&quot;2014&quot;,&quot;tot_pedestrians_injured&quot;:&quot;11036&quot;,&quot;tot_cyclists_injured&quot;:&quot;4000&quot;}\n,{&quot;year&quot;:&quot;2019&quot;,&quot;tot_pedestrians_injured&quot;:&quot;10568&quot;,&quot;tot_cyclists_injured&quot;:&quot;4986&quot;}\n,{&quot;year&quot;:&quot;2015&quot;,&quot;tot_pedestrians_injured&quot;:&quot;10084&quot;,&quot;tot_cyclists_injured&quot;:&quot;4281&quot;}\n,{&quot;year&quot;:&quot;2022&quot;,&quot;tot_pedestrians_injured&quot;:&quot;8963&quot;,&quot;tot_cyclists_injured&quot;:&quot;5025&quot;}\n,{&quot;year&quot;:&quot;2021&quot;,&quot;tot_pedestrians_injured&quot;:&quot;7503&quot;,&quot;tot_cyclists_injured&quot;:&quot;4961&quot;}\n,{&quot;year&quot;:&quot;2020&quot;,&quot;tot_pedestrians_injured&quot;:&quot;6691&quot;,&quot;tot_cyclists_injured&quot;:&quot;5576&quot;}]\n<\/code><\/pre>\n<h4>Observation<\/h4>\n<p>Looks like the config file worked as expected. While the number of pedestrians injured is declining a little, the number of cyclists injured is going in the opposite direction.<\/p>\n<h3>Using <a href=\"https:\/\/stedolan.github.io\/jq\/\">jq<\/a> to do additional filtering<\/h3>\n<p>Similar to the previous query, extract the yearly totals of injured cyclists. This time we\u2019ll use <a href=\"https:\/\/stedolan.github.io\/jq\/\">jq<\/a> to filter the output.<\/p>\n<pre><code class=\"language-bash\">&gt; curl --config .\/.nyc_ped_cyc_injured_yearly_curlrc \\\n  | jq -r '.[] | .year + &quot;,&quot; + .tot_cyclists_injured' | sort -k 1n \\\n  | column -t -s, --table-columns=Year,CyclistsInjured \nYear  CyclistsInjured\n2013  4075\n2014  4000\n2015  4281\n2016  4975\n2017  4889\n2018  4725\n2019  4986\n2020  5576\n2021  4961\n2022  5025\n<\/code><\/pre>\n<h4>Explanation<\/h4>\n<p>This is similar to the previous query except I used  <a href=\"https:\/\/stedolan.github.io\/jq\/\">jq<\/a> to extract the injured cyclist data only from the returned results.<\/p>\n<ul>\n<li><code>sort -k 1n<\/code>\n<ul>\n<li>Sort the Year, numerically<\/li>\n<\/ul>\n<\/li>\n<li><code>column -t -s, --table-columns=Year,CyclistsInjured<\/code>\n<ul>\n<li>Add column headers for readability<\/li>\n<li>The <code>jq<\/code> command already created comma separated results<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h4>Observation<\/h4>\n<p>2020 and 2022 were the worst years for bicyclist injuries. 2020 was a year where cycling became more popular. The injuries dropped a little in 2021, maybe because cyclists got a little scared after the slaughter in 2020. The upward trend may be returning, based on the 2022 results.<\/p>\n<h3>Get the 10 worst zip codes for collisions in January 2023<\/h3>\n<p>Previously I got the [10 worst Zip codes for collisions in February][Get the 10 worst zip codes for collisions in February 2023]. I used some bash commands to fine tune results. Here I will use SoQL to do most of the heavy lifting.<\/p>\n<h4>Config file <code>.nyc_jan_coll_curlrc<\/code><\/h4>\n<pre><code class=\"language-bash\">&gt; cat .nyc_jan_coll_curlrc \n# --- NYC Collision Data - January Collisions  ---\nget\nurl = &quot;https:\/\/data.cityofnewyork.us\/resource\/h9gi-nx95.json&quot;\ndata-urlencode  = &quot;\\$\\$app_token=uvwxyz&quot;\ndata-urlencode = &quot;\\$select=zip_code,count(zip_code) AS collision_count&quot;\ndata-urlencode = &quot;\\$where=crash_date between '2023-01-01' AND '2023-01-31' &quot;\ndata-urlencode = &quot;\\$group=zip_code&quot;\ndata-urlencode = &quot;\\$order=collision_count DESC, zip_code&quot;\ndata-urlencode = &quot;\\$limit=10&quot;\nsilent\n<\/code><\/pre>\n<pre><code class=\"language-bash\">&gt; curl --config .\/.nyc_jan_coll_curlrc \\\n  | jq -r '.[] | .zip_code + &quot;, &quot; + .collision_count' \\\n  | column -t -s, --table-columns=ZipCode,CollisionCount\nZipCode  CollisionCount\n11207     124\n11236     83\n11208     82\n11212     77\n11203     69\n11385     67\n11234     66\n11206     64\n10002     63\n11101     61\n<\/code><\/pre>\n<h4>Explanation<\/h4>\n<p>Most of this is similar to our earlier request for February stats. This time we are using a new config file <code>.nyc_jan_coll_curlrc<\/code>.<br>\nInstead of sorting the results using the bash sort, we sort using the SoQL <code>[$order](https:\/\/dev.socrata.com\/docs\/queries\/order.html)<\/code>. We get the 10 worst using the <code>$limit<\/code> clause.<\/p>\n<ul>\n<li><code>\\$order=collision_count DESC, zip_code<\/code>\n<ul>\n<li><a href=\"https:\/\/dev.socrata.com\/docs\/queries\/order.html\">Sort<\/a> the collision count from worst to \u201cleast worst\u201d.<\/li>\n<li>zip_code ascending sort<\/li>\n<\/ul>\n<\/li>\n<li><code>$limit=10<\/code>\n<ul>\n<li>Get the first 10 after the sort using <a href=\"https:\/\/dev.socrata.com\/docs\/queries\/limit.html\">$limit<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h4>Observation<\/h4>\n<p>Zip Code <a href=\"https:\/\/www.unitedstateszipcodes.org\/11207\">11207<\/a>, again emerges as a collision prone area with 124 collisions in January.  That\u2019s 4 collisions a day. Every day is a regular demolition derby day in that part of Brooklyn.<\/p>\n<h2>Some Perl CLI Resources<\/h2>\n<p><a href=\"https:\/\/perldoc.perl.org\/perlrun\">Perldocs &#8211; perlrun<\/a><\/p>\n<p><a href=\"https:\/\/catonmat.net\/books\">Peteris Krumins has some great e-books<\/a><\/p>\n<p><a href=\"https:\/\/www.perl.com\/pub\/2004\/08\/09\/commandline.html\/\">Dave Cross &#8211; From one of his older posts on perl.com<\/a><\/p>\n<h3>Some NYC Street Resources<\/h3>\n<p><a href=\"https:\/\/dev.socrata.com\/docs\/endpoints.html\">SODA Developers Guide<\/a><\/p>\n<h3>Some NYC Street Resources<\/h3>\n<p><a href=\"https:\/\/nyc.streetsblog.org\/\">StreetsBlog NYC<\/a><\/p>\n<p><a href=\"https:\/\/hellgatenyc.com\/\">Hellgate NYC &#8211; Local NYC News<\/a><\/p>\n<p><a href=\"https:\/\/elkue.com\/\">Liam Quigley &#8211; Local Reporter<\/a><\/p>\n<p><a href=\"https:\/\/twitter.com\/_elkue\">More Liam Quigley &#8211; Twitter<\/a><\/p>\n<p><a href=\"https:\/\/www.youtube.com\/watch?v=jN7mSXMruEo\">These Stupid Trucks are Literally Killing Us \u2013 YouTube<\/a><\/p>\n<h3>Me<\/h3>\n<p><a href=\"https:\/\/www.linkedin.com\/in\/austin-kenny-87515311\/\">LinkedIn<\/a>\n<a href=\"https:\/\/www.aibistin.com\/\">blog<\/a><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[111,113,104,114,103,112,99,98],"tags":[137,122,116,134,136,115,49,58,135,88,132],"class_list":["post-1020","post","type-post","status-publish","format-standard","hentry","category-bash","category-cli","category-data","category-json","category-new-york-city","category-oneliner","category-programming","category-sql","tag-api","tag-cli","tag-collisions","tag-crash","tag-curl","tag-linux","tag-newyorkcity","tag-nyc","tag-soql","tag-sql","tag-trafficsafety"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.aibistin.com\/index.php?rest_route=\/wp\/v2\/posts\/1020","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.aibistin.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aibistin.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aibistin.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aibistin.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1020"}],"version-history":[{"count":12,"href":"https:\/\/www.aibistin.com\/index.php?rest_route=\/wp\/v2\/posts\/1020\/revisions"}],"predecessor-version":[{"id":1041,"href":"https:\/\/www.aibistin.com\/index.php?rest_route=\/wp\/v2\/posts\/1020\/revisions\/1041"}],"wp:attachment":[{"href":"https:\/\/www.aibistin.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1020"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aibistin.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1020"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aibistin.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1020"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}