Yesterday, I made a post about how to reconstruct the NYC map visualization usin... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

minimaxir on Nov 17, 2015 | parent | context | favorite | on: Analyzing 1.1B NYC Taxi and Uber Trips

Yesterday, I made a post about how to reconstruct the NYC map visualization using the 1.1B Taxi Data using ggplot2: http://minimaxir.com/2015/11/nyc-ggplot2-howto/

Looking at the code for the visualization, the author did an independently similar approach (with the same tools), and one that turned out slightly different, which is what makes things interesting.

It's worth nothing that back in August, only the 2014 and 2015 datasets were released by the NYC TLC. I'm not entirely sure why they decided to release 2009-2012 now.

If you're looking to just playing with the data, I recommend using the BigQuery approach as noted in my article, since downloading and processing ~300GB might take awhile. However, the shape file approach used in the original article the next logical step after that, and one that is put to very good use in the article.

aw3c2 on Nov 17, 2015 | [–]

If you are already using PostgreSQL it makes no sense to involve Shapefiles in analysis. Instead use PostGIS.

minimaxir on Nov 17, 2015 | | [–]

You need to the Shapefiles to tell where the districts in NYC actually are, though. (the GitHub repository in the OP's post contains the Shapefiles)

hadley on Nov 17, 2015 | [–]

Nice work!

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact