Hacker News new | past | comments | ask | show | jobs | submit login

I've been Git scraping the San Francisco version of this for a few years now.

My https://github.com/simonw/sf-tree-history repo now has 444 commits (most recent one was just 4 days ago) tracking every change that's been made to https://data.sfgov.org/City-Infrastructure/Street-Tree-List/... since March 2019.

I haven't yet done anything with this data, but there is so much potential for visualizations and other fun stuff with it. If anyone wants to have a go please be my guest!

Wrote more about this project here: https://simonwillison.net/2019/Mar/13/tree-history/




I love the idea of syncing git commit history with data change history, like using git for a repo of data. It's actually quite possible if you use pretty printed JSON as a record format (or other simple linear text formats).

I explored more of this "git as DB backend" in some places including: https://github.com/dosyago/sirdb

Also, just as a headsup to any folks, the SF version of this "tree map" (Heh) is at: https://web.archive.org/web/20230328192805/https://bsm.sfdpw... (O site and archive seems to be down)


Yeah I have a bunch of these using pretty-printed JSON - here's one that scrapes Hacker News for mentions of my site, for example: https://github.com/simonw/scrape-hacker-news-by-domain/blob/...


Wow you're the creator of datasette! Cool, man! I thought that was a really revolutionary idea, and also related to this notion of git backed DB...I'm so glad to sort of see it confirmed with this! Haha :)


I've been thinking about this recently also. I can't think of any downsides off the top of my head but I might be blinded by the idea? Can you?


I've had dozens of repos doing this kind of thing for a few years now and I've not found any downsides yet - it just works.

There's a size limit to consider: GitHub won't let you push a file larger than 100MB and prefer you to keep the repo itself below 2GB.

Most of my scraping projects are a tiny fraction of that size though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: