Hacker News new | past | comments | ask | show | jobs | submit login

Online geocoding service are always the worst way to go. You only use the online services if you don't have an alternative dataset.

The osm datasets actually aren't that big, the entire PBF formatted dataset is <30GB. You can fit that on any system (not in RAM of course but hard disk access is still a million times quicker than using an online service). There are simple scripts to keep the dataset in sync on your server.

Also it's just completely fucking ridiculous to send a request over the internet to perform a calculation.

I encountered the issue myself ages ago where i had a task to reverse geocode for a railways GPS system for every train every minute. Google and other services would have costed millions of dollars every year, used massive amounts of bandwidth and been waaay too slow. I wrote a library - https://github.com/AReallyGoodName/OfflineReverseGeocode in a day that could easily do millions of lookups each minute.




What was your dataset for your reverse geocodes? The GeoNames dataset can't be that accurate, can it?


It's quite reasonable despite being a non-polygonal dataset if all you want is the nearest suburb (this was my use case). They have multiple points for the same location so nearest point search works quite well even with weirdly shaped suburbs. I've mucked around with other datasets (there's an OSM branch in my public repo that i'm currently working on to get down to street and street number level) but for the use i mentioned above the geonames works well.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: