Hacker News new | past | comments | ask | show | jobs | submit login

What kind of work has been done publicly on a data model for resolving this kind of query? I was just wondering about that because I was noticing a little of how Google Maps responds in terms of different language names for places and features.

Could this model be built automatically from existing OSM data, or would someone else have to manually add some kind of additional hierarchy information? Has this been studied from the computational linguistics side as well as the geospatial information side?




The short answer is yes: OSMand (note, not OSM - but a fairly popular mapping/nav android app using its data) has users who build "full address maps" for entire countries, but it has to be put together as a downloadable bundle and mostly hasn't.

The other problem is OSM tends to have good street maps, but very limited door-to-door coverage.


I'm just curious about how this works with some things that seem to add complexity (I haven't worked in this area at all):

- Different names for things in different parts of the address because of different languages? (What if two different parts are in different language, e.g. giving a Belgian city name in Flemish but the street name in French? Giving a Swiss address with a street name in French, city name in German, and canton name in French+? Giving a Russian address with the street name in Russian but the city and oblast name in English, possibly using different scripts?)

- Landmarks and points of interest, maybe with separate colloquial and official names? (Tower of St. Vincent? Torre de São Vincente? Torre de Belém? Tower of Belém? Tower of Belem?)

- How do you search if an address component is omitted (like leaving out a city in a U.S. address but including a state or a postal code)?

- How do you handle free-form text searches that might omit address delimiters, or include postal codes in a locally nonstandard order, or omit or include official postal designators for the components of the address (for example, Brazilian addresses might include the word "CEP" before the postal code but could be given without explicitly mentioning that the postal code is a postal code)?

Is there a good book or web site or FAQ about geographic name matching that I could look at to get a sense of what standard answers to these sorts of questions are?

+ In Switzerland, the same city could theoretically be referred to as "Fribourg, [Staat ]Freiburg" or "Freiburg, [État de ]Fribourg". Well, of course a human user might always search using any combination of languages with which they're familiar, or even ones they're not familiar with that they just copied and pasted from somewhere.


OSM nominally assigns the "name" tag to the name locally used, and there are also "name:en", "name:fr" tags, etc. Presumably you could search through all of them, using an inverted index and a Bayesian prior for how likely words like "CEP" are to occur in addresses. Bag-of-words should do just about everything except distinguish ZIP codes from street numbers, and possibly distinguish between street and city.

OsmAnd~'s interface for this is absolutely terrible.

A corpus of address searches to test against would be really helpful for this kind of development.

I suspect there is a good book or web site or FAQ about geographic name matching, but you have to be a Googler to see it.


Photon builds nice, find as you type, free text search on top of OSM data:

https://photon.komoot.de/

I wonder if GP is talking specifically about on device apps, Photon needs a pretty big index (31 gigabytes compressed):

https://github.com/komoot/photon


Yes, I was talking about phone apps.

Just played with Photon on their website: It's better but still far away from Google's capability. Like it or not, but Google is very, very good with search :-)


Besides Photon there is Mapzen search that provides search as you type.

Naturally comparing with google is a bit apples and oranges, on the one hand a company that purchases essentially all its data and spends billions on its mapping department, compared to the OSMF with a annual budget of roughly $150k. So it is not a surprise that some things take a bit longer with OSM, but it is tortoise vs. hare....




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: