Hacker News new | past | comments | ask | show | jobs | submit login
Extracting city blocks from OpenStreetMap data (peteris.rocks)
151 points by p8donald on Aug 11, 2016 | hide | past | favorite | 17 comments



Always good to see geospatial analysis, especially with the toolchain available in Python, these days!

One of the downsides of forming polygon from line segments in this way is that it implicitly assumes all lines have a vertex at the intersection of another line.

For example, something like this would fail ("o" represents a vertex):

    o      o  o
     \      \/
      \     /\
       \   /  \   o
        \ /    \ /
         /      /
        / \    / \
       o   \  /   \
            \/     \
            /\      o
           /  \
          o    \
                o
You can handle this in a couple of ways:

1) Intersect the lines before-hand, ensuring that you wind up with a vertex at every intersection. (In shapely's case, you'd call `shapely.ops.cascaded_union(lines)` before applying `polygonize`.)

2) Cut a bounding polygon with the lines, rather than forming polygons from individual line segments.

The second method has the advantage of ensuring that everything inside of the bounding polygon is split up into smaller, block-level polygons. Usually this is what you want, though for something like city blocks, you'd need to filter out non-urban areas (e.g. water).


Explicit intersections are typical for OSM. Crossings without an intersection (or layer information) are flagged by various QA tools (which is a euphemism for an error when you don't have a prescriptive data model).


What about bridges and tunnels? It seems like an ideal (or idealistic) solution is to clean up the source data to include intersections when there are intersections so they don't have to be inferred. Is that impractical or would it explode the size of the dataset?


Bridges and tunnels are still represented as linear features in most road datasets. In good ones, there's an attribute to mark it as a bridge or tunnel, and you can choose to use them or not.

Including intersections wouldn't significantly change the size of the dataset, but that type of clean-up is best done as a pre-analysis step. There's no good reason to do it in the "raw" data.

Furthermore, it's dependent on the projection/etc that you do it in. The intersection of two lines in lat/long space is not at the same point as the intersection of two lines in another lat/long space with a different datum, and it's not the same as the intersection in a projected coordinate system.

Ideally, you'd only include the intersection if you've actually measured it at that location. It's best to leave the observations as observations and try not to alter them too much.

At any rate, by the time you're analyzing the data, you've already made the type of assumptions about cartesian vs. spherical vs. real space that I'm referring to above. Therefore, it's usually a good idea to defer cleanup that's specific to a particular operation (this one is) until you're doing that operation.


The US Census Bureau has a TIGER database of street information. They use the "block face" as a primitive. A block face is one side of a street bounded by an intersection or the end of a street. This works even for streets that dead end. Not all block faces define the outlines of blocks. Addresses are associated with block faces.

The original post was trying to do it for Riga, Latvia, so they can't use US TIGER files.


There's also the tag place=city_block [1] which may be useful depending on its coverage but which has the advantage of being explicitly assigned by a human.

I've always thought of city blocks as an anti-pattern in urban layout e.g. exploring New York on foot for the first time I tried 'using my nose' to find interesting places but had the problem that I got zero incremental information until the next intersection at which point the large block size meant it was too far to turn back.

I prefer to live in cities for which the concept of a 'block' has less relevance, and I'd caution against trying to 'blockify' cities worldwide.

[1] http://wiki.openstreetmap.org/wiki/Key:place


I have to keep trying to open those mapbox vector tiles. Zipped protocol buffers in a sqlite file. If you want to handle a lot of geo data, you can't keep using text. Although I wonder if protocol buffer+sqlite is really a good idea.


The sqlite is only storage server-side. Otherwise you would have bazillion of small files in your filesystem and they mostly do not handle that well. (Speaking of experience of having 800k tiles in a single NTFS directory - that was not pretty).

When serving the tiles over http, the only tiles themselves are served, not the entire sqlite database.

This was actually pioneered in the era of raster tiles - pngs/jpgs in the sqlite db. Vector tiles just "inherited" that.


Do you know a vector format that is more adequate?


No; however there are not many vector tile formats publicly defined. MVT is good at what it does.

The generic vector files used in GIS (shapefiles, DGN, KML, GML etc) are not a good fit for tiles - they describe a specific classes of features and do not optimize for tiling (i.e. you must always parse the entire files without regards what extents you are really interested in) - which is fine for generic GIS usage, but not for end-user vector maps.


Looking at the visualizations provide I don't see a visual proof that the code is able to extract city blocks; by city block, I mean as defined by Wikipedia here: https://en.m.wikipedia.org/wiki/City_block

Proof might be an interactive map that shows the city block of a selected area in NYC.

Is there a visual proof in the walkthrough I'm not seeing?


Data from https://mapzen.com/data/metro-extracts/metro/brooklyn_new-yo...

I followed my instructions in the post (except I had to use encoding='iso-8859-1' instead of utf-8)

I cropped the map a little.

Generated file https://gist.github.com/pdonald/223e42b491eb42ba99b8e10ef1ae...

Interactive and clickable http://bl.ocks.org/anonymous/raw/d62bfd592234ed71d24bfd6f264...

You can also go to http://geojson.io and load the generated file for more interactivity (select, delete, add your own polygons, etc.) or open it in QGIS.

In the Results section I had a screenshot of the generated polygons in both QGIS and geojson.io (polygons are separated by thick black lines and their area is colored gray).


Reluctant to respond, since it's clear that bit of more effort went into your comment than mine.

Basically for me when I click the interactive map I get a response, which is a pinned blank text-box with a x to close it. Thing is that I'm unable see that a city block is highlighted via a click; the text box simple appears where I click, unless it's outside the targeted area; for example, in the river.

Looking at the raw JSON it does appear the block are extracted.

So, putting aside "seeing" it, appears you're confirming it extracts single blocks as objects and isn't doing something else like if two blocks are attached, they're not sharing the path that's between them.


Nice, could you try it with an irregular city? Like Rome?

thx



here you have the problem. You see? They are not correctly detected http://imgur.com/iGLT5Ea


Yes, to detect them I would also have to include pedestrian paths, service roads and other smaller roads. The solution is not perfect but good enough for the most part.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: