I'm the developer of Protomaps, to summarize: The latency you see on https://pmt...

hyperknot · 2024-09-24T16:36:20 1727195780

No, the ping is 150 ms to us-west-2, and the tiles load in like 5 seconds on a cold start. Of course we cannot test cold start on HN comments because HN is the definition of hot :-)

I can imagine workers to be fast, it's the range requests which are super slow. It's also outside of your control, it depends on how Cloudflare and S3 handles range requests to 90 GB files.

I think if you could make PMTiles split into files <10 MB, it'd be perfect with range requests.

bdon · 2024-09-24T16:40:53 1727196053

I agree, there are tradeoffs to using static storage - the intended audience for PMTiles is those that prefer using static sites instead of administering a Linux server.

I would be interested to see a comparison of Btrfs + nginx serving latency, vs `pmtiles serve` from https://github.com/protomaps/go-pmtiles on a PMTiles archive on disk. That would be a more direct comparison.

I think there's potentially some interesting use case for tiles in Btrfs volumes and incremental updates, which I haven't tackled in PMTiles yet!

hyperknot · 2024-09-24T17:58:09 1727200689

I think both solutions could easily saturate a 1 Gbps line. I benchmarked Btrfs + nginx and it could do 30 Gbps, which doesn't really make a difference if your server is 1 Gbps only.

The fact that there is no service running was the more important for me. Mostly for security and bugs. I had so many problems with various tile servers in production, they needed daily restarting, they had memory leaks, etc.

Basically I wanted to go nginx-only for security and to avoid tile server bugs.

bdon · 2024-09-24T18:05:59 1727201159

I see, I think that's a good approach to enable serving with stock nginx as well as for companies that are built on Nginx or a plain HTTP serving stack already.

For PMTiles the module is loadable directly as a Caddy plugin (https://docs.protomaps.com/deploy/server#caddyfile) which I prefer to nginx for security and bugs (and automatic SSL), and also enables serving PMTiles from disk or a remote storage bucket without a separate service running.

hyperknot · 2024-09-24T19:14:44 1727205284

Yes, PMTiles with the Caddy plugin is very similar to nginx + Btrfs.

At that point, the difference between the two projects is mostly which schema is being used.

tecleandor · 2024-09-24T20:02:30 1727208150

You both specify the filesystem to be Btrfs. Is there any advantage in this case against ZFS, ext4, XFS... or is it just a practical choice?

hyperknot · 2024-09-24T21:16:38 1727212598

Yes, small files can fit in the metadata, which makes a super big difference when you have 300 million files of 405 bytes each. Also, inode handling is way better compared to ext4.

tecleandor · 2024-09-25T10:12:38 1727259158

Oh neat! Thanks! I'm checking the docs and I guess you're referring to "Inline files". I don't have much knowledge about btrfs, so I didn't know...

The nearest thing (and it's not really really similar, just related) that ZFS has is special VDEVs.

When you have an array of disks (usually "slow", like regular HDDs) you can attach to it another array of disks (usually very fast, like NVMe) where you can store metadata (file information) and optionally small files up to a size that you can define.

So for example you have a 50TB (or 500, who knows) array of SATA disks, and a small but superfast array of NVMe drives. Lets say 128GB. Or 512, or 1TB depending on you want to do.

File metadata is saved there, so doing a find, ls, tree... operation is now very fast. And if you save, for example, all files smaller than 32KB there (it will depend on your needs, also) all the small file operations will be way faster.

hyperknot · 2024-09-25T12:36:52 1727267812

Yes, the key thing about Btrfs is how it handles inodes and how it can store data with the metadata. In the OpenFreeMap image, 60% of the files are stored with the metadata, essentially taking up no space.

preya2k · 2024-09-25T04:05:58 1727237158

I would also like to see this comparison. And for good measure it’d be great to also include Versatiles in this comparison.

bdon · 2024-09-25T04:49:27 1727239767

If you would like to run this comparison for tileservers reading from disk, I wrote a small tool to simulate traffic based on access logs of OpenStreetMap tiles:

https://github.com/bdon/TileSiege