Hacker News new | past | comments | ask | show | jobs | submit login
Writing a Mini-CDN to Learn Nginx/Prometheus/Grafana/Lua (github.com/leandromoreira)
312 points by dreampeppers99 on Dec 26, 2022 | hide | past | favorite | 45 comments



The hard part of building a CDN is to know when you need it. 99.9% of all websites with CDN do not need it. Serving static files consumes so little resources that a single server can serve billions of users as long as you dont use script for serving the file. The most cost-effective with also the lowest latency solution is to never use CDN. If your webserver provider charge you a lot for traffic you are better off using another provider.


Static files can be relatively large, like pictures, or sounds, or even videos. At busier moments, the p99 latency can become pretty bad if you only have one virtual NIC serving them.

Also, geography: ping time from Singapore to the US can never be negligibly short, which is pretty noticeable during TLS handshakes.

But yes, likely these considerations do not matter for 90% of websites (not 99.9% though).


It's definitely 99.9% if the websites aren't filtered at all. A lot lower if you're only taking about the top 100 visited websites according to Alexa or apple however...


In OP’s premise, they were filtered by “websites who chose to be served by a CDN”.


> The most cost-effective with also the lowest latency solution is to never use CDN

The lowest latency solution is to put the content near the user and a CDN is probably the easiest way of doing that if someone needs to server a geographically dispersed audience


> The most cost-effective with also the lowest latency solution is to never use CDN.

CloudFlare is free at my tier and gives me the ability to have the lowest latency.


> The hard part of building a CDN is to know when you need it. 99.9% of all websites with CDN do not need it.

What exactly leads you to believe you can tell what 99.9% of all websites need?

Unless you believe 99.9% of websites are only accessed by your upstairs neighbors, CDNs provide a couple of important business and operational advantages.

Just to illustrate how wrong and misguided your personal assumption is, CDNs are primarily used to cut down latency, which in accesses from other regions can easily go beyond 300ms. Unless you somehow think that it's ok for your users to be subjected to a bad user experience, a basic CDN service is all you need to employ to lower those latencies by an order of magnitude.


Thats interesting because when I added a CDN to one of my projects to serve the assets (mostly images) my latency went up and I would preferred not serving them through the CDN but didn’t have a choice because of the large number of files.

I accept that my images are being served a bit slower but with the compromise that I can server a lot more images.

I also think the person your are replying to is correct. The CDN makes the user experience worse but saves me money, that’s an unfortunate sacrifice.


It would be nice to discuss the common approaches to global name resolution —- anycast vs geo-routing.


IIRC the industry standard is to serve your authoritative DNS with anycast, and have those servers do geo-based dns resolution to shift HTTP traffic to a nearby edge POP.


This is nicely written, and a lot of it mirrors my experience using nginx as a pseudo-cdn. Another area worth exploring might be http3, ssl session caching, and general latency/ttfb optimizations.


this is very very cool! One thing i would definitely like to see is domain name resolution. Shopify, Dukaan, Vercel all make a big deal out of it ...going all the way to BGP.

https://twitter.com/subhashchy/status/1536769406801309696


Is it possible for CDNs to cache per URL per user? I'm thinking of something like /favorites where one URL would list something different for everyone. When I've setup caching on backend it was keyed off the user.

This was a very informative read!


I don't know why you want to hurt yourself.

If these are public, put them on /favorites/$USERNAME or something similar. If they are private, don't cache them.

You can cache with specific headers as cache keys, but I would advise against doing this too much / abusing it. It really makes caching complicated. And from a data privacy standpoint it's better to opt-in into caching. I've witnessed incidents where visitors saw the private profile page of another user, because it was cached in the CDN.


You can configure whether the cache key includes a particular header or query parameter in a lot of CDNs. So as long as your user identify is transmitted in one of those, it would work.


User-aware CDN would require scripting of some kind to handle sessions. However, if the data is not sensitive you could use random string uris to publicly available files. That way it is difficult to guess/brute force the url to the files. (sensitive=person identifiable data)


Many CDNs support caching based on a particular cookie value, incorporating it into the cache key. I’d just be extra careful, the worst case for many server settings is an inoperable service but choosing the wrong cache key can easily result in a data leak. (serving one user’s response to another user)


You can use the `Vary` header.


The hard part of building a CDN is scaling it. The best approach imo is to use fly.io to host an anycast IP (with horizontal scaling) and store cache files on disk

Fly.io also has a Grafana dashboard built in for your machines


Agree, Fly.io is great for such usecases. Is there any CDN/Proxy solution or guide available for fly?



Beautifully written! Thanks for sharing, Leandro.


<3


I'm curious if any HNers have opinions on prometheus vs other time series databases like influxdb?

I periodically consider a grafana & backend setup for when datadog becomes cost prohibitive for metrics with several tags.


Go with Mimir. It is Prometheus compatible and horizontally scalable for read/write path separately.

Mimir: https://github.com/grafana/mimir


You did not answer OPs question tho'. prometheus vs influxDB.


We have been using prometheus at a client for little over a year now. Since we need to keep metrics for years, prometheus cannot seem to be able to deal with it well. One behavior we observed is it crashes consistently in k8s. We couldn’t pin down the root cause but suspect it’s the amount of metrics we collect continuously and keep (archive).

Now we are considering to switch to thanos or mimir.


At $dayjob we're considering replacing DataDog with Grafana and friends, already using it elsewhere to great affect.

Haven't used influxdb yet so can't speak as a comparison but from my usage, I'm sold on Grafana, Loki, Prometheus, and friends over DataDog. It mixed with OTel have been a real pleasure to use.


We’ve done that migration at $dayjob. Has taken a lot of getting used to. The data model is different (perhaps poor setting choices) causing some wacky query requirements for our charts which devs often don’t understand to misleading charts. We’re slowly switching to OTEL which should solve that problem. Otherwise there’s also much to be desired in how our grafana behaves interims of saving preferences (ours doesn’t; stateless head? I forget why) which is annoying. Also the tracing/APM is missing out of the box though some teams are working on getting that going too.

So a classic build vs buy story I guess. Probably will pay less in the long run but for a worse and rocky road experience so far.


Great content, helpful and inspiring.

Thanks!


Good read. Is there something similar for building a DDoS protection feature? Like Cloudflare?


Very good project. thanks for sharing


Thanks for this


my pleasure


Why didn't you use varnish for that?


I guess it's "...to Learn Nginx/Prometheus/Grafana/Lua".

Per the first line of the link: "The objective of this repo is to build a body of knowledge on how CDNs work by coding one from "scratch". "


Another example of a project duped into thinking Lua is “powerful”. It is small. That is it. Lua has near zero useful functionality and makes the developer repeatedly reinvent functionality over and over and over again.

https://media1.giphy.com/media/TFO2mwVPIFoOJcuTSC/giphy.gif


Would you like to expand on why you think Lua is a bad choice for this particular project and what you would have used instead. That would be much more helpful than a generic attack on the language itself.


I think they were as specific as they need to be: "Lua has near zero useful functionality and makes the developer repeatedly reinvent functionality over and over and over again."

That may feel uselessly broad if you're not familiar with the language or ecosystem but I've used lua professionally for years on projects of all different sizes and it's a truth I recognize.


It's small, fast, and doesn't have a GIL lock, so concurrent executions are trivial.


Yes it's good if you're writing a C app and need to embed a concurrent scripting language with long-running tasks, and for some reason you can't write that part in C and just expose it to the script env, and also you can't use one of the three or four alternatives that are still a better choice for this specific scenario.


Which alternatives would you recommend?


So, it depends, and a lot of my hostility to lua is because creators of C/C++ apps will just drop lua in and consider that part done, leaving the end user/programmer to handle all of the exposed complexity.

In this situation, usually what you're embedding lua for is defining and exposing a DSL to handle complex configuration or mediate some automation.

So these days I use janet for this because I like lisp, its data structures and core functions are predictable in a way lua's are not, and it has a good standard library that is easy to select subsets of for embedding. Lisps are particularly well suited for making DSLs and it has macros if you need them though I never have.

TCL is another excellent choice. It is small and embeds very simply like lua, but is more suited for making DSLs and even GUI config if you need to go that far. And this is subjective but I think it's less hostile to the probably non-professional programmers that are likely the end users. It has a similar C-centric embedded history so is similarly optimized for that case, but with more focus on users not needing to learn the whole language to effectively use a part of it.

For some cases where what the end user will define is not configuration but processes or procedures over unknowable-at-build-time data, forth is an unusual but strong choice. There are some incredibly lean implementations built for embedding. The paradigm is most likely to be unfamiliar to the users but for some use cases it's such a good fit it's worth it.

The cases where I would use lua are where you expect broad and long-lived community development with high complexity and dedicated system builders involved. Lua's metatables and module system are a good foundation to build a powerful ruby-like OO/FP hybrid environment if it's worth adding all that weight and maintaining it over time. You see something like this in mud client scripting where lua is conventional and I think appropriate, and the clients themselves function more like platforms for lua development than apps that run embedded scripts.

The main thing imo is just to think about what the reason for embedding actually is, who is going to be using it, and for what. Lua isn't the worst default, but its flexibility makes developers think they don't have to make this choice at all if they include it, and that's an error.


Your proposed alternatives to using Lua being a LISP and Forth says a lot about why people will in fact use Lua.


lua haters club rise up. I truly don't get why this language gets nothing but admiration on HN. I can only assume people have only used it for tiny personal scripting projects, or else just fucking loooove hand writing the for loops to do exciting niche things like parse csv.




The deadline for YC's W25 batch is 8pm PT tonight. Go for it!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: