Hacker News new | past | comments | ask | show | jobs | submit login
Plane: Per-user backends for web apps (driftingin.space)
321 points by paulgb on Oct 13, 2022 | hide | past | favorite | 72 comments



Hey HN!

Plane came from our desire to build tools that have the low friction to use of running in the browser, but use more memory and compute than the browser will allocate. The basic idea is to run a remote background process, connect to it over WebSocket, and stream data.

This ends up being a surprisingly useful primitive to have, and it's been used to:

- Pixel-stream X11 applications to the browser over WebRTC[1]

- Run IDEs and notebooks

- Power Figma-style realtime collaboration backends (including https://rayon.design).

Here's a direct link to our repo: https://github.com/drifting-in-space/plane and docs: https://plane.dev/

[1] https://twitter.com/drifting_corp/status/1552773567649091584


Big grats Paul, love innovation in this space and fully agree w/ your design decisions re: K8s.


Hey Paul, congrats on your HN launch!


This stuff is great.

The sooner you can figure out Windows, the better.


A related and also very useful usage pattern: "backend instance per customer".

Because…

There are many ways to implement multi-tenant SaaS; but a highly underrated approach is to write a single tenant app, then use infrastructure to run an instance of it per (currently logged in) customer. Plus a persistent database per customer of course.

This has the tremendous advantage that you can add another column to your pricing page easily: the "bring a truckload of money and we will set you up to run this behind your firewall" tier. There are still a lot of orgs out there, large ones with considerable financial capacity, who really want this.


I think you are putting together two concepts that are actually different: "backend instance per customer" and "deploy the service in the customer's infrastructure".

I don't have much to say about the 2nd one, but my team does the first one for our (internal) customers, so we deploy and manage in our own accounts a whole service stack for each of our customers. It's a nightmare.

We are going to do some rearchitecture work next year to move away from it, because it drains so much of our time in operational load.

One example of a major pain point that we have is encountering random failures in 3rd party services, such as AWS. For a more classic service that you deploy 2 or 3 times a week, if CloudFormation deployments fail once every thousand deployments for random errors, you'll have one failure per year. Well, we deploy thousands of instances of our service 2-3 times a week. So we have failures every single time we deploy.

Oncall is a nightmare (I don't love oncall anymore, I created my login in my previous team) because we just fix a tsunami of similar tickets that are each from a different instance of our service, for a single customer, but often with a different root cause.

We probably have half of our headcount dedicated to initiatives that wouldn't need to exist if we had a more classic 1 service = 1 instance approach.

Just don't do it.


The connection between the concepts is: if your multi-tenant strategy is to deploy (hopefully only while in active use) an instance per customer, then the additional effort to provide on-site installations is relatively low, compared to other multi-tenant strategies.

Whether on site installs are a good thing to offer, of course depends on how valuable that is to your customers versus the cost/effort to support it.

As you point out there are major trade-offs! If your product architecture requires substantial multi-service complexity per tenant, that points away from the instance per customer strategy.


Out of interest are you deploying exactly the same software to each instance? Or are there customisations?

My work deploys an instance per customer, and I haven’t encountered the same problem, but they number in the tens at this stage so not at the same scale yet.

Would love to hear any dos or don’ts you’ve picked up, being further ahead in terms of scale!


They're all exactly the same software, the only customization is configuration (more or less servers, getting data from this or that S3 bucket, etc.).

To be honest it comes down a lot to the tools you are using, you have to make sure they scale. For example we deploy one CloudFormation stack per customer (3-4 actually but it's not important), and those are deployed all as part of a single deployment pipeline (think AWS CodePipeline type of thing).

Well, turns out this pipeline thing does not like deploying to thousands of targets at once and only supports a few hundreds at most. We had to refactor the way we do that part because just loading the page with all the targets was taking 5 minutes. We also had to split our targets across 2 pipelines, and soon we'll have to go to 3, and it's a pain in the ass and a big engineering headache every time.

The main issue comes from operations, invest in automation ASAP, and make sure you make your customers accountable for their own mistakes. Half of our tickets are raised because a customer changed something on their side without thinking of changing the configuration in our system, which then breaks. If we alarmed them instead of ourselves it would be a lot easier.


I can see how this is horrible for internal tools. The incentives are all messed up.


This model works but requires that the minimum costs of the stack for supporting a single tenant are low enough for the smallest tenant and revenue stream from them.

Often times an overlooked aspect of this is that, even without a freemium option, that revenue can be 0. Consider all the demos for potential customers, and all the examples setup for testing, etc. these will all cost money, and if they can’t be shared, than those costs may make it unworkable on a per-instance basis.


Yes certainly, this approach to multi-tenancy is not well-suited for apps that will support a large number of active free users.

The idea is to only allocate a running instance while there is at least one active user of that instance. So for occasional-use apps the ongoing cost per idle customer would be just the storage cost of the database, very close to 0. Obviously for something like a chat app, email app, anything else that people tend to leave open all day, a different approach is better.


> The idea is to only allocate a running instance while there is at least one active user of that instance.

That first user is going to have a bit of a wait while you turn on their server. Maybe you keep some empties warmed up that just need a reconfig and restart so it’s fast.

Personally, I’d only do any type of multi-tenant if the cost of some micro instance behind an auto scaler was negligible to the value of the contract anyway.


I wrote a system at work that does this for VS Code instances - on a commodity server and not much optimisation effort it goes from a click to the UI starting to appear in about 10s (mostly thank you to Firecracker and Alpine) . There's a loading screen that's instant though, and probes for when the VM is ready.

I think that would work fine for a lot of other apps, at least where you're looking to start a lengthier session.


Have done similar although 10s seems a bit far, were you using K8s? With our custom VMM could get similar instances up and routed within 2s w/o K8s.


I'm copying a 4GB root filesystem (on ext4!) per connection, so that's a couple of seconds, more like 10s on my laptop. The boot time for the kernel is about 2s to userspace initialisation. Then the userspace is running node.js to bring OpenVSCode up (2-3s?), starting a couple of other processes, and then the front-end is polling only every 1s.

It's all on 1 server, just a Go https proxy launching firecracker on demand, no k8s.

If I were optimising, I'd first make sure the root filesystem copy ops happed on a CoW filesystem, then change the readiness polling for a "long" poll that's under the server's control. And next buy a faster server :) But I'm using a relatively heavyweight userspace app, so I'm curious if you can see there might be other gains to be had.

Our application is using VS Code as an environment for programming exercises. So even 30s would have been fine when people are using it after that for 1-2 hours.

If I could get it down to 2s I'd certainly enjoy testing it more!


Ahh fair. I was thinking along the lines of storing the 'base' VM as a memory snapshot and CoWing it per connection to save some initialization overhead perhaps...


If you've ever used enterprise software (SAP, Sage...) you'd know that showing a loading screen is on the table.


They could still be shared. Each customer is an org, not a single user. This model would only work at business pricing levels anyway.

However many environments you have of Dev, Staging, QA, Integration, whatever is the same as normal. Sales gets an instance, but I’ve seen that anyway and it’s just one more. Can have a public Demo. As long as you don’t scale O(n) with prospects, it’s not a meaningful cost. Even if you did that, I’d bet it’s a tiny fraction of your cost of sales.


Right, but that’s the point. Even orgs can be costly when you consider how you plan on sharing infrastructure. My point is these can add up as you have databases, k8s clusters, load balancers, CDN endpoints, etc. so if your strategy doesn’t include driving these costs down on a per instance basis, with idle usage and whatnot, it will become a cost problem quickly.


The implementation makes some weird choices like rebuilding a bunch of services like DNS, cert, weird dependency on SQLite. Wish people would stop reimplementing Kubernetes and just build on top of it.

I think "per-user" is probably the wrong killer feature for something like this. Much more potential in shared distributed processes that support multiple users (chat, CRDT/coauthoring). Appears that the underlying layer can probably do that.

In any case, super cool idea, and I hope something like this lands in the serverless platforms from all the major cloud providers. It's always been mind blowing to me that Google Cloud Functions supports websockets without allowing you to route multiple incoming connections from different users to the same process. That simple change would unlock so many useful scenarios.


Thanks for taking the time to look through the architecture. There are definitely some choices that would have seemed weird to me when we set out to build this, but that we did not make lightly.

We actually initially built this on Kubernetes, twice. The MVP was Kubernetes + nginx where we created pods through the API and used the built-in DNS resolver. The post-MVP attempt fully embraced k8s, with our own CRD and operator pattern. It still exists in another branch of the repo[1].

Our decision to move off came because we realized we cared about a different set of things than Kubernetes did. For example, cold start time generally doesn’t matter that much to a stateless server architecture (k8s’ typical use), but is vital for us because a user is actively waiting on each cold start. Moving away from k8s let us own the scheduling process, which helped us reduce cold start times significantly. There are other things we gain from it, some of which I’ve talked about in this comment tree[2]. I will say, it seemed like a crazy decision when I proposed it, but I have no regrets about it.

The point of sqlite was to allow the “drone” version to be updated in place without killing running backends. It also allows (but does not require) the components of the drone to run as separate containers. I originally wanted to use LMDB, but landed on sqlite. It’s a pretty lightweight dependency, it provides another point of introspection for a running system (the sqlite cli), and it’s not something people otherwise have to interact with. I wrote up my thought process for it at the time in this design doc[3].

You’re right about shared backends among multiple users being supported by Plane. I use per-user to convey that we treat container creation as so cheap and ephemeral you could give one to every user, but users can certainly share one and we’ve done that for exactly the data sync use case you describe.

[1] https://github.com/drifting-in-space/plane/tree/original-kub...

[2] https://news.ycombinator.com/item?id=32305234

[3] https://docs.google.com/document/d/1CSoF5Fgge_t1vY0rKQX--dWu...


Hi Paul, thanks for your explanation - you should add that to the documentation, e.g. in a chapter "Why not K8S?".

Also you should give some advice about how to deploy when the default for deploying apps in an organization is K8S, what might be not too exotic nowadays. Will Plane need it´s own cluster? Does it run on top of K8S? How is the relation to K8S in general for a deployment scenario?

THANKS!


Good idea on both counts. Documentation will be one of my priorities over the coming months and it’s great to have feedback on what’s missing.

Re. the “Why not k8s” question, you might enjoy this post from a couple months back; although it only touches on Plane briefly, it shows the framework we used to make the decision. https://driftingin.space/posts/complexity-kubernetes


https://developer.ibm.com/articles/reducing-cold-start-times...

Knative has solved most of those pod start time problems since it’s dealing with a similar scenario, unless 0.008s startup time isn’t good enough for you.


It's funny how SQLite gets so much flak, but every time I've used it in production, it just _worked_.


I don't think I have ever read something negative about SQLite.

I also don't read the GP comment as being negative toward SQLite. It sounds more like the author was surprised about the architecture, since a naive view would think Kubernetes would be good enough.


u can plug in your own scheduler - https://kubernetes.io/blog/2020/12/21/writing-crl-scheduler/

I think you would get a much higher long term payoff with a custom scheduler. Dask does something like this. Both on scheduling and when it has to "Drain".


We considered that approach, but even plugging in a scheduler would not allow us to own scheduling end-to-end; there’s still latency introduced by having events go through etcd. In the end the complexity of kubernetes got in our way, and we realized we were using it as a glorified OCI runtime API, so we decided to cut it out.


Super excited to see an open source implementation of this!

I built a similar service for AWS back around 2015 to do web-based pixel streaming applications. That service's legacy still lives on today in many descendants, but I was always bummed that no team was willing to invest in making it generic. Everyone who needed it either forked it or re-implemented around the original design.

Warms my heart to see something like it on the outside. It's a super powerful concept. Great work!


This seems similar to an Elixir/Phoenix use case where you have a GenServer per user. At first glance, it seems like that approach would be functionally equivalent.


Yes, the BEAM/OTP/{Erlang/Elixir} stack is unique in that it provides similar primitives as part of the runtime.

My impression of that approach is that it’s good for IO-bound work and stateful business logic, but less so for the CPU/memory-bound applications that we’re targeting. I’d love to know if there are counterexamples to that though. It’s admittedly been over a decade since I touched Erlang and I’m not up to date, only peripherally familiar with Elixir and Phoenix.


Yes, for CPU bound processed on the BEAM you'll want to use a NIF (native implemented function) but that leaves you open to taking down the entire VM with bad NIF code (segfaults, infinite loops, etc). A purported safer means to create NIFs is to use Rustler (https://github.com/rusterlium/rustler) which lets you easily write NIFs in Rust instead of C. I haven't used it but I've heard good things.


This looks super cool! Typically, products/projects claiming "browser is the new OS" either,

* run a full blown browser in the cloud and stream back pixels

* emulate a native application in the browser (like, github.dev)

Both are okay but per user backends feel like a nicer primitive to build applications with. Apps that run locally in the browser but can access cloud compute/storage on demand.


v1.0: per-user backends for web apps

v2.0: per-user backends for web apps hosted on edge compute.

v3.0: per-user backends “hosted” as desktop software (the ultimate edge cloud)


Interesting, so you're saying there's some way we can run applications on a user's machine without using the cloud/edge/K8s/AWS?


I think they're joking that if you follow the idea through to the extreme, you end up back at familiar old desktop programs (/'apps' now).


I think the comment you replied to understood the joke and riffed on the humour.


I think I understood the riff, and ... oh who am I kidding - yes, thanks, I read it that way too now, after a coffee or two.


:)


Sweet. Where I work we do a process per user stateful model and it takes away a heap of issues when compared to the more traditional share everything in the web server and let the RAM blow up.

If the user does nothing the process still ticks along. It may be doing background work or doing nothing. It can keep state and periodically save. It is like a desktop app experience on the web, if you like.

If each user or org is doing their own thing and the service is not too “social” requiring cross interactions (so more like say a CRM than a LinkedIn) I think it is an interesting model.

Let alone slow feature roll outs!


This is great. I think CouchDB kind of has a db per user kind of model. I was wondering, what's the best way to persist a session. This kind of per-user model I think is super useful for local-first, decentralized apps.

An example case might be to write in a notes app locally, and collaborate on it, and then have it saved and backed up to re-use on different machines. I'm guessing that'd probably be a layer on top of this, or outside of it.


Wow this could be incredibly useful for a lot of applications. I’m excited to see the new wave of tools this could spawn.


I've always thought it'd be neat to use something like this in addition to the broadway[1] backend for GTK+ to stream X11 GUI apps directly and independently for each user.

[1] https://docs.gtk.org/gtk4/broadway.html


Thanks for the link! It's actually rendering to Dom huh... Interesting.


This is the same concept that Cloudflare has with Durable Objects. I think it's a really good one.


Yes, DOs have a similar approach. Plane is sort of like DOs but using containers instead of V8 isolates, allowing you to run code that couldn’t run in the browser (either for cpu/memory needs, or because it just won’t compile to JS/wasm).


Hn yes, that makes sense. If I remember correctly, CF liked isolates because they have a superb cold start time. How does Plane do there?


On the order of a few seconds, which is certainly slower than isolates, but fast enough to e.g. spin up a Jupyter backend in a reasonable amount of time. We're playing around with some ideas to make it even faster, like snapshotting.


Makes sense. Well, this feels like the future, so the more innovation and competition the better! Glad you're here.


Hey Paul & Taylor, congrats on the launch!

Can this be used, for example, to create a collaborative whiteboard app? I am trying to understand the use cases. Does it depend on Docker and Guacamole? Guacamole was mentioned in passing in your video but I haven’t heard of it before.


Hey Matt!

Yes, it could be used to create a whiteboard app by spinning up one backend per whiteboard, and connecting each user to the backend of the whiteboard they are viewing. For data synchronization, where it really shines is in slightly more complex cases where data updates need to be validated (e.g. 3d software or a CAD app), because unlike a traditional pub/sub service which just passes messages through, you can run your own code on the backend.

It depends on Docker, but Guacamole is only used for the demo. Guacamole is what lets us run Firefox inside Docker and then view it inside another browser so that we can test an install without having to configure a bunch of DNS and certificate stuff.


Really excited by projects in this space! I'm working on a similar project myself [1], intended mainly for collaborative web-apps and browser games. It's more of a light-weight approach with processes instead of containers. Pop in a binary or script and let multiple users connect to a shared process via websockets. Process-based architecture limits its use to simpler cases, but these kinds of apps are at-best delightful to spin up and develop.

[1] https://www.scalesocket.org/


Feels like we are kinda back to using inetd


Say more!


inetd spawned a process for each incoming request/connection. This seems kinda similar except instead of a process it's a whole container.


It is similar! Years ago I first tried websocketd, which was explicitly inspired by initd, and wished I could build applications on top of something like it. Plane is sort of a natural evolution of that.

Of course, one of the big differences with initd is that it runs on a cluster of machines instead of locally, which turns out to be most of the difficulty.

http://websocketd.com/


Correction (you probably know this but for the rest): it's "inetd" not "initd".


Oops, thanks. Too late to edit now.


Streaming pixels usually means that the applications are not very accessible to any assistive technology imaginable. How does Jamsocket/Plane solve that?


Yes, this is definitely a problem for a full pixel-streamed apps. Although we’re able to pixel stream an existing X11 app, our vision is that new applications will be rendered by the DOM (so that it supports any a11y features the browser does) and pixel-stream only the component that benefits from pixel streaming. For example, in a visual effects tool, only the rendered video frame is compute heavy, so only that component would be rendered remotely.

In addition to the a11y benefits, this fits our general vision of how apps should work, which is that you shouldn’t need a round-trip to the edge to do a UI paint.


most messaging apps and almost all bi-directional websocket based apps do something very similiar. mobile apps/webapps become thin clients with a web socket.

that said, it's nice to abstract it away, and with a container - makes it very generic which is nice.

will definitely look into it for a project.

btw how decentralized can the servers be. potentially can anyone join in as a server. discovering peers will be another interesting challenge.


From an operational perspective this seems like a nightmare. Is it fair to characterize this as SPOF (single point of failure) as a service?

Does the service expose time series metrics?

How would I detect and remedy a hot shard?

Are resource caps well defined before hand/are all instances expected to have similar resource consumption?

What would administratively draining an instance look like?


Re. handling failure, we leave that up to an application/framework layer decision. When the backend is used for program state, the common approach is an auto-save loop that persists state externally (asynchronously) on a loop. If the backend is only used in a read-only way, the approach is to just recreate it on failure with the same parameters.

In general, Plane backends are meant to be used with thick clients, so there’s also the option to treat clients as nodes in a distributed system for the purpose of failure. If the server goes down and is replaced, when it comes back up, the nodes could buffer and replay any messages that may have been lost during the failure. Over time as we see patterns emerge, we may create frameworks from them (like aper.dev) to abstract the burden away from the application layer.

Time series metrics are exposed through Docker’s API, collectors for it already exist for various sinks. We will soon be sending some time series metrics over NATS to use internally for scheduling, but the Docker API will be better for external consumption because the collector ecosystem is already robust.

Resource caps can be defined at “spawn time”. They are not expected to have similar consumption, but the scheduler is not yet very smart, our approach currently is admittedly to overprovision. The scheduler is a big Q4 priority for us.

Draining currently involves terminating the “agent” process on the node, which stops the drone from advertising itself to the controller. Traffic still gets routed to backends running on that drone. We have an open issue[1] to implement a message to do this automatically.

[1] https://github.com/drifting-in-space/plane/issues/129


> We will soon be sending some time series metrics over NATS to use internally for scheduling.

For what purpose?

> Re. handling failure

There are several operations that should be near seamless and very well thought out/handled reasonably including for the pieces of Plane itself.

  Push new code
  Roll back code
  Push Canary code
  shutdown -r now
  add machine
  remove machine
  cluster wide restart
And for persistent data:

  replace master
  add replica
  backup/restore backup
It seems like just about any product is going to have to implement some version of those things so it seems like there should be very well thought out story for each of them under the various conditions that prevent standard architectures.

A single point of failure is an extremely convenient architecture, but it is also a brittle and pain causing architecture that will resist scaling and the clean-ness of operations is probably the best window to assess that.

As far as architecture itself goes, why choose to use DNS rather than header based information/cookies? Why let the client choose the backend rather than hiding that as an infrastructure side implementation detail?


This looks great; Congrats, Paul!

I'm curious if this can be used to stream an entire X11 Linux desktop to the browser?


Yep! (It's annoyingly nontrivial to do with gpu acceleration, but we'll have a guide out as soon as we figure it out)

(I work at drifting in space)


That's excellent! I'm looking forward to seeing that. I'm curious how the "pixel-streaming" works in terms of what protocol is used and what technologies are used on the browser-side.


You might enjoy this talk that Abhishek (@pretentious7) gave on it: https://www.youtube.com/watch?v=GZBuYjy5rWE


Thank you, interesting talk! I'm curious if the technologies used for streaming video games (e.g. something like moonlight) can be used for low-latency and high-fidelity streaming of applications or desktops within containers.


Very awesome, this should unlock some pretty cool usecases.


mmmh




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: