Launch HN: Fly.io (YC W20) – Deploy app servers close to your users

tomcam · on March 18, 2020

Fascinating idea. I’d like gently to suggest that you make this your elevator pitch, since I don’t care about who you are. It’s pretty much what you said, but up at the top:

“fly.io is really a way to run Docker images on servers in different cities and a global router to connect users to the nearest available instance. We convert your Docker image into a root filesystem, boot tiny VMs using an Amazon project called Firecracker, and then proxy connections to it. As your app gets more traffic, we add VMs in the most popular locations.”

Exciting stuff! My best to you!

dmix · on March 18, 2020

Agreed, that one sentence line is perfect and not fully clear on the website. A lot of these cloud services keep it quite vague for some reason, which I somewhat understand for the whole 'CTO marketing'. But the early adopters will appreciate it and they are who matter the most early on.

battery_cowboy · on March 18, 2020

I wish more marketing sites just had a description like the one in the root comment, usually I have to read a bunch of docs before I get a grasp on what a product does when they introduce them on HN, if I could just read a description like this I could either look at more info or ignore it if I don't care about it.

Instead, I'm trying to make sense of APIs to figure out what a product called Floozbobble.io does and it turns out that it's SaaS for making SaaS product factory factories, and I don't care about that, but then some other product called Dizmeple.cloud comes out that makes it easier for me to manage my database deployments, which I do care about, and I can't tell I would want it because it has no fucking description!

When did we start to prefer these crap marketing sites that take 12,000 spins on my scroll wheel to get through and still don't tell you anything?

karljtaylor · on March 19, 2020

when companies decided it was too hard to run data driven marketing programs?

mrkurt · on March 18, 2020

This is actually one of my favorite things about Hacker News. I end up redoing landing pages each time we get comments like this.

dathinab · on March 18, 2020

> 2. Max monthly spend: unexpected traffic spikes happen, and the thought of spending an unbounded amount of money in a month is really uncomfortable. You can configure fly.io apps with a max monthly budget, we'll suspend them when they hit that budget, and then re-enable them at the beginning of the next month.

I like this, not having caps is a major problem with some of your competition for smaller projects/companies where the max caps are more important then availability.

I heard of more then one project which mad some mistake them self wrt. some code generating request in their client. Or had some other reason why they had insane usage spikes, causing them to basically go bankrupt in a mater of a view hours. Not even days. (Through one project got lucked out as if I remember correctly Amazone bailed them out to prevent bad press).

IMHO for the majority of server application availability is important but only up to a certain cost. After which unavailability for some time is better, even if you lose some customers. (Yes, like always there are exceptions).

ar0b · on March 18, 2020

I don't think I like the failure mode of your app getting slower (losing your ApplicationDN) right at the time it's really popular. I don't have a better solution though.

nine_k · on March 19, 2020

Unless your popularity pays your bills, it's the most graceful degradation you can have.

michaeldwan · on March 18, 2020

This is tricky since every app has different performance and budget constraints. We try to minimize footguns by starting with sensible defaults. Over time we'll provide more options so you can tune scaling and placement yourself. We're also happy to help if you need!

capableweb · on March 18, 2020

> I don't have a better solution though

Don't host in such a way that you're paying for traffic... Hetzner, OVH and Packet all have dedicated servers where you don't pay for the traffic, inbound or outbound.

Edit: judging by other comments here, it might seem like US zone of Fly.io is in fact hosted in Packet so they are probably themselves not paying for the traffic. Maybe they are using Hetzner for the EU zone (or OVH for that matter).

mrkurt · on March 19, 2020

Packet charges for outbound bandwidth. The places you can get close to free on outbound bandwidth don’t give you the ability to do anycast and tend to over subscribe their networks.

We’d like to grt network prices down but we can’t run our service on ovh or Hetzner.

namibj · on March 19, 2020

Vultr does BGP if you have a routable subnet and AS. Just open a support ticket.

mrkurt · on March 19, 2020

Yep! We've been experimenting with Vultr, their physical servers are only in 7 cities. I hope they expand.

Sebb767 · on March 19, 2020

But your server going down because of high load instead of being shut down by a bill breaker doesn't make that much difference.

Sebb767 · on March 19, 2020

The solution is to set your limit pretty high (100 times your average or so) or, alternatively, have shorter limit intervals - 100 times your daily bill is still a lot less harmful than 100 times your monthly bill.

The only 'real' solution is proper alerting, but even then it's pretty easy to rack up a bill of several thousand dollar before anyone realizes what's going on.

ericlewis · on March 18, 2020

maybe if you go over, you automatically add ads to the page or something.

tr33house · on March 19, 2020

This would be terrible

mrkurt · on March 19, 2020

Ads would be terrible but I'm really interested in the whole idea of "do something else when an app os over budget". I hadn't considered much except a 402 status code, which seems kinda mean: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/402

nine_k · on March 19, 2020

A "donate" button when the current remaining budget would last for less than 100 minutes.

Thaxll · on March 18, 2020

What problem does it solve? Because latency is currently not an issue with all the regions from current cloud providers from my perspective. And for all static stuff you can use a CDN that has pop all arround the world / cities.

Not sure I understand the use case of a single Docker image in a city outside of your entire backend services, especially the DB. If your Docker image talks to something else on AWS / GCP for example you add a lot of latency using public routes.

It looks more like: https://workers.cloudflare.com/

michaeldwan · on March 18, 2020

There's a whole bunch of use cases people have today that don't require a database. Here's a few that I'm excited about:

  - image, video, audio processing near consumers
  - game or video chat servers running near the centroid of  people in a session
  - server side rendering of single page js apps
  - route users to regional data centers for compliance
  - graphql stitching / caching (we do this!)
  - pass through cache for s3, or just minio as a global s3 (we do this!)
  - buildkite / GitHub Action agents (we do this!)
  - tensorflow prediction / DDoS & bot detection
  - load balance between spot instances on AWS/GCP
  - TLS termination for custom domains
  - authentication proxy / api gateway (we do this!)
  - IoT stream processing near devices

CloudFlare Workers is a fantastic serverless function runtime, but we're a layer lower than that. You can actually build it on top of fly.

edit: formatting

lvh · on March 18, 2020

This basically bumps the amount of behavior you can stuff into the edge by several orders of magnitude. Previously, Lambda@Edge will like, maybe validate a JWT. Now, you can put like half your app in there. It sounds like a tiny incremental change but it's big enough that it simply redefines what's even possible in a POP.

quickthrower2 · on March 18, 2020

I'm interested, what is now possible with this? Is it high bandwidth VR sort of stuff?

strombofulous · on March 18, 2020

This is not an area I'm in a lot, so I may be wrong, the cloudflare workers page GP mentioned says that you only get 10ms CPU time per request (or 50ms with the paid plan). I'm assuming lambda etc have similar limits. So presumably anything that takes longer than 10/50ms.

ericlewis · on March 18, 2020

lambda can run up to 30m I think (idk about lambda@edge). CF Workers fail when trying to use something like node-unfluff, because the CPU time takes too long.

This seems to me, like a more controllable lambda@edge.

mrkurt · on March 18, 2020

The containers on fly.io can go up to 8CPUs. So you can do a lot of computation for images, video etc.

They can also accept any kind of TCP traffic (and we're trialing UDP), so lots of interesting network services. This is especially interesting for people who want to do live video.

AND we have disks. So you can deploy Varnish, or nginx caches, etc. This is something we enable by hand per app.

estebarb · on March 19, 2020

I didn't find documentation about the disks: size, ssd/hdd, price?

mrkurt · on March 19, 2020

Ah, I didn't make this obvious but that's a feature we're currently testing. We can enable ephemeral disks on apps, but it's not generally available. They're local SSDs, size is variable (still figuring that out). They're free right now. ;)

If you want to try them out, you can create an app and then an email to either me or support at fly.io and we'll turn them on for you.

manigandham · on March 18, 2020

They originally had (and seem to still have) the same JS runtime at the edge to give you a smart CDN/reverse proxy.

They just updated it from being JS only to being able to run any Docker image. Cloudflare gives you a persistent key/value store and Fly provides a non-persistent Redis cache.

You don’t have to move your entire app but there are plenty of use-cases where you can move more logic to the edge.

mrkurt · on March 18, 2020

We’re moving the JS apps to just run on Deno containers. Deno is fabulous.

sjroot · on March 18, 2020

This is the most intriguing thing for me, perhaps second only to the fact that you guys are working with Rust. I have followed Deno since it was announced and it has been fascinating to watch it evolve. How much Deno code do you all have working in production right now?

mrkurt · on March 18, 2020

Only a little. It works very well, once we port our edge js library (https://github.com/superfly/edge) to Deno we will move over some things handling hundreds of millions of requests per day.

chrisweekly · on March 19, 2020

Hats off, @mrkurt. This is the kind of thing HN, and YC, are all about at their very best. Absolutely love the way this idea, executed right, helps improve the deployment topography landscape, so to speak. Between service workers, Cloudflare workers, and now fly.io nodes at the edge, the degree of control over where and how your application executes across the network in relation to clients is kind of exhilarating -- at least for this 22-year veteran of web-related architecture. Bravo! And good luck!

mrkurt · on March 19, 2020

Ahhhh thank you! That's an incredibly nice thing to say.

jassmith87 · on March 18, 2020

We've been using fly.io for a month now. They are amazing. The service is great, the team is great, and most importantly our apps have seen a dramatic performance uplift by leveraging fly. A++ would use again.

Full disclosure: I am another YC founder. Fly did not ask me or encourage me to post this in any way.

bmm6o · on March 18, 2020

What kind of apps are you using it for? How much data do you have to move to the edge?

michaeldwan · on March 18, 2020

I'm not jassmith87, but they're making a super slick app builder at glideapps.com and using fly for custom domains

ignoramous · on March 18, 2020

> ...builder at glideapps.com

Nice. Here's their launch-hn: https://news.ycombinator.com/item?id=19163081

a13n · on March 18, 2020

If one API request makes on average 5-10 round trips to the database, and the database is in Virginia, this only makes the problem (much) worse. How do you solve this problem for this use case?

michaeldwan · on March 18, 2020

We're not solving db latency yet. A good place to start is aggressively caching at the edge. We offer an in-memory redis cache for this that can replicate commands globally. Beyond that you'd need read replicas which will be possible once we launch persistent storage. That said, latency between data centers on the same continent is often less than I would have thought!

a13n · on March 18, 2020

Gotcha, honestly that feels pretty niche to me, which might be a good place for a startup to start.

I can't think of many back-end applications between purely static content (just use a CDN) and needs a database connection. Probably video game servers, where you don't need the game state to be (immediately) stored/accessed globally.

mrkurt · on March 18, 2020

Game servers are a great example. What's interesting is how many different kinds of apps need game-server like infrastructure: https://www.figma.com/blog/how-figmas-multiplayer-technology...

We've been talking to a lot of startups doing communications tools, especially for remote work.

Lots of full stack apps benefit from app servers + redis cache in different regions. They need a database connection, but if they're already done the work to minimize DB round trips they might just work with no code changes.

There are also a bunch of folks doing really dynamic video and image delivery. Where an individual user gets an entirely unique blob of binary data.

ShroudedNight · on March 18, 2020

I could see IoT device 'acceleration' to be a significant potential use-case. Something with a tiny bill of materials for the device itself, offloading any non-trivial processing to a virtual device on the closest 'real' infrastructure you can get. Especially for something human-interactive, you would want to be very aggressive about minimizing latency.

Also, depending on how tight the limits are for VM lifetime / bandwidth / outbound connections, I could see using these as a kind of virtual NIC / service mesh type thing for consumer-grade internet connections, to restore the inbound routing capabilities precluded by carrier-grade NAT and avoid their traffic discrimination, as well as potentially on-boarding to higher-quality transit as early as possible for use when accessing latency-sensitive services further 'interior' to the cloud.

mrkurt · on March 18, 2020

These are great. IoT seems like a thing you could do but that's a really specific use case I hadn't even considered.

The second example would be interesting to try. There's no real limit on VM lifetime or outbound connections, bandwidth is more of a budget problem. VMs are ephemeral, so they _can_ go away but we're all happier if they just run forever.

slashdev · on March 18, 2020

You should solve this by making one request to a db api layer (use GraphQL!) And have that layer do the back and forth with the db, from right next to it.

Holding transactions open for long distance round trips is going to get you into trouble in a myriad of ways. It does not scale.

michaeldwan · on March 18, 2020

This is exactly what we do! We have a Rails app exposing a graphql api for our cli and web app. The web app is a React SPA monster that we're replacing with another Rails app that runs at the edge and serves the customer UI, docs, marketing pages, and some other things. It's really pleasant. We love graphql.

jpochtar · on March 18, 2020

Why not just have the rails app exposing GQL also expose the customer UI/etc?

michaeldwan · on March 18, 2020

Mostly because the path to here was not a straight one :)

Our main Rails app does a bunch of things that can't run globally and it takes a long time to build while the customer facing app is a lightweight Rails app that consolidated several static and single page react apps into one less gross place.

mrkurt · on March 18, 2020

That's actually the plan! The Ruby Graphql gem lets you resolve queries directly so the app can have an "edge" or "core" mode and make the right call (edge over HTTP, core direct to DB).

mwcampbell · on March 18, 2020

One good solution might be something like Datomic (the classic version, now called Datomic On-Prem), where each application process has the query engine embedded in it, and it pulls data from a relatively dumb storage service such as DynamoDB as needed, then aggressively caches it.

mrkurt · on March 18, 2020

That would work great. Even something like FaunaDB would work really well.

northstar702 · on March 18, 2020

You'd want to pair up with a global database like FaunaDB or Cosmos DB for the data backend.

kasey_junk · on March 18, 2020

Fly.io is the service I’ve been most excited to see make headway in a long time.

Pairing this with a global sql (cough cockroach cough*) is literally the app platform I’ve been dreaming about.

I would like to see more documentation around push based architectures. That is I want to build a system where a process pushes to the Redis in fly but is not running itself in fly. Basically something that may be unrouteable for pulls.

In any case congrats fly team!

michaeldwan · on March 19, 2020

Thanks for the kind words! Cockroach is awesome and it's something we'd like to offer someday. Until then checkout FaunaDB.

Your push example is interesting. We don't have a way to connect to redis from outside fly, but you could certainly boot up a tiny app on fly that acts as a proxy from external apps into the fly redis.

kasey_junk · on March 19, 2020

Yeah that’s what I ended up messing with, it works for sure.

lvh · on March 18, 2020

I'm so glad you've launched. I loved the entire thing every time you've explained it. Absolutely chuffed to see Wireguard and Firecracker getting deployed "in anger". Tensorflow@Edge sounds like the good kind of bananas to use it with.

Your quickstart being called speedrun is too good for you alone to have it, so I'm stealing it the next chance I get.

Either way congratulations Kurt & team! (and no I still have not bought that Safari 911 or any 911 for that matter and yes professional help is being sought).

michaeldwan · on March 18, 2020

Thanks for the kind words!

ThePhysicist · on March 18, 2020

Who do you peer with in Europe? I see in the US you use Packet (https://www.packet.com/) but I couldn't find any information about the EU, or do you focus on the US market for now?

We're also a LIR and want to build an anycast network (we anonymize streaming data), any helpful resources you can share on this?

Cool product btw! I think this will be a very interesting area in the coming years, the fact that you offer Docker containers is a good USP as compared e.g. to Cloudflare workers, we might even consider using your service ourselves if you provide service in Europe (our customers are mostly in Germany)!

mrkurt · on March 18, 2020

We actually have servers in North America, Europe (including Frankfurt), and Asia Pacific. The complete list is here: https://fly.io/docs/regions/#discovering-your-applications-r...

Building an anycast network is expensive. That's part of what we want to make accessible to devs. There are a couple of companies (like Packet, and possibly Vultr) you can lease servers from that will handle anycast. These tend to get you into the same ~16 regions, expanding past those can be difficult and even more expensive. That's what we're working on now.

dmix · on March 18, 2020

Having a more prominent 'locations' or 'regions' link on the website navigation/homepage might be helpful. It was the first thing I looked for even before pricing. It's currently a bit hidden in the docs.

I know locations are probably not super important to you guys as you see it as a starting point or something super flexible, but I always find myself drawn to the concrete stuff like that. Largely as a measure of how mature the product is.

CraftThatBlock · on March 18, 2020

Also a map would be a great visualization. A list doesn't really tell me much without a frame a reference

mrkurt · on March 18, 2020

Good call. We have a neat visualization we want to build to show what's happening, but we don't need to wait on that to make it more obvious.

ThePhysicist · on March 18, 2020

Cool, thanks! I know Vultr but I'm looking for a second provider to fulfill the multi-homing requirement. And yes it's expensive and I think it's a valuable service to developers you're offering!

udkl · on March 18, 2020

Curious - In what way would building an Anycast network be expensive ?

mrkurt · on March 19, 2020

Running your own bgp in multiple data centers requires some reasonable network engineering skills. Anycast, specifically, is a complex beast. If you use multiple transit providers, you have to continuously tweak things to make sure people aren’t getting weird routes. Network providers like to send people over the cheapest (in dollars) routes sometimes, which makes things slow.

cfontes · on March 18, 2020

No south america :(

michaeldwan · on March 18, 2020

Yeah not yet. :sob: We're going to expand there and India as fast as we can.

cfontes · on March 19, 2020

Good to know, this looks great!

Btw any plans to support Java applications?

jeromegn · on March 19, 2020

If you can build it in a Docker image or if you can deploy it to Heroku, then we support it :)

If it doesn't work, it's a bug.

cfontes · on March 19, 2020

Awesome, I had a misunderstanding it was only GoLang because of your github repos, get ready to get filthy rich with this.

If you ever come to SA, never partner with Localweb ( biggest/major local server provider ) they are garbage.

ignoramous · on March 18, 2020

Congratulations on the launch! I've been following fly.io ever since I stumbled on it 2 years ago.

A few questions, if I may:

> We run a mesh Wireguard network for backhaul, so in flight data is encrypted all the way into a user application. This is the same kind of network infrastructure the good content delivery networks use.

Does it mean the backhaul is private and not tunneling through the public internet?

> fly.io is really a way to run Docker images on servers in different cities and a global router to connect users to the nearest avaible instance.

I use Cloudflare Workers and I find that at times they load-balance the traffic away from the nearest location [0][1] to some location half-way around the world adding up to 8x to the usual latency we'd rather not have. I understand the point of not running an app in all locations esp for low traffic or cold apps, but do you also "load-balance" away the traffic to data-centers with higher capacity? If so, is there a documentation around this? I'm asking because for my use-case, I'd rather have the app running in the next-nearest location and not the least-load location.

> The router terminates TLS when necessary and then hands the connection off to the best available Firecracker VM, which is frequently in a different city.

Frequently? Are these server-routers running in more locations than data centers that run apps?

Out of curiosity, are these server-routers eBPF-based or dpdk or...?

> Networking took us a lot of time to get right.

Interesting, and if you're okay sharing more-- is it that the anycast setup and routing that took time, or figuring out networking wrt the app/containers?

Thanks a lot.

[0] https://community.cloudflare.com/t/caveat-emptor-code-runs-i...

[1] https://cloudflare-test.judge.sh/

kentonv · on March 18, 2020

Hey, I'm the tech lead of Workers. I don't want to intrude too much on this thread, but just wanted to say: we don't do any special load-balancing for Workers requests; they are treated the same as any other Cloudflare request. We use Anycast routing (where all our datacenters advertise the same IP addresses), which has a lot of benefits, but occasionally produces weird routes. Often this relates to specific ISPs having unusual routing logic that, for whatever reason, doesn't choose the shortest route. We put a lot of effort into tracking these down and fixing them (if the ISP is willing to cooperate). We do sometimes re-route a fraction of traffic away from an overloaded datacenter by having it stop advertising some IPs, but if the internet is working as it should, that traffic should end up going to the next-closest datacenter, not around the world. When you see requests going around the world, feel free to file a support request and tell us about your ISP so we can try to track down the problem and fix it.

mrkurt · on March 18, 2020

> Does it mean the backhaul is private and not tunneling through the public internet?

Backhaul runs only through the encrypted tunnel. The Wireguard connection itself _can_ go over the public internet, but the data within the tunnel is encrypted and never exposed.

> I use Cloudflare Workers and I find that at times they load-balance the traffic away from the nearest location [0][1] to some location half-way around the world adding up to 8x to the usual latency we'd rather not have. I understand the point of not running an app in all locations esp for low traffic or cold apps, but do you also "load-balance" away the traffic to data-centers with higher capacity?

This is actually a few different problems. Anycast can be confusing and sometimes you'll see weird internet routes, we've seen people from Michigan get routed to Tokyo for some reason. This is especially bad when you have hundreds of locations announcing an IP block.

Server capacity is a slightly different issue. We put apps where we see the most "users" (based on connection volumes). If we get a spike that fills up a region and can't put your app there, we'll put it in the next nearest region, which I think is what you want!

CDNs are notorious for forcing traffic to their cheapest locations, which they can do because they're pretty opaque. We probably couldn't get away with that even if we wanted to.

> Frequently? Are these server-routers running in more locations than data centers that run apps?

We run routers + apps in all the regions we're in, but it's somewhat common to see apps with VMs in, say, 3 regions. This happens when they don't get enough traffic to run in every region (based on the scaling settings), or occasionally when they have _so much_ traffic in a few regions all their VMs get migrated there.

> Interesting, and if you're okay sharing more-- is it that the anycast setup and routing that took time, or figuring out networking wrt the app/containers?

Anycast was a giant pain to get going right, then Wireguard + backhaul were tricky (we use a tool called autowire to maintain wireguard settings across all the servers). The actual container networking was pretty simple since we started with ipv6. When you have more IP addresses than atoms in the universe you can be a little inefficient with them. :)

(Also I owe you an email, I will absolutely respond to you and I'm sorry it's taken so long)

e12e · on March 18, 2020

> Wireguard + backhaul were tricky (we use a tool called autowire to maintain wireguard settings across all the servers).

I'm guessing that's? https://github.com/geniousphp/autowire

Looks like it uses consul - is there a separate wireguard net for consul, or does consul run over the Internet directly?

mrkurt · on March 18, 2020

Consul runs over a different connections with mutual TLS auth. That's the project we use!

alexeldeib · on March 18, 2020

Any chance you have more details on GP's question about the tech basis of the router (ebpf, dpdk)? I didn't find this component among the OSS in the superfly org.

mrkurt · on March 18, 2020

Doh, missed that. We're not doing eBPF it's just user land TCP proxying right now. This will likely change, right now it's fast enough but as we get bigger I think we'll have more time to really tighten up some of this stuff.

anaganisk · on March 19, 2020

Let's Encrypt was introduced so that money will never be a barrier/excuse for https. In these days it must be a default feature. Pocketing half for something generated for free is not a good signal for me. Yes other half is donated but, isn't that supposed to be optional? Let's Encrypt is supported by huge organizations. And on fly.io, The customer is already paying for compute.

mrkurt · on March 19, 2020

This is a reasonable take. Let’s Encrypt is amazing and we don’t want to diminish their importance at all.

We charge for certificates because the infrastructure to make SSL work (even when the certificates themselves are free) is complicated.

Managing certificate creation can be tricky, we have to deal with all kinds of edge cases (like mismatched A and AAAA records breaking validation). We also generate both RSA and ECDSA certificates, have infrastructure for ALPN validation, and a whole setup for DNS challenges.

And then we have to actually use them. We run a global Vault cluster to store certificates securely, and then cache them in memory in each of our router processes.

The developers who use the certificates the most love paying us to manage certs, and one person who posted in the comments here was able to replace an entire Kubernetes cluster they were using to manage certificates for their customers.

When Let’s Encrypt invalidated millions of certificates a few weeks ago, none of our customers even noticed. That’s what they’re paying us for.

SGran · on March 19, 2020

Sarah from Let's Encrypt here. We certainly understand the infrastructure and engineering costs associated with managing TLS/SSL. Fly.io has given back for years to help make our work possible and we appreciate that!

ignoramous · on March 19, 2020

This is a great answer and imo should go in your FAQ [0] because charging for let's-encrypt certs does come off as disingenuous especially when AWS, Netlify, and Zeit and other services offer to do so, for free, despite them having to maintain a PKI which isn't exactly a walk in the park (like you point out).

[0] You are missing a FAQs page.

mrkurt · on March 19, 2020

Good call. We actually put up a blog post with some answers: https://fly.io/blog/fly-answers-questions/

lpellis · on March 18, 2020

The max monthly spend is awesome, I'l probably try it out just because of that :) Its a bit unclear to me though how exactly the heroku deploy works? Is it basically a replacement for the web dynos that heroku provides, but then still connecting to existing postgres instances for example? What are the limits for the redis store? I'm using it on heroku but constantly running into max connection issues, if you can improve that experience it is also a great win.

michaeldwan · on March 18, 2020

Glad to hear :)

You're exactly right about the Heroku deploy. We convert your app's slug to a Docker image and launch the web process in it. DB & other dynos still run on Heroku.

We don't have any hard connection limits on the redis cache. It's usually not an issue anyway since apps are often distributed across many regions and many redis servers.

wasd · on March 18, 2020

you might be running into max connection issues because your library is leaving stale connections. i know this is the case for ruby & php. check if you have timeout set. see this github issue for ruby: https://github.com/redis/redis-rb/issues/524

lpellis · on March 18, 2020

I have the timeout set to 10 seconds which certainly helps, I think the issue is how gunicorn/gevent handle web requests. I think each request spawns a new redis connection, and as far as I can see there is no global pool I can use :( On heroku you are limited to 20 connections in the free tier, and it quickly gets expensive.

michaeldwan · on March 18, 2020

Oh yes... I remember bitterly upgrading to a larger size just so >20 goroutines could use the same ~1mb of cached data.

lnsp · on March 18, 2020

Have been building a similar project in the past few months called Valar (https://valar.dev, it uses gVisor instead of Firecracker and is still in private beta), but I prioritized university studies so it took too much of my time to actually release it publicly. Great to see a similar product being released, really looking forward to test it. Best of luck to you!

mrkurt · on March 19, 2020

How are you liking gVisor? We started with Firecracker and only spent a little time with gVisor, but it seems really nice.

lnsp · on March 19, 2020

I discovered it when looking into runtimes for my Bachelor's thesis. So far it's serving me quite well, especially after they reworked the Sentry file system abstraction (when I started file access was horribly slow). Networking works very well although they reimplemented it themselves. It also allows me to do base image layering using Overlay since I only keep binaries/source code/assets after a successful build.

mrkurt · on March 19, 2020

Overlay is a nice feature, we had to give that up with Firecracker. We pre-optimize filesystems instead, tgz them, and then cache them in various regions. Boot times are _insanely_ good, which we like, although a lot of apps (especially node apps) are slow to start.

ignoramous · on March 21, 2020

> ...which we like, although a lot of apps (especially node apps) are slow to start.

Surprising since NodeJS routinely comes up as the fastest runtime in Lambda benchmarks, especially for cold-starts: https://levelup.gitconnected.com/aws-lambda-cold-start-langu...

chrisjarvis · on March 18, 2020

Hi, this is a very cool product and I am planning a project that I think could use it.

The question I have tho: How do you take advantage of the gains from this if you still need one master strictly consistent db for writes?

Would a system design pattern to take advantage of fly.io be to have read only replicas on each geographic deploy or to only have region specific persistance? Apologies if this was already answered I read thru everything I saw. Thanks!

mrkurt · on March 18, 2020

The "simplest" gains come from adding an in memory cache, we include Redis for this and some apps work really well just leaving the DB where it is, caching aggressively, and running close to users: https://fly.io/docs/redis/

Read only replicas are a great first step for most applications. I'd probably do caching first, then replicas (which are kind of like caching).

Region specific persistence is one way to improve write latency, and I think the simplest for most apps. We've experimented with CockroachDB for this (it keeps ranges of rows where they're most busy), and you can actually deploy MongoDB this way.

chrisjarvis · on March 18, 2020

Thanks that totally makes sense! I look forward to playing around with this.

awoods187 · on March 18, 2020

PM@cockroach labs here. Which tools did you experiment with? We've been working to increase our tooling capabilities!

mrkurt · on March 19, 2020

We got hung up with the migration tooling for popular frameworks. If we can get those migrations to work with minimal drama, we want to basically show people “global full stack” with app + cache + database.

dan003400 · on March 18, 2020

Just switched from Heroku in just a few clicks and now I am running in 5 regions with auto-scaling for free :)

michaeldwan · on March 19, 2020

Awesome to hear!

yingw787 · on March 18, 2020

Actually, one more question...do you guys scale compute and data layers separately, or are they tightly coupled within the same container?

I was looking at containerized PostgreSQL on AWS because I want to colocate a job scheduling tool (pg_cron) with the database process, but RDS doesn't support that extension. Apparently (or at least I hope), ecs-cli compose supports docker volumes through EBS, which is the same base as EKS persistent volumes. There's next to no information for ECS + EBS though, everybody uses EC2 or full on EKS.

I was just thinking, if you needed to handle excessive read load on small quantities of data, having separate data layers would enable you to autoscale db instances while still having the same volumes, instead of using an entirely separate caching layer which could introduce bugs and increase maintenance overhead. If you guys had native HA with docker exec access and passed savings to consumers that would be huge for me and my use cases.

sudhirj · on March 18, 2020

I’m experimenting with this, they have Redis at every edge, with a way (SELECT 2) to send commands to all edges with eventual consistency. No RDBMS yet, said they’re looking at CockroachDB.

I’m running a single central Postgres server on Heroku and planning to use the Redis edges to cache.

mrkurt · on March 18, 2020

Right now we're best suited for app servers, databases won't (yet) run very well on fly.io. We are trying really hard to focus on what we have because it's so valuable but we love DBs so much we might end up trying to "solve" them soon.

tarun_anand · on March 18, 2020

But your most valuable customers will need to interact with an app server plus database for any real life use case. Can you share some applications where only placing the app server close to user works? Is the database back in Virginia?

mrkurt · on March 18, 2020

You are mostly right, there are a surprising number of problems that don't need much database interaction. Lots of image generation, video workloads, game servers, etc.

One of the things we want to do, though, is make "boring" apps really fast. My heuristic for this is "can you put a Rails app on fly.io without a rewrite?".

Many of these applications add a caching layer. Normally if someone wants to make a Rails app fast, they'll start by minimizing database round trips and cache views or model data. If somone has already done this work, fly.io might just work for this app since we have a global Redis service (https://fly.io/docs/redis/).

We have experimented with using CockroachDB in place of Postgres to get us even farther, but it doesn't work with most frameworks' migration tools.

We're also thinking of running fast-to-boot read replicas for Postgres, so people could leave their DB in Virginia but bring up replicas alongside their app servers.

If you've seen anyone do anything clever to "globalize" their database we're all ears.

wasd · on March 19, 2020

I’m extremely impressed with how slick your Heroku integration is. We thought about moving over to render but the dev ux just isn’t there like Heroku. I would be fine with paying for a read replica on the west coast that was always running if you can make it as easy the rest of your Heroku integration.

ignoramous · on March 21, 2020

I've seen https://macrometa.co take a stab at an edge database, but their guarantees (consistency / correctness) don't really infuse any sort of confidence in me [0]. https://yugabyte.com is another global scale database that competes squarely with cockroach-db, though I haven't used either.

Cloudflare Workers KV has the simplest model, with a central-db that transparently and eventually only replicates read-only, hot-data specific to a DC but writes continue to incur heavy penalty in terms of operations-per-second, cost, and latency.

In our production setup, we back Workers KV with a single-region, source-of-truth DynamoDB [1] and employ DynamoDB Streams to push data to Workers KV [2], that is,

Writes (control-plane): clients -> (graphql) DynamoDB -> Streams -> Workers KV

Reads (data-plane): clients -> Workers KV

Reads (control-plane): clients -> (graphql) DynamoDB

[0] https://news.ycombinator.com/item?id=19307122

[1] We really should switch to QLDB once it supports Triggers.

[2] We do so mainly because we do not to be locked-down to Workers KV, especially at its very nascent stage.

ctesh · on March 25, 2020

Hi Ignoramus - founder and CEO of Macrometa here - regret that our first attempt at explaining our consistency model caused confusion last year. Here's a link to the research paper that describes our architecture and consistency model.

https://bit.ly/HPTS-Macrometa

We got accepted in High Performance Transaction systems last year for the innovations around CRDTs for strong eventual consistency (SEC) with low read and write latencies.

Im trying to figure out how to provide a simple light weight way for fly.io users to use our global DB in their apps. It would allow a full stack to run at the edge with the compute on fly.io and the data on Macrometa either directly on fly.io or a nearby PoP (same city). Will update

yingw787 · on March 18, 2020

Fair enough! I signed up, looking forward to DBs on Fly.io!

(Also I got permission denied when attempting to curl the script when writing to /usr/local/bin, I needed sudo. I'm on Ubuntu 19.10 Eoan Ermine. Not sure whether security implications for `curl | sh` outweigh convenience, but I trust you guys and my connection. :P)

michaeldwan · on March 18, 2020

Heh curl to sudo slippery slope :P

The script is just picking the binary for your OS/arch and putting in PATH. We have instructions for doing it yourself here https://fly.io/docs/getting-started/installing-flyctl/#comma...

Or you can download straight from github: https://github.com/superfly/flyctl/releases

Hopefully we can get on snap soon!

irfansharif · on March 18, 2020

Congrats on the launch! If you're looking at CockroachDB (looking at some other comment on this thread), you should reach out to us if you haven't already. Your design here seems like the exact kind of thing we're hoping to push towards.

simonw · on March 18, 2020

I'm trying this out now.

One question: when I ran "flyctl deploy" it said "Docker daemon available, performing local build..."

If I turn off my local Docker, would it instead just upload the Dockerfile somewhere and perform the build for me?

If so, is there a way to force it to do that? I'd much rather upload a few hundred bytes of Dockerfile than build and push 100s of MBs of compiled image from my laptop.

mrkurt · on March 18, 2020

That's exactly what happens when you disable Docker. We default to local builds because it's a little more secure. It's usually faster (our remote builder isn't great at caching layers yet).

It would make sense to be able to force that. Right now you'd have to stop Docker.

(Also I'm a huge fan of Django)

michaeldwan · on March 18, 2020

good suggestion, I made an issue :) https://github.com/superfly/flyctl/issues/80

simonw · on March 18, 2020

... and they just shipped it as a feature! Impressive turnaround.

developer2 · on March 19, 2020

Meta Comment: Why are such posts (to HN users) written in grey text on a matching background? So hard to read, and I'm not even color blind. Please make it normal black text, ffs; there is no reason for this abysmal color scheme. Every time I open one of these posts, my initial impression is that it has been downvoted into oblivion, because that is what the color represents.

frakkingcylons · on March 18, 2020

Congratulations on launching! Would it make sense to think of this like Cloudflare Workers but for any application running in a Docker container? Are there any restrictions on outbound connections?

mrkurt · on March 18, 2020

There is some overlap with CloudFlare Workers. The big, practical difference is that you can put apps on fly.io that abuse CPUs, write to disk, accept TCP traffic, etc.

Most people we've worked with want to run run apps they've already written (or open source like https://github.com/h2non/imaginary).

michaeldwan · on March 18, 2020

It's closer to Google's Cloud Run, Heroku, or Fargate since we don't constrain you to a framework and the restrictions that come with it.

lpellis · on March 19, 2020

Well I just signed up and tested one of my Heroku apps, got it working with just a few clicks, really impressive. Are there any way to see stats like memory usage? What happens if an app goes over the limit? (heroku has some swap space before it kills an app, and then I can see it in the logs)

michaeldwan · on March 19, 2020

That's great to hear! Apps are allowed to burst over the limit if the host has free resources. We don't offer much visibility into metrics yet but we're working on it. All our metrics are going into prometheus with some awesome dashboards in grafana that we can't wait to expose to customers.

riquito · on March 18, 2020

Hi, cool service!

I have some questions about the pricing.

Say I want to use micro-1x with hard_limit/soft_limit = 20 and I get 40 concurrent request for one hour, would it cost $2.67 (micro-1 price) * 2? ($5.34) (monthly cost) If that is the case, can I set a limit on how many instances I want to run at most?

Another question: is the price calculated per second or is it there just to compare it with other services? If it's per second, since you don't fully scale to zero, should I consider having always at least one vm active full time?

michaeldwan · on March 18, 2020

That's correct, though we bill per second and scale back down when your app is over provisioned. Right now we always run 1 instance so your app responds right away after a period of inactivity. We might offer scaling to zero at some point too. You can also configure min/max count globally and per region.

It seems like no two apps have the same scaling needs, so if you have any questions or can't make something work let us know and we'll help!

CraftThatBlock · on March 18, 2020

It looks like the pricing is actually seconds base, so you would have 2 * 3600s * 0.000001015$/s = 0.007308$

mwcampbell · on March 18, 2020

I'm curious about how you turn the Docker image into a root filesystem for the micro-VM. If you're willing to share more about this, are you using an existing tool such as LinuxKit or Packer, or did you write your own?

michaeldwan · on March 18, 2020

We built something into our registry that squashes layers of an image into a compressed rootfs archive. Edge nodes map an image back to one of these files when launching an app. This cut launch time for large images in remote regions from tens of seconds to a few hundred ms. Much of that infrastructure is actually running on fly itself!

1UserB · on March 18, 2020

What are your thoughts on information-centric networking [1,2]?

You seem to be addressing the same problems.

[1] https://en.wikipedia.org/wiki/Information-centric_networking [2] https://irtf.org/icnrg

mrkurt · on March 18, 2020

I think we fit that category. We definitely have features/philosophies that match the Wikipedia article!

ottobonn · on March 18, 2020

Great introduction to the technology! I rarely see such a clear explanation of how new products work.

mrkurt · on March 19, 2020

This is my favorite Hacker News comment of all time.

thegreatpeter · on March 18, 2020

We've been using fly.io at Draftbit for the last couple of months and have nothing but amazing things to say about the platform, Kurt and the team!

I've gotten several friends to switch and they've all said the same thing. If you haven't given it a shot yet, there's a simple 1 click Heroku to Fly deployment you can use to give them a shot.

Sytten · on March 18, 2020

Do you guys plan to add an offering for a distributed database that we could match with this compute? That way we could build entire application on edge without buying into the FaaS movement.

michaeldwan · on March 18, 2020

That's something we're certainly thinking about, but we need to get compute right first!

jedberg · on March 18, 2020

> That way we could build entire application on edge without buying into the FaaS movement

So you want to be able to upload your code and have someone manage the infrastructure and datastore.

Isn't that the definition of FaaS?

Sytten · on March 18, 2020

My definition of FaaS is each controller living its own life and deployed independently behind an api gateway. This is a PaaS because only the infrastructure is managed and I can deploy my whole application in one docker image.

mrkurt · on March 19, 2020

That’s a pretty good definition of FaaS.

michaeldwan · on March 18, 2020

Partially, and there's plenty of hosts that offer it. We're trying to make things like full stack Rails monoliths at the edge possible.

mrkurt · on March 19, 2020

That sounds like the definition of “utopia” to me.

jimleroyer · on March 18, 2020

Pretty nice to see Rust being used for network performance code. Do you have any learning using it to share? Would you rather use C++ if you'd have to do it again? Do you feel more confident in your code? Do you feel it was slower or quicker to write code compared to C++

jeromegn · on March 18, 2020

We attempted C++ about a year ago, but I was never confident in our ability to clean up memory allocations (we had leaks) or avoid undefined behavior (we had segfaults).

I definitely feel more confident about our Rust code. It's no silver bullet, but it prevents a lot of unsoundness with its compile-time guarantees.

I can't really compare to C++, but it's easy to write new code or refactor old code. It took some time to get there, though.

All in all, I would recommend Rust wholeheartedly. The ecosystem is growing and getting more mature every week. The community is very helpful in general, especially the tokio folks.

steveklabnik · on March 18, 2020

Glad to hear you all are doing great with Rust! :D

jimleroyer · on March 18, 2020

Thanks for sharing! It's nice to hear about this learning. Also, wasn't aware of tokio, it looks real nice.

sairamkunala · on March 18, 2020

This is great @mrkurt

Any plans to launch any Datacenters in India? could not find any here - https://fly.io/docs/regions/#welcome-message

michaeldwan · on March 18, 2020

No timeline yet but we're eager to get into India and South America when we can

sidcool · on March 18, 2020

India is the next cloud hub. AWS and Azure are earning hands over fist here.

mrkurt · on March 18, 2020

We are definitely missing out on customers because we aren't in India and South America. It's expensive to solve that problem but we're getting close!

sidcool · on March 19, 2020

Thanks! This is a very cool tool. Wish you all the best.

zomglings · on March 18, 2020

This is an awesome idea and it looks like you guys are executing on it beautifully. Congratulations!

Out of curiosity, what kind of customers/teams are asking you for gRPC support? Is this coming from your enterprise customers or from smaller teams?

mrkurt · on March 18, 2020

We actually noticed people asking about gRPC on Hacker News, especially on Cloud Run posts:

https://news.ycombinator.com/item?id=19612577

It's usually small teams, individual devs who want gRPC. Even if it's within a large company, it's almost always one technical person.

rsweeney21 · on March 18, 2020

Neat idea, but what about my database? Where does that live?

Do you do anything to speed up latency from the edge to the database?

michaeldwan · on March 18, 2020

from another thread: We're not solving db latency yet. A good place to start is aggressively caching at the edge. We offer an in-memory redis cache for this that can replicate commands globally. Beyond that you'd need read replicas which will be possible once we launch persistent storage. That said, latency between data centers on the same continent is often less than I would have thought!

You should also checkout something like FaunaDB.

rhizome · on March 18, 2020

This is the pendulum swinging back to the "5 colos scattered around the globe" architecture.

time0ut · on March 18, 2020

Very impressive work. Just curious if you are worried about the big guys (AWS, GCP, etc) coming in and offering a similar service? If so, are you hoping DX will help act as moat?

michaeldwan · on March 18, 2020

That's always a concern, but not something we focus on. DX is a big deal. That's why Heroku and CF Workers are so popular despite everything the big guys offer.

rgbrgb · on March 19, 2020

Awesome work, congrats!

Do you consider zeit.co or netlify competitors? I saw in a comment that you're interested in making it dead easy to deploy a simple Rails app to the edge. These companies have gone deep on a different segment that's deploying web apps without DBs. Is your roadmap kind of routing around the JAMstack crowd, straight to supporting traditional full-stack apps? Seems much harder, but a more valuable prize if so.

michaeldwan · on March 19, 2020

Thanks! We don't consider Zeit or Netlify competitors. We're a level lower than them -- you could actually run those things on top of fly!

As you said, they're both going deep for JavaScript apps (and doing an awesome job at it!) and we're focusing on being an awesome place to run full stack apps and exotic (non-http) servers.

davidkhess · on March 18, 2020

How does this compare to StackPath? https://www.stackpath.com

samcrawford · on March 18, 2020

We use StackPath in a similar capacity to the one being pitched here. They support both containers and VMs, and have a very good spread of locations. Their heritage is a CDN - they were formerly MaxCDN and Highwinds - so their connectivity to eyeball networks is excellent.

We had some minor reliability issues with the edge platform early on, but their support and responsiveness was excellent.

We are very happy with them!

byteshock · on March 19, 2020

Fascinating product! I was wondering if there are any other products that are similar to yours? Other than serverless platforms such as Cloudlfare Workers or AWS Lambda.

I know Stackpath has been offering this kind of thing for a while. So how would your product compare to theirs? Since Stackpath has a well established cdn network already.

manigandham · on March 19, 2020

All of the clouds have a service to run containers (GCP Cloud Run, Azure Container Instance, AWS Fargate) but you have to deploy to the different regions separately and use their global load balancers or an external CDN to manage traffic.

Stackpath is an amalgamation of many acquired companies. Their CDN is fine but nothing special. The computing services aren't great. Not very competitive on price and have reliability and latency issues. Their cloud storage is white-labeled Wasabi. I wouldn't recommend them as the first choice for anything.

Zeit Now version 1 was also a run your own container runtime but that has been deprecated: https://zeit.co/docs/v1/getting-started/deployment#docker-de...

byteshock · on March 19, 2020

Adding onto my last comment, is Stackpath container pricing cheaper than yours at the moment?

I understand this might be due to Stackpath being a larger company and owning hardware instead of renting it. But the price for traffic and compute seem to be cheaper. There is also no mention of how much you charge for storage on the pricing page.

I’m looking to deploy my next app onto one of these platforms and would like to know the price differences!

mrkurt · on March 19, 2020

Stackpath is cheaper on bandwidth, mostly because of scale. For video based applications, this is a big deal (and we work with video companies to try and get better bandwidth pricing). For typical web app servers, bandwidth is usually not a very large expense.

CPU based pricing is pretty close. We've heard our CPUs are higher performance, but haven't done any real testing. The people who run high CPU apps on us _tend_ to pay less because we scale up and down so quickly.

byteshock · on March 19, 2020

Thank you for your reply. Your deployment seems to be a lot more streamlined and simpler than Stackpath at the moment. That just might be the deciding factor for me and other developers!

I wish you and your team the best of luck!

ggregoire · on March 18, 2020

Just a feedback: the landing page doesn't explain at all what's the difference with AWS, Google Cloud, Azure, etc.

I had to read these parts in the doc to get how Fly solves the problem differently:

> Think of a world where your application appears in the location where it is needed. When a user connects to a Fly application, the system determines the nearest location for the lowest latency and starts the application there.

> Compare those Fly features with a traditional non-Fly global cloud. There you create instances of your application at every location where you want low latency. Then, as demand grows, you'll end up scaling-up each location because scaling down is tricky and would involve a lot of automation. The likelihood is that you'll end up paying 24/7 for that scaled-up cloud version. Repeat that at every location and it’s a lot to pay for.

michaeldwan · on March 18, 2020

Thanks for the feedback. The landing page is very much a work in progress!

yingw787 · on March 18, 2020

Honestly, while I'd love to try this out, I'm afraid of committing to a solution that might not be around long-term, which for me at least overrides concerns of peace of mind and ease of use, and I'm doing a hobby project at the moment.

I'm using bare AWS at the moment because a) they gave me $5k in credits for YC SUS, b) they own the physical servers, and c) I can trust that they'll be around a long time, so I'd rather get locked into AWS proper rather than a service that might be built on top of AWS (e.g. CloudFormation vs. Terraform).

But I get, better than I did two months ago, just how freaking hard it is to build something, anything. This is amazing work, and I couldn't do it. Kudos to you, and I look forward to hearing about your amazing success!

mrkurt · on March 18, 2020

We're not AWS, but: 1) we own physical servers and 2) we're profitable 3) have big public companies as customers.

vikramkr · on March 18, 2020

Weirdly I feel like the fact that you are profitable and have large customers could be as important a part of your customer pitch as it is your investor pitch - even if the tech is awesome that trust in the organization to stick around is super important (just look at the trouble around stadia haha). It's like those fintech companies that have "but how do we make money?" To make sure customers know "ok they can make money without screwing me over or ending up having to shut down"

yingw787 · on March 18, 2020

Oh woah, profitable and own physical servers? You guys are gonna be just fine :)

OldHand2018 · on March 18, 2020

Congrats! I can see a lot of use cases for picking just a small subset of cities/regions and skipping Google/Amazon/Azure altogether.

mayank · on March 18, 2020

> 1) we own physical servers and 2) we're profitable 3) have big public companies as customers

Ironically, these also make you a prime acquisition target (because the product idea rocks), which renders your long-term future unclear.

mrkurt · on March 18, 2020

I was at a company that got acquired before. It was so awful. I'd rather just work on this forever than get absorbed by a big company.

optimiz3 · on March 18, 2020

> I was at a company that got acquired before. It was so awful. I'd rather just work on this forever than get absorbed by a big company.

How do your investors feel about this / what's your exit plan?

erohead · on March 18, 2020

We feel great about Fly.io :) (I am their group partner at YC)

mayank · on March 18, 2020

Could you elaborate? :) Because otherwise, you're possibly sending the same message as what I alluded to (i.e., we feel great because we're expecting a big exit).

tossAsimov · on March 18, 2020

Well, whoever buys them would like to keep the customers, as it is already a profitable business. So, even if they completely close the business, they should provide a way to migrate, as nobody likes to loose money/customers.

maxmcd · on March 18, 2020

Physical servers in Vultr right? I believe your geo regions are the same.

michaeldwan · on March 18, 2020

So far only Packet, but we'll be expanding soon

edit: those regions are the same because it's the easiest set of cities to roll this out in :)

slashdev · on March 18, 2020

"I won't use anything from smaller or new companies or Google in case it goes away". That's silly, how is this the top comment? Are people that conservative in tech of all places? This is a docker runtime. If it goes away you'll be up and running again without delay on AWS or anywhere else, you just won't have the edge performance characteristics.

yingw787 · on March 18, 2020

Yes, I'm very conservative. I'm in tech because I want to compound my achievements over time, and I can't do that if I fear the ground shifting underneath my feet. Hence Bezo's mantra "focus on the things that don't change". I hate O(N) efforts, I prefer O(N log N) efforts or better.

Enterprise workloads are far more conservative than I am, those guys spend decades running the same servers. It's why they can focus on sales and customer success and rake in money, which is what actually puts food on the table for their kids.

LunaSea · on March 18, 2020

Depending on the region your infrastructure is located in, AWS doesn't own the datacenter.

For example in the Paris (eu-west-3) region, their availability zones are operated on hardware owned and managed by Telehouse, Interxion and Equinix.

yingw787 · on March 18, 2020

Huh, interesting, I didn't know this. Thanks so much for sharing! Do you have a link? I searched and found all three companies, but I only see links to AWS Direct Connect and hybrid cloud solutions.

I just assumed if they're creating their own chips, they're probably creating their own servers, datacenters, networks, etc. but I guess I shouldn't jump to conclusions.

mrkurt · on March 18, 2020

Datacenters are weird. They're basically a real estate market, check out Digital Realty Trust: https://en.wikipedia.org/wiki/Digital_Realty

Networks are crazy too, especially between continents, ownership of undersea cables is fascinating: https://en.wikipedia.org/wiki/Submarine_communications_cable

LunaSea · on March 18, 2020

https://www.interxion.com/sites/default/files/2020-01/paris-...

I'm guessing it's likely that this solution was adopted to quickly expand to a lot of countries due to data location and privacy questions being raised.

zomglings · on March 18, 2020

Is it already possible or do you guys plan to add support for attaching persistent storage to applications deployed with fly?

I am building a search engine and this would let me derive your performance benefits using region-scoped databases and search indices.

michaeldwan · on March 18, 2020

That's awesome, your use case is perfect for region scoped persistent storage.

We're testing persistent storage privately with a few customers now and the results are exciting. My favorite is using minio as a private global s3 for caching.

What are you using for the index storage engine?

dgreensp · on March 18, 2020

I would be really interested in Fly if it abstracted persistent storage. An exciting use case to me would be something that doesn’t with “...for caching.”

michaeldwan · on March 18, 2020

We're on it!

zomglings · on March 18, 2020

elasticsearch

mnutt · on March 18, 2020

I vaguely remember that fly.io used to have a platform based on v8 isolates, I’m guessing the new platform is a bit of a pivot? I’m curious, was it just to support more platforms, or we’re there technical challenges with isolates?

michaeldwan · on March 18, 2020

It's a bit of a pivot but still in the same space. Here's a few reasons why we landed on this:

- Our JS apps require customers to write new code to solve problems. That’s a tough sell for companies with existing code they need to make fast.

- The more people used JS apps the more unexpected things they wanted to do. Like tensorflow at the edge, audio and video encoding, game servers, etc. No way we could support any of that without moving down the stack.

- Reimplementing the service worker api was a slog we didn’t want to continue. Deno is fantastic and we’d rather just run those apps than compete with it

rsmets · on March 19, 2020

Neat idea and the ease of use is certainly attractive! I have a question though...

Couldn't one simply use a traditional CDN where ever their customers are which would then allow the inbound network requests to jump on private interface routing to where ever the app truly lives quicker - essentially making for a more responsive "business logic" app feel? If all infrastructure was on the same cloud provider, say AWS.

I understand this approach is less dynamic in nature but would have been a solution for the Ars Technica problem presented I feel. If not, what am I missing? Thanks!

mrkurt · on March 19, 2020

That's actually a good, generic way to speed up the initial app connections. You can even do it with a CDN in front of a backend on a different network, since most CDNs pool connections.

The problem is that everything useful a server side application does still requires round trips. Even for the most boring content, an 800ms delay is pretty normal if you have a spread out audience.

donjh · on March 18, 2020

We are happy Fly customers! Amazing service and team. Congrats on the launch.

hendrikhalkow · on March 19, 2020

This is awesome work. What I'd like to see is some support for higher-level structures, i.e. autoscaling, stateful sets, service to service communication, persistent volumes, etc.

mrkurt · on March 19, 2020

I would like to see that too. Hopefully we can keep working on fly.io and ultimately do all those things.

alexdumitru · on March 18, 2020

Does anyone know what did they use to build the docs? They look amazing.

https://fly.io/docs/

michaeldwan · on March 18, 2020

Middleman and a fantastic designer + writer!

We've gotten so many comments about the docs, I wish we could open source something but it's tightly coupled to other things that aren't useful to anyone else.

ibatindev · on March 18, 2020

Inspecting the source , it seems it's https://docsearch.algolia.com/

mrkurt · on March 19, 2020

Algolia is great, we use it for search.

simonw · on March 18, 2020

Do you scale-to-zero - shut down my containers if they aren't getting any traffic and then start then up again on-demand when traffic starts flowing again?