Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Fly.io (YC W20) – Deploy app servers close to your users
626 points by mrkurt on March 18, 2020 | hide | past | favorite | 261 comments
Hello Hacker News! We're Kurt, Jerome, and Michael from fly.io (https://fly.io/). We're building a platform to run Docker applications close to end users. It's kind of like a content delivery network, but for backend servers.

I helped build Ars Technica and spent the majority of my time trying to make the site fast. We used a content delivery network to cache static content close to anonymous readers and it worked very well for them. But the most valuable readers were not these, but the ones who paid for subscriptions. They wanted personalized content and features for interacting with the community – and we couldn't make those fast. Content delivery networks don't work for Ars Technica's best customers.

Running Docker apps close to users helps get past the "slow" speed of light. Most interactions with an app server seem slow because of latency between the hardware it's running on (frequently in Virginia) and the end user (frequently not in Virginia). Moving server apps close to users is a simple way to decrease latency, sometimes by 80% or more.

fly.io is really a way to run Docker images on servers in different cities and a global router to connect users to the nearest available instance. We convert your Docker image into a root filesystem, boot tiny VMs using a project called Firecracker (recently discussed here: https://news.ycombinator.com/item?id=22512196) and then proxy connections to it. As your app gets more traffic, we add VMs in the most popular locations.

We wrote a Rust based router to distribute incoming connections from end users. The router terminates TLS when necessary (some customers handle their own TLS) and then hands the connection off to the best available Firecracker VM, which is frequently in a different city.

Networking took us a lot of time to get right. Applications get dedicated IP addresses from an Anycast block. Anycast is an internet routing feature that lets us "announce" from multiple datacenters, and then core routers pick the destination with the shortest route (mostly). We run a mesh Wireguard network for backhaul, so in flight data is encrypted all the way into a user application. This is the same kind of network infrastructure the good content delivery networks use.

We got a handful of enterprise companies to pay for this, and spent almost a year making it simple to use — it takes 3 commands to deploy a Docker image and have it running in 17 cities: https://fly.io/docs/speedrun/. We also built "Turboku" to speed up Heroku apps. Pick a Heroku app and we deploy the slug on our infrastructure .. typical Heroku apps are 800ms faster on fly.io: https://fly.io/heroku/

We've also built some features based on Hacker News comments. When people launch container hosting on Hacker News, there's almost always a comment asking for:

1. gRPC support: apps deployed to fly.io can accept any kind of TCP connection. We kept seeing people say "hey I want to run gRPC servers on this shiney container runtime". So you can! You can specify if you want us to do TLS or HTTP for an app, or just do everything yourself.

2. Max monthly spend: unexpected traffic spikes happen, and the thought of spending an unbounded amount of money in a month is really uncomfortable. You can configure fly.io apps with a max monthly budget, we'll suspend them when they hit that budget, and then re-enable them at the beginning of the next month.

One of the best parts of building this has been seeing the problems that developers are trying to solve, often problems we didn't know about beforehand. My favorite is a project to re-encode MP3s at variable speeds for specific users (apparently the Apple Audiobook player has no option for playback speed). Another is "TensorFlow at the edge" — they trained a TensorFlow model to detect bots and run predictions before handling requests.

We're really happy we get to show this to you all, thank you for reading about it! Please let us know your thoughts and questions in the comments.




Fascinating idea. I’d like gently to suggest that you make this your elevator pitch, since I don’t care about who you are. It’s pretty much what you said, but up at the top:

“fly.io is really a way to run Docker images on servers in different cities and a global router to connect users to the nearest available instance. We convert your Docker image into a root filesystem, boot tiny VMs using an Amazon project called Firecracker, and then proxy connections to it. As your app gets more traffic, we add VMs in the most popular locations.”

Exciting stuff! My best to you!


Agreed, that one sentence line is perfect and not fully clear on the website. A lot of these cloud services keep it quite vague for some reason, which I somewhat understand for the whole 'CTO marketing'. But the early adopters will appreciate it and they are who matter the most early on.


I wish more marketing sites just had a description like the one in the root comment, usually I have to read a bunch of docs before I get a grasp on what a product does when they introduce them on HN, if I could just read a description like this I could either look at more info or ignore it if I don't care about it.

Instead, I'm trying to make sense of APIs to figure out what a product called Floozbobble.io does and it turns out that it's SaaS for making SaaS product factory factories, and I don't care about that, but then some other product called Dizmeple.cloud comes out that makes it easier for me to manage my database deployments, which I do care about, and I can't tell I would want it because it has no fucking description!

When did we start to prefer these crap marketing sites that take 12,000 spins on my scroll wheel to get through and still don't tell you anything?


when companies decided it was too hard to run data driven marketing programs?


This is actually one of my favorite things about Hacker News. I end up redoing landing pages each time we get comments like this.


> 2. Max monthly spend: unexpected traffic spikes happen, and the thought of spending an unbounded amount of money in a month is really uncomfortable. You can configure fly.io apps with a max monthly budget, we'll suspend them when they hit that budget, and then re-enable them at the beginning of the next month.

I like this, not having caps is a major problem with some of your competition for smaller projects/companies where the max caps are more important then availability.

I heard of more then one project which mad some mistake them self wrt. some code generating request in their client. Or had some other reason why they had insane usage spikes, causing them to basically go bankrupt in a mater of a view hours. Not even days. (Through one project got lucked out as if I remember correctly Amazone bailed them out to prevent bad press).

IMHO for the majority of server application availability is important but only up to a certain cost. After which unavailability for some time is better, even if you lose some customers. (Yes, like always there are exceptions).


I don't think I like the failure mode of your app getting slower (losing your ApplicationDN) right at the time it's really popular. I don't have a better solution though.


Unless your popularity pays your bills, it's the most graceful degradation you can have.


This is tricky since every app has different performance and budget constraints. We try to minimize footguns by starting with sensible defaults. Over time we'll provide more options so you can tune scaling and placement yourself. We're also happy to help if you need!


> I don't have a better solution though

Don't host in such a way that you're paying for traffic... Hetzner, OVH and Packet all have dedicated servers where you don't pay for the traffic, inbound or outbound.

Edit: judging by other comments here, it might seem like US zone of Fly.io is in fact hosted in Packet so they are probably themselves not paying for the traffic. Maybe they are using Hetzner for the EU zone (or OVH for that matter).


Packet charges for outbound bandwidth. The places you can get close to free on outbound bandwidth don’t give you the ability to do anycast and tend to over subscribe their networks.

We’d like to grt network prices down but we can’t run our service on ovh or Hetzner.


Vultr does BGP if you have a routable subnet and AS. Just open a support ticket.


Yep! We've been experimenting with Vultr, their physical servers are only in 7 cities. I hope they expand.


But your server going down because of high load instead of being shut down by a bill breaker doesn't make that much difference.


The solution is to set your limit pretty high (100 times your average or so) or, alternatively, have shorter limit intervals - 100 times your daily bill is still a lot less harmful than 100 times your monthly bill.

The only 'real' solution is proper alerting, but even then it's pretty easy to rack up a bill of several thousand dollar before anyone realizes what's going on.


maybe if you go over, you automatically add ads to the page or something.


This would be terrible


Ads would be terrible but I'm really interested in the whole idea of "do something else when an app os over budget". I hadn't considered much except a 402 status code, which seems kinda mean: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/402


A "donate" button when the current remaining budget would last for less than 100 minutes.


What problem does it solve? Because latency is currently not an issue with all the regions from current cloud providers from my perspective. And for all static stuff you can use a CDN that has pop all arround the world / cities.

Not sure I understand the use case of a single Docker image in a city outside of your entire backend services, especially the DB. If your Docker image talks to something else on AWS / GCP for example you add a lot of latency using public routes.

It looks more like: https://workers.cloudflare.com/


There's a whole bunch of use cases people have today that don't require a database. Here's a few that I'm excited about:

  - image, video, audio processing near consumers
  - game or video chat servers running near the centroid of  people in a session
  - server side rendering of single page js apps
  - route users to regional data centers for compliance
  - graphql stitching / caching (we do this!)
  - pass through cache for s3, or just minio as a global s3 (we do this!)
  - buildkite / GitHub Action agents (we do this!)
  - tensorflow prediction / DDoS & bot detection
  - load balance between spot instances on AWS/GCP
  - TLS termination for custom domains
  - authentication proxy / api gateway (we do this!)
  - IoT stream processing near devices
CloudFlare Workers is a fantastic serverless function runtime, but we're a layer lower than that. You can actually build it on top of fly.

edit: formatting


This basically bumps the amount of behavior you can stuff into the edge by several orders of magnitude. Previously, Lambda@Edge will like, maybe validate a JWT. Now, you can put like half your app in there. It sounds like a tiny incremental change but it's big enough that it simply redefines what's even possible in a POP.


I'm interested, what is now possible with this? Is it high bandwidth VR sort of stuff?


This is not an area I'm in a lot, so I may be wrong, the cloudflare workers page GP mentioned says that you only get 10ms CPU time per request (or 50ms with the paid plan). I'm assuming lambda etc have similar limits. So presumably anything that takes longer than 10/50ms.


lambda can run up to 30m I think (idk about lambda@edge). CF Workers fail when trying to use something like node-unfluff, because the CPU time takes too long.

This seems to me, like a more controllable lambda@edge.


The containers on fly.io can go up to 8CPUs. So you can do a lot of computation for images, video etc.

They can also accept any kind of TCP traffic (and we're trialing UDP), so lots of interesting network services. This is especially interesting for people who want to do live video.

AND we have disks. So you can deploy Varnish, or nginx caches, etc. This is something we enable by hand per app.


I didn't find documentation about the disks: size, ssd/hdd, price?


Ah, I didn't make this obvious but that's a feature we're currently testing. We can enable ephemeral disks on apps, but it's not generally available. They're local SSDs, size is variable (still figuring that out). They're free right now. ;)

If you want to try them out, you can create an app and then an email to either me or support at fly.io and we'll turn them on for you.


They originally had (and seem to still have) the same JS runtime at the edge to give you a smart CDN/reverse proxy.

They just updated it from being JS only to being able to run any Docker image. Cloudflare gives you a persistent key/value store and Fly provides a non-persistent Redis cache.

You don’t have to move your entire app but there are plenty of use-cases where you can move more logic to the edge.


We’re moving the JS apps to just run on Deno containers. Deno is fabulous.


This is the most intriguing thing for me, perhaps second only to the fact that you guys are working with Rust. I have followed Deno since it was announced and it has been fascinating to watch it evolve. How much Deno code do you all have working in production right now?


Only a little. It works very well, once we port our edge js library (https://github.com/superfly/edge) to Deno we will move over some things handling hundreds of millions of requests per day.


Hats off, @mrkurt. This is the kind of thing HN, and YC, are all about at their very best. Absolutely love the way this idea, executed right, helps improve the deployment topography landscape, so to speak. Between service workers, Cloudflare workers, and now fly.io nodes at the edge, the degree of control over where and how your application executes across the network in relation to clients is kind of exhilarating -- at least for this 22-year veteran of web-related architecture. Bravo! And good luck!


Ahhhh thank you! That's an incredibly nice thing to say.


We've been using fly.io for a month now. They are amazing. The service is great, the team is great, and most importantly our apps have seen a dramatic performance uplift by leveraging fly. A++ would use again.

Full disclosure: I am another YC founder. Fly did not ask me or encourage me to post this in any way.


What kind of apps are you using it for? How much data do you have to move to the edge?


I'm not jassmith87, but they're making a super slick app builder at glideapps.com and using fly for custom domains


> ...builder at glideapps.com

Nice. Here's their launch-hn: https://news.ycombinator.com/item?id=19163081


If one API request makes on average 5-10 round trips to the database, and the database is in Virginia, this only makes the problem (much) worse. How do you solve this problem for this use case?


We're not solving db latency yet. A good place to start is aggressively caching at the edge. We offer an in-memory redis cache for this that can replicate commands globally. Beyond that you'd need read replicas which will be possible once we launch persistent storage. That said, latency between data centers on the same continent is often less than I would have thought!


Gotcha, honestly that feels pretty niche to me, which might be a good place for a startup to start.

I can't think of many back-end applications between purely static content (just use a CDN) and needs a database connection. Probably video game servers, where you don't need the game state to be (immediately) stored/accessed globally.


Game servers are a great example. What's interesting is how many different kinds of apps need game-server like infrastructure: https://www.figma.com/blog/how-figmas-multiplayer-technology...

We've been talking to a lot of startups doing communications tools, especially for remote work.

Lots of full stack apps benefit from app servers + redis cache in different regions. They need a database connection, but if they're already done the work to minimize DB round trips they might just work with no code changes.

There are also a bunch of folks doing really dynamic video and image delivery. Where an individual user gets an entirely unique blob of binary data.


I could see IoT device 'acceleration' to be a significant potential use-case. Something with a tiny bill of materials for the device itself, offloading any non-trivial processing to a virtual device on the closest 'real' infrastructure you can get. Especially for something human-interactive, you would want to be very aggressive about minimizing latency.

Also, depending on how tight the limits are for VM lifetime / bandwidth / outbound connections, I could see using these as a kind of virtual NIC / service mesh type thing for consumer-grade internet connections, to restore the inbound routing capabilities precluded by carrier-grade NAT and avoid their traffic discrimination, as well as potentially on-boarding to higher-quality transit as early as possible for use when accessing latency-sensitive services further 'interior' to the cloud.


These are great. IoT seems like a thing you could do but that's a really specific use case I hadn't even considered.

The second example would be interesting to try. There's no real limit on VM lifetime or outbound connections, bandwidth is more of a budget problem. VMs are ephemeral, so they _can_ go away but we're all happier if they just run forever.


You should solve this by making one request to a db api layer (use GraphQL!) And have that layer do the back and forth with the db, from right next to it.

Holding transactions open for long distance round trips is going to get you into trouble in a myriad of ways. It does not scale.


This is exactly what we do! We have a Rails app exposing a graphql api for our cli and web app. The web app is a React SPA monster that we're replacing with another Rails app that runs at the edge and serves the customer UI, docs, marketing pages, and some other things. It's really pleasant. We love graphql.


Why not just have the rails app exposing GQL also expose the customer UI/etc?


Mostly because the path to here was not a straight one :)

Our main Rails app does a bunch of things that can't run globally and it takes a long time to build while the customer facing app is a lightweight Rails app that consolidated several static and single page react apps into one less gross place.


That's actually the plan! The Ruby Graphql gem lets you resolve queries directly so the app can have an "edge" or "core" mode and make the right call (edge over HTTP, core direct to DB).


One good solution might be something like Datomic (the classic version, now called Datomic On-Prem), where each application process has the query engine embedded in it, and it pulls data from a relatively dumb storage service such as DynamoDB as needed, then aggressively caches it.


That would work great. Even something like FaunaDB would work really well.


You'd want to pair up with a global database like FaunaDB or Cosmos DB for the data backend.


Fly.io is the service I’ve been most excited to see make headway in a long time.

Pairing this with a global sql (cough cockroach cough*) is literally the app platform I’ve been dreaming about.

I would like to see more documentation around push based architectures. That is I want to build a system where a process pushes to the Redis in fly but is not running itself in fly. Basically something that may be unrouteable for pulls.

In any case congrats fly team!


Thanks for the kind words! Cockroach is awesome and it's something we'd like to offer someday. Until then checkout FaunaDB.

Your push example is interesting. We don't have a way to connect to redis from outside fly, but you could certainly boot up a tiny app on fly that acts as a proxy from external apps into the fly redis.


Yeah that’s what I ended up messing with, it works for sure.


I'm so glad you've launched. I loved the entire thing every time you've explained it. Absolutely chuffed to see Wireguard and Firecracker getting deployed "in anger". Tensorflow@Edge sounds like the good kind of bananas to use it with.

Your quickstart being called speedrun is too good for you alone to have it, so I'm stealing it the next chance I get.

Either way congratulations Kurt & team! (and no I still have not bought that Safari 911 or any 911 for that matter and yes professional help is being sought).


Thanks for the kind words!


Who do you peer with in Europe? I see in the US you use Packet (https://www.packet.com/) but I couldn't find any information about the EU, or do you focus on the US market for now?

We're also a LIR and want to build an anycast network (we anonymize streaming data), any helpful resources you can share on this?

Cool product btw! I think this will be a very interesting area in the coming years, the fact that you offer Docker containers is a good USP as compared e.g. to Cloudflare workers, we might even consider using your service ourselves if you provide service in Europe (our customers are mostly in Germany)!


We actually have servers in North America, Europe (including Frankfurt), and Asia Pacific. The complete list is here: https://fly.io/docs/regions/#discovering-your-applications-r...

Building an anycast network is expensive. That's part of what we want to make accessible to devs. There are a couple of companies (like Packet, and possibly Vultr) you can lease servers from that will handle anycast. These tend to get you into the same ~16 regions, expanding past those can be difficult and even more expensive. That's what we're working on now.


Having a more prominent 'locations' or 'regions' link on the website navigation/homepage might be helpful. It was the first thing I looked for even before pricing. It's currently a bit hidden in the docs.

I know locations are probably not super important to you guys as you see it as a starting point or something super flexible, but I always find myself drawn to the concrete stuff like that. Largely as a measure of how mature the product is.


Also a map would be a great visualization. A list doesn't really tell me much without a frame a reference


Good call. We have a neat visualization we want to build to show what's happening, but we don't need to wait on that to make it more obvious.


Cool, thanks! I know Vultr but I'm looking for a second provider to fulfill the multi-homing requirement. And yes it's expensive and I think it's a valuable service to developers you're offering!


Curious - In what way would building an Anycast network be expensive ?


Running your own bgp in multiple data centers requires some reasonable network engineering skills. Anycast, specifically, is a complex beast. If you use multiple transit providers, you have to continuously tweak things to make sure people aren’t getting weird routes. Network providers like to send people over the cheapest (in dollars) routes sometimes, which makes things slow.


No south america :(


Yeah not yet. :sob: We're going to expand there and India as fast as we can.


Good to know, this looks great!

Btw any plans to support Java applications?


If you can build it in a Docker image or if you can deploy it to Heroku, then we support it :)

If it doesn't work, it's a bug.


Awesome, I had a misunderstanding it was only GoLang because of your github repos, get ready to get filthy rich with this.

If you ever come to SA, never partner with Localweb ( biggest/major local server provider ) they are garbage.


Congratulations on the launch! I've been following fly.io ever since I stumbled on it 2 years ago.

A few questions, if I may:

> We run a mesh Wireguard network for backhaul, so in flight data is encrypted all the way into a user application. This is the same kind of network infrastructure the good content delivery networks use.

Does it mean the backhaul is private and not tunneling through the public internet?

> fly.io is really a way to run Docker images on servers in different cities and a global router to connect users to the nearest avaible instance.

I use Cloudflare Workers and I find that at times they load-balance the traffic away from the nearest location [0][1] to some location half-way around the world adding up to 8x to the usual latency we'd rather not have. I understand the point of not running an app in all locations esp for low traffic or cold apps, but do you also "load-balance" away the traffic to data-centers with higher capacity? If so, is there a documentation around this? I'm asking because for my use-case, I'd rather have the app running in the next-nearest location and not the least-load location.

> The router terminates TLS when necessary and then hands the connection off to the best available Firecracker VM, which is frequently in a different city.

Frequently? Are these server-routers running in more locations than data centers that run apps?

Out of curiosity, are these server-routers eBPF-based or dpdk or...?

> Networking took us a lot of time to get right.

Interesting, and if you're okay sharing more-- is it that the anycast setup and routing that took time, or figuring out networking wrt the app/containers?

Thanks a lot.

[0] https://community.cloudflare.com/t/caveat-emptor-code-runs-i...

[1] https://cloudflare-test.judge.sh/


Hey, I'm the tech lead of Workers. I don't want to intrude too much on this thread, but just wanted to say: we don't do any special load-balancing for Workers requests; they are treated the same as any other Cloudflare request. We use Anycast routing (where all our datacenters advertise the same IP addresses), which has a lot of benefits, but occasionally produces weird routes. Often this relates to specific ISPs having unusual routing logic that, for whatever reason, doesn't choose the shortest route. We put a lot of effort into tracking these down and fixing them (if the ISP is willing to cooperate). We do sometimes re-route a fraction of traffic away from an overloaded datacenter by having it stop advertising some IPs, but if the internet is working as it should, that traffic should end up going to the next-closest datacenter, not around the world. When you see requests going around the world, feel free to file a support request and tell us about your ISP so we can try to track down the problem and fix it.


> Does it mean the backhaul is private and not tunneling through the public internet?

Backhaul runs only through the encrypted tunnel. The Wireguard connection itself _can_ go over the public internet, but the data within the tunnel is encrypted and never exposed.

> I use Cloudflare Workers and I find that at times they load-balance the traffic away from the nearest location [0][1] to some location half-way around the world adding up to 8x to the usual latency we'd rather not have. I understand the point of not running an app in all locations esp for low traffic or cold apps, but do you also "load-balance" away the traffic to data-centers with higher capacity?

This is actually a few different problems. Anycast can be confusing and sometimes you'll see weird internet routes, we've seen people from Michigan get routed to Tokyo for some reason. This is especially bad when you have hundreds of locations announcing an IP block.

Server capacity is a slightly different issue. We put apps where we see the most "users" (based on connection volumes). If we get a spike that fills up a region and can't put your app there, we'll put it in the next nearest region, which I think is what you want!

CDNs are notorious for forcing traffic to their cheapest locations, which they can do because they're pretty opaque. We probably couldn't get away with that even if we wanted to.

> Frequently? Are these server-routers running in more locations than data centers that run apps?

We run routers + apps in all the regions we're in, but it's somewhat common to see apps with VMs in, say, 3 regions. This happens when they don't get enough traffic to run in every region (based on the scaling settings), or occasionally when they have _so much_ traffic in a few regions all their VMs get migrated there.

> Interesting, and if you're okay sharing more-- is it that the anycast setup and routing that took time, or figuring out networking wrt the app/containers?

Anycast was a giant pain to get going right, then Wireguard + backhaul were tricky (we use a tool called autowire to maintain wireguard settings across all the servers). The actual container networking was pretty simple since we started with ipv6. When you have more IP addresses than atoms in the universe you can be a little inefficient with them. :)

(Also I owe you an email, I will absolutely respond to you and I'm sorry it's taken so long)


> Wireguard + backhaul were tricky (we use a tool called autowire to maintain wireguard settings across all the servers).

I'm guessing that's? https://github.com/geniousphp/autowire

Looks like it uses consul - is there a separate wireguard net for consul, or does consul run over the Internet directly?


Consul runs over a different connections with mutual TLS auth. That's the project we use!


Any chance you have more details on GP's question about the tech basis of the router (ebpf, dpdk)? I didn't find this component among the OSS in the superfly org.


Doh, missed that. We're not doing eBPF it's just user land TCP proxying right now. This will likely change, right now it's fast enough but as we get bigger I think we'll have more time to really tighten up some of this stuff.


Let's Encrypt was introduced so that money will never be a barrier/excuse for https. In these days it must be a default feature. Pocketing half for something generated for free is not a good signal for me. Yes other half is donated but, isn't that supposed to be optional? Let's Encrypt is supported by huge organizations. And on fly.io, The customer is already paying for compute.


This is a reasonable take. Let’s Encrypt is amazing and we don’t want to diminish their importance at all.

We charge for certificates because the infrastructure to make SSL work (even when the certificates themselves are free) is complicated.

Managing certificate creation can be tricky, we have to deal with all kinds of edge cases (like mismatched A and AAAA records breaking validation). We also generate both RSA and ECDSA certificates, have infrastructure for ALPN validation, and a whole setup for DNS challenges.

And then we have to actually use them. We run a global Vault cluster to store certificates securely, and then cache them in memory in each of our router processes.

The developers who use the certificates the most love paying us to manage certs, and one person who posted in the comments here was able to replace an entire Kubernetes cluster they were using to manage certificates for their customers.

When Let’s Encrypt invalidated millions of certificates a few weeks ago, none of our customers even noticed. That’s what they’re paying us for.


Sarah from Let's Encrypt here. We certainly understand the infrastructure and engineering costs associated with managing TLS/SSL. Fly.io has given back for years to help make our work possible and we appreciate that!


This is a great answer and imo should go in your FAQ [0] because charging for let's-encrypt certs does come off as disingenuous especially when AWS, Netlify, and Zeit and other services offer to do so, for free, despite them having to maintain a PKI which isn't exactly a walk in the park (like you point out).

[0] You are missing a FAQs page.


Good call. We actually put up a blog post with some answers: https://fly.io/blog/fly-answers-questions/


The max monthly spend is awesome, I'l probably try it out just because of that :) Its a bit unclear to me though how exactly the heroku deploy works? Is it basically a replacement for the web dynos that heroku provides, but then still connecting to existing postgres instances for example? What are the limits for the redis store? I'm using it on heroku but constantly running into max connection issues, if you can improve that experience it is also a great win.


Glad to hear :)

You're exactly right about the Heroku deploy. We convert your app's slug to a Docker image and launch the web process in it. DB & other dynos still run on Heroku.

We don't have any hard connection limits on the redis cache. It's usually not an issue anyway since apps are often distributed across many regions and many redis servers.


you might be running into max connection issues because your library is leaving stale connections. i know this is the case for ruby & php. check if you have timeout set. see this github issue for ruby: https://github.com/redis/redis-rb/issues/524


I have the timeout set to 10 seconds which certainly helps, I think the issue is how gunicorn/gevent handle web requests. I think each request spawns a new redis connection, and as far as I can see there is no global pool I can use :( On heroku you are limited to 20 connections in the free tier, and it quickly gets expensive.


Oh yes... I remember bitterly upgrading to a larger size just so >20 goroutines could use the same ~1mb of cached data.


Have been building a similar project in the past few months called Valar (https://valar.dev, it uses gVisor instead of Firecracker and is still in private beta), but I prioritized university studies so it took too much of my time to actually release it publicly. Great to see a similar product being released, really looking forward to test it. Best of luck to you!


How are you liking gVisor? We started with Firecracker and only spent a little time with gVisor, but it seems really nice.


I discovered it when looking into runtimes for my Bachelor's thesis. So far it's serving me quite well, especially after they reworked the Sentry file system abstraction (when I started file access was horribly slow). Networking works very well although they reimplemented it themselves. It also allows me to do base image layering using Overlay since I only keep binaries/source code/assets after a successful build.


Overlay is a nice feature, we had to give that up with Firecracker. We pre-optimize filesystems instead, tgz them, and then cache them in various regions. Boot times are _insanely_ good, which we like, although a lot of apps (especially node apps) are slow to start.


> ...which we like, although a lot of apps (especially node apps) are slow to start.

Surprising since NodeJS routinely comes up as the fastest runtime in Lambda benchmarks, especially for cold-starts: https://levelup.gitconnected.com/aws-lambda-cold-start-langu...


Hi, this is a very cool product and I am planning a project that I think could use it.

The question I have tho: How do you take advantage of the gains from this if you still need one master strictly consistent db for writes?

Would a system design pattern to take advantage of fly.io be to have read only replicas on each geographic deploy or to only have region specific persistance? Apologies if this was already answered I read thru everything I saw. Thanks!


The "simplest" gains come from adding an in memory cache, we include Redis for this and some apps work really well just leaving the DB where it is, caching aggressively, and running close to users: https://fly.io/docs/redis/

Read only replicas are a great first step for most applications. I'd probably do caching first, then replicas (which are kind of like caching).

Region specific persistence is one way to improve write latency, and I think the simplest for most apps. We've experimented with CockroachDB for this (it keeps ranges of rows where they're most busy), and you can actually deploy MongoDB this way.


Thanks that totally makes sense! I look forward to playing around with this.


PM@cockroach labs here. Which tools did you experiment with? We've been working to increase our tooling capabilities!


We got hung up with the migration tooling for popular frameworks. If we can get those migrations to work with minimal drama, we want to basically show people “global full stack” with app + cache + database.


Just switched from Heroku in just a few clicks and now I am running in 5 regions with auto-scaling for free :)


Awesome to hear!


Actually, one more question...do you guys scale compute and data layers separately, or are they tightly coupled within the same container?

I was looking at containerized PostgreSQL on AWS because I want to colocate a job scheduling tool (pg_cron) with the database process, but RDS doesn't support that extension. Apparently (or at least I hope), ecs-cli compose supports docker volumes through EBS, which is the same base as EKS persistent volumes. There's next to no information for ECS + EBS though, everybody uses EC2 or full on EKS.

I was just thinking, if you needed to handle excessive read load on small quantities of data, having separate data layers would enable you to autoscale db instances while still having the same volumes, instead of using an entirely separate caching layer which could introduce bugs and increase maintenance overhead. If you guys had native HA with docker exec access and passed savings to consumers that would be huge for me and my use cases.


I’m experimenting with this, they have Redis at every edge, with a way (SELECT 2) to send commands to all edges with eventual consistency. No RDBMS yet, said they’re looking at CockroachDB.

I’m running a single central Postgres server on Heroku and planning to use the Redis edges to cache.


Right now we're best suited for app servers, databases won't (yet) run very well on fly.io. We are trying really hard to focus on what we have because it's so valuable but we love DBs so much we might end up trying to "solve" them soon.


But your most valuable customers will need to interact with an app server plus database for any real life use case. Can you share some applications where only placing the app server close to user works? Is the database back in Virginia?


You are mostly right, there are a surprising number of problems that don't need much database interaction. Lots of image generation, video workloads, game servers, etc.

One of the things we want to do, though, is make "boring" apps really fast. My heuristic for this is "can you put a Rails app on fly.io without a rewrite?".

Many of these applications add a caching layer. Normally if someone wants to make a Rails app fast, they'll start by minimizing database round trips and cache views or model data. If somone has already done this work, fly.io might just work for this app since we have a global Redis service (https://fly.io/docs/redis/).

We have experimented with using CockroachDB in place of Postgres to get us even farther, but it doesn't work with most frameworks' migration tools.

We're also thinking of running fast-to-boot read replicas for Postgres, so people could leave their DB in Virginia but bring up replicas alongside their app servers.

If you've seen anyone do anything clever to "globalize" their database we're all ears.


I’m extremely impressed with how slick your Heroku integration is. We thought about moving over to render but the dev ux just isn’t there like Heroku. I would be fine with paying for a read replica on the west coast that was always running if you can make it as easy the rest of your Heroku integration.


I've seen https://macrometa.co take a stab at an edge database, but their guarantees (consistency / correctness) don't really infuse any sort of confidence in me [0]. https://yugabyte.com is another global scale database that competes squarely with cockroach-db, though I haven't used either.

Cloudflare Workers KV has the simplest model, with a central-db that transparently and eventually only replicates read-only, hot-data specific to a DC but writes continue to incur heavy penalty in terms of operations-per-second, cost, and latency.

In our production setup, we back Workers KV with a single-region, source-of-truth DynamoDB [1] and employ DynamoDB Streams to push data to Workers KV [2], that is,

Writes (control-plane): clients -> (graphql) DynamoDB -> Streams -> Workers KV

Reads (data-plane): clients -> Workers KV

Reads (control-plane): clients -> (graphql) DynamoDB

[0] https://news.ycombinator.com/item?id=19307122

[1] We really should switch to QLDB once it supports Triggers.

[2] We do so mainly because we do not to be locked-down to Workers KV, especially at its very nascent stage.


Hi Ignoramus - founder and CEO of Macrometa here - regret that our first attempt at explaining our consistency model caused confusion last year. Here's a link to the research paper that describes our architecture and consistency model.

https://bit.ly/HPTS-Macrometa

We got accepted in High Performance Transaction systems last year for the innovations around CRDTs for strong eventual consistency (SEC) with low read and write latencies.

Im trying to figure out how to provide a simple light weight way for fly.io users to use our global DB in their apps. It would allow a full stack to run at the edge with the compute on fly.io and the data on Macrometa either directly on fly.io or a nearby PoP (same city). Will update


Fair enough! I signed up, looking forward to DBs on Fly.io!

(Also I got permission denied when attempting to curl the script when writing to /usr/local/bin, I needed sudo. I'm on Ubuntu 19.10 Eoan Ermine. Not sure whether security implications for `curl | sh` outweigh convenience, but I trust you guys and my connection. :P)


Heh curl to sudo slippery slope :P

The script is just picking the binary for your OS/arch and putting in PATH. We have instructions for doing it yourself here https://fly.io/docs/getting-started/installing-flyctl/#comma...

Or you can download straight from github: https://github.com/superfly/flyctl/releases

Hopefully we can get on snap soon!


Congrats on the launch! If you're looking at CockroachDB (looking at some other comment on this thread), you should reach out to us if you haven't already. Your design here seems like the exact kind of thing we're hoping to push towards.


I'm trying this out now.

One question: when I ran "flyctl deploy" it said "Docker daemon available, performing local build..."

If I turn off my local Docker, would it instead just upload the Dockerfile somewhere and perform the build for me?

If so, is there a way to force it to do that? I'd much rather upload a few hundred bytes of Dockerfile than build and push 100s of MBs of compiled image from my laptop.


That's exactly what happens when you disable Docker. We default to local builds because it's a little more secure. It's usually faster (our remote builder isn't great at caching layers yet).

It would make sense to be able to force that. Right now you'd have to stop Docker.

(Also I'm a huge fan of Django)


good suggestion, I made an issue :) https://github.com/superfly/flyctl/issues/80


... and they just shipped it as a feature! Impressive turnaround.


Meta Comment: Why are such posts (to HN users) written in grey text on a matching background? So hard to read, and I'm not even color blind. Please make it normal black text, ffs; there is no reason for this abysmal color scheme. Every time I open one of these posts, my initial impression is that it has been downvoted into oblivion, because that is what the color represents.


Congratulations on launching! Would it make sense to think of this like Cloudflare Workers but for any application running in a Docker container? Are there any restrictions on outbound connections?


There is some overlap with CloudFlare Workers. The big, practical difference is that you can put apps on fly.io that abuse CPUs, write to disk, accept TCP traffic, etc.

Most people we've worked with want to run run apps they've already written (or open source like https://github.com/h2non/imaginary).


It's closer to Google's Cloud Run, Heroku, or Fargate since we don't constrain you to a framework and the restrictions that come with it.


Well I just signed up and tested one of my Heroku apps, got it working with just a few clicks, really impressive. Are there any way to see stats like memory usage? What happens if an app goes over the limit? (heroku has some swap space before it kills an app, and then I can see it in the logs)


That's great to hear! Apps are allowed to burst over the limit if the host has free resources. We don't offer much visibility into metrics yet but we're working on it. All our metrics are going into prometheus with some awesome dashboards in grafana that we can't wait to expose to customers.


Hi, cool service!

I have some questions about the pricing.

Say I want to use micro-1x with hard_limit/soft_limit = 20 and I get 40 concurrent request for one hour, would it cost $2.67 (micro-1 price) * 2? ($5.34) (monthly cost) If that is the case, can I set a limit on how many instances I want to run at most?

Another question: is the price calculated per second or is it there just to compare it with other services? If it's per second, since you don't fully scale to zero, should I consider having always at least one vm active full time?


That's correct, though we bill per second and scale back down when your app is over provisioned. Right now we always run 1 instance so your app responds right away after a period of inactivity. We might offer scaling to zero at some point too. You can also configure min/max count globally and per region.

It seems like no two apps have the same scaling needs, so if you have any questions or can't make something work let us know and we'll help!


It looks like the pricing is actually seconds base, so you would have 2 * 3600s * 0.000001015$/s = 0.007308$


I'm curious about how you turn the Docker image into a root filesystem for the micro-VM. If you're willing to share more about this, are you using an existing tool such as LinuxKit or Packer, or did you write your own?


We built something into our registry that squashes layers of an image into a compressed rootfs archive. Edge nodes map an image back to one of these files when launching an app. This cut launch time for large images in remote regions from tens of seconds to a few hundred ms. Much of that infrastructure is actually running on fly itself!


What are your thoughts on information-centric networking [1,2]?

You seem to be addressing the same problems.

[1] https://en.wikipedia.org/wiki/Information-centric_networking [2] https://irtf.org/icnrg


I think we fit that category. We definitely have features/philosophies that match the Wikipedia article!


Great introduction to the technology! I rarely see such a clear explanation of how new products work.


This is my favorite Hacker News comment of all time.


We've been using fly.io at Draftbit for the last couple of months and have nothing but amazing things to say about the platform, Kurt and the team!

I've gotten several friends to switch and they've all said the same thing. If you haven't given it a shot yet, there's a simple 1 click Heroku to Fly deployment you can use to give them a shot.


Do you guys plan to add an offering for a distributed database that we could match with this compute? That way we could build entire application on edge without buying into the FaaS movement.


That's something we're certainly thinking about, but we need to get compute right first!


> That way we could build entire application on edge without buying into the FaaS movement

So you want to be able to upload your code and have someone manage the infrastructure and datastore.

Isn't that the definition of FaaS?


My definition of FaaS is each controller living its own life and deployed independently behind an api gateway. This is a PaaS because only the infrastructure is managed and I can deploy my whole application in one docker image.


That’s a pretty good definition of FaaS.


Partially, and there's plenty of hosts that offer it. We're trying to make things like full stack Rails monoliths at the edge possible.


That sounds like the definition of “utopia” to me.


Pretty nice to see Rust being used for network performance code. Do you have any learning using it to share? Would you rather use C++ if you'd have to do it again? Do you feel more confident in your code? Do you feel it was slower or quicker to write code compared to C++


We attempted C++ about a year ago, but I was never confident in our ability to clean up memory allocations (we had leaks) or avoid undefined behavior (we had segfaults).

I definitely feel more confident about our Rust code. It's no silver bullet, but it prevents a lot of unsoundness with its compile-time guarantees.

I can't really compare to C++, but it's easy to write new code or refactor old code. It took some time to get there, though.

All in all, I would recommend Rust wholeheartedly. The ecosystem is growing and getting more mature every week. The community is very helpful in general, especially the tokio folks.


Glad to hear you all are doing great with Rust! :D


Thanks for sharing! It's nice to hear about this learning. Also, wasn't aware of tokio, it looks real nice.


This is great @mrkurt

Any plans to launch any Datacenters in India? could not find any here - https://fly.io/docs/regions/#welcome-message


No timeline yet but we're eager to get into India and South America when we can


India is the next cloud hub. AWS and Azure are earning hands over fist here.


We are definitely missing out on customers because we aren't in India and South America. It's expensive to solve that problem but we're getting close!


Thanks! This is a very cool tool. Wish you all the best.


This is an awesome idea and it looks like you guys are executing on it beautifully. Congratulations!

Out of curiosity, what kind of customers/teams are asking you for gRPC support? Is this coming from your enterprise customers or from smaller teams?


We actually noticed people asking about gRPC on Hacker News, especially on Cloud Run posts:

https://news.ycombinator.com/item?id=19612577

It's usually small teams, individual devs who want gRPC. Even if it's within a large company, it's almost always one technical person.


Neat idea, but what about my database? Where does that live?

Do you do anything to speed up latency from the edge to the database?


from another thread: We're not solving db latency yet. A good place to start is aggressively caching at the edge. We offer an in-memory redis cache for this that can replicate commands globally. Beyond that you'd need read replicas which will be possible once we launch persistent storage. That said, latency between data centers on the same continent is often less than I would have thought!

You should also checkout something like FaunaDB.


This is the pendulum swinging back to the "5 colos scattered around the globe" architecture.


Very impressive work. Just curious if you are worried about the big guys (AWS, GCP, etc) coming in and offering a similar service? If so, are you hoping DX will help act as moat?


That's always a concern, but not something we focus on. DX is a big deal. That's why Heroku and CF Workers are so popular despite everything the big guys offer.


Awesome work, congrats!

Do you consider zeit.co or netlify competitors? I saw in a comment that you're interested in making it dead easy to deploy a simple Rails app to the edge. These companies have gone deep on a different segment that's deploying web apps without DBs. Is your roadmap kind of routing around the JAMstack crowd, straight to supporting traditional full-stack apps? Seems much harder, but a more valuable prize if so.


Thanks! We don't consider Zeit or Netlify competitors. We're a level lower than them -- you could actually run those things on top of fly!

As you said, they're both going deep for JavaScript apps (and doing an awesome job at it!) and we're focusing on being an awesome place to run full stack apps and exotic (non-http) servers.


How does this compare to StackPath? https://www.stackpath.com


We use StackPath in a similar capacity to the one being pitched here. They support both containers and VMs, and have a very good spread of locations. Their heritage is a CDN - they were formerly MaxCDN and Highwinds - so their connectivity to eyeball networks is excellent.

We had some minor reliability issues with the edge platform early on, but their support and responsiveness was excellent.

We are very happy with them!


Fascinating product! I was wondering if there are any other products that are similar to yours? Other than serverless platforms such as Cloudlfare Workers or AWS Lambda.

I know Stackpath has been offering this kind of thing for a while. So how would your product compare to theirs? Since Stackpath has a well established cdn network already.


All of the clouds have a service to run containers (GCP Cloud Run, Azure Container Instance, AWS Fargate) but you have to deploy to the different regions separately and use their global load balancers or an external CDN to manage traffic.

Stackpath is an amalgamation of many acquired companies. Their CDN is fine but nothing special. The computing services aren't great. Not very competitive on price and have reliability and latency issues. Their cloud storage is white-labeled Wasabi. I wouldn't recommend them as the first choice for anything.

Zeit Now version 1 was also a run your own container runtime but that has been deprecated: https://zeit.co/docs/v1/getting-started/deployment#docker-de...


Adding onto my last comment, is Stackpath container pricing cheaper than yours at the moment?

I understand this might be due to Stackpath being a larger company and owning hardware instead of renting it. But the price for traffic and compute seem to be cheaper. There is also no mention of how much you charge for storage on the pricing page.

I’m looking to deploy my next app onto one of these platforms and would like to know the price differences!


Stackpath is cheaper on bandwidth, mostly because of scale. For video based applications, this is a big deal (and we work with video companies to try and get better bandwidth pricing). For typical web app servers, bandwidth is usually not a very large expense.

CPU based pricing is pretty close. We've heard our CPUs are higher performance, but haven't done any real testing. The people who run high CPU apps on us _tend_ to pay less because we scale up and down so quickly.


Thank you for your reply. Your deployment seems to be a lot more streamlined and simpler than Stackpath at the moment. That just might be the deciding factor for me and other developers!

I wish you and your team the best of luck!


Just a feedback: the landing page doesn't explain at all what's the difference with AWS, Google Cloud, Azure, etc.

I had to read these parts in the doc to get how Fly solves the problem differently:

> Think of a world where your application appears in the location where it is needed. When a user connects to a Fly application, the system determines the nearest location for the lowest latency and starts the application there.

> Compare those Fly features with a traditional non-Fly global cloud. There you create instances of your application at every location where you want low latency. Then, as demand grows, you'll end up scaling-up each location because scaling down is tricky and would involve a lot of automation. The likelihood is that you'll end up paying 24/7 for that scaled-up cloud version. Repeat that at every location and it’s a lot to pay for.


Thanks for the feedback. The landing page is very much a work in progress!


Honestly, while I'd love to try this out, I'm afraid of committing to a solution that might not be around long-term, which for me at least overrides concerns of peace of mind and ease of use, and I'm doing a hobby project at the moment.

I'm using bare AWS at the moment because a) they gave me $5k in credits for YC SUS, b) they own the physical servers, and c) I can trust that they'll be around a long time, so I'd rather get locked into AWS proper rather than a service that might be built on top of AWS (e.g. CloudFormation vs. Terraform).

But I get, better than I did two months ago, just how freaking hard it is to build something, anything. This is amazing work, and I couldn't do it. Kudos to you, and I look forward to hearing about your amazing success!


We're not AWS, but: 1) we own physical servers and 2) we're profitable 3) have big public companies as customers.


Weirdly I feel like the fact that you are profitable and have large customers could be as important a part of your customer pitch as it is your investor pitch - even if the tech is awesome that trust in the organization to stick around is super important (just look at the trouble around stadia haha). It's like those fintech companies that have "but how do we make money?" To make sure customers know "ok they can make money without screwing me over or ending up having to shut down"


Oh woah, profitable and own physical servers? You guys are gonna be just fine :)


Congrats! I can see a lot of use cases for picking just a small subset of cities/regions and skipping Google/Amazon/Azure altogether.


> 1) we own physical servers and 2) we're profitable 3) have big public companies as customers

Ironically, these also make you a prime acquisition target (because the product idea rocks), which renders your long-term future unclear.


I was at a company that got acquired before. It was so awful. I'd rather just work on this forever than get absorbed by a big company.


> I was at a company that got acquired before. It was so awful. I'd rather just work on this forever than get absorbed by a big company.

How do your investors feel about this / what's your exit plan?


We feel great about Fly.io :) (I am their group partner at YC)


Could you elaborate? :) Because otherwise, you're possibly sending the same message as what I alluded to (i.e., we feel great because we're expecting a big exit).


Well, whoever buys them would like to keep the customers, as it is already a profitable business. So, even if they completely close the business, they should provide a way to migrate, as nobody likes to loose money/customers.


Physical servers in Vultr right? I believe your geo regions are the same.


So far only Packet, but we'll be expanding soon

edit: those regions are the same because it's the easiest set of cities to roll this out in :)


"I won't use anything from smaller or new companies or Google in case it goes away". That's silly, how is this the top comment? Are people that conservative in tech of all places? This is a docker runtime. If it goes away you'll be up and running again without delay on AWS or anywhere else, you just won't have the edge performance characteristics.


Yes, I'm very conservative. I'm in tech because I want to compound my achievements over time, and I can't do that if I fear the ground shifting underneath my feet. Hence Bezo's mantra "focus on the things that don't change". I hate O(N) efforts, I prefer O(N log N) efforts or better.

Enterprise workloads are far more conservative than I am, those guys spend decades running the same servers. It's why they can focus on sales and customer success and rake in money, which is what actually puts food on the table for their kids.


Depending on the region your infrastructure is located in, AWS doesn't own the datacenter.

For example in the Paris (eu-west-3) region, their availability zones are operated on hardware owned and managed by Telehouse, Interxion and Equinix.


Huh, interesting, I didn't know this. Thanks so much for sharing! Do you have a link? I searched and found all three companies, but I only see links to AWS Direct Connect and hybrid cloud solutions.

I just assumed if they're creating their own chips, they're probably creating their own servers, datacenters, networks, etc. but I guess I shouldn't jump to conclusions.


Datacenters are weird. They're basically a real estate market, check out Digital Realty Trust: https://en.wikipedia.org/wiki/Digital_Realty

Networks are crazy too, especially between continents, ownership of undersea cables is fascinating: https://en.wikipedia.org/wiki/Submarine_communications_cable


https://www.interxion.com/sites/default/files/2020-01/paris-...

I'm guessing it's likely that this solution was adopted to quickly expand to a lot of countries due to data location and privacy questions being raised.


Is it already possible or do you guys plan to add support for attaching persistent storage to applications deployed with fly?

I am building a search engine and this would let me derive your performance benefits using region-scoped databases and search indices.


That's awesome, your use case is perfect for region scoped persistent storage.

We're testing persistent storage privately with a few customers now and the results are exciting. My favorite is using minio as a private global s3 for caching.

What are you using for the index storage engine?


I would be really interested in Fly if it abstracted persistent storage. An exciting use case to me would be something that doesn’t with “...for caching.”


We're on it!


elasticsearch


I vaguely remember that fly.io used to have a platform based on v8 isolates, I’m guessing the new platform is a bit of a pivot? I’m curious, was it just to support more platforms, or we’re there technical challenges with isolates?


It's a bit of a pivot but still in the same space. Here's a few reasons why we landed on this:

- Our JS apps require customers to write new code to solve problems. That’s a tough sell for companies with existing code they need to make fast.

- The more people used JS apps the more unexpected things they wanted to do. Like tensorflow at the edge, audio and video encoding, game servers, etc. No way we could support any of that without moving down the stack.

- Reimplementing the service worker api was a slog we didn’t want to continue. Deno is fantastic and we’d rather just run those apps than compete with it


Neat idea and the ease of use is certainly attractive! I have a question though...

Couldn't one simply use a traditional CDN where ever their customers are which would then allow the inbound network requests to jump on private interface routing to where ever the app truly lives quicker - essentially making for a more responsive "business logic" app feel? If all infrastructure was on the same cloud provider, say AWS.

I understand this approach is less dynamic in nature but would have been a solution for the Ars Technica problem presented I feel. If not, what am I missing? Thanks!


That's actually a good, generic way to speed up the initial app connections. You can even do it with a CDN in front of a backend on a different network, since most CDNs pool connections.

The problem is that everything useful a server side application does still requires round trips. Even for the most boring content, an 800ms delay is pretty normal if you have a spread out audience.


We are happy Fly customers! Amazing service and team. Congrats on the launch.


This is awesome work. What I'd like to see is some support for higher-level structures, i.e. autoscaling, stateful sets, service to service communication, persistent volumes, etc.


I would like to see that too. Hopefully we can keep working on fly.io and ultimately do all those things.


Does anyone know what did they use to build the docs? They look amazing.

https://fly.io/docs/


Middleman and a fantastic designer + writer!

We've gotten so many comments about the docs, I wish we could open source something but it's tightly coupled to other things that aren't useful to anyone else.


Inspecting the source , it seems it's https://docsearch.algolia.com/


Algolia is great, we use it for search.


Do you scale-to-zero - shut down my containers if they aren't getting any traffic and then start then up again on-demand when traffic starts flowing again?


We don't (yet) because we really loathe slow response times and booting peoples' apps is absurdly slow sometimes. Our #1 feature request for Firecracker is suspend / snapshot functionality so we can do that.

Instead, we give everyone $10/mo of credits and have a really tiny VM that you can run full time for $2.67/mo.


Very cool.

I see that you're not advertising storage (yet).

Any suggestions for someone like me who wants to deploy a service which is itself the storage layer (and needs persistent disk)?


Wouldn't this drastically increase database latency?


Yes, but not if you can afford to cache reads, either in-memory, locally on-disk, or in a distributed redis-cluster they offer out-of-the-box. See: https://news.ycombinator.com/item?id=22619275

They plan to add many more capabilities wrt db: https://news.ycombinator.com/item?id=22619613

Writes... remain expensive.


This is so awesome at many levels. Keep up the good work.


Just curious, how do you handle non-HTTP(S) traffic to applications? You said applications get dedicated IP addresses (I assume v4 and v6). Not having the ability to multiplex an IPv4 address (e.g. if I deploy an application that accepts connections on port 1337/TCP) will get expensive quite quickly, so I wonder how you solve this problem! :)


Do you mean it would get expensive to run multiple applications that listen on a single TCP port?

It _might_, if you need a bunch of ipv4 addresses it'll add up fast. But you could always put your own router app in place to accept that port on one IP, find the right ipv6, and forward connections along.


Love it, Turboku seems like it could be great if you can do this with db read replicas somehow too. There are key details of this product on your initial comment that aren't on your website or at least not that I see. Such as "As your app gets more traffic, we add VMs in the most popular locations."


This sounds great.

How suitable do you think this is for CPU-intensive work? I'm interested in having servers for scientific-computational work, which would be rather CPU-heavy. It would be great to offload some of this to a nearby browser for bits and pieces that desire low-latency.


I would think it would be better to have a bunch of servers in a single datacenter if low-latency between them is important. Fly.io sounds great for the case where you're willing to sacrifice low-latency between your own servers to get them to have low-latency with your users.


We have some customers doing CPU heavy tasks like image and video processing, but we're not specifically optimizing for that right now. If there's demand we might offer better processors or GPUs for those workloads, or maybe even spot pricing on idle nodes, but that's far off.


This is a really cool idea ! Congrats


I like the idea of an edge service that is accessible to everyone.

The list of cities looks pretty random to me. In particular I am not seeing anything in the Northeast, New York, etc. In upstate I already have 30ms latency to AWS and Azure in Ohio without terrible tail latency.


The city list does look random, but it's actually the simplest cities to build out with physical servers + anycast.

We _tend_ to do better than AWS on latency to your apps, and from upstate New York you'd probably be connecting to New Jersey. I would be Virginia is quicker than Ohio for you most of the time too.


I have timed Virginia and Ohio and Ohio is 20 ms faster.

I discovered this earlier when I was playing Titanfall and noticed a much lower ping to their Azure data center in Ohio. I confirmed it by setting up my own host in Azure.

I was thinking of switching to Azure but pretty soon AWS opened us-east-2 and I moved my stuff there.


Huh that's really interesting.

I just checked one of the performance tools we use a lot and it's <3ms to connect to fly.io New Jersey from NYC. It's not the best test because datacenter-to-datacenter behaves differently than consumer internet and NYC isn't upstate New York. If you feel like testing I'm curious what you see to https://flyio-ui.fly.dev


There's one in the Newark area:

`ewr Parsippany, NJ (US)`

from https://fly.io/docs/regions/


Thanks for pointing that out! That list is pretty disorganized... I made an issue to clean it up https://github.com/superfly/flyctl/issues/79


Are these airport names? Just clicked for me. Is this typical across the industry? I think I remember Rackspace also using ord and iad...


Yes! It's really common to use airport codes for regions. There are some really strange ones, too.

Someday I want to have a datacenter in San Carlos so we can have a SQL region.


Hey this sounds a lot like Cloudlets / edge computing! Really happy to see some real world applications out there: http://elijah.cs.cmu.edu/


Sorry for wet-blanket (and possibly dumb and obvious!) question, but is this the kind of thing that AWS/GC/Azure are just going to implement their own version of if it turns out to be popular?


At some level of popular, probably. I think we have a lot of time before we're big enough that they get excited, and I pretty firmly believe that a developer ux is compelling.


Curious - what are you using to power/style your docs website?


Predominantly, middleman, markdown and a splendid web designer.


Had a look at the network pricing. I wouldn't host anything there. Takes "the cloud is expensive" to a whole other level.


I usually balk at pricing tables too, but real world has shown to be different. One of our big customers saved $130K/mo by switching to us!


How do you currently host your Docker container at edge nodes?

(It's a facetious question: unless you're a $100B company you're not doing anything of the sort.)


Why would I care? I have <30 ms ping to AWS. Maybe there's an argument for this for something like realtime-ish like games, but this feel likes a massive premature optimization for anything solving typical business problems.

That's not meant to be a snarky question, I genuinely don't understand what business problem that's going to be solved by saving at most 30 ms. Anything written in Rails/Django, talking to a DB, etc. is going to have request latency dominated by other parts of the stack.


We have some benchmarks comparing us to Heroku (on AWS) and the performance gains from faster networking alone are nothing to sneeze at: https://fly.io/blog/turboku/


If the value prop is that you're Heroku, but faster, I can understand that.

I think it's misleading to say that deficiencies of Heroku have anything do to with AWS though. It's really, really easy to set up anycast [https://aws.amazon.com/global-accelerator/] with ECS [especially if you're willing to pay for Fargate]. If your product does something meaningfully different than that, I'd love to know more.

NB: I'm in no way affiliated with AWS or Heroku, just have experience with both in the past.


> Takes "the cloud is expensive" to a whole other level.

What do you mean by this? It seems pretty much on par with the expensive cloud pricing of the big players.


I understand maybe half of what you said, but darn does it sound cool and super useful. Can't wait to see how it does!


I can't see where your server locations are, it'd be great to see that. I have some region specific needs.



It would be helpful to have it more prominently displayed, after landing and learning a bit about the product the locations is the second thing you want to look at.


Did y'all rack the servers that the containers run on? Or is it VMs running the containers? Just curious :)


We got someone else to rack them. Customer apps are VMs on top of the physical servers we lease. At scale we'll build our own datacenters, but at scale we can get not-me to do that.


>Deploy app servers close to your users

Except if they're in North Asia or Africa


We're going to expand there as soon as we can


So it’s google cloud run that scales to many servers?


(Cloud Run PM here)

Cloud Run automatically scales your container image to thousands of container instances ("servers") if needed, maybe you mean "scales to many regions"?


FWIW, I love CloudRun and when I went to try and deploy Docker images in different places to see how we stacked up, it was the best of the big cloud offerings. I'm not sure any AWS PMs have ever even used Fargate ...


Yes that is what I meant. To regions. Love cloud run just wish it would scale to many regions and if it could scale to CDN pops then it’s a game changer!


Do you have plans to add Terraform support?


Not at the moment but it's certainly doable. Our CLI is written in go (github.com/superfly/flyctl) and could be ran as a go-plugin for terraform without rewriting the whole thing. I'd like that :)


Nice! This would be really awesome.


How is this better than serverless?


I'd say this is serverless. You're not managing servers, just pushing an app container.


Docker? No thanks.


We're not running docker or making you run it. We simply use docker images (aka OCI image format https://github.com/opencontainers/image-spec) as the packaging format for your application code and dependencies.


Tried deploying a simple Heroku echo app and put it through KeyCDN performance test... The fly.io instances actually perform worse in terms of TTFB compared to a single Heroku deployment in AMS. Here are some stats. https://i.imgur.com/Qt1p29G.png


That’s unusual, will you email me or support at fly.io so I can have a look?

We use that KeyCDN test pretty frequently with different results.

Those TLS handshake times aren’t great, I think that was probably the first load from Vault on certificates. You should see most handshakes at <30ms on there.


[flagged]


This is about an unrelated company. See downthread: https://news.ycombinator.com/item?id=22626922


(Disclaimer: I am a Fly.io founder)

I don't recall us being at Hack Arizona, certainly not me. I googled it and all it yielded was this HN post.

Your comment couldn't be further from the truth. I can't speak for whoever used these words (if they did), but I think we have pretty great work/life balance.

We all have families of our own and recognize they are far more important than our business. These things happen, such is life. Your kid gets sick, you want to care for them. Time off is always paid and we encourage people to take some. People often find it hard to take time off, but we've been good at it.

Nobody, generally, works more than 40 hours a week. I say "generally" because these past few weeks have been more intense given the end of our YC adventure, demo day, virtually meeting with investors and this Launch HN post. In normal times, I might work a few hours on a weekend but only if that brings me joy.

... and of course we're very flexible on work schedules because we're a remote-only company. Some weeks this might mean working only a few hours here and there because of life activities or the need to take time off. Other weeks, it might be the opposite. We recognize and embrace that.


Hey! I want to formally apologize -- the company I heard presenting had a name very, very similar to yours. Definitely was not the same company. Unfortunately it's past the 2 hour mark to delete comments on HN, but consider this my retraction of what I said above. Really sorry about the mix up, and what you have going here seems very impressive. Definitely seems like a fantastic attitude towards workers' health and happiness.


I've reopened your comment for editing if you want to edit it.


I don't know if that makes a company to avoid, but that's certainly a good point of awareness to raise.



As a developer by trade, it was definitely raising a LOT of red flags for me. I don't blame you for taking it with a grain of salt, though.


> "we're not coworkers, we're family", "our developers love to work, they do it out of love", virtue signaling, etc, etc. The whole shebang.

It is a filter. If it keeps you away from them, the filter worked? Fwiw, a younger me would find this proposition very attractive.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: