Fly.io has GPUs now

k8svet · 2024-02-14T02:41:39 1707878499

Does it have basic functioning other stuff? I am shocked at how our production usage of Fly has gone. Even basic stuff as support not being able to just... look up internal platform issues. Cryptic/non-existent error messages. I'm not impressed. It feels like it's compelling to those scared of or ignorant of Kubernetes. I thought I was over Kubernetes, but Fly makes me miss it.

parhamn · 2024-02-14T07:43:36 1707896616

I was hoping to migrate to Fly.io and during my testing I found that simple deploys would drop connections for a few seconds during a deploy switch over. Try a `watch -n 2 curl <serviceipv4>` during a deploy to see for yourself (try any one of the the strategies documented including blue-green). I wonder how many people know this?

When I tested it I was hoping for at worst early termination of old connections with no dropped new connections and at best I expected them to gracefully wait for old connections to finish. But nope, just a full downtime switch over every time. But then when you think about the network topology described in their blog posts, you realize theres no way it could've been done correctly to begin with.

It's very rare for me to comment negatively on a service but that fact that this was the case paired with the way support acted like we were crazy when we sent video evidence of it definitely irked me for infrastructure company standards. Wouldn't recommend it outside of toy applications now.

> It feels like it's compelling to those scared of or ignorant of Kubernetes

I've written pretty large deployment systems for kubernetes. This isn't it. Theres a real space for heroku-like deploys done properly and no one is really doing it well (or at least without ridiculously thin or expensive compute resources)

asaddhamani · 2024-02-14T07:51:02 1707897062

Yeah I had a similar experience where I got builds frozen for a couple days, such that I was not able to release any updates. When I emailed their support, I got an auto-response asking me to post in the forum. Pretty much all hosts are expected to offer a ticket system even for their unmanaged services if its a problem on their side. I just moved over all my stuff to Render.com, it's more expensive, but its been reliable so far.

loloquwowndueo · 2024-02-14T11:25:57 1707909957

The first (pinned) post in the fly.io forum explains it:

https://community.fly.io/t/fly-io-support-community-vs-email...

malfist · 2024-02-14T14:56:25 1707922585

That forum post just says what OP said, that they will ignore all tickets from unnmanaged customers. Which is a pretty shitty thing to do to your customers.

yencabulator · 2024-02-16T01:19:44 1708046384

The cheapest plan that gets email support is nothing more than a commitment to spend a minimum of $29/mo on their services. That is, if you spend >=$29/mo, it costs nothing extra. Not what I'd call "managed".

sofixa · 2024-02-14T08:12:33 1707898353

> I've written pretty large deployment systems for kubernetes. This isn't it. Theres a real space for heroku-like deploys done properly and no one is really doing it well (or at least without ridiculously thin or expensive compute resources)

Have you tried Google Cloud Run(based on KNative) I've never used it in production, but on paper seems to fit the bill.

parhamn · 2024-02-14T08:45:06 1707900306

Yeah we're mostly hosted there now. The cpu/virtualization feels slow but I haven't had time to confirm (we had to offload super small ffmepg operations).

It's in a weird place between heroku and lambda. If your container has a bad startup time like one of our python services, autoscaling can't be used as latency becomes a pain. Its also common deploy services on there that need things like health checks (unlike functions which you assume are alive), this assumes at least 1 instance of sustained use as well, assuming you do minute health checks. Their domain mapping service is also really really bad and can take hours to issue a cert for a domain so you have to be very careful about putting a lb in front of it for hostname migrations.

I don't care right now but the fact that we're paying 5x in compute is starting to bother me a bit. A 8core 16gb 'node' is ~$500/month ($100 on DO) assuming you don't scale to zero (which you probably wont). Plus I'm pretty sure the 8 cores reported isn't a meaty 8 cores.

But its been pretty stable and nice to use otherwise!

jetbalsa · 2024-02-14T15:08:49 1707923329

A 6c / 12t Dedicated Server with 32GB of ram is 65$ a month with OVH

I do get that it is a bare server, but if you deploy even just bare containers to it, you would be saving a good bit of money and get better performance from it.

doctorpangloss · 2024-02-14T17:08:10 1707930490

Another interpretation is the so-called dedicated servers are too good to be true.

jrockway · 2024-02-14T17:57:39 1707933459

It depends on what the 6 cores are. Like I have a 8C/8T dedicated server sitting in my closet that costs $65 per the number of times you buy it. (Usually once.) The cores are not as fast as the highest-end Epyc cores, however ;)

ac29 · 2024-02-14T17:58:08 1707933488

At the $65/month level for an OVH dedicated server, you get a 6-core CPU from 2018 and a 500Mbps public network limit. Doesnt even seem like that good a deal.

There is also a $63/month option that is significantly worse.

seabrookmx · 2024-02-15T08:07:31 1707984451

We also run some small ffmpeg workloads and experimented with Cloud Run consuming Pub/sub via EventArc triggers. Since Cloud Run's opaque scaling is tied to http requests, EventArc uses a push subscription. In pub/sub these don't give you any knobs to turn regarding rate limiting/back pressure, so it basically tries to DoS your service and then backs off. This setup was basically impossible to tune or monitor properly.

Our solution was to migrate the service to Kubernetes using an HPA scaling on the number of un-acked messages in the subscription, and then use a pull subscription to ensure reliable delivery (if the service is down they just sit in the queue rather retrying indefinitely).

I'm convinced Cloud Run/Functions are only useful for trivial HTTP workloads at this point and I rarely consider them.

parhamn · 2024-02-15T20:33:43 1708029223

Thats very interesting. Thanks for sharing.

But sweet sweet github triggered deploys. Have you found an easy solution to this?

seabrookmx · 2024-02-17T01:54:05 1708134845

> easy solution to this

Triggered deploys to Kubernetes you mean? There's a million ways to solve this problem for better or worse. We use Gitlab CI so we invoke helm in our pipelines (I'm sure there's a way to do this with github actions), but there's also flux cd, argo, etc. etc.

We use Kubernetes (GKE) elsewhere so we already had this machinery in place luckily. I can see the appeal of CloudRun/Functions as a way to avoid taking that plunge

dig1 · 2024-02-14T08:58:20 1707901100

I have yet to gain positive experience with Cloud Run. I have one project with it, and Cloud Run is very unpredictable with autoscaling. Sometimes, it can start spinning up/down containers without any apparent reason, and after hunting Google support for months, they said it is an "expected behavior". Good luck trying to debug this independently because you don't have access to knative logs.

Starting containers on Cloud Run is weirdly slow, and oh boy, how expensive that thing is. I'm getting the impression that pure VMs + Nomad would be a way better option.

sofixa · 2024-02-14T10:06:09 1707905169

> I'm getting the impression that pure VMs + Nomad would be a way better option

As a long time Nomad fan (disclaimer: now I work at HashiCorp), I would certainly agree. You lose some on the maintenance side because there's stuff for you to deal with that Google could abstract for you, but the added flexibility is probably worth it.

parhamn · 2024-02-14T09:41:00 1707903660

> Starting containers on Cloud Run is weirdly slow

What is this about? I assumed a highly throttled cpu or terrible disk performance. A python process that would start in 4 seconds locally could easily take 30 seconds there.

JoshTriplett · 2024-02-14T10:03:53 1707905033

Last I checked, Cloud Run isn't actually running real Linux, it's emulating Linux syscalls.

seabrookmx · 2024-02-15T07:57:42 1707983862

Cloud Run "gen2" runs a microvm (ala lambda) rather than gvisor, so it depends on your settings.

JoshTriplett · 2024-02-15T19:50:27 1708026627

Ah, good to know, thank you! I hadn't seen the announcement of the second generation environment.

jonatron · 2024-02-14T11:39:54 1707910794

I just use AWS EC2, load balancer, auto scaling groups. The user_data pulls and runs a docker image. To deploy I do an instance refresh which has no downtime. Obvious downside is more configuration than more managed services.

giovannibonetti · 2024-02-14T11:54:13 1707911653

I have been using Google Cloud Run in production for a few years and have had a very good experience. It has the fastest auto scaler I have ever seen, except only for FaaS, which are not a good option for client-facing web services.

davidspiess · 2024-02-14T18:05:26 1707933926

Same experience here, using it for years in production for our critical api services without issues.

seabrookmx · 2024-02-15T07:53:06 1707983586

Cloud Run is compatible with KNative YAML but actually runs on Borg under the hood, not Kubernetes. At least when not using the "Cloud Run on GKE" option via Anthos.

rollcat · 2024-02-14T08:23:03 1707898983

> Try a `watch -n 2 curl <serviceipv4>` during a deploy

You need blackbox HTTP monitoring right now, don't ever wait for your customer to tell you that your service is down.

I use Prometheus (&Grafana), but you can also get a hosted service like Pingdom or whatever.

xena · 2024-02-14T04:48:12 1707886092

Can you email the first two letters of my username at fly.io with more details? I'd love to find out what you've been having trouble with so I can help make the situation better any way I can. Thanks!

bongobingo1 · 2024-02-14T07:36:30 1707896190

Another support.flycombinator.com classic.

azinman2 · 2024-02-14T08:01:10 1707897670

Would you rather them be unresponsive?

lostemptations5 · 2024-02-14T08:40:07 1707900007

It's HN -- if the company proved responsive it might invalidate his OP and everyone who band wagons on it.

zmgsabst · 2024-02-14T08:01:45 1707897705

Why would you care about customer problems if they don’t embarrass you in public?

/s

keeganpoppen · 2024-02-14T08:16:50 1707898610

the only thing easier than them responding in this thread is someone making this comment in this thread…

rob · 2024-02-14T15:19:42 1707923982

[flagged]

joshi4 · 2024-02-14T17:44:16 1707932656

It seems to me that your comment is personally targeting OP and I think that is quite out of line.

throwaway220033 · 2024-02-14T12:11:13 1707912673

...as if it's one person who had issues! I thought it was just incompetency. But it now looks like a theatre, pretending now.

ignoramous · 2024-02-14T13:37:49 1707917869

I've been a paying Fly.io customer for 3 years now, and for the past 18 months, I've had no real issue with any of my apps. In fact, I don't even monitor our Fly.io servers any more than I monitor S3 buckets; the kind of zero devops I expect from it is already a reality.

> it's one person who had issues

Issues specific to an application or one particular account have to be addressed as special cases (like any NewCloud platform, Fly.io has its own idiosyncrasies). The first step anyway is figuring out just what you're dealing with (special v common failure).

> looks like a theatre

I have had the Fly.io CEO do customer service. Some may call it theatre, but this isn't uncommon for smaller upstarts, and indicative of their commitment, if anything.

pech0rin · 2024-02-14T06:42:43 1707892963

Yep they have terrible reliability and support. Couldn’t deploy for 2 days once and they actually told me to use another company. Unmanaged dbs masquerading as managed. Random downtime. I could go on but it’s not a production ready service and I moved off of it months ago.

benzible · 2024-02-14T15:30:14 1707924614

The header at the top of their Getting Started is "This Is Not Managed Postgres " [1]

and they have a managed offering [2] in private beta now...

> Supabase now offers their excellent managed Postgres service on Fly.io infrastructure. Provisioning Supabase via flyctl ensures secure, low-latency database access from applications hosted on Fly.io.

[1] https://fly.io/docs/postgres/getting-started/what-you-should...

[2] https://fly.io/docs/reference/supabase/

biorach · 2024-02-14T07:39:17 1707896357

> Unmanaged dbs masquerading as managed

Are you talking about fly postgres? Because I use it and feel they've been pretty clear that it's unmanaged.

andy_ppp · 2024-02-14T10:11:43 1707905503

Seriously! That's crazy. I need to setup terraform and move to AWS before launching I guess.

biorach · 2024-02-14T11:08:55 1707908935

> Seriously! That's crazy

huh? it does what it says on the tin. nothing crazy about it.

They spell out for you in detail what they offer: https://fly.io/docs/postgres/getting-started/what-you-should...

And suggest external providers if you need managed postgres: https://fly.io/docs/postgres/getting-started/what-you-should...

andy_ppp · 2024-02-14T11:25:59 1707909959

I was shocked because I didn't realise it wasn't managed. Even Digital Ocean offer managed Postgres.

If you are offering a service like Fly I think the database should be managed personally, the whole point of Fly.io is to provide abstractions to make production simpler.

Do you think the type of user who is using fly.io is interested in or capable of managing their own Postgres database? I'd rather just trust RDS or another provider.

corobo · 2024-02-14T12:29:20 1707913760

> Do you think the type of user who is using fly.io is interested in or capable of managing their own Postgres database?

Honestly.. kinda, yeah

At least I'm projecting my weird "I want to love you for some reason, Fly" plus my skillset onto anyone else that wants to love Fly too haha

They feel very developer/nerd/HN/tinkerer targeted

chillfox · 2024-02-15T01:53:19 1707961999

Yep, I never really trusted managed databases. It just feels like one of those things that's so important to your app that not having full control of it is weird.

morgante · 2024-02-14T09:02:20 1707901340

Unfortunately this is a pretty common story. Half the people I know who adopted Fly migrated off it.

I was very excited about Fly originally, and built an entire orchestrator on top of Fly machines—until they had a multi-day outage where it took days to even get a response.

Kubernetes can be complex, but at least that complexity is (a) controllable and (b) fairly well-trodden.

loloquwowndueo · 2024-02-14T11:30:17 1707910217

Fly.io is not comparable to Kubernetes. It’s a bit like comparing AWS to Terraform.

Or to clarify your comment, Kubernetes on which cloud? Amazon? google? Linode?

jrockway · 2024-02-14T18:04:34 1707933874

Kubernetes on AWS, GCP, and Linode are all controllable and well-trodden.

I definitely understand the comparison between Kubernetes and fly. You have couple apps that are totally unrelated, managed by separate teams, and you want to figure out how you can avoid the two teams duplicating effort. One option is to use something like fly.io, where you get a command line you run to build your project and push the binary to a server. Another option is to self-host infrastructure like Kubernetes, and eventually get that down to one command to build and push (or have your CI system do it).

The end result that organizations are aiming for are similar; developers code the code and then the code runs in production. Frankly, a lot of toil and human effort is spent on this task, and everyone is aiming to get it to take less effort. fly.io is an approach. Kubernetes is an approach. Terraform on AWS is an approach.

loloquwowndueo · 2024-02-14T19:38:11 1707939491

Maybe you’re comparing flyctl with Kubernetes?

That’d be a slightly more valid comparison albeit flyctl is much less ambitious by choice and design. That said, using flyctl to orchestrate your deployments is not the only way to Fly. Example:

https://fly.io/blog/fks/

morgante · 2024-02-14T19:40:46 1707939646

> Fly.io is not comparable to Kubernetes.

The Fly team has worked on solving similar problems to Kubernetes. Ex://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/

Of course, Fly also provides the underlying infrastructure stack too. If you want to be pedantic, you can compare it to GKE/AKS/EKS.

Kubernetes on any major cloud platform is more mature, controllable, and reliable than Fly.

awestroke · 2024-02-14T07:10:43 1707894643

I have run several services on Fly for almost a year now, have not had any issues.

rmbyrro · 2024-02-14T17:59:22 1707933562

I find it amazing how much bad vibes fly.io gets here.

It looks worse than AWS or Azure to me.

Never used the service, but based on what I hear, I'll never try...

M_Joverflow · 2024-02-15T03:24:35 1707967475

I don't see any reason use Fly. There are more mature, feature-richer and cheaper solutions out there. We have the big complex ones like AWS, Azure, GCP, the easier more affordable all rounder like DO, Render, the hosting plattforms like Vercel, Heroku and finally the biggest bang for your money barebones like Hetzner.

Why should I choose Fly? How come they are so prominent on hackernews? Are they backed by VC and get their default 400 upvotes by backers? I get the impression that Fly posts here are kind of sponsored.

throwaway220033 · 2024-02-14T12:03:43 1707912223

I switched to Kamal and Hetzner. It's the sweet spot.

chachra · 2024-02-14T04:11:18 1707883878

Been on it 7 months, 0 issues. Feel like you're alone on this potentially.

weird-eye-issue · 2024-02-14T05:34:10 1707888850

Alone? Every thread about Fly has complaints about reliability and people complain about it on Twitter too

jrockway · 2024-02-14T18:07:49 1707934069

It's hard to tell how meaningful the reviews are. I have used AWS, GCP, DigialOcean, and Linode throughout my career. Every single one of these, through no fault of myself or my team, messed up and caused downtime. Like, you can get most SRE types in a room to laugh if you blurt out "us-east-1", because it's known to be so unreliable. And yet, it's where every Fortune 500 puts every service; we laugh about the reliability and it's literally powering the economy just fine.

So yes, a lot of people on HN complain about fly's reliability. fly posts to HN a lot and gives them the opportunity. Is it actually meaningful compared to the alternatives? It's very hard to tell.

tptacek · 2024-02-14T18:41:34 1707936094

Hoo boy.

First: this is 100% a "live by the sword, die by the sword" situation for us. We're as aware as anybody about our weird HN darling status (this is a post from two months ago, about an announcement from many months ago, that spent like 12 hours plastered to the front page; we have no idea why it hit today, and it actually stepped on another thing we wanted to post today so don't think we secretly orchestrated any of this!). We've allowed ourselves to be ultra-visible here, and threads like this are natural consequence.

Moreover: a lot of this criticism is well warranted! I can cough up a litany of mitigating factors (the guy who stored his database in ephemeral instance storage instead of a volume, for instance), but I mean, come on. The single most highly upvoted and trafficked thing we've ever written was a post a year ago owning up to reliability issues on the platform. People have definitely had issues!

A fun cop-out answer here is to note all the times people compare us to AWS or Cloudflare, as if we were a hyperscaler public cloud. More fun still is to search HN for stories about us-east-1. We certainly do that to self-sooth internally! And: also? If your only consideration for picking a place to host an application is platform reliability? You're hosting on AWS anyways. But it's still a cop-out.

So I guess I'd sum all this up as: we've picked a hard problem to work on. Things are mathematically guaranteed to go wrong even if we're perfect, and we are not that. People should take criticisms of us on these threads seriously. We do. This is a tough crowd (the threads, if not the vote scores on our blog post) and there's value in that. Over the last year, and through this upcoming year, staffing for infra reliability has been the single biggest driver of hiring at Fly.io, I think that's the right call, and I think the fact that we occasionally get mauled on threads is part of what enabled us to make that call.

(Ordinarily I'd shut up about this stuff and let the thread die out itself, but some dearly loved user of ours took a stand and said they'd never had any problems on us, which: you can imagine the "ohhhhh nooooooo" montage that took place in my brain when I read that someone had essentially dared the thread to come up with times when we'd sucked for some user, so I guess all bets are off. Go easy on Xe, though: they really are just an ultra-helpful uncynical person, and kind of walked into a buzzsaw here).

jrockway · 2024-02-14T19:43:59 1707939839

I also don't know why HN is so upset about people willing to help out in the threads. The way I see it is, if you talk about your product on HN, inevitably someone will remember they have a support inquiry while HN is open, and ask it there instead of over email. Since employees are probably reading HN, they are naturally going to want to answer or say they escalated there. I don't think it's some sort of scam, just what any reasonable person would do.

tptacek · 2024-02-14T19:56:57 1707940617

It's become a YC cliche, that the way to get support for any issue is to get a complaint upvoted to the top of a thread. People used to talk about "Collison installs", which are real-use product demos that are so slick your company founder (in this case Stripe's 'pc) can just wander around installing your product for people to evangelize it; there should be another Collison term for decisively resolving customer support issues by having the founder drop into a thread, and I think that's the vibe people are reacting to here.

chachra · 2024-02-14T06:07:19 1707890839

ok possibly not alone, maybe the issues happened before I started using them extensively. I've had ~no downtime that affects me in 7 months.

I do wish they had some features I need, but their support and responses are top notch. And I've lost much less hair and time than I would going full-blown AWS or another cloud provider.

jokethrowaway · 2024-02-14T10:34:19 1707906859

To be fair most hosting providers come with plenty of public complaints about downtime. The big ones do way better, the best one is AWS, then GC and last Azure. They cost stupid money though.

Digital ocean has been terrible for me, some regions just go down every month and I lose thousands of requests, increasing my churn rate.

Fly.io had tons of weird issues but it got better in the last months. It's still very incomplete in terms of functionality and figuring out how to deploy the first time is a massive pain.

My plan is to add Hetzner and load balance with bunnycdn across DO and H

loloquwowndueo · 2024-02-14T11:28:25 1707910105

Every thread on the Internet about any product or service has complaints.

weird-eye-issue · 2024-02-14T15:24:09 1707924249

Actually here is a good example: Cloudflare. Sure people complain a ton about privacy but I haven't seen a single complaint about the reliability of Cloudflare Workers or similar product in the dozens of threads I've seen on HN

weird-eye-issue · 2024-02-14T12:07:48 1707912468

Not to this extent, it has always stood out to me in particular

throwaway220033 · 2024-02-14T12:15:05 1707912905

[flagged]

loloquwowndueo · 2024-02-14T12:30:19 1707913819

Wow that devolved into aggression pretty quickly.

throwaway220033 · 2024-02-15T09:16:10 1707988570

I'd correct that as a corrupt definition of "aggression".

matthewmacleod · 2024-02-14T12:54:05 1707915245

This isn't the place for silly, unjustified personal attacks. Stop it.

nixgeek · 2024-02-14T06:06:37 1707890797

That hasn’t been my experience with Fly but I’m sorry to hear it seems to be others :(

uo21tp5hoyg · 2024-02-14T07:08:12 1707894492

https://community.fly.io/t/reliability-its-not-great/11253

heeton · 2024-02-14T08:40:18 1707900018

Not alone, I’ve been part of two teams who have evaluated fly and hit weird reliability or stability issues, deemed it not ready yet.

yawnxyz · 2024-02-14T20:19:48 1707941988

this is what I thought, until once I spent two days to publish a new, trivial code change to my Fly.io hosted API — it just wouldn't update! And every time I tried to re-publish it'd give me a slightly different error.

When it works, it's brilliant. The problem is that it hasn't worked too well in the last few months.

xena · 2024-02-14T00:19:28 1707869968

Hi, author of the post and Fly.io devrel here in case anyone has any questions. GPUs went GA yesterday, you can experiment with them to your heart's content should the fraud algorithm machine god smile upon you. I'm mostly surprised my signal post about what the "GPUs" are didn't land well here: https://fly.io/blog/what-are-these-gpus-really/

If anyone has any questions, fire away!

benreesman · 2024-02-14T05:48:17 1707889697

I'd be fascinated to hear your thoughts on Apple hardware for inference in particular. I spend a lot of time tuning up inference to run locally for people with Apple Silicon on-prem or even on-desk, and I estimate a lot of headroom left even with all the work that's gone into e.g. GGUF.

Do you think the process node advantage and SoC/HBM-first will hold up long enough for the software to catch up? High-end Metal gear looks expensive until you compare it to NVIDIA with 64Gb+ of reasonably high memory bandwidth attached to dedicated FP vector units :)

One imagines that being able to move inference workloads on and off device with a platform like `fly.io` would represent a lot of degrees of freedom for edge-heavy applications.

xena · 2024-02-14T07:02:51 1707894171

Well, let me put it this way. I have a MacBook with 64 GB of vram so I can experiment with making an old-fashioned x.ai clone (the meeting scheduling one, not the "woke chatgpt" one) amongst other things now. I love how Apple Silicon makes things vroomy on my laptop.

I do know that getting those working in a cloud provider setup is a "pain in the ass" (according to ex-AWS friends) so I don't personally have hope in seeing that happen in production.

However, the premise makes me laugh so much, so who knows? :)

LtdJorge · 2024-02-15T12:03:07 1707998587

If you want something similar to Apple silicon for AI, look for AMD Instinct MI300A. 24 cores, 256MB cache, 128GB of HBM3 and 1 petaflop of FP16.

thangngoc89 · 2024-02-14T05:07:54 1707887274

This is right on time. I'm evaluating "severless" GPU services for my upcoming project. I see on the announcement that pricing is per hours. Is scaling to zero priced based on minutes/seconds? For my workflow, medical image segmentation, one file takes about 5 minutes.

LtdJorge · 2024-02-15T12:04:16 1707998656

How long does it take to process on CPU?

thangngoc89 · 2024-02-15T14:39:34 1708007974

About 1-1.5 hours on a Core i9-13900K. Which is unacceptable for my use case.

qeternity · 2024-02-14T00:42:51 1707871371

I posted further down before seeing your comment. First, congrats on the launch!

But who is the target user of this service? Is this mostly just for existing fly.io customers who want to keep within the fly.io sandbox?

xena · 2024-02-14T01:11:15 1707873075

Part of it is for people that want to do GPU things on their fly.io networks. One of the big things I do personally is I made Arsène (https://arsene.fly.dev) a while back as an exploration of the "dead internet" theory. Every 12 hours it pokes two GPUs on Fly.io to generate article prose and key art with Mixtral (via Ollama) and an anime-tuned Stable Diffusion XL model named Kohaku-XL.

Frankly, I also see the other part of it as a way to ride the AI hype train to victory. Having powerful GPUs available to everyone makes it easy to experiment, which would open Fly.io as an option for more developers. I think "bring your own weights" is going to be a compelling story as things advance.

gooseyman · 2024-02-14T03:23:31 1707881011

https://en.m.wikipedia.org/wiki/Dead_Internet_theory

What have you learned from the exploration?

xena · 2024-02-14T03:41:12 1707882072

Enough that I'd probably need to write a blogpost about it and answer some questions that I have about it. The biggest one I want to do is a sentiment analysis of these horoscopes vs market results to see if they are "correct".

cosmojg · 2024-02-14T04:50:46 1707886246

Interesting setup! What's the monthly cost of running Arsène on fly.io?

xena · 2024-02-14T06:55:59 1707893759

Because I have secret magical powers that you probably don't, it's basically free for me. Here's the breakdown though:

The application server uses Deno and Fresh (https://fresh.deno.dev) and requires a shared-1x CPU at 512 MB of ram. That's $3.19 per month as-is. It also uses 2GB of disk volume, which would cost $0.30 per month.

As far as post generation goes: when I first set it up it used GPT-3.5 Turbo to generate prose. That cost me rounding error per month (maybe like $0.05?). At some point I upgraded it to GPT-4 Turbo for free-because-I-got-OpenAI-credits-on-the-drama-day reasons. The prose level increase wasn't significant.

With the GPU it has now, a cold load of the model and prose generation run takes about 1.5 minutes. If I didn't have reasons to keep that machine pinned to a GPU (involving other ridiculous ventures), it would probably cost about 5 minutes per day (increased the time to make the math easier) of GPU time with a 40 GB volume (I now use Nous Hermes Mixtral at Q5_K_M precision, so about 32 GB of weights), so something like $6 per month for the volume and 2.5 hours of GPU time, or about $6.25 per month on an L40s.

In total it's probably something like $15.75 per month. That's a fair bit on paper, but I have certain arrangements that make it significantly less cheap for me. I could re-architect Arsène to not have to be online 24/7, but it's frankly not worth it when the big cost is the GPU time and weights volume. I don't know of a way to make that better without sacrificing model quality more than I have to.

For a shitpost though, I think it'd totally worth it to pay that much. It's kinda hilarious and I feel like it makes for a decent display of how bad things could get if we go full "AI replaces writers" like some people seem to want for some reason I can't even begin to understand.

I still think it's funny that I have to explicitly tell people to not take financial advice from it, because if I didn't then they will.

tptacek · 2024-02-14T01:26:26 1707873986

This isn't the target user, but the boy's been using it at the soil bacteria lab he works in to do basecalling for a FAST5 data from a nanopore sequencer.

yard2010 · 2024-02-14T16:28:36 1707928116

Can you please elaborate?

tptacek · 2024-02-14T16:29:11 1707928151

I am nowhere within a million miles smart enough to elaborate on this one.

subarctic · 2024-02-14T00:50:10 1707871810

Commenters like this, for one thing: https://news.ycombinator.com/item?id=34242767

Nevin1901 · 2024-02-14T04:49:32 1707886172

How fast are coldstarts, and how do you compare against other gpu providers (runpod modal etc)

xena · 2024-02-14T07:03:26 1707894206

The slowest part is loading weights into vram in my experience. I haven't done benchmarking on that. What kind of benchmark would you like to see?

ipsum2 · 2024-02-14T08:03:54 1707897834

I would like to see time to first inference for typical models (llama-7b first token, SDXL 1 step, etc)

yla92 · 2024-02-14T02:53:07 1707879187

Not a question but the link "Lovelace L40s are coming soon (pricing TBD)" is 404.

thangngoc89 · 2024-02-14T04:06:00 1707883560

If it's a link to nvidia.com then it's expected to be broken. Seriously, I've never seen a valid link to nvidia.com

xena · 2024-02-14T03:09:03 1707880143

Uhhhh that's not ideal. I'll go edit that after dinner. Thanks!

bl4kers · 2024-02-14T01:34:11 1707874451

How difficult world it be to set up Folding@home on these? https://foldingathome.org

xena · 2024-02-14T01:39:16 1707874756

I'm not sure, the more it uses CUDA the easier I bet. I don't know if it would be fiscally worth it though.

niz4ts · 2024-02-14T00:19:47 1707869987

As far as I know, Fly uses Firecracker for their VMs. I've been following Firecracker for a while now (even using it in a project), and they don't support GPUs out of the box (and have no plan to support it [1]).

I'm curious to know how Fly figured their own GPU support with Firecracker. In the past they had some very detailed technical posts on how they achieved certain things, so I'm hoping we'll see one on their GPU support in the future!

[1]: https://github.com/firecracker-microvm/firecracker/issues/11...

mrkurt · 2024-02-14T00:22:37 1707870157

The simple spoiler is that the GPU machines use Cloud Hypervisor, not Firecracker.

yencabulator · 2024-02-16T01:29:03 1708046943

There has been weirdly little discussion on HN about Cloud Hypervisor. I guess because it's such a horribly bland non-descriptive Enterprise Naming name?

It looks pretty sweet. Rust & sharing libraries with Firecracker and ChromeOS's crosvm, with more emphasis on long-running stateful services than in Firecracker.

https://github.com/cloud-hypervisor/cloud-hypervisor

https://github.com/rust-vmm

nolist_policy · 2024-02-16T09:28:59 1708075739

Unfortunately, Cloud Hypervisor does not use strong sandboxing/privilege separation like crosvm does.

yencabulator · 2024-02-16T20:10:27 1708114227

For anyone else wanting to check on the status of this: it seems they're looking at a combination of seccomp, landlock and a systemd service instance per VM, with systemd doing DynamicUser, namespacing, and initial seccomp. Work seems to be happening right now, but of course it's telling and sad that it wasn't part of the original design.

https://github.com/cloud-hypervisor/cloud-hypervisor/issues/...

niz4ts · 2024-02-14T00:45:44 1707871544

Way simpler than what I was expecting! Any notes to share about Cloud Hypervisor vs Firecracker operationally? I'm assuming the bulkier Cloud Hypervisor doesn't matter much compared to the latency of most GPU workloads.

tptacek · 2024-02-14T00:57:43 1707872263

They are operationally pretty much identical. In both cases, we drive them through a wrapper API server that's part of our orchestrator. Building the cloud-hypervisor wrapper took me all of about 2 hours.

iambateman · 2024-02-13T22:46:42 1707864402

It’s cool to see that they can handle scaling down to zero. Especially for working on experimental sites that don’t have the users to justify even modest server costs.

I would love an example on how much time a request charges. Obviously it will vary, but is it 2 seconds or “minimum 60 seconds per spin up”?

mrkurt · 2024-02-13T22:55:11 1707864911

We charge from the time you boot a machine until it stops. There's no enforced minimum, but in general it's difficult to get much out of a machine in less than 5 seconds. For GPU machines, depending on data size for whatever is going into GPU memory, it could need 30s of runtime to be useful.

sodality2 · 2024-02-13T23:01:01 1707865261

How long does model loading take? Loading 19GB into a machine can't be instantaneous (especially if the model is a network share).

carl_dr · 2024-02-13T23:37:58 1707867478

It takes about 7s to load a 9GB model on Beam (they claim, and tested as about right), I imagine it is similar with Fly - I've not had any performance issues with Fly.

loloquwowndueo · 2024-02-13T23:35:35 1707867335

There are no “network shares”. The typical way to store model data would be in a volume, which is basically local nvme storage.

xena · 2024-02-14T07:04:10 1707894250

Wellllllll, technically there is LSVD which would let you store model weights in S3.

God that's a horrible idea. Blog time!

bbkane · 2024-02-13T23:01:38 1707865298

I see the whisper transcription article. Is there an easy way to limit it to, say $100 worth of transcription a month and then stop till next month? I want to transcribe a bunch of speeches but I want to spread the cost over time

IanCal · 2024-02-13T23:36:27 1707867387

Probably available elsewhere but you could setup an account with a monthly spend limit with openai and use their API until you hit errors.

$100/mo is about 10 days of speeches a month, how much data do you have?

edit - if the pricing seems reasonable, you can just limit how many minutes you send. AssemblyAI is another provider at about the same cost.

bbkane · 2024-02-14T01:11:05 1707873065

Thanks! Maybe 50hr of speeches. It's a hobby idea so I'll check these out when I get some time

IanCal · 2024-02-14T12:04:03 1707912243

I can probably just run these through whisper locally for you if you want and are able to share. Email is in my bio (ignore the pricing, I'm obv not charging)

xena · 2024-02-14T07:04:33 1707894273

Email xe@fly.io, I'm intrigued.

andes314 · 2024-02-13T23:00:54 1707865254

Do you offer some sort of keep_warm parameter that removes this latency (for a greater cost)?

mrkurt · 2024-02-13T23:03:28 1707865408

You control machine lifecycles. To scale down, you just set the appropriate restart policy, then exit(0).

You can also opt to let our proxy stop machines for you, but the most granular option is to just do it in code.

So yes, kind of. You just wait before you exit.

Aeolun · 2024-02-14T00:57:05 1707872225

So just to confirm, for these workloads, it’d start a machine when the request comes in, and then shut it down immediately after the request is finished (with some 30-60s in between I suppose)? Is there some way to keep it up if additional requests are in the queue?

Edit: Found my answer elsewhere (yes).

pgt · 2024-02-14T08:58:30 1707901110

I was an early adopter of Fly.io. It is not production-ready. They should fix their basic features before adding new ones.

urduntupu · 2024-02-14T09:17:48 1707902268

Unfortunately true. Also jumped the fly.io ship after initial high excitement for their offering. Moved back to DigitalOcean's app platform. A bit more config effort, significantly pricier, but we need stability on production. Can't have my customers call me b/c of service interruption.

throwaway220033 · 2024-02-14T09:27:37 1707902857

+1 - It's the most unreliable hosting service I've ever used in my life with "nice looking" packaging. There were frequently multiple things broken at same time, status page would always be green while my meetings and weekends were ruined. Software can be broken but Fly handles incidents with unprofessional, immature attitude. Basically you pay 10x more money for an unreliable service that just looks "nice". I'm paying 4x less to much better hardware with Hetzner + Kamal; it works reliably, pricing is predictable, I don't pay 25% more for the same usage next month.

https://news.ycombinator.com/item?id=36808296

ecmascript · 2024-02-14T12:26:56 1707913616

Comments like these are just sad to see on HN. It is not constructive. What is these basic features that need fixing you're speaking about and what is the fixes required?

cschmatzler · 2024-02-14T13:28:51 1707917331

Reliability and support. Having even “the entire node went down” tickets get an auto-response to “please go fuck off into the community forum” is insane. What is the community forum gonna do about your reliability issues? I can get a 4€/mo server at Hetzner and have actual people in the datacenter respond to my technical inquiries within minutes.

ecmascript · 2024-02-15T10:32:20 1707993140

Well I use Hetzner myself, I have never used Fly or their support. But this comment is much more helpful for readers so it's good that you elaborated :)

pgt · 2024-02-17T11:47:20 1708170440

He who sees fraud and does not cry 'fraud,' is fraud.

1. Provisioned machines should not die randomly and not spin back up.

Saving you time and money.

nakovet · 2024-02-13T23:21:17 1707866477

About Fly but not about the GPU announcement, I wish they had a S3 replacement, they suggest a GNU Affero project that is a dealbreaker for any business, needing to leave Fly to store user assets was a dealbreaker for us to use Fly on our next project, sad cause I love the simplicity, the value for money, the built in VPN.

simonw · 2024-02-13T23:30:48 1707867048

Sounds like you might be interested in the Tigris preview:

- https://www.tigrisdata.com/

- https://benhoyt.com/writings/flyio-and-tigris/ (discussed here: https://news.ycombinator.com/item?id=39360870)

- https://fly.io/docs/reference/tigris/

JoshTriplett · 2024-02-13T23:27:57 1707866877

> I wish they had a S3 replacement, they suggest a GNU Affero project that is a dealbreaker for any business

AGPL does not mean you have to share everything you've built atop a service, just everything you've linked to it and any changes you've made to it. If you're accessing an S3-like service using only an HTTPS API, that isn't going to make your code subject to the AGPL.

bradfitz · 2024-02-13T23:49:32 1707868172

Regardless, some companies have a blanket thou-shalt-not-use-AGPL-anything policy.

hiharryhere · 2024-02-14T00:34:10 1707870850

Some companies Including Google.

I’ve sold Enterprise Saas to Google and we had to attest we have no AGPL code servicing them. This is for a CRM-like app.

anonzzzies · 2024-02-14T10:12:36 1707905556

Yep, our lawyers say to not use and we have to check components and libs we use too. People are really shooting themselves in the foot with that license.

aragilar · 2024-02-14T12:43:18 1707914598

You assume that people want you to use their project. For MinIO, the AGPL seems to be a way to get people into their ecosystem so they can sell exceptions. Others might want you to contribute code back.

anonzzzies · 2024-02-14T12:47:51 1707914871

I have no problem with contributing back: we do that all the time on MIT / BSD projects even if we don't have to. AGPL just restricts the use-cases and (apparently) there is limited legal precedence in my region to see if we don't have to give away everything that's not even related but uses it, so the lawyers (I am not a lawyer, so I cannot provide more details) say to avoid it completely. Just to be safe. And I am sure it hurts a lot of projects... There are many modern projects that are the same thing, but they don't share code because the code is agpl.

corobo · 2024-02-14T15:50:38 1707925838

Sounds more like the license is doing its job as intended, and businesses that can afford lawyers but not bespoke licenses are shooting themselves in the foot with that policy

JoshTriplett · 2024-02-15T00:34:46 1707957286

Exactly. If someone doesn't want to use your software because the copyleft license is stricter than they would prefer, that's an opportunity to sell them a license.

anonzzzies · 2024-02-15T16:31:43 1708014703

We just won’t use their product. Works as well.

corobo · 2024-02-15T17:08:20 1708016900

Everyone wins :)

trollian · 2024-02-13T23:55:27 1707868527

Lawyercats are the worst cats.

RcouF1uZ4gsC · 2024-02-13T23:57:41 1707868661

> AGPL does not mean you have to share everything you've built atop a service, just everything you've linked to it and any changes you've made to it. If you're accessing an S3-like service using only an HTTPS API, that isn't going to make your code subject to the AGPL.

I am not so sure about that. Otherwise, you could trivially get around the AGPL by using https services to launder your proprietary changes.

There is not enough caselaw to say how a case that used only http services provided by AGPL to run a proprietary service would turn out, and it is not worth betting your business on it.

c0balt · 2024-02-14T00:11:30 1707869490

> > AGPL does not mean you have to share everything you've built atop a service, just everything you've linked to it and any changes you've made to it. If you're accessing an S3-like service using only an HTTPS API, that isn't going to make your code subject to the AGPL.

Correct, this is a known caveat, that's also covered a bit more in the GNU article about the AGPL when discussing Software as a Service Substitutes, ref: https://www.gnu.org/licenses/why-affero-gpl.html.en

xcdzvyn · 2024-02-14T00:10:34 1707869434

> you could trivially get around the AGPL by using https services to launder your proprietary changes.

This is a very interesting proposition that makes me reconsider my opinion of AGPL.

mbreese · 2024-02-14T02:17:06 1707877026

Anything “clever” in a legal sense is a red flag for me… Computer people tend to think of the law as a black and white set of rules, but it is and it isn’t. It’s interpreted by people and “one clever trick” doesn’t sound like something I’d put a lot of faith in. Intent can matter a lot.

(Regardless of how you see the AGPL)

internetter · 2024-02-14T02:35:29 1707878129

> Computer people tend to think of the law as a black and white set of rules

I've never seen someone put this into words, but it makes a lot of sense. I mean, idealistically computers are deterministic, whereas the law is not (by design), yet there exists many parallels between the two. For instance, the lawbook has strong parallels to the documentation for software. So it makes sense why programmers might assume the law is also mostly deterministic, even if this is false

ozr · 2024-02-14T03:11:55 1707880315

I'm an engineer with a passing interest in the law. I've frequently had to explain to otherwise smart and capable people that their one weird trick will just get them a contempt charge.

xcdzvyn · 2024-02-14T19:27:09 1707938829

Even if that wasn't directly targeted at me, I'll elaborate on my concern:

That it's possible to interpret the AGPL both ways (that the prior hack is legal, and that it is not), and that the project author could very well believe either one, suggests to me that the AGPL's terms aren't rigidly binding, but ultimately a kind of "don't do what the author thinks the license says, whatever that is".

Dylan16807 · 2024-02-14T03:08:34 1707880114

On the other hand the AGPL itself is trying to be one clever trick in the first place, so maybe it's appropriate here.

benbjohnson · 2024-02-13T23:30:20 1707867020

We have an region-aware S3 replacement that's in beta right now: https://community.fly.io/t/global-caching-object-storage-on-...

benhoyt · 2024-02-13T23:29:52 1707866992

They're about to get an S3 replacement, called Tigris (it's a separate company but integrated into flyctl and runs on Fly.io infra): https://benhoyt.com/writings/flyio-and-tigris/

martylamb · 2024-02-13T23:32:09 1707867129

Funny you should mention that: https://news.ycombinator.com/item?id=39360870

itake · 2024-02-14T00:30:56 1707870656

The dealbreaker should be their uptime and support. They deleted my database and have many uptime issues.

tptacek · 2024-02-14T00:12:48 1707869568

Give us a minute.

benatkin · 2024-02-13T23:26:44 1707866804

This looks promising https://github.com/seaweedfs/seaweedfs

candiddevmike · 2024-02-14T00:15:34 1707869734

Seaweed requires a separate coordination setup which may simplify the architecture but complicates the deployment.

qeternity · 2024-02-14T00:38:52 1707871132

Who is the target market for this? Small/unproven apps that need to run some AI model, but won't/can't use hosted offerings by the literally dozens of race-to-zero startups offering OSS models?

We run plenty of our own models and hardware, so I get wanting to have control over the metal. I'm just trying to figure out who this is targeted at.

mrkurt · 2024-02-14T03:32:16 1707881536

We have some ideas but there's no clear answer yet. Probably people building hosting platforms. Maybe not obvious hosting platforms, but hosting platforms.

KTibow · 2024-02-14T04:47:33 1707886053

Fly is an edge network - in theory, if your GPUs are next to your servers and your servers are next to your users, your app will be very fast, as highlighted in the article. In practice this might not matter much since inference takes a long time anyway.

tptacek · 2024-02-14T05:32:48 1707888768

We're really a couple things; the edge stuff was where we got started in 2020, but "fast booting VMs" is just as important to us now, and that's something that's useful whether or not you're doing edge stuff.

joshxyz · 2024-02-14T05:40:32 1707889232

this is crazy, this move alone cements fly as an edge player for the next 3 / 5 / 10 years.

dathinab · 2024-02-14T08:41:54 1707900114

TL;DR: (skip to last paragraph)

- having the GPU compute in the same data center or at least from the same cloud provider can be a huge plus

- it's not that rare for various providers we have tried out to run out of available A100 GPUs, even with large providers we had issues like that multiple times (less an issue if you aren't locked to specific regions)

- not all providers provide a usable scale down to zero "on demand" model, idk. how well it works with fly long term but that could be another point

- race-to-zero startups have the tendency to not last, it's kind by design from a 100 of them just a very few survive

- if you are already on fly and write a non-public tech demo which just gets evaluated a few times their GPU offering can act like a default don't think much about it solution (through you using e.g. Huggingface services would be often more likely)

- A lot of companies can't run their own hardware for various reasons, at best they can rent a rack in another Datacenter but for small use use-cases this isn't always worth it. Similar there are use cases which do might A100s but only run them rarely (e.g. on weekly analytics data). Potentially less then 1h/w in which case race-to-zero pricing might not look interesting at all

To sum up I think there are many small reasons why some companies, not just startups, might have interest in fly GPUs, especially if they are already on fly. But there is no single "that's why" argument, especially if you are already deploying to another cloud.

qeternity · 2024-02-14T22:00:03 1707948003

It's not like Fly has GPUs in every PoP...so there goes all the same datacenter stuff (unless you just want to be in the PoP with GPUs in which case...)

But none of this answers my question.

I'm trying to understand the intersection of things like "people who need GPU compute" and "people who need to scale down to zero".

This can't be a very big market.

DreamGen · 2024-02-14T10:42:58 1707907378

I am not seeing any race-to-zero in the hosted offering space. Most charge multiples of what you would pay on GCP, and the public prices on GCP are already several times what you would pay as an enterprise customer.

qeternity · 2024-02-14T21:57:18 1707947838

I don't know what you think I'm talking about, or who is charging multiples of GCP? But I'm talking about hosted inference, where many startups are offering Mistral models cheaper than Mistral are.

ec109685 · 2024-02-14T05:39:46 1707889186

The recipe example or any any LLM use case seems like a very poor way of highlighting “inference at the edge” given the extra few hundred ms round trip won’t matter.

unraveller · 2024-02-14T11:20:56 1707909656

The better use case is obviously voice assistant at the edge. As in voice 2 text 2 search/GPT 2 voice generated response. That is where ms matter but it is also a high abuse angle no one wants to associate with just yet. My guess is they are going to do this in another post, and if so they should make their own perplexity style online-gpt. For now they just wanted to see what else people can think up by making the introduction of it boring.

ec109685 · 2024-02-14T16:22:01 1707927721

There’s three options for inference: 1) On device inference 2) Inference “on the edge” 3) Inference in a data center

Given fly is deployed in equinox data centers just like everyone else, fundamentally there isn’t much difference between #2 and #3.

manishsharan · 2024-02-14T05:59:18 1707890358

This. I cannot think of a business case for running LLMs on the edge. Is this a Pets.com moment for the AI industry?

holoduke · 2024-02-13T22:57:48 1707865068

Anybody has experience with the performance. First glance is that they are quite expensive. Compared to for example Hetzner (cpu machines)

impulser_ · 2024-02-13T23:07:35 1707865655

I'm not sure about others, but you can get A100s with 90gb of RAM from DigitalOcean for $1.15 an hour. So about 1/3 the price.

You can even get H100s for cheaper than these prices at $2.24 an hour.

So these do seem a bit expensive, but this might be because there is high demand for them from customers and they don't have the supply.

treesciencebot · 2024-02-13T23:25:35 1707866735

Just to correct the record, both $1.15 per A100 and $2.24 per H100 require a 3-year-commitment. On-demand prices are 2.5X that.

Aeolun · 2024-02-14T01:01:23 1707872483

> $2.24/hour pricing is for a 3-year commitment. On-demand pricing for H100 is $5.95/hour under our special promo price.\$1.15/hour pricing is for a 3-year commitment.

Wow, that’s some spectacularly false advertising.

skrtskrt · 2024-02-13T23:17:03 1707866223

getting supply is super hard right now, DigitalOcean just straight up bought Paperspace to get access to those GPUs.

The whole reason Coreweave is on a fat growth trajectory right now is they used their VC money to buy a ton of GPUs at the right time

dathinab · 2024-02-14T09:21:05 1707902465

Company I work for had multiple times problems of not being able to allocate any gpus from some larger cloud providers (with the region restrictions we have, which still include all of EU as regions).

(I'm not sure which of them it was, we are currently evaluating multiple providers and I'm not really involved in that process.)

unixhero · 2024-02-14T12:19:34 1707913174

I use Fly.io free tier to run uptime monitoring with Uptime kuma. It works insanely well, and I'm a really happy camper.

rozenmd · 2024-02-14T12:28:27 1707913707

What do you use to let you know uptime kuma went down?

unixhero · 2024-02-14T14:20:39 1707920439

It doesn't

UncleOxidant · 2024-02-14T02:24:31 1707877471

I don't want to deploy an app, I just want to play around with LLMs and don't want to go out and buy an expensive PC with a highend GPU just now. Is Fly.io a good way to go? What about alternatives?

mrb · 2024-02-14T07:20:51 1707895251

Use https://vast.ai and rent a machine for as long as you need (minutes, hours, days). You pick the OS image, and you get a root shell to play with. An RTX 4090 currently costs $0.50 per hour. It literally took me less than 15 minutes to sign up for the first time a few weeks ago.

For comparison, the first time experience on Amazon EC2 is much worse. I had tried to get a GPU instance on EC2 but couldn't reserve it (cryptic error message). Then I realized as a first-time EC2 user my default quota simply doesn't allow any GPU instances. After contacting support and waiting 4-5 days I eventually got a response my quota was increased, but I still can't launch a GPU instance... apparently my quota is still zero. At this point I gave up and found vast.ai. I don't know if Amazon realizes how FRUSTRATING their useless default quotas are for first-time EC2 users.

janalsncm · 2024-02-14T08:49:54 1707900594

Pretty much had the same experience with EC2 GPUs. No permission, had to contact support. Got permission a day later. I wanted to run on A100 ($30/hour, 8GPU minimum) but they were out of them that night. I tried again next day, same thing. So I gave up and used RunPod.io.

mrkurt · 2024-02-14T02:27:38 1707877658

You might actually be better off building a gaming rig and using that. The datacenter GPUs are silly expensive, because this is how NVIDIA price discriminates. The consumer, game GPUs work really well and you can buy them for almost as cheap as you can lease datacenter ones.

nojs · 2024-02-14T02:56:30 1707879390

I can recommend runpod.io after a few months of usage - very easy to spin up different GPU configurations for testing and the pricing is simple and transparent. Using TheBloke docker images you can get most local models up and running in a few minutes.

leourbina · 2024-02-14T02:25:00 1707877500

Paperspace is a great way to go for this. You can start by just using their notebook product (similar to Collab), and you get to pick which type of machine/GPU it runs on. Once you have the code you want to run, you can rent machines on demand:

https://www.paperspace.com/notebooks

janalsncm · 2024-02-14T08:54:47 1707900887

I used paperspace for a while. Pretty cheap for mid tier gpu access (A6000 for example). There were a few things that annoyed me though. For one, I couldn’t access free GPUs with my team account. So I ended up quitting and buying a 4090 lol.

ignoramous · 2024-02-14T03:34:40 1707881680

> What about alternatives?

Custom models? Apart from the Big 3 (in no particular order):

- https://together.ai/

- https://replicate.com/

- https://anyscale.com/

- https://baseten.co/

- https://modal.com/

- https://banana.dev/

- https://runpod.io/

- https://bentoml.com/

- https://brev.dev/

- https://octo.ai/

- https://cerebrium.ai/

...

ayewo · 2024-02-14T11:39:34 1707910774

> Apart from the Big 3 ...

Who are the big 3 in this context?

gk1 · 2024-02-14T13:06:26 1707915986

OpenAI, Anthropic, Cohere

mrcwinn · 2024-02-14T02:31:37 1707877897

https://ollama.com/ - Easy setup, run locally, free.

UncleOxidant · 2024-02-14T02:34:56 1707878096

Yeah, but I've got an RTX1070 in my circa 2017 PC. How well is that going to work?

thangngoc89 · 2024-02-14T04:11:44 1707883904

It's slow but still decent since it has 8GB of RAM.

jeswin · 2024-02-14T05:23:04 1707888184

You mean GTX 1070. There's no RTX 1070.

dathinab · 2024-02-14T09:14:35 1707902075

main question, do you need a A100?

some use cases do so.

but if not there are much cheaper consumer GPU based choices

but then maybe you anyway just use it for 1-2 hours in total in which case the price difference might just not matter

andes314 · 2024-02-13T22:58:08 1707865088

Has anyone who has used Beam.Cloud compare that service to this one?

DreamGen · 2024-02-14T10:41:30 1707907290

Great, more competition for the price-gouging platforms like Replicate and Modal is needed. As always with these, I would be curious about the cold-start time -- are you doing anything smart about being able to start (load models into VRAM) quickly? Most platforms that I tested are completely naive in their implementation, often downloading the docker image just-in-time instead of having it ready to be deployed on multiple machines.

jimnotgym · 2024-02-14T18:44:59 1707936299

It is a bit of an odd thing that we still call GPUs GPUs when the main use for them seems to have little to do with Graphics!

Havoc · 2024-02-13T23:05:17 1707865517

How fast is the spin up/down on this scale to zero? If it is fast this could be pretty interesting

amanda99 · 2024-02-13T23:27:31 1707866851

I think the bigger question is how long it takes to load any meaningful model onto the GPU.

fideloper · 2024-02-13T23:56:42 1707868602

that’s exactly right.

gpu-friendly base images tend to be larger (1-3g+) so that takes time (30s - 2m range) to create a new Machine (vm).

Then there’s “spin up time” of your software - downloading model files adds as long as it takes to download GB of model files.

Models (and pip dependencies!) can generally be “cached” if you (re)use volumes.

Attaching volumes to gpu machines dynamically created via the API takes a bit of management on your end (in that you’d need to keep track of your volumes, what region they’re in, and what to do if you need more volumes than you have)

dathinab · 2024-02-14T09:28:35 1707902915

I know it's not common in research and makes often little sense there.

But at least in theory for deployments you should generate deployment images.

I.e. no pip included in the image(!), all dependencies preloaded, unnecessary parts stripped, etc.

Models likely might also be bundled, but not always.

Still large images, but also depending on what they are for the same image might be reused often so it can be cached by the provider to some degree.

wslh · 2024-02-14T10:47:57 1707907677

Interesting. We have this discussing this kind of services (offloading training) over the last several days [1] [2] [3]. Thinking on the opportunity to compete with top cloud services such as Google Cloud, AWS, and Azure.

[1] https://news.ycombinator.com/item?id=39353663

[2] https://news.ycombinator.com/item?id=39329764

[3] https://news.ycombinator.com/item?id=39263422

riquito · 2024-02-14T00:18:57 1707869937

Is there any configuration to keep alive the machine for X seconds after a request has been served, instead of scaling down to zero immediately? I couldn't find it skimming the docs

mrkurt · 2024-02-14T00:21:43 1707870103

Machines are both dumber and more powerful than you'd think. Scaling down means just exit(0) if you have the right restart policy set. So you can implement any kind of keep-warm logic you want.