Hacker News new | past | comments | ask | show | jobs | submit | Palmik's comments login

That's a bad analogy. The weights are much closer to source code, because you can directly modify them (fine tune, merge or otherwise) using open source software that Meta released (torchtune, but there are tons of other libraries and frameworks).

You can also modify a precompiled binary with the right tools.

Except doing continued pre-training or fine tuning of the released model weights is the same process through which the original weights were created in the first place. There's no reverse engineering required. Meta engineers working on various products that need custom versions of the Llama model will use the same processes / tools.

Given that the focus is performance, do you have any benchmarks to compare against the likes of TensoRT-LLM.

It' s a bit early to compare directly to TensorRT because we don't have a full-blown equivalent.

Note that our focus is being platform agnostic, easy to deploy/integrate, good performance all-around, and ease of tweaking. We are using the same compiler than Jax, so our performances are on par. But generally we believe we can gain on overall "tok/s/$" by having shorter startup time, choosing the most efficient hardware available, and easily implementing new tricks like multi-token prediction.


I second this, it would help to justify the time investment into a framework if its clear how it stacks up!

> For example, training and serving Llama 3.1 on Google TPUs is about 30% cheaper than NVIDIA GPUs

When you say this, you should specify which Nvidia GPU you mean (I assume h100 SXM) and that price you are assuming for such GPU.

One can't simply compare based on the on demand price on GCP, because the Nvidia GPUs there are extremely overpriced.


Runpod charges $3.49/hr for an H100 SXM, which is fairly cheap as far as on-demand H100s go. A v5p TPU is $4.20/hr, but has 95GB RAM instead of 80GB on the H100 — so you'll need fewer TPUs to get the same amount of RAM.

Runpod is ever-so-slightly cheaper than Google TPUs on-demand on a per-GB basis: about 4.3 cents an hour per GB for Runpod vs 4.4 cents an hour per GB for a TPU. But let's look at how they compare with reserved pricing. Runpod is $2.79/hr with a 3-month commitment (the longest commitment period they offer), whereas Google offers v5p TPUs for $2.94/hr for a 1-year commitment (the shortest period they offer; and to be honest, you probably don't want to make 3-year commitments in this space, since there are large perf gains in successive generations).

If you're willing to do reserved capacity, Google is cheaper than Runpod per GB of RAM you need to run training or inference: Runpod is about 3.4 cents per GB per hour vs Google for about 3.09 cents per GB per hour. Additionally, Google presumably has a lot more TPU capacity than Runpod has GPU capacity, and doing multi-node training is a pain with GPUs and less so with TPUs.

Another cheap option to benchmark against is Lambda Labs. Now, Lambda is pretty slow to boot, and considerably more annoying to work with (e.g. they only offer preconfigured VMs, so you'll need to do some kind of management on top of them). They offer H100s for $2.99/hr "on-demand" (although in my experience, prepare to wait 20+ minutes for the machines to boot); if cold boot times don't matter to you, they're even better than Runpod if you need large machines (they only offer 8xH100 nodes, though: nothing smaller). For a 1-year commit, they'll drop prices to $2.49/hr... Which is still more expensive on a per-GB basis than TPUs — 3.11 cents per GB per hour vs 3.09 cents per GB per hour — and again I'd trust Google's TPU capacity more than Lambda's H100 capacity.

It's not dramatically cheaper than the cheapest GPU options available, but it is cheaper if you're working with reserved capacity — and probably more reliably available in large quantities.


Thank you for the detailed analysis. We need to spend some time thinking and coming up with a price comparison like this. We’ll use this as inspiration!

VRAM per GPU isn't such an interesting metric. If it was, everyone would be fine tuning on A100 80gb :)

What matters is steps per $ and to some degree also speed (I'm happy to pay premium sometimes to get the fine tuning results faster).


True, but a TPU v5p is supposedly much closer to an H100 than an A100 (the A100 and TPU v4 were fairly similar) — and you need the RAM as a baseline just to fit the model. I haven't seen super thorough benchmarking done between the two but the Google claims similar numbers. So, $/RAM/hr is all I can really look at without benchmarking sadly.

GCP is one of the cheapest places you can get them at scale.

Wouldn't really say it's the cheapest option...there are other providers like Lambda Labs or Ori.co where you can find them way cheaper

Tell me more.

At what scale were you able to get a significant discount and how much?

Most people will be (full) fine tuning on 8xh100 or 16xh100 for few days at a time.


For one time payments, like in e-commerce, it's relatively trivial for even a small business to implement.

For subscriptions, achieving portability is much trickier.


The difference between (A) software engineers reacting to AI models and systems for programming and (B) artists (whether it's painters, musicians or otherwise) reacting to AI models for generating images, music, etc. is very interesting.

I wonder what's the reason.


Because code either works or it doesn't. Nobody is replacing our entire income stream with an LLM.

You also need a knowledge of code to instruct an LLM to generate decent code, and even then it's not always perfect.

Meanwhile plenty of people are using free/cheap image generation and going "good enough". Now they don't need to pay a graphic artist or a stock photo licence

Any layperson can describe what they want a picture to look like so the barrier to entry and successful exit is a lot lower for LLM image generation than for LLM code generation.


> "Meanwhile plenty of people are using free/cheap image generation and going "good enough". Now they don't need to pay a graphic artist or a stock photo licence"

and getting sandwich photos of ham blending into human fingers:

https://www.reddit.com/r/Wellthatsucks/comments/1f8bvb8/my_l...


And yet, even knowing what I was looking for, I didn't see it long enough that I guessed I misunderstood and swiped to the second image, where it was pointed out specifically. Even if I had noticed myself--presumably because I was staring at it for way too long in the restaurant--I can't imagine I would have guessed what was going on, BUT EVEN THEN it just wouldn't have mattered... clearly, this is more than merely a "good enough" image.

At best it's a prototype and concept generator. It would have to yield assets with layers that can be exported by an illustration or bitmap tool of choice. AI generated images are almost completely useless as-is.

I agree there are plenty of images with garbled text and hands with 7 fingers, but text to image has freely available generators which create almost perfect images for some prompts. Certainly good enough to replace an actor holding a product, a stock photo, and often a stylised design.

Look at who the tools are marketed towards. Writing software involves a lot of tedium, eye strain, and frustration, even for experts who have put in a lot of hours practicing, so LLMs are marketed to help developers make their jobs easier.

This is not the case for art or music generators: they are marketed towards (and created by) laypeople with who want generic content and don't care about human artists. These systems are a significant burden on productivity (and fatal burden on creativity) if you are an honest illustrator or musician.

Another perspective: a lot of the most useful LLM codegen is not asking the LLM to solve a tricky problem, but rather to translate and refine a somewhat loose English-language solution into a more precise JavaScript solution (or whatever), including a large bag of memorized tricks around sorting, regexes, etc. It is more "science than art," and for a sufficiently precise English prompt there is even a plausible set of optimal solutions. The LLM does not have to "understand" the prompt or rely on plagiarism to give a good answer. (Although GPT-3.5 was a horrific F# plagiarist... I don't like LLM codegen but it is far more defensible than music generation)

This is not the case with art or music generators: it makes no sense to describe them as "English to song" translators, and the only "optimal" solutions are the plagiarized / interpolated stuff the human raters most preferred. They clearly don't understand what they are drawing, nor do they understand what melodies are. Their output is either depressing content slop or suspiciously familiar. And their creators have filled the tech community with insultingly stupid propaganda like "they learn art just like human artists do." No wonder artists are mad!


What you say may be true about the simplest workflow: enter a prompt and get one or more finished images.

But many people use diffusion models in a much more interactive way, doing much more of the editing by hand. The simplest case is to erase part of a generated image, and prompt to infill. But there are people who spend hours to get a single image where they want it.


This is true, and there's some really cool stuff there, but that's not who most of this is marketed at. Small wonder there's backlash from artists and people who appreciate artists when the stated value proposition is "render artists unemployed".


": they are marketed towards (and created by) laypeople with who want generic content and don't care about human artists"

Good. The artists I know have zero interest in doing that work. I have sacrificed a small fortune to invest in my wife's development as an artist so she never had to worry about making any money. She uses AI to help with promoting and "marketing" herself.

She and all of her colleagues all despise commissioned work and they get a constant stream of them. I always tell her to refuse them. Some pay very well.

If you are creating generic "art" for corporations I have little more than a shrug for your anxiety over AI.


It's just gatekeeping.

Artists put a ton of time into education and refining their vision inside the craft. Amateur efforts to produce compelling work always look amateur. With augmentation, suddenly the "real" artists aren't as differentiated.

The whole conversation is obviously extremely skewed toward digital art, and the ones talking about it most visibly are the digital artists. No abstract painter thinks AI is coming for their occupation or cares wether it is easier to create anime dreamscapes this year or the next.


Coding Assistants are not good enough (yet). Inline suggestions and chats are incredibly helpful and boost productivity (and only to those who know to use them well), but that's as fast as they go today.

If they can take a Jira ticket, debug the code, create a patch for a large codebase and understand and respect all the workarounds in a legacy codebase, I would have a problem with it.


Except they can't do the equivalent for art yet either, and I am fairly familiar with the state of image diffusion today.

I've commissioned tens of thousands of dollars in art, and spent many hundreds of hours working with Stable Diffusion, Midjourney, and Flux. What all the generators are missing is intentionality in art.

They can generate something that looks great at surface level, but doesn't make sense when you look at the details. Why is a particular character wearing a certain bracelet? Why do the windows on that cottage look a certain way? What does a certain engraving mean? Which direction is a character looking, and why?

The diffusers do not understand what they are generating, so they just generates what "looks right." Often this results in art that looks pretty but has no deeper logic, world building, meaning, etc.

And of course, image generators cannot handle the client-artist relationship as well (even LLMs cannot), because it requires an understanding of what the customer wants and what emotion they want to convey with the piece they're commissioning.

So - I rely on artists for art I care about (art I will hang on my walls), and image generators for throwaway work (such as weekly D&D campaign images.)


Of course the "art" art -- the part that is all about human creativity -- will always be there.

But lots of people in the art business aren't doing that. If you didn't have midjourney etc, what would you be doing for the throwaway work? Learn to design the stuff yourself, hire someone to do it on Upwork, or just not do it all? Some money likely will exchange hands there.


The throwaway work is worth pennies per piece to me at most. So I probably wouldn't do it at all if it wasn't for the generators.

And even when it comes to the generators, I typically just use the free options like open-source diffusion models, as opposed to something paid like Midjourney.


Have you seen https://www.swebench.com/ ?

Once you engage agentic behaviour, it can take you way further than just the chats. We're already in the "resolving JIRA tickets" area - it's just hard to setup, not very well known, and may be expensive.


> We're already in the "resolving JIRA tickets" area

For very simple tasks maybe, but not for the kinds of things I get paid to do.

I don't think it will be able to get to the level of reliably doing difficult programming tasks that require understanding and inferring requirements without having AGI, in which case society has other things to worry about than programmers losing their jobs.


Looks like the definition of "resolving a ticket" here is "come up with a patch that ensures all tests pass", which does not necessarily include "add a new test", "make sure the patch is actually doing something meaningful", "communicate how this is fixed". Based on my experience and what I saw in the reports in the logs, a solution could be just hallucinating completely useless code -- as long as it doesn't fail a test.

Of course, it is still impressive, and definitely would help with the small bugs that require small fixes, especially for open source projects that have thousands of open issues. But is it going to make a big difference? Probably not yet.

Also, good luck doing that on our poorly written, poorly documented and under tested codebase. By any standard django is a much better codebase than the one I work on every day.


Some are happy with creating tests as well, but you probably want to mostly write them yourself. I mean, only you know the real world context - if the ticket didn't explain it well enough, LLMs can't do magic.

Actually the poorly documented and poorly written is not a huge issue in my experience. The under tested is way more important if you want to automate that work.


But that’s not that far. Like sure, currently it’s not. But "reading a ticket with a description, find the relevant code, understand the code (often better than human), test it, return the result" is totally doable with some more iterations. It’s already doable for smaller projects, see GitHub workspaces etc.


I love art and code, IMO is because Cursor is really good and AI art is not that good.

There isn't a good metaphor for the problem with AI art. I would say it is like some kind of chocolate cake that the first few bites seem like the best cake you have ever had and then progressive bites become more and more shit until you stop even considering eating it. Then at some point even the thought of the cake makes you want to puke.

I say this as someone who thought we reached the art singularity in December 2022. I have no philosophical or moral problem with AI art. It just kind of sucks.

Cursor/Sonnet on the other hand just blew my mind earlier today.


There are really good models for AI art, if people care. I think that AI is better at making an image from start to finish than making some software from start to finish.

And I use Claude 3.5 Sonnet myself.


AI art is an oxymoron. It will never give me chills or make me cry.

I mean, it's supply and demand right.

- There is a big demand for really complex software development, and an LLM can't do that alone. So software devs have to do lots of busywork, and like the opportunity to be augmented by AI

- Conversely, there is a huge demand for not very high level art. - eg, lots of people want a custom logo or a little jingle, but no many people want to hire a concert pianist or comission the next Salvadore Dali.

So most artists spend a lot of time doing a lot of low level work to pay the bills, while software devs spend a lot of time doing low level code monkey work so they can get to the creative part of their job.


Is it really? I know people who love using LLMs, people who are allergic to the idea of even taking about AI usability and lots of others in between. Same with artists hating the idea, artists who spend hours crafting very specific things with SD, and many in between.

I'm not sure I can really point out a big difference here. Maybe the artists are more skewed towards not liking AI since they work with medium that's not digital in the first place, but the range of responses really feels close.


Curious to see if the same will apply to other materials like news, books, images, music, movies, etc.


The headlines on the ruling can be misleading:

> Plaintiffs may have succeeded if they were instead seeking damages for past harms. But in her opinion, Justice Amy Coney Barrett wrote that partly because the Biden administration seemingly stopped influencing platforms' content policies in 2022, none of the plaintiffs could show evidence of a "substantial risk that, in the near future, they will suffer an injury that is traceable" to any government official. Thus, they did not seem to face "a real and immediate threat of repeated injury," Barrett wrote.


Whatever it takes to spin a 6-3 decision man. It was clear from the start that this supposed government “pressure” doesn’t and never did exist.


Are you saying Zuck is lying and the government did not do what he's saying they did? In Twitter's case, there are emails from Adam Schiff - do you think that evidence is fraudulent?


Yes I think Zuck is being a diva to hedge on the outcome of the election. No I don’t think the evidence exists.


The existence of the government communications with the social media companies requesting suppression of content are referenced in the courts opinions. The Biden admin also admits to these communications. https://rollcall.com/2024/06/26/supreme-court-rejects-lawsui...


Your feelings don’t matter.


How is this different from the Coolify hosted cloud, apart from the fact that you are the co-founder of Coherence and not Coolify.

I've used neither solution, but just at a glance, right now I'd bet on Coolify -- it has more permissive license, it has active community of third party contributors and it amassed a large amount of private and corporate sponsors that likely make it sustainable.

On the other hand, you've raised $3.9m more that a year ago. What happens if the money runs out?

Maybe you can clarify what your solution offers that Coolify doesn't.


Appreciate the POV, and agree that Coolify has a much better community around it! A lot we can learn from. Not sure we agree on the license front since we do allow commercial use.

Coolify and cnc are very different technical solutions. Coolify is a server you deploy to a VM that then can schedule workloads onto that VM, managing features like ingress and updates. cnc is a client-side CLI that schedules workloads into managed cloud services like lambda, cloud run, ECS, or Kubernetes. It orchestrates public cloud provided services instead of providing them itself (e.g. RDS vs. MySQL in a docker container on a VM). The trade-offs here are too big for a comment and both are a great fit for different use cases. We dive in a bit deeper with our POV here: https://www.withcoherence.com/post/the-2024-web-hosting-repo...


The article is interesting, but it's not clear how Lago fixes the issue.

As far as I know, Lago starts in the low thousands per month.

If you go with the self hosted route, you might also just process Stripe's API (backfill using API or data export, use webhooks to keep it up to date). It's actually much easier than onboarding with Lago I would say.


Exactly, Lago seems like its riding the open source wave. Nobody reasonable is going to self host this.


I see no reason why one wouldn't self host this. Subscription and billing data is such a crucial part of any business, I'm surprised more don't handle it internally.


Because to self host it, you'd need to ensure you're PCI compliant. That works for established companies I guess, but not so much when starting out.

Payment provider lock-in in a scary thing, when they can cancel your account at moment's notice.


Pricing calculator was easy to find https://www.stackit.de/en/pricing/cloud-services/iaas/stacki...

Menu -> Compute Engine -> Pricing


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: