Hacker News new | past | comments | ask | show | jobs | submit login

I feel like an insane person everytime I look at the LLM development space and see what the state of the art is.

If I'm understanding this correctly, the standard way to get structured output seems to be to retry the query until the stochastic language model produces expected output. RAG also seems like a hilariously thin wrapper over traditional search systems, and it still might hallucinate in that tiny distance between the search result and the user. Like we're talking about writing sentences and coaching what amounts to an auto complete system to magically give us something we want. How is this industry getting hundreds of billions of dollars in investment?

Also the error rate is about 5-10% according to this article. That's pretty bad!




> [...] the standard way to get structured output seems to be to retry the query until the stochastic language model produces expected output.

No, that would be very inefficient. At each token generation step, the LLM provides a likelihood for all the defined token based on the past context. The structured output is defined by a grammar, which defines the legal tokens for the next step. You can then take the intersection of both (ignore any token not allowed by the grammar), and then select among the authorized token based on the LLM likelihood for them in the usual way. So it's a direct constraint, and it's efficient.


Yeah that sounds way better. I saw one of the python libraries they recommended mention retries and I thought, this can't be that awful can it?


> Also the error rate is about 5-10% according to this article. That's pretty bad!

Having 90-95% success rate on something that was previously impossible is acceptable. Without LLMs the success rate would be 0% for the things I'm doing.


I think the problem here is that that is often still not that acceptable. Let's imagine a system with say, 100 million users making 25 queries a day, just to give us some contrived numbers to examine. At a 10% error rate that's 250 million mistakes a day, or 75 million if we're generous and say there's a 3% error rate. Then you have to think about your application, how easily you can detect issues, how much money you're willing to pay your ops staff (and how big you want to expand it), the cost of the mistakes themselves as well as the legal and retutational costs of having an unreliable system. Take those costs, add it to the cost to run this system (probably considerable), and you're coming up on a heuristic for figuring out if possible equates worth doing. 75 million times any dollar amount (plus 2.5 billion total queries you need to run the infrastructure for) is still a lot of capital. If each mistake costs you $0.20 (I made this number up), then maybe $5.5b a year is worth the cost? I'm not sure.

It's probable that Google is in the middle of doing this napkin math given all the embarrassing stuff we saw last week. So it's cool that we're closer to solving these really hard problems but whether they're acceptable is a more complicated question than just it used to not be possible. Maybe that math works out in your favor for your application.


> How is this industry getting hundreds of billions of dollars in investment?

FOMO? To me it's the Gold Rush, except that it's not clear if anyone wants that kind of gold at the end :-).


Google is so terrified that someone is threatening their market position, the one in which they have over $100b in cash and get something like $20b in profit quarterly, that they're willing to shove this technology into some of the most important infrastructure on the internet so they can get fucksmith to tell everyone to put glue in their pizza sauce. I'll never understand how a company in maybe one of the most secure financial situations in all of human history has leadership that is this afraid.


Line must go up.


Via APIs, yes. But if you have direct access to the model you can use libraries like https://github.com/guidance-ai/guidance to manipulate the output structure directly.


This seems like it could do some cool code completion stuff with local models.


I've been building out an AI/LLM-based feature at work for a while now and, yeah, from my POV it's completely useless bullshit that only exists because our CTO is hyped by the technology and our investors need to see "AI" plastered somewhere on our marketing page, regardless of how useful it is in real use. Likewise with any of the other LLM products I've seen out in the wild as well, it's all just a hypewave being pushed by corps and clueless C-suites who hear other C-suites fawning over the tech.


It's so painful. We have funders come to us saying they love what we do, they want us to do more of it, they have $X million to invest, but only if we use "AI." Investers have their new favorite hammer and, by gosh, you better use it, even if you're trying to weld a pipe.


Very much agree.

Also, what's it for? None of these articles point to anything worthwhile that it's useful for.


I've seen a lot of salivation over replacing customer service reps lol.


Humans probably have about the same error rate. It's easy to miss a comma or quote.

These systems compete with humans, not with formatters.


A system of checks and balances overseen by several humans can have orders of magnitude lower error rates, though.


A system of checks and balances also costs orders of magnitude more money.


this is my fear regarding AI - it doesn't have to be as good as humans, it just has to be cheaper and it will get implemented in business processes. overall quality of service will degrade while profit margins increase.


You probably also need that for the AI as well though


The point was that for many tasks, AI has similar failure rates compared to humans while being significantly cheaper. The ability for human error rates to be reduced by spending even more money just isn't all that relevant.

Even if you had to implement checks and balances for AI systems, you'd still come away having spent way less money.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: