Hacker News new | past | comments | ask | show | jobs | submit login

> The generation is a showy gimmick.

It really isn't? You can tell it to output in a JSON structure (or some other format) of your choice and it will, with high reliability. You control the output.

Honestly I wonder if the people who criticize LLM's have made a serious attempt to use them for anything




I made a serious attempt to do precisely that, and yes, it output a valid JSON structure highly reliably. The problem was stopping it from just inventing values for parameters that weren't actually specified by the user.

Consider the possibility that at least some of the criticisms of LLMs are a result of serious attempts to use them.


llama.cpp has a way to constrain responses to a grammar, which is 100% reliable as it is implemented in the inference itself. You still need to tell the model to produce a certain format to get good results, though.


> You can tell it to output in a JSON structure (or some other format) of your choice and it will, with high reliability.

I mean, this is provably false. Have you tried to use LLMs to generate structured JSON output? Not only do all LLMs suck at reliably following a schema, you need to use all kinds of "forcing" to make sure the output is actually JSON anyway. By "forcing" I mean either (1) multi-shot prompting: "no, not like that," if the output isn't valid-ish JSON; or (2) literally stripping out—or rejecting—illegal tokens (which is what llama.cpp does[1][2]). And even with all of that, you still won't really have a production-ready pipeline in the general case.

[1] https://github.com/ggerganov/llama.cpp/issues/1300

[2] this is cutely called "constraining" a decoder; what it actually is is correcting a very clear stochastic deficiency in LLMs


Beyond this, an LLM can easily become confused even if outputting JSON with a valid schema. For instance, we've had mixed results trying to get an LLM to report structured discrepancies between two multi-paragraph pieces of text, each of which might be using flowery language that "reminds" the LLM of marketing language in its training set. The LLM often gets as confused as a human would, if the human were quickly skimming the text and forgetting which text they're thinking about - or whether they're inventing details from memory that are in line with the tone of the language they're reading. These are very reasonable mistakes to make, and there are ways to mitigate the difficulties with multiple passes, but I wouldn't describe the outputs as highly reliable!


I would have agreed with you six months ago, but the latest models - Claude 3, GPT-4o, maybe Llama 3 as well - are much more proficient at outputting JSON correctly.


Seems logical that they will always implement specialized pathways for the most critical and demanding user base. At some point they might even do it all by hand and we wouldn’t know /s


This was my experience as well. The only reliable method I found was to use the LLM to generate the field values then put it into a json myself.


Yes, I'm using them quite extensively with my day to day work for extracting numerical data from unstructured documents. I've been manually verifying the JSON structure and numerical outputs and it's highly accurate for the corpus I'm processing.

FWIW I'm using GPT4o not Llama, I've tried Llama for local tasks and found it pretty lacking in comparison to GPT.


Your comment has an unnecessary and overly negative tone to it that doesn't do this tech justice. These approaches are totally valid and can get you great results. An LLM is just a component in a pipeline. I deployed many of these in production without a hiccup.

Guidance (the industry term for "constraining" the model output) is only there to ensure the output follows a particular grammar. If you need JSON to fit a particular schema or format, then you can always validate it. In case of validation failure you can always pass the JSON and the validation result back to the LLM for it to correct it.


> Have you tried to use LLMs to generate structured JSON output? Not only do all LLMs suck at relaibly following a schema, you need to use all kinds of "forcing" to make sure the output is actually JSON anyway.

Yeah it's worked about fifty thousand times for me without issues in the past few months for several NLP production pipelines.


I'm generating hundreds of JSONs a day with OpenAI and it has no problem following a schema defined in TypeScript.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: