Yeah, let's pretend it works. So far structured output from an LLM is an exercise in programmers' ability to code defensively against responses that may or may not be valid JSON, may not conform to the schema, or may just be null. There's a new cottage industry of modules that automate dealing with this crap.
No? With structured outputs you get valid JSON 100% of the time. This is a non-problem now. (If you understand how it works, it really can't be otherwise.)
The guarantee promised in link 1 is not supported by the documentation in link 2. Structured Output does a _very good_ job, but still sometimes messes up. When you’re trying to parse hundreds of thousands of documents per day, you need a lot of 9s of reliability before you can earnestly say “100% guarantee” of accuracy.
Whether it's a non-problem or not very much depends on how much the LLM API providers actually bother to add enforcement server-side.
Anecdotally, I've seen Azure OpenAI services hallucinate tools just last week, when I provided an empty array of tools rather than not providing the tools key at all (silly me!). Up until that point I would have assumed that there are server-side safeguards against that, but now I have to consider spending time on adding client-side checks for all kinds of bugs in that area.
You are confusing API response payloads with Structured JSON that we expect to conform to the given Schema. It's carnage that requires defensive coding. Neither OpenAI nor Google are interested in fixing this, because some developers decide to retry until they get valid structured output which means they spend 3x-5x on the calls to the API.
Ansible, puppet, chef, salt, cfengine... There are tons of tools that are more precise and succinct for describing sensitive tasks, such as managing a single or a fleet of remote servers. Using MCP/LLMs for this is... :O
There are security / reliability concerns, true, but finally getting technically close to Star Trek computers and then still doing things 'the way they've always been done' doesn't seem efficient.
I don't know if you understand the role the LLM is playing here. The mechanism used to execute the command is not the relevant thing. The LLM autonomously executing commands has intelligence, it's not just a shell script. If I ask it to do a task, and it runs into an issue... LLMs like Claude can recognize the problem, and find a way to resolve it. Script failed because of a missing dependency, it'll install it. Need a config change, it'll do it. The SSH mcp is just the interface for the LLM to do the work.
You can give an LLM a github repo link, a fresh VPC, and say "deploy my app using nginx" and any other details you need... and it'll get it done.
> just how loud the expectations around AI have become, especially among non-technical folks.
This. It's bordering on mass madness. I am taking 2-4 calls a week from "two guys from ..." with mad ideas and unrealistic expectations of what it takes to build and maintain an AI product. I've seen it with early internet rush, Web 2.0, and crypto before.
I've long back realized that it is not just the big corps, but every employee of a company, after a while began to think everything outwards. Their focus, the work, they playbook always from them. There are only a very few that things from outside.
During my consultation, the team I was helping keep talking about "Our App", "Our Process", "Our Use", "How do we get this data into our System?" I had to ask them multiple times, "How does your users or customers outside of your company uses them?" "Have you thought of how people usually do these kind of steps?"
That’s a meme created and spread by pseudo-journalist Declan McCullagh specifically to tar Al Gore in the lead-up to the 2000 election.
Specifically, Gore said in an interview that he “took the initiative in creating the Internet” by introducing the bill to allow commercial traffic on ARPAnet, which McCullagh twisted in an article to “Al Gote claimed he invented the Internet” in order to to smear him.
Given how close that election turned out to be, this smear campaign likely changed the presidency, and given George WMD Bush's actions, changed the course of the world for the worse in many ways. (For those who were too young or not yet born at the time, these jokes were MASSIVE to the extent that became largely Al Gore was known for, for years after. So it's not much of an exaggeration to say they had a material impact on his perception and hence the votes.)
Al Gore understood technology, the internet, was a champion for the environment, and it's unbelievable today that he came that close to presidency (and then lost). When people say "we live in the bad timeline", one of the closest good timelines is probably one where this election went differently.
> Al Gore, a strong and knowledgeable proponent of the Internet, promoted legislation that resulted in President George H.W Bush signing the High Performance Computing and Communication Act of 1991. This Act allocated $600 million
> In the early 1990s the Internet was big news ... In the fall of 1990, there were just 313,000 computers on the Internet; by 1996, there were close to 10 million. The networking idea became politicized during the 1992 Clinton–Gore election campaign, where the rhetoric of the information highway captured the public imagination.
Your parent comment is either joining in on the ridicule side or at least in misquoting:
> Gore became the subject of controversy and ridicule when his statement, "I took the initiative in creating the Internet", was widely quoted out of context. It was often misquoted by comedians and figures in American popular media who framed this statement as a claim that Gore believed he had personally invented the Internet.[54] Gore's actual words were widely reaffirmed by notable Internet pioneers, such as Vint Cerf and Bob Kahn, who stated, "No one in public life has been more intellectually engaged in helping to create the climate for a thriving Internet than the Vice President."
AI is a tool. Just like with a drum machine or a DAW, you need to be a musician to be able to use it to create something worthwhile. And just like sampling, drum machines and DJing didn’t kill acoustic music, AI won’t either. It will merely create a new type of music that will coexist along with all of the other types of music just fine.
AI just raises the level of abstraction and therefore the capabilities of an individual.
Since you brought up sampling in music and I feel compelled to point this out in any AI thread when that gets mentioned:
sampling machines don't give you a free pass to sample music all willy nilly, if you're going to publish the result for commercial gain you have to clear the sample, the original artist is getting royalties from it. This is something that was fought for and won by musicians: https://en.wikipedia.org/wiki/Grand_Upright_Music,_Ltd._v._W....
(Not that you implied otherwise, I just want to point that out)
Thanks for pointing that out, I actually didn’t know that! It is very relevant, although it doesn’t affect my core message. Let’s see what the future holds.
Bell Labs had infinite money. Their owners made money every time someone picked up a phone. Not all businesses are that embedded in the society and those that have boards that might like the idea of funding their own labs have to answer to the higher power--the Wall Street crowd, who will force you to optimise for maximum profit in the shortest amount of time. You get there fastest by cutting costs, especially the costs of long-term research that may not bear fruit.
What annoys me is that every programmers who wish their favourite language / feature was as popular as Python and they choose to implement it in Python to make Python "better". Python was created as a dynamically typed language. If you want a language with type checking, there are plenty of others available.
Rust devs in particular are on a bend to replace all other languages by stealth, which is both obviously visible and annoying, because they ignore what they don't know about the ecosystem they choose to target. As cool as some of the tools written for Python in Rust are (ruff, uv) they are not a replacement for Python. They don't even solve some annoying problems that we have workarounds for. Sometimes they create new ones. Case in point is uv, which offers custom Docker images. Hello? A package manager is not supposed to determine the base Docker image or Python version for the project. It's a tool, not even an essential one since we have others, so know your place. As much as I appreciate some of the performance gains I do not appreciate the false narratives spread by some Rust devs about the end of Python/JavaScript/Golang based on the fact that Rust allowed them to introduce faster build tools into other programming languages' build chains. Rust community is quickly evolving into the friends you are embarrassed to have, a bit like any JVM-based language that suddenly has a bunch of Enterprise Java guys showing up to a Kotlin party and telling everyone "we can be like Python too...".
This argument doesn't make a whole lot of sense because nothing about type annotations constrains Python code at all. In fact because they're designed to be introspectable they make Python even more dynamic and you can do even crazier stuff than you could before. Type checkers are
working very hard to handle the weird code.
Pydantic being so fast because it's written in Rust is a good thing, you can do crazy dynamic (de-)serializations everywhere with very little performance penalty.
> nothing about type annotations constrains Python code at all
Sorry, but this is just not true. Don't get me wrong, I write typed Python 99% of the time (pyright in strict mode, to be precise), but you can't type check every possible construct in the language. By choosing to write typed Python, you're limiting how much of the language you can use. I don't think that's a bad thing, but it can be a problem for untyped codebases trying to adopt typing.
Yeah, let's pretend it works. So far structured output from an LLM is an exercise in programmers' ability to code defensively against responses that may or may not be valid JSON, may not conform to the schema, or may just be null. There's a new cottage industry of modules that automate dealing with this crap.
reply