It's true, but do you really trust the AI generated + Nurse Review output more than Organic Nurse generated?
In my experience, management types use the fact that AI generated + Nurse Review is faster to push a higher quota of forms generated per hour.
Eventually, from fatigue or boredom, the human in the loop just ends up being a rubber stamper. Would you trust this with your own or your children's life?
The human in the loop becomes a lot less useful when it's pressured to process a certain quota against an AI that's basically stochastic "most probable next token", aka professional bullshitter, literally trained to generate plasuible outputs with no responsibility to accurate outputs.
It works because we are in a health care crisis and the nurse doesn't have anything close to enough time to do a good job.
It is really one of the few great examples that LLMs are good for in an economic sense.
In a different industry, such inefficiency would have been put out of business.
It is a unique economic condition that makes LLMs valuable. It makes complete sense.
To the wider economy though, it is hard to ignore the unreasonable uselessness of LLMs. The unreasonable uselessness points to some kind of fundamental problems with the models that are unlikely to be solved by scaling.
We need HAL to solve our problems but instead we have probabilistic language models that somehow have to grow into HAL.
These same questions could be asked about self driving cars, but they've been shown to be consistently safer drivers than humans. If this guy is getting consistently better results from ai+human than it is from just humans, what would it matter if the former results in errors given the latter results in more and costs more?
If the cars weren't considerably safer drivers than humans they wouldn't be allowed on the road. There isn't as much regulation blocking deploying this healthcare solution... until those errors actually start costing hospitals money from malpractice lawsuits (or not), we don't know whether it will be allowed to remain in use.
You can't compare an LLM output with a self driven car. That's the flaw of using the term AI for everything, it brings two completely different technologies to an artificial level ground.
TFA's while point is that there is no easy way to tell if LLM output is correct or not. Driving mistakes provide instant feedback if the output of whatever AI is driving is correct or not. Bad comparison.
Many of the things that LLMs will output can be validated in a feedback loop, e.g., programming. It's easy to validate the generated code with a compiler, unit tests, etc. LLMs will excel in processes that can provide a validating feedback loop.
I love how everyone thinks software is easy to validate now. Like seriously, do you have any awareness at all about how much is invested in testing software by the likes of Microsoft, the game studios, and any other serious producers of software? It's a lot, and they still release buggy code.
I trust it alot, in our tests the times a human nurse picked up on something the AI missed are pretty rare. The times the AI found something the nurse missed are common, almost the majority.
That might not be relevant to OPs use case. A lot of nurses get tied up doing things like reviewing claims denials. There’s good use cases on the administrative side of healthcare that currently require nurse involvement.
In my experience, management types use the fact that AI generated + Nurse Review is faster to push a higher quota of forms generated per hour.
Eventually, from fatigue or boredom, the human in the loop just ends up being a rubber stamper. Would you trust this with your own or your children's life?
The human in the loop becomes a lot less useful when it's pressured to process a certain quota against an AI that's basically stochastic "most probable next token", aka professional bullshitter, literally trained to generate plasuible outputs with no responsibility to accurate outputs.