It's not solved, I had tried it across the latest versions of several different AIs before I posted, because I anticipated a reply like yours.
> Just because they struggle with that makes all the other things wrong?
No, it makes their grandiose claims very tenuous. But for me, yes I'm very underwhelmed by what AI is capable of. I think it's a useful tool, like a search box, but that's it at this point. That they are branding it as a Ph.D. researcher is just blowing smoke.
> I tried it too and I know this test for a while.
Not only did Claude respond correctly, I also wrote it in German.
You're missing the point though. If it worked for you when you tried it, that's great, but we know these tools are stochastic, so that's not enough to call it "solved". That I tried it and it didn't work tells me it's not. And that's actually worse than it not working at all, because it leads to the false confidence you are expressing.
The strawberry example highlights like all abstractions, AI is a leaky one. It's not the oracle they're selling it as, it's just another interface you have to know the internal details of to properly use it beyond the basics. Which means that ultimately the only people who are really going to be able to wield AI are those who understand what I mean when I say "leaky abstraction". And those people can already program anyway, so what are we even solving by going full natural language?
> I created single page html Javascript pages for small prototypes withhin 10 minutes with very little repromting.
You can achieve similar results with better language design and tooling. Really this is more of an indictment of Javascript as a language. Why couldn't you write the small prototype in 10 minutes in Javascript?
> I can literally generate an image of whatever I want with just text or voice.
Don't get me wrong, I enjoy doing this all the time. But AI images are kind of a different breed because art doesn't ever have to be "right". But still, just like generative text, AI images are impressive only to a point where details start to matter. Like hands. Or character consistency. Multiple characters in a scene. This is still a problem in latest models of AI not doing what you tell it to do.
> I can have a discussion with my smartphone with voice.
Yes, talking to your phone is a neat trick, but is it a discussion? Interacting with AI often feels more like an improv session, where it just makes things up and rolls with vibes. Every time (and yes I mean every) I ask it about topics I'm an expert about, it gets things wrong. Often subtly, but importantly still wrong. It feels like less of a discussion and more like I'm being slowly gaslight. This is not impressive, it's frustrating.
I think you get my point. Yes, AI can be impressive at first, but when you start using it more and more you start to see all the cracks. And it would be fine if those cracks were being patched, or even if they just acknowledged them and the limitations of LLMs. But they won't do that; instead what they are doing is pretending they don't exist, pretending these LLMs are actually good, and pushing AI into literally everything and anything. They've been promising exponential improvements for years now, and still the same problems from day 1 persist.
And we live now in such a fast pace that it feels like Ai should be perfect already and it's of course not.
I also see that an expert can leverage Ai a lot better because you still need to know enough to make good things without.
But Ai progresses very fast and still has achieved things were we have not had any answer than before.
What it can already do is still very aligned of what I assume/expect it from the current hype.
Llama made our ocr 20% better just by using it. I prompted plenty of code snippets and stuff which saved me time and was fun using it. Including python scripts for a small ml pipeline and I normally don't write python.
It's the first tech demo ever which people around me just 'got' after I showed it to them.
It's the first chatbot I have seen which doesn't flake out after my second question.
Chatgpt pushed billions into new computer. Blackwell is the first chip to hit the lower estimation for brain compute performance.
It changed the research field of computer linguistics.
I believe it's fundamental to keep a very close eye on it and trying things out regular otherwise it will over roll us suddenly.
I really enjoy using Claude.
Edit: and eli5 on research paper. That's so so good
You're reinforcing the disparity that I'm pointing out in my original reply. My expectations for AI are calibrated by the people selling it. You say "AI progress is very fast" but again, I'm not seeing it. I'm seeing the same things I saw years ago when ChatGPT first came on the scene. If you go back to the hype of those days, they were saying "Things will change so rapidly now that AI is here, we will start seeing exponential gains in what we can accomplish."
They would point to the increasing model sizes and the ability to solve various benchmarks as proof of that exponential rise, but things have really tapered off since then in terms of how rapidly things are changing. I was promised a Ph.D. level researcher. A 20% better result on your OCR is not that. That's not to say it's a good thing and an improvement, but it's not what they are selling.
> Blackwell is the first chip to hit the lower estimation for brain compute performance.
What does that even mean? That's just more hype and marketing.
Nope not marketing. I researched this topic myself. I'm not aware that Nvidia mentioned this.
But hey it seems I can't convince you of my enthusiasm regarding ai. It's fine I still will play around with it often and looking forward to it's progress.
Regarding your researcher: NotebookLM is great and you might need to invest a few hundred bucks to really try out more.
Your enthusiasm for AI is just more hype noise as far as I'm concerned. You say you've "done research" but then don't link it for anyone to look into, adding to the hype. Brain compute performance is not a well-understood topic so saying some product approaches it is pure marketing hype. It's akin to saying the brain is a neural net.
Nonetheless it's different things.
Just because they struggle with that makes all the other things wrong? Underwhelming?
I don't think so.