What is your personal benchmark for AGI? Turing test was surpassed years ago, ARC-AGI was a next step some quite clever people in the space came up with as a successor, and has now surpassed as well.
I keep seeing this thrown around, but did anyone actually like go out and do this? I feel like I could distinguish between an AI (even the latest models) and a person after a text-only back and forth conversation.
The models you've interacted with have guardrails. If you fine-tuned an LLM with the goal of "convince people you are human" (effectively the opposite of what the major players are doing with their fine-tunes), I am very confident even you would be fooled.
Self-learning. Human level intelligence, which isn’t a set of facts but rather the ability to self-learn and correct which LLMs are completely unable to do right now.
It is fairly clear to me that self-learning is already possible. Give an agentic LLM the ability to add things to its own context window, or fine tune itself, and it will.
So what is your benchmark?