Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On a related note: I am deeply disappointed by how OpenAI has managed to continually dumb down ChatGPT. I’m sure the shortage of NVIDIA H100s and the demand for their APIs shooting through the roof has something to do with it. Perhaps they’ve switched to a quantized model? Who knows.

And yes, LLaMA 2 is awesome. Progress is being made. But it’s slowed down. The novelty’s worn off. Now we’re lacking some fundamental plumbing to reach a useful AGI.

If anyone cares, Bard is proving to be way more useful these days. Especially for research work. And Claude is bar none for document analysis.



Hey, who knows, it could just be a scaling issue. (The history of NNs would lend credence to this theory). In which case, if this room temperature superconductor stuff is legit, then maybe we're about to experience a major compute power leap that could result in AGI sooner than we think.


What makes you think we're lacking fundamental plumbing? I am personally still quite convinced it will simply take a bit of time and development to plug in the current AI into sufficient self-testing structures (basically the next versions of AutoGPT) and let it run long enough to map out an entire domain with accurate measurements. Seems more like just a matter of time and a bit of funding - but mostly inevitable and very straightforward. It doesn't even really take GPT4 quality levels to do - that's just a bonus.

https://twitter.com/fablesimulation/status/16813529041528504... These guys seem to be on the leading edge there - making self-referential character-driven narratives where agents talk to each other and build the collective world understanding, all through a South Park Westworld lol


I completely agree with you. I failed to articulate it well.

Like the current top HN post suggests (https://eugeneyan.com/writing/llm-patterns/), we’re still discovering patterns that work well with LLMs.

That said, anecdotally - they already excel at being logic engines. Capable of filling in the gaps between instructions. Using their worldly knowledge or “common sense” to do so.

But ever so often, they’ll miss an important bit. And I have to be quite involved to catch that. Kinda defeats the purpose. Here, I think we can benefit from supervisor LLMs. A second layer, whose sole job is to ensure the output quality. A QA bot - essentially.


Yeah with their own QA from a variety of different personas/perspectives/concerns/contexts I reckon you'll get a very decent accuracy - or at least self-assessment of inaccuracy. All these can ever do is propagate the data they know so far to the context/prompt you desire, but I don't see obvious limits there. And GPT4 is already a superb conversation partner as smart as nearly any person - so it's really like piecing experts together. If we run into any fundamental limitations from piecing these all together, it's gonna be the same limit that any group of humans trying to make a coherent/consistent organization of knowledge encounters, I think.

Coincidentally, that appears to be how GPT4 was made - apparently it's actually about 8 personas with designated roles running GPT3.5 trained together ("Panel of experts"? there's an AI name for the technique). Makes you wonder how far that one trick scales.

(P.S. great link. Gah - another long read on the todo list)


It's not dumber. You can verify this on their models with any prompt you ran at 0.0 when it first came out, I did. It's the exact same model run the same exact way. They've repeatedly confirmed this.


> It's not dumber

So I absolutely agree with this. And yet this meme persists. I wonder what's creating the feeling that it's "dumber" in so many? Perhaps they're just noticing the limits that always existed previously? I'm not sure, and am interested in others thoughts on it.


ChatGPT as in the paid subscription hosted by OpenAI. Its quality has deteriorated. It will miss the simplest details in the prompt and hallucinate a lot. I’ve only noticed this before on models with lesser parameters.

In comparison, Bard and Claude are getting better with time.


They've added a lot of safety features. Knowing very little about LLMs, I would assume these prepended prompts are using up a chunk of the limited "attention" the transformer has


Ooh I have some nice book chapter summaries that ChatGPT generated for me when it first came out. I got them in a Google doc.

If you ask GPT to give you the same thing now, it wont , no matter how hard you try.

That's like hard evidence for me that they've dumbed it down.


They confirmed the API still ran the same base model. But there was no mention of ChatGPT the service. I was referring to the latter above.


The ChatGPT webapp is not the same workflow as the ChatGPT API.


It is and it's documented:

https://arxiv.org/abs/2307.09009


A) its wrong and caused a lot of hand wringing about Arxiv and undergrads B) Not my claim, those are two different models




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: