I'm on paid (rich, I know) and the performance is all over the place. Sometimes ...

ravenstine · 2024-01-31T13:29:04 1706707744

This reflects my experience. Sometimes I'll provide a single sentence (to GPT-4 with the largest context window) and it will slowly type out 3 or so words every 5 seconds, and in other cases I'll give it a massive prompts and it returns data extremely fast. This is also true of smaller context window models. There seems to be no way to predict the performance.

Kim_Bruning · 2024-01-31T17:51:26 1706723486

Oh hey... leep an eye on your CPU load. The problem might be on the near end. In my case on a slower machine it slows down if you're dealing with a very long chat.

(DO report this as a bug if so)

gtirloni · 2024-02-01T09:30:38 1706779838

I think that's not the issue here but I do notice the browser going crazy after a while of chatting with ChatGPT. The tab seems to consume a baseline CPU while doing nothing. I just brush it off and close it... bad JavaScript maybe. I should look into this and report as a bug, thanks for the advice.

entontoent · 2024-01-31T15:27:00 1706714820

This is basically how I respond to requests myself. Sometimes a single short sentence will cause me to slowly spit out a few words. Other times I can respond instantly to paragraphs of technical information with high accuracy and detailed explanations. There seems to be no way to predict my performance.

wolpoli · 2024-01-31T16:17:55 1706717875

Early on, I noticed that if I ask ChatGPT an unique question that might not have been asked before, it'll split out a response slowly, but repeating the same question would result in a much quicker response.

Is it possible that you have a caching system too so that you are able to respond instantly with paragraphs of technical information to some types of requests that you have seen before?

gtirloni · 2024-02-01T01:16:40 1706750200

Yes, search for LLM caching and semantic searches. They must be using something like that.

clbrmbr · 2024-01-31T19:26:44 1706729204

I cannot tell if this comment was made in just or in earnest.

As far as I understand, the earlier GPT generations required a fixed amount of compute per token inferred.

But given the tremendous load on their systems, I wouldn’t be surprised if OpenAI is playing games with running a smaller model when they predict they can get away with it. (Is there evidence for this?)

darkerside · 2024-01-31T12:58:05 1706705885

I'm guessing there are so many other impacts of own on the model that size of print probably gets lost. I can see a future where people are forecasting updates to ChatGPT like we do with the weather.

gtirloni · 2024-01-31T13:03:01 1706706181

Yeah. It has so many moving parts that I doubt anyone can make a science out of it, but people will try for sure. Just like with most psycology/social experiments and SEO. I'm flooded with prompt engineering course spam these days.