Okay, if client side resources expand then larger parameter LLMs will be used. T...

Closi · on March 27, 2023

I'm not convinced - at least on current hardware this seems well positioned for the cloud.

My tiny Alexa puck isn't going to run a 180bn parameter LLM that runs best on 10 graphics cards any time soon, however it can already call a simple API and get a response in only 50ms more. I suspect people will prefer the cloud overhead of 50ms for a better response for a bigger model for a lot of queries.

But who knows at this stage! I guess it could go either way depending on how both hardware and these models advance.

I just assume that in the close future, most people are going to be interacting with LLM's on low-cost devices with limited/varied compute.