>*If you are claiming that training a LLM literally only one time is enough and ...

baobabKoodaa · on Feb 21, 2023

I wasn't talking about running the same code with a new corpus. For that kind of use case one can simply fine tune the pretrained model. The example I gave was "if a CS student wants to dabble in this research".

You said "LLM training is not the same as writing a program and running your code with different variations". How do you think these LLMs were made, seriously? Do you think Facebook researchers sat down for 12 months and wrote code non stop without compiling it once, until the program was finished and was used to train the LLM literally only one time?

Dylan16807 · on Feb 22, 2023

I would expect them to use small sizes for almost all the testing.

baobabKoodaa · on Feb 22, 2023

Yes. There _is_ a need to train LLMs more than once, and training is prohibitively expensive, so you need workarounds such as training on a small subset of data, or a smaller version of the model. We're not yet at the point where a CS student on consumer hardware could afford to do this kind of research.

Dylan16807 · on Feb 22, 2023

> We're not yet at the point where a CS student on consumer hardware could afford to do this kind of research.

Okay. But I was saying someone with millions of dollars to spend could do it. And then another poster was arguing that millions of dollars was not enough to be viable because you need lots of repeated runs.

Nobody was saying a student could train one of these models from scratch. The cool potential is for a student to run one, maybe fine tune it.

baobabKoodaa · on Feb 23, 2023

Here is the upthread comment I was responding to:

> Why would you want to retrain it from scratch every day?

I was explaining why someone might want to retrain it more than once (although not literally every day).