> *The chinchilla law states that the amount of data required to train a languag...

> The chinchilla law states that the amount of data required to train a language model grows exponentially with the model size. This means that it is very expensive to train large language models, even with the latest hardware. The RWKV community is working on developing new methods for training large language models more efficiently. There are a number of datasets available to the RWKV community, including:

What? I though the chinchilla-optimal regime was something like "20 tokens per weight". That's not remotely exponential, even by the word's colloquial use.