Are there currently any plans to create a RWKV 30B or 65B? That seems to be the size at which the LLaMA transformer models become genuinely competitive with GPT3.5 for many tasks.
Most of the focus is in the 1-14B range. Due to constraints of the dataset sizes (chinchilla law), and GPUs available
Community demand is also mostly in this range as there is a strong desire to optimise and run on local GPU. So more focus is in this range.
Not representing blink directly here - but if anyone wants to see a 30B / 65B model. Reach out to contribute the GPUs required to make it happen
The code is already there, just need someone to run it,
Ps: I too am personally interested in how it will perform at ~60B, which I believe will be to be optimal model size for higher level of thoughts (this number is based on intuition not research)
You might find that thread interesting, they're taking submissions for potential partnership with LambdaLabs a cloud compute company that has a few hundred H100s laying around. They have an open form and their cofounder is currently doing the rounds having meetings and this may be a good candidate.
I'm not associated with them at all, just interested in the space and things going on.
can you elaborate on the chinchilla law / dataset problem a bit? (perhaps by editing your previous comment?)
what datasets are available to the community, how big are these, are they needed to be updated from time to time, where are these stored, what are the usual cost ranges involved, ...? :o