Hacker News new | past | comments | ask | show | jobs | submit login

Are there currently any plans to create a RWKV 30B or 65B? That seems to be the size at which the LLaMA transformer models become genuinely competitive with GPT3.5 for many tasks.



TLDR: please donate A100s to make this happen

Most of the focus is in the 1-14B range. Due to constraints of the dataset sizes (chinchilla law), and GPUs available

Community demand is also mostly in this range as there is a strong desire to optimise and run on local GPU. So more focus is in this range.

Not representing blink directly here - but if anyone wants to see a 30B / 65B model. Reach out to contribute the GPUs required to make it happen

The code is already there, just need someone to run it,

Ps: I too am personally interested in how it will perform at ~60B, which I believe will be to be optimal model size for higher level of thoughts (this number is based on intuition not research)


https://twitter.com/boborado/status/1659608452849897472

You might find that thread interesting, they're taking submissions for potential partnership with LambdaLabs a cloud compute company that has a few hundred H100s laying around. They have an open form and their cofounder is currently doing the rounds having meetings and this may be a good candidate.

I'm not associated with them at all, just interested in the space and things going on.


wierdly their form requires a company rep (which RWKV does not have, as its not a company) - lets see how it goes ...


Are there any estimates anywhere of how many A100s would be needed to e.g. train a 30B model in 6 months?


That’s a loaded question without deciding dataset size


Would it be possible to just use the exact same dataset as LLaMA? (There's an open source project currently training a transformer on exactly that).


You mean red pajama? I believe that has already started for 1-14B (need to double check)


Yep that's the one. Curious roughly how many A100s it'd take to train a 65B RWKV on that.


Really bad napkin math as no one has attempted 65B (so +\- 50%)

8 x 8 x 8 A100, should be able to do a 100k++ tokens/s at that size

With a dataset of 1.2 trillion tokens. That’s 12 million seconds. Or 140 days

(PS: this is why everyone is training <60B, its crazy the cost, even if my math estimate is wrong by 300%, its still a crazy number)


Thank you! 888 is 512 A100s, that is indeed pretty expensive.


can you elaborate on the chinchilla law / dataset problem a bit? (perhaps by editing your previous comment?)

what datasets are available to the community, how big are these, are they needed to be updated from time to time, where are these stored, what are the usual cost ranges involved, ...? :o

thank you!


Chinchilla law is a rule of thumb that you should have 11++ x training tokens for every param

If not, you are getting diminishing benefits for each param you add

I’m extreme cases your model can even perform worse with more param due to lack of training data

More complicated: the quality of the data matters as well

So there are 2 major directions. Build efficient models with good dataset and optimal param count for the task

Or go big on everything (aka openAI) which requires monster GPU time for every reply token

There are obviously in between as well. Hence why the question is so loaded

Ballpark: if your not setting aside a 100k for GPUs alone, to train a 60B model from scratch, your probably not ready to train one


30B would be interesting because that's the practical ceiling for local GPUs assuming 4-bit quantization.

Is there some kind of dedicated fund for training hardware? Donating an A100 sounds unlikely, but surely they could be crowdfunded?


weirdly enough, organisations are more willing to rent GPUs than money.

If you want to help fund RWKV, the ko-fi link is - https://ko-fi.com/rwkv_lm

IMO: this needs way more funding, just to sustain blink leading this project, let alone GPUs for training.

(Also - current tests shows this model doing really badly with 4bit quantized, but alright at Q5 and Q8)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: