Linear attention will cause it to forget details and make it less rich than alternatives using proper quadratic attention. It only competes with the best transformers in benchmarks when the number of parameters is about the same anyway whereas one would expect a drastic decrease in parameters when using linear attention.
That could still be a good trade off. It's fine for the feedforward blocks to be a bit slower (due to higher model dimension) if the 'attention' blocks are much faster (due to better complexity).
I think RWKV team should make a statement by training a really large model. Their 14B finetunes are fine but far from impressive. If they believe that this is the future, they should have at it. They probably have funding to do so (I think?).