Inference presumably will run faster on a 5090. If the 5x memory bandwidth figur...

Inference presumably will run faster on a 5090. If the 5x memory bandwidth figure holds, then token generation would run 5 times faster. That said, people in the digits discussion predict that the memory bandwidth will be closer to 546GB/sec, which is closer to 1/3 the memory bandwidth of the 5090, so a bunch of 5090 cards would only run 3 times faster at token generation.