For the 65B fine tune, did you add another A100 node? Or just drop batch size? A...

For the 65B fine tune, did you add another A100 node? Or just drop batch size?

Any chance you’re up to sharing the training parameters?

sahil_chaudhary on March 25, 2023 [–]

Dropping the batch size