> If I can see that their approach is 20% better than SOTA, but they require 1M LoC plus 3 weeks of total computation time on a 100 machine cluster with 8 V100 per node, I can safely say - sod it!
8 V100's cost about $20/h, 100 machines for 2 weeks (allowing for a long training time) will cost $638K. This is the salary of three to five engineers for a year. If your model reduces more than that amount of time it is worth it. It's just a matter of how much use you can get out of it. Of course a model can be reused by different teams and companies, so it could easily be worth the price.
I expect the number I calculated to be exaggerated for this task, though, you don't need that much compute for this model. GPT-3 cost $1.2M per run and it is the largest model in existence.
8 V100's cost about $20/h, 100 machines for 2 weeks (allowing for a long training time) will cost $638K. This is the salary of three to five engineers for a year. If your model reduces more than that amount of time it is worth it. It's just a matter of how much use you can get out of it. Of course a model can be reused by different teams and companies, so it could easily be worth the price.
I expect the number I calculated to be exaggerated for this task, though, you don't need that much compute for this model. GPT-3 cost $1.2M per run and it is the largest model in existence.