Cost per FLOP continues to drop on an exponential trend (and what bit flops do we mean?). Leaving aside more effective training methodologies and how that muddies everything by allowing superior to GPT4 perf using less training flops, it also means one of the thresholds soon will not make sense.
With the other threshold, it creates a disincentive for models like llama-405B+, in effect enshrining an even wider gap between open and closed.
Why? Llama is not generated by some guy in a shed.
And even if it were, if said guy has such amount of compute, then it's time to use some of it to describe the model's safety profile.
If it makes sense for Meta to release models, it would have made sense even with the requirement. (After all the whole point of the proposed regulation is to get some better sense of those closed models.)
There is a reason why we report time (speedup) in spec instead of $$
The price you pay depends on who you are and who is giving it to you.