> You can say ‘the recent jumps are relatively small’ or you can notice that (1) there is an upper bound at 100 rapidly approaching for this set of benchmarks, and (2) the releases are coming quickly one after another and the slope of the line is accelerating despite being close to the maximum.
The graph does not look like it is accelerating. I actually struggle to imagine what about it convinced the author the progress is accelerating.
I would be very interested in a more detailed graph that shows individual benchmarks because it should be possible to see some benchmarks effectively be beaten and get a good idea of where all of the other benchmarks are on that trend. The 100 % upper bound is likely very hard to approach, but I don't know if the limit is like 99%, 95% or 90% for most benchmarks.
I heard a theory today that hitting 100% on the MMLU benchmark may be impossible due to errors in that benchmark itself - if there are errors in the benchmark no model should ever be able to score 100% on it.
The same problem could well be present in other benchmarks as well.
The graph does not look like it is accelerating. I actually struggle to imagine what about it convinced the author the progress is accelerating.
I would be very interested in a more detailed graph that shows individual benchmarks because it should be possible to see some benchmarks effectively be beaten and get a good idea of where all of the other benchmarks are on that trend. The 100 % upper bound is likely very hard to approach, but I don't know if the limit is like 99%, 95% or 90% for most benchmarks.