You don’t have to agree on any one degree. You can set up a giant matrix with lengths on one axis, and the sortedness on the other.
Each cell can have a color indicating if TimSort beats, say, some other hybrid of MergeSort; green shades suggest TimSort is winning, red shades suggest the contender is winning.
For each cell, do multiple runs with those sortedness/length parameters and pick the average.