The hyperband (HB) paper you refer to was discussed (briefly) by Frank Hutter in this talk - [1]. He observes that the interpretation is not correct, and had the experiment run longer, you would've seen different behavior. In fact their group combined Bayesian Optimization (BO) with HB; producing some interesting results. Some initial results were reported in a NIPS'17 workshop - [2], and a detailed version was part of the recently concluded ICML'18 - [3].
[1] https://youtu.be/OR-IKyP4ZpI?t=3m30s
[2] See accepted papers here - https://bayesopt.github.io/accepted.html
[3] https://icml.cc/Conferences/2018/Schedule?showEvent=2387