This is one of those junior engineer moments where they technically did perceive a problem and solve it, but you wish they had just come and asked for some advice first.
Well the details in the article are sparse, but given what we are told, it seems highly likely that instead of using their ML model directly, they could use their ML model to fit a regression or a piecewise polynomial (eg a linear interpolation or spline) over the result. So the user input is not driving the ML model it is simply an input into a polynomial giving a calculation that is trivial for a modern computer.
Then they wouldn’t even need to cache anything and the result would be instantaneous with no real loss of accuracy.