I can see it being strongly correlated, as that human has more items to draw from for figuring out solutions, but I don't see how it can be used to test computer creativity. Python's random surely isn't creative.
That's the problem with metrics though. You can make anything sound plausible. Once a model has been presented to you, you can come up with an endless stream of just-so stories for why it measures what you were told it measures. Nonetheless, some metrics simply aren't that useful, and get adopted anyway because they produce a bunch of data. Psychometrics like this have an awful track record for standing up to more meaningful tests like "Can we drive any outcome we care about with this"