I've seen similar systems, such as "Which one of these four images is a puppy?". I think the problem is that the set has to be small, so it ends up being a multiple choice quiz. With one correct answer out of four or five choices, it is very easy to brute force.
Even my pre-school self could solve the Sesame Street "one of these things is not like the other".
There are so many sets with an odd-one-out that would only be easily determinable by a human over a computer.