All candidates did both screens (quiz and fizzbuzz). The correlations were calculated against the same population. Now, I agree that survivor bias could affect the quality of these results (we know nothing about the significant % of people who dropped out). But it's not really possible to solve that problem outside of a lab. I don't think it's an argument to not do analysis. For now we're simply trying to minimize the dropoff rate, and maximize correlation. The quiz was better at both.
Well, the candidates doing both screens is better, but it doesn't totally solve your problems.
It doesn't address the survival bias issue, and when you say a significant percentage dropped out, that's not reassuring. But it's not the case that you need a lab to solve the problem. Even a basic questionnaire of programming ability self-assessment might tell you if there are meaningful differences in the population that quits your process. At the very least, you should understand and talk about survival bias in your article to indicate you're aware of the issue.
Even if you still want to claim a difference between the quiz and coding exercise, you're not yet in the clear. For example, did you counterbalance the order you gave them to people? E.g., if everybody did the quiz first and the fizzbuzz second, that meant they were mentally fresher for the quiz and slightly more tired for the fizzbuzz, which could again create a spurious result. And this definitely doesn't require a lab to test.
Don't misunderstand me, I appreciate your attempts to quantify all this, and I actually think you guys have roughly the correct result (given the limited nature of fizzbuzz-style coding), but when you step into the experimental psych arena, you need to learn how to properly analyze your data. Given that your business is predicated on analyzing the results of how your hires do in the real world, you need to really up your analytical game.
I have to agree with kingmob. It very much sounds like survivor bias. My first reaction is that anyone who drops out due to a test has a high likelihood of dropping out because they can't do the test, which would leave you with a low correlation test when compared with the survivors, but a very high anti correlation with the total population.
I read a blog post a couple years ago by a game programmer/designer who outsources a lot of work through places like odesk/elance. Basically his thing was to weed out the fakers, he'd offer anyone ~5hrs at their bidding rate to finish a predefined programming task expected to take ~5hrs. He says this will usually drop his pool to less than 10 out of the hundreds who may apply, and he can usually use at least one of the people who complete the task. It's hard to say how many of these people go away because the task looks too big, and there's risk of not getting paid, but it's clearly a good filter for him.
As far as measuring this survivor bias, you might gain some insight by randomly altering the order of the testing. You could measure when people tended to drop off. You might even find that people all tend to drop off around the same amount of time, or maybe after some certain amount of effort. It might even be worth paying people to see if that would improve completion rates (while introducing it's own biases).