I have to agree with kingmob. It very much sounds like survivor bias. My first reaction is that anyone who drops out due to a test has a high likelihood of dropping out because they can't do the test, which would leave you with a low correlation test when compared with the survivors, but a very high anti correlation with the total population.
I read a blog post a couple years ago by a game programmer/designer who outsources a lot of work through places like odesk/elance. Basically his thing was to weed out the fakers, he'd offer anyone ~5hrs at their bidding rate to finish a predefined programming task expected to take ~5hrs. He says this will usually drop his pool to less than 10 out of the hundreds who may apply, and he can usually use at least one of the people who complete the task. It's hard to say how many of these people go away because the task looks too big, and there's risk of not getting paid, but it's clearly a good filter for him.
As far as measuring this survivor bias, you might gain some insight by randomly altering the order of the testing. You could measure when people tended to drop off. You might even find that people all tend to drop off around the same amount of time, or maybe after some certain amount of effort. It might even be worth paying people to see if that would improve completion rates (while introducing it's own biases).
I read a blog post a couple years ago by a game programmer/designer who outsources a lot of work through places like odesk/elance. Basically his thing was to weed out the fakers, he'd offer anyone ~5hrs at their bidding rate to finish a predefined programming task expected to take ~5hrs. He says this will usually drop his pool to less than 10 out of the hundreds who may apply, and he can usually use at least one of the people who complete the task. It's hard to say how many of these people go away because the task looks too big, and there's risk of not getting paid, but it's clearly a good filter for him.
As far as measuring this survivor bias, you might gain some insight by randomly altering the order of the testing. You could measure when people tended to drop off. You might even find that people all tend to drop off around the same amount of time, or maybe after some certain amount of effort. It might even be worth paying people to see if that would improve completion rates (while introducing it's own biases).