when you train a computational model on half of your dataset and test it against...

roundsquare · on Aug 24, 2009

>when you train a computational model on half of your dataset and test it against the other half (which the author claims is Not Sufficiently Scientific), how is that qualitatively different from training it on all your data, and then "testing it on Nature"

it would be essentially the same thing if you took half the data, made a model, tested it on the other half, got good results, and stopped.

but thats not going to happen. your fist model will stink. you'll refine it and try again. and you'll keep doing that. unfotunatly, this mean your model ends up being dependent on all the data.

the easiest way to prevent this is to build a model and the test it on real world data that occurs after you build the model. if it doesn't work, tweak the model, and then find new data again.

hc · on Aug 24, 2009

> your fist model will stink. you'll refine it and try again. and you'll keep doing that. unfotunatly, this mean your model ends up being dependent on all the data.

this is exactly what the scientific community as a whole is always doing.

jacquesm · on Aug 24, 2009

A learning algorithm creates a specialist filter, usually in classification problems, that can discriminate between the various classes.

That doesn't mean that because you didn't manually tune each and every weight in the classifier that its behaviour is not solidly governed by scientific principles.

Science and the application of the scientific method are what got us there in the first place, and has given us the keys to understanding these systems. Genetic Algorithms are a another area where you could easily be tempted to think that you are not doing science, the same thing happens there.

Science is the key to understanding. The scientific method is the ultimate arbiter between what we know to be true, and what we can not prove to be either false, fantasy or outright lie.

I feel that you think that somehow these computational models somehow supplant the scientific method, but you are forgetting one crucial bit here, those models themselves will need to be understood, and only the scientific method will allow you to do so.

caffeine · on Aug 24, 2009

> How is that qualitatively different from training it on all your data

It's different for the simple reason that to really verify a hypothesis, you don't just test it on the other half of the same data. That's just step 1.

What you need to do is work out the further implications of your hypothesis, and test those. When doing so, one of three things will happen:

1) You remain an astronaut. In other words, your work is so far out that none of its implications falsify or corroborate any existing theories, and all require their own experiments. This can be good (i.e. ground-breaking work), but for your hypothesis to become accepted as anything other than an interesting diversion, you will need to wait for the rest of science to catch up, or to keep building out implications until you can connect what you're doing to existing theories.

2) The implications of your hypothesis directly corroborate an existing theory. This is useful, but a bit boring. It means you've basically discovered yet another implication of an existing theory. Experiment some more, and then publish.

3) The implications of your hypothesis contradict existing hypotheses or theories. This is exciting! This is where the real science lies. Now you get to design a series of experiments to figure out who is right, who is wrong, and why. You debug your hypothesis and/or the existing science 'til it works, possibly overturning established wisdom along the way.

4) A combination of (2) & (3) occur. Woah! This is really exciting. You've discovered that the implications of one existing theory falsify those of another existing theory, uncovering a fundamental flaw in our understanding so far. Well done! You've got a lifetime of debugging ahead of you, but your contribution is likely to be truly important.

> there is no getting rid of models.

True; and models are very useful. But there's a big difference between models used as an explanatory / predictive tool (i.e. Newtonian physics), and models used as a substitute for Nature when doing experiments (i.e. what's done in Computational Neuroscience, my field). The latter attempts to duck the issue of doing real experiments that are complicated / expensive by running the experiment on a computer.

Now, this is certainly useful - but to be science, it must be accompanied with a solemn understanding that "What you learn here don't mean jack." It's just a model, and probably a bad one. God doesn't use 64-bit floating points. The only possible use is to help you design a better experiment to try in the real world. But if you never get to doing that, then you've wasted everybody's time, because you got stuck on step 0 (Hypothesize).