> How is that qualitatively different from training it on all your data It's...

> How is that qualitatively different from training it on all your data

It's different for the simple reason that to really verify a hypothesis, you don't just test it on the other half of the same data. That's just step 1.

What you need to do is work out the further implications of your hypothesis, and test those. When doing so, one of three things will happen:

1) You remain an astronaut. In other words, your work is so far out that none of its implications falsify or corroborate any existing theories, and all require their own experiments. This can be good (i.e. ground-breaking work), but for your hypothesis to become accepted as anything other than an interesting diversion, you will need to wait for the rest of science to catch up, or to keep building out implications until you can connect what you're doing to existing theories.

2) The implications of your hypothesis directly corroborate an existing theory. This is useful, but a bit boring. It means you've basically discovered yet another implication of an existing theory. Experiment some more, and then publish.

3) The implications of your hypothesis contradict existing hypotheses or theories. This is exciting! This is where the real science lies. Now you get to design a series of experiments to figure out who is right, who is wrong, and why. You debug your hypothesis and/or the existing science 'til it works, possibly overturning established wisdom along the way.

4) A combination of (2) & (3) occur. Woah! This is really exciting. You've discovered that the implications of one existing theory falsify those of another existing theory, uncovering a fundamental flaw in our understanding so far. Well done! You've got a lifetime of debugging ahead of you, but your contribution is likely to be truly important.

> there is no getting rid of models.

True; and models are very useful. But there's a big difference between models used as an explanatory / predictive tool (i.e. Newtonian physics), and models used as a substitute for Nature when doing experiments (i.e. what's done in Computational Neuroscience, my field). The latter attempts to duck the issue of doing real experiments that are complicated / expensive by running the experiment on a computer.

Now, this is certainly useful - but to be science, it must be accompanied with a solemn understanding that "What you learn here don't mean jack." It's just a model, and probably a bad one. God doesn't use 64-bit floating points. The only possible use is to help you design a better experiment to try in the real world. But if you never get to doing that, then you've wasted everybody's time, because you got stuck on step 0 (Hypothesize).