How is this different from any other academic research? What he is asking about is neither openness nor reproducibility (which are, indeed, very important). He is asking that researchers produce code that he can put into production. Not only they have negative incentives to do so (for one, providing such code will surely result in a stream of all kinds of support requests), it would actually work against the reproducibility objective.
The purpose of the code written is usually very simple: to produce results of the paper, not to provide a tool other people can use out of the box. Even when such a tool is nominally provided (for example, when a statistics paper is accompanied by an R package), there are good reasons to be very careful with it: for example, the paper may include assumptions on valid range of inputs, and using the package without actually reading the paper first would lead to absurd results -- which is something that has happened. The way to use academic research results is to (1) read and understand the paper, (2) reproduce the code -- ideally, from scratch, so that his results are (hopefully) unaffected by authors' bugs, (3) verify on a test problem, and (4) apply to his data. Using an out of the box routine skips steps 1-3, which are the whole point of reproducibility.
> He is asking that researchers produce code that he can put into production.
That's not how I read his comments at all.
What he seems to be asking for is the ability to take the code you used to produce the pretty graphs and tables in your paper and re-run it, maybe tweak it himself and use it on a slightly different dataset. He wants to be able to see that your results extend to more than just the toy synthetic dataset you made up, and also be able to verify that some bug in your verification code didn't make the results seem more successful. Finally, he wants to be able to compare apples-to-apples by knowing all the details of your procedure that you didn't bother putting into the paper.
> What he seems to be asking for is the ability to take the code you used to produce the pretty graphs and tables in your paper and re-run it,
You're assuming such code exists. If the graphs were produced by hand (eg, typing things into MATLAB to create the plot and then saving the figure), then there is no code to hand off. Now the code request has risen to "redo all that work".
And, as an academic myself, we should force people to save and publish their matlab scripts.
That is not to much to ask, but the academic system is full of perverse incentives. Doing good, robust work looses to good looking, quick and dirty all the time.
We need funding bodies to require publication of all these details, and we need to structurally combat publish or perish. Hire people based on their three best results, not on h-factor or other statistical meassures.
Some (many?) use MATLAB interactively to produce figures and then save them by hand. Often this involves a mixture of typing commands and clicking on GUI elements like "add colorbar". So a m-file that produces the figure doesn't exist; at best there would be fragments of code buried in MATLAB's command history.
> reproduce the code -- ideally, from scratch, so that his results are (hopefully) unaffected by authors' bugs
This rests on a common false assumption that programmers make: they think it's easier to write bug-free code when starting from scratch. The reality is that it's almost always easier to start with something that's nearly working and find and fix the bugs.
What really happens when you do a clean room reproduction is that you end up with two buggy programs that have non-overlapping sets of bugs, and you spend most of the effort trying to figure out why they don't match up. It's a dumb way to write software when it can be otherwise avoided.
I wonder though, maybe non-overlapping sets of bugs are actually better for science? That is, it could avoid systematic errors. Of course, one bug free implementation is clearly better!
True, but this is research, not business. Getting 2 programs that don't agree, even when your post-doc has cleaned up all the bugs, is the point of reproducing the research. Ideally, you want to know if you both ran it the same way so you know it's 'true'.
True, but how would one discover those bugs in the first place? In commercial software a client might run into them and report them. But for science papers? Algorithms that go into production might work the same way, but analysis papers such as the OP is talking about don't.
Worse, with the attitude that the OP has, do you really think they will take extra time to verify the entire code base or look for bugs?
Often the best way to find errors in these sorts of analyses is to do a clean room implementation then figure out why the two programs don't produce the same results.
But you're not writing software for production, you're writing software for understanding/advancing research. Every thing that doesn't match up is either a mistake on your end, or an undocumented (and possibly dubious) assumption on their end, and it's really valuable to find out either way.
Reimplementation matter hugely (in ML at least). But that doesn't mean having the original implementation available isn't a huge advantage, obviously it is.
I think there's exceptions to this: most modern attempts to verify software has involved design decisions that are only practical when you're starting from scratch. Similarly, rewriting in a language with more static safety guarantees may lead to fewer bugs (or at least different, less critical ones).
It can also be used to flesh out unconscious biases. I did this at work and this helped to find several issues that might have become bugs in the future.
> What he is asking about is neither openness nor reproducibility (which are, indeed, very important). He is asking that researchers produce code that he can put into production.
That is 100% not what he's asking. I don't know how you could even interpret that as what he's asking.
He wants to be able to take your research and run it over an updating dataset to verify that the conclusions of said research actually still apply to that data.
> "How is this different from any other academic research?"
Well it's not, although CS should be particularly amenable to reproducibility.
> "He is asking that researchers produce code that he can put into production"
No, he asked for code [full stop]. "Because CS researchers don't publish code or data. They publish LaTeX-templated Word docs as paywalled PDFs."
> "The way to use academic research results is to (1) read and understand the paper, (2) reproduce the code -- ideally, from scratch, so that his results are (hopefully) unaffected by authors' bugs, (3) verify on a test problem, and (4) apply to his data."
He wants to re-run the authors analysis with new data. He's not looking to recreate the research from scratch or publish a new paper. Saying that this is the only valid usage of the results is awfully short sighted. It misses the point that the research has use beyond usage by other researchers.
Imagine if rather than open source software, we published the results of our new modules and told potential collaborators to build it from scratch to verify the implementation first. You'd learn a lot about building that piece of software, but you've missed an enormous opportunity along the way.
"He wants to re-run the authors analysis with new data. He's not looking to recreate the research from scratch or publish a new paper. Saying that this is the only valid usage of the results is awfully short sighted. It misses the point that the research has use beyond usage by other researchers."
That sort of thing has been done for years in the scientific computing world. The end result is that you are making decisions based on code that may have worked once (Definition of 'academic code': it works on the three problems in the author's dissertation. There's production code, beta code, alpha code, proofs-of-concepts, and academic code.) but that you have no reason to trust on other inputs and no reason to believe correct.
Case in point: I had lunch a while back with someone whose job was to run and report the results of a launch vehicle simulation. Ya know, rocket science. It required at least one dedicated human since getting the parameters close to right was an exercise in intuition. Apparently someone at Georgia Tech wanted to compare results so they got a copy. The results, however, turned out to be completely different because they had inadvertently given the researchers a newer version of the code.
The purpose of the code written is usually very simple: to produce results of the paper, not to provide a tool other people can use out of the box. Even when such a tool is nominally provided (for example, when a statistics paper is accompanied by an R package), there are good reasons to be very careful with it: for example, the paper may include assumptions on valid range of inputs, and using the package without actually reading the paper first would lead to absurd results -- which is something that has happened. The way to use academic research results is to (1) read and understand the paper, (2) reproduce the code -- ideally, from scratch, so that his results are (hopefully) unaffected by authors' bugs, (3) verify on a test problem, and (4) apply to his data. Using an out of the box routine skips steps 1-3, which are the whole point of reproducibility.