Speaking as both a ML researcher and an applied ML business owner (one who hired...

stared · on July 16, 2018

Hi Radim!

To make it clear, I am not happy with the current state of reproducibility in AI. Yet, it is still better that in all disciplines I interacted with (quantum physics, mathematical psychology). There the standard prectice was to not include any code, even if the paper bases on it.

Vide my answer to "Why are papers without code but with results accepted?" (https://academia.stackexchange.com/questions/23237/why-are-p...).

So, I was so happy to see that in Deep Learninig a lot of code appears on GitHub (I am the most happy if it appears in different frameworks, implemented y different people).

Dirty code provides limited value. It's hard to learn from it, it's hard to re-use it, and its performance may depend on the phase of the Moon (and system setting, software versions, etc). Yet, IMHO, is much better than no code. It is not only about good faith, but about including all details. Some of them may seem unimportant (even to the author), yet crucial for the results.

The next level is resonably well written code, with clear environment setting (e.g. Dockerfile/requirements.txt), and the dataset. Otherwise it is hard to proof it against "on my environment it works":

> where the replication scores came out 62% instead of claimed 72%

jononor · on July 16, 2018

The cited example was attempted reproduced and found wanting. A plausible reason for this was discovered, and is accessible online. This in much better than no replication being performed at all. But yes, we need to continuously up the game. Adopting a standard that ensures model stability would be a good next target, and not accepting papers that don't uphold it.

I'd also like to see open-source runnable code the default for paper acceptance, with some sort of 'punishment' for not having it. Maybe even make it prerequisite of empirical papers.