Science Code Manifesto

noodly · on Oct 14, 2011

  Without better software, science cannot progress.

Science can progress even without software - the only difference that software makes is rate of progress and its cost.

  But the culture and institutions of science have not yet adjusted to this reality.

Did you ask yourself why ? Maybe, because what's now, just works for them ? I didn't see any description of problems on this page, that this manifesto wants to solve.

  The code is the only definitive expression of the data-processing methods used: without the code, readers cannot fully consider, criticize, or improve upon the methods.

The code is not as important as descriptions of algorithms, and the ideas behind code - readers should not concentrate on the code, but on everything that's behind it, and if they want to verify results, they should write code themselves, in order to increase plausibility of the results by independent verification - including independence of code.

I'm not buying this.

lmkg · on Oct 14, 2011

> Science can progress even without software - the only difference that software makes is rate of progress and its cost.

Depends on the science. Facial recognition software? That requires software to make progress.

> Did you ask yourself why ? Maybe, because what's now, just works for them ? I didn't see any description of problems on this page, that this manifesto wants to solve.

It is usually to the benefit of the individual researcher to keep the code under wraps, but not to the benefit of the field of research. A researcher benefits from generating results. They do not benefit from other people being able to verify their results. In fact, they may even benefit from raising the barrier to entry, as it makes their own results more important in the field.

> The code is not as important as descriptions of algorithms, and the ideas behind code

In theory, this should be 100% true. In practice, results are determined as much by parameters, implementation choices, and flat-out bugs, as they are by the algorithms and the big ideas. I absolutely agree that people should re-implement the algorithms, rather than share the bugs (Dijkstra wrote against this). However, what happens when you find a discrepancy between your results and what a paper said? Without having access to the original source code, you can't figure out what caused the difference. Anecdotally, my girlfriend has been in this situation before--she couldn't replicate a (computer vision) result from someone else's paper. The original author was reticent to share their own code, so she didn't know whether the difference was a bug on her end, a bug on their end, or a platform difference. Open source code may have helped answer that question.

noodly · on Oct 14, 2011

> Depends on the science. Facial recognition software? That requires software to make progress.

Like every computation - you can simulate it on paper :)

> It is usually to the benefit of the individual researcher to keep the code under wraps, but not to the benefit of the field of research. A researcher benefits from generating results. They do not benefit from other people being able to verify their results. In fact, they may even benefit from raising the barrier to entry, as it makes their own results more important in the field.

It's to the benefit not only of the individual researcher, but also of the institutions - because they don't need to upkeep code repositories - so publishing is not as costly as it could be, when releasing source code was mandatory. From field of research point of view - as I wrote earlier - - source code is not important. It may be useful in some situations but only for individual researcher (like your girlfriend).

> Without having access to the original source code, you can't figure out what caused the difference.

I think that's good reason to share the code or for discussion between researchers - but I don't think it's enough to make sharing code mandatory, because your girlfriend could for example write article pointing out differences between her result and previous result(s) and be done (assuming she was certain about her result), without ever looking at others source codes.

frossie · on Oct 14, 2011

Science can progress even without software

Okay well, I don't know what field of science you work in, but where I am sitting, we can't make progress without fourier-transforming millions of samples for tens of thousands of detectors three times an hour. Try doing that without software.

Science rests on the ability of others to reproduce and finesse your work. Releasing your code makes this much easier, partly because a lot of code does "boring" stuff like deal with data formats and basic calibrations before it gets to the "interesting" stuff. So giving somebody the "boring" part helps them enormously.

The reason people don't do it: they don't get personal benefit for it, and the journals and funding agencies don't make them. The latter CAN change.

[Edit - also: the way of funding scientific computing is somewhat broken, but that's another story].

noodly · on Oct 15, 2011

> Okay well, I don't know what field of science you work in, but where I am sitting, we can't make progress without fourier-transforming millions of samples for tens of thousands of detectors three times an hour. Try doing that without software.

It's possible - you can make hardware circuit that does exactly this :)

> Science rests on the ability of others to reproduce and finesse your work. Releasing your code makes this much easier, partly because a lot of code does "boring" stuff like deal with data formats and basic calibrations before it gets to the "interesting" stuff. So giving somebody the "boring" part helps them enormously.

What if this "boring" part in others code contains bug ? There's less chance to reproduce it, if you had to write it yourself :)

fergal_reid · on Oct 14, 2011

In one way, I really agree with this initiative. I'd like to raise a counter argument, about replication, though.

In a nutshell, if someone has to reimplement the code, from the details in the paper they've read, its a great check that the original author isn't just reporting the results of subtle bugs, or particularities, in their software.

Its true that opening the source allows other researchers look for bugs in the code, and that's good. But such checks are inherently less thorough than instead having another group replicate the results, in a separate environment, just working from the published paper details.

One objection to this 'clean room' replication, is that perhaps its not practical to re-engineer the code that was written for a paper - thats just too much work. But that's basically saying "well, we cant replicate the results of this paper, its too much work" - generally, that is not the way you want to go about doing science. A cornerstone of science we are willing to accept slower short term progress, in return for more certainty that what we are doing is correct (and hence, hopefully, faster long term progress); painstaking replication is a key part of this philosophy, I think everyone agrees.

Consider the neutrinos: they did an experiment; others are trying to suggest reasons for the surprising results. In the unlikely event the result lasts, other scientists are going to want to redo the experiment. Ultimately, its not enough to analyze the data, or experimental setup; you want to replicate, and ideally, from the ground up - in a different lab, with different equipment - and, with different code to process the results.

Now, maybe the manifesto is aimed at a world where its not possible to replicate based solely on whats in the paper - perhaps there's just too much detail that can't be included: and hence, a lot of the 'science' is inherently tied up in the detail of the code. I'm sceptical about whether thats a good way to do our science - but if we decide to go down that route, then the 'paper' as publication, the de facto unit of scientific output, is something we will also have to really rethink; and reviewers are going to have to be responsible for signing off on the code, which they don't generally currently do.

I think there's an argument, currently, for telling people "I could give you the code - but it'll only take you a couple of days to write your own code to replicate the results, and it'd be much better if you could do that" - but I'm sure this varies drastically across domains.

I'm not sure where I stand overall - but I don't think its black and white.

rflrob · on Oct 14, 2011

> I think there's an argument, currently, for telling people "I could give you the code - but it'll only take you a couple of days to write your own code to replicate the results, and it'd be much better if you could do that" - but I'm sure this varies drastically across domains.

While I can definitely see a PI setting this kind of task to a student, I don't think that by releasing the code the task suddenly becomes impossible. If (and when) the two independently written programs disagree, it will then be possible to immediately step through each and figure out where the point of divergence happens, and which (if either) implementation is more likely correct, rather than then starting an email chain with the original authors saying, "we get different results, but we don't know why".

fergal_reid · on Oct 15, 2011

If people continued to write independent programs, purely to replicate and verify claimed results, then certainly it'd be beneficial for them to have the source, in order to track down where the discrepancy arises. This is often how it works currently, in practice, in my limited experience; you mail the authors and ask them to help you track down the problem, and get either code, or support; but I acknowledge this wont always work.

My concerns are that 1) I don't trust people to do the hard thing, and re-implement to replicate, rather than take the easier way, and just use the code that's provided. There's very little credit currently given for replicating existing (even recent) results.

2) More importantly, when replicating a paper, thats a little vague, it'll be more and more tempting to just peek at the source; and suddenly the paper isn't the document of record anymore. I feel that if we go down the route where the code becomes the detailed documentation of the scientific process, that's a very fundamental shift from the current model, where the paper is supposed to be repeatable, in and of itself.

If we go down that road, we probably need a whole different review infrastructure; are reviewers really going to spend the time to review large and hastily written scientific codebases?

I doubt it; so how does review work when: "The code is the only definitive expression of the data-processing methods used: without the code, readers cannot fully consider, criticize, or improve upon the methods." Will it no longer be possible to criticise a paper for lacking sufficient detail to reproduce the results? Will the reply be 'read the source' ?

Maybe that's just the way things are going to go. There's a lot to like in that manifesto. But there's going to positive and negatives to letting the source become the documentation. The discussion around the manifesto on their website does not acknowledge such tradeoffs; its taking a pretty one-sided view. Maybe that's just how you are supposed to write manifestos :-) But I'd like to see some discussion of these tradeoffs.

jacques_chester · on Oct 15, 2011

> But such checks are inherently less thorough than instead having another group replicate the results, in a separate environment, just working from the published paper details.

Not so. Research into "N-version programming" shows that independent implementations tend to have bugs clustered in the same subsystems.

Besides, if scientists can invent a way to reliably communicate requirements in a 12-page paper in Nature then I have a requirements document in Brooklyn I'd like to sell them.

TheEzEzz · on Oct 15, 2011

Most academic code is poor quality and not easily extensible/wrappable. I suspect many people would still develop their own code.

People might be more lazy in some situations, I agree, but at the same time the number of eyes per lines of code will increase, which may be a net gain.

maxs · on Oct 14, 2011

Great thought behind this manifesto. I wish more authors would attach their source codes to their manuscripts. Even when the code is not clean (academic codes usually aren't written with extensibility in mind), it would be very useful to be able to verify precisely what the code is computing.

Many times I've had occasions where the paper doesn't clearly explain parameter values, order of updates in the simulation, etc.

A tip I've discovered: you may have a chance to get access to the source code if you email the corresponding author (if they're still alive!).

bigiain · on Oct 14, 2011