Unfortunately, Jacques needs to answer a pretty basic question in order to get his wish: why? Why should CS paper writers show Jacques (or anyone else) their code?
We have discussed this topic on HN a number of times, for example:
Many of the comments in those threads do a better job summing up than I ever could. However, briefly, literally all of the incentives are aligned against publishing code and data.
If a writer's code is wrong, they are embarrassed (and there is no culture of being embarrassed by not publishing code).
If a writer publishes their code and it is actually good, someone else can scoop their follow-on results.
If a writer does not publish their code, and it is actually any good, they can potentially commercialize it thanks to the Bayh-Dole Act.
If a writer publishes their code and people intend to use it, the writer needs to clean it up, check it for correctness, and handle support requests. These activities are probably more time consuming than writing the code in the first place.
If the writer publishes their code, and other people in the writer's field do not, the writer is usually at a disadvantage. Others will appear to have more publications, the basic currency of academia. (Many people have great reasons for not publishing their code or data, especially researchers embedded at large companies making changes to large proprietary systems.)
So overall, yes, it would be great if CS paper writers gave out their code. What they are doing is not reproducible science in the philosophy of science sense.
But what is Jacques (or anyone else) doing to fix this system of incentives, and what could anyone do?
Thanks socratic for summarizing well the previous discussions! I was quite vocal against the obligation to release code if there was no incentive, even though I released the code of all the papers I published so far [1].
But to my surprise, the software engineering research community decided to try something new this year in one of their top conferences, FSE [2]. All authors of accepted papers were invited to submit their artifacts (data, code, video showing how the code was used, etc.) to the artifact evaluation committee. Two members of the committee reviewed each artifact and compared it with the paper. Papers that received a grade equal or above "meet expectations" received a special mention in the program [3].
Although the artifacts are not released to the public, this is a step in the right direction. If you are motivated to package your code for artifact evaluation and you get some recognition out of it, then the next step of releasing it is a lot easier. As a committee (I was part of it), we were afraid it would take hours and hours to just try to run the code that was submitted, but the authors really made the effort to package their code well.
I think what you are describing is pretty much the only promising direction for solving the problem. However, I have significant doubts about the approach. Perhaps you can address them?
The strategy as I understand it is:
1. Convince a high profile conference in ${FIELD} that reproducibility is important.
2. Create a special group within that community to test submitted code to see if it matches the results presented in papers.
3. Give a special carrot to authors (a special mention in the program, a piece of text in their paper) who meet the expectations of this group.
4. Hopefully, eventually readers come to see papers with the markers indicating reproducibility as the only legitimate ones, and writers are then required to make the significant time commitment (and take the significant risks) of releasing their code.
As it happens, (1), (2), and (3) have happened in a few systems communities. For example, SIGMOD has (more or less) the same setup as you describe.
However, I have deep doubts about whether (4) will ever happen. The three issues are:
1. The group doing the evaluation of the code for the conference has a boring, unappreciated job. They are also reading terrible, likely buggy code. A natural outcome is that the evaluation group will make bold claims about how all of the code they evaluated had significant issues potentially impacting research results, making everyone who submitted look bad, and leading to disincentives for future submitters. In fact, the evaluation group may even write papers about how bad specific code they reviewed was. I believe this has happened in other communities.
2. I briefly alluded to this in my original post, but many actors have extremely good reasons (at least on their face) for not releasing their code and/or data. This is why I mentioned how researchers embedded at companies modifying large proprietary code bases are extremely unlikely to ever be part of this evaluation regime. (And, no one wants to kick such researchers out of the academic community.)
3. In order for a stigma to be attached to non-reproducibility according to the conference, there has to be a strong correlation between the highest quality work and reproducibility. However, it is likely that much of the highest quality work will not be reproducible, either because it comes out of (or in conjunction with) corporate research labs, or because it uses some very difficult to get proprietary data. Likewise, the most easily reproducible results may be the least significant.
Do you think that these issues are solvable in the long term?
About Problem 1, I must stay vague, but the code I read was understandable. We tracked the time it took us to review the artifacts and since this is still an open problem (how to efficiently and fairly review artifacts), the process may change in the future.
To address the potential negative impact on the authors, I believe the papers who did not get a good review by the artifact evaluation committee just did not get a special mention. Since it is not possible to know who submitted an artifact and who did not (unless you got a mention), no harm is done for now.
The potentially negative impact on the authors' reputation was something that concerned me because I've been burned in the past by Ph.D. students not being able to use the tools I published and saying that my tools were buggy when they just did not know how to install Eclipse...
About problem 2, the conference organizers promised to keep the data confidential, but that might not be enough in some cases. For example, I would never show my interview transcripts to anyone, but there are some intermediate data that I could show and describe. We did not need to reproduce everything, we just wanted to see reasonable evidence that the approach described in the paper had been validated as advertised.
I'm not sure I understand the difference between problem 2 and 3. I must say that in my research area, I don't see many approaches and studies that are exclusively about proprietary data. Often, some part of the data/technique is publicly available or the approach has been tried on both proprietary data and open source data.
Overall, I think the artifact evaluation committee is a nice initiative and a step in the right direction. It needs to be carefully monitored and adapted to ensure that nobody gets burned for a bad reason though.
As far as examples for 2/3, Google is a common source. Lots of their papers report on experiments conducted with: 1) massive-scale proprietary data; and 2) using proprietary infrastructure. They do sometimes share data, but often don't, and in cases where they don't, there is not always equivalent publicly available data (especially of anywhere near the same size). And I would suspect that sharing the code is right out for a lot of cases; if they're writing a paper on improving an aspect of Google Translate (which they do fairly regularly), they aren't going to send the entire source code to GT to a committee.
Yes, and Microsoft Research also publishes papers without any numbers on the axes of their graphs (e.g., number of bugs per component in windows vista). There is also a debate whether this can be seen as a significant contribution to science. Some thinks that it is, some don't think so.
In software engineering, these papers are still a minority and researchers (again, from Microsoft Research or IBM Research) often try to test their hypotheses or approaches on open data set (e.g., eclipse bugs repository).
Google just don't publish in SE venues so I never encountered this example. I'm sure they must publish more in distributed systems though. Btw, if Google is writing a paper on improving an aspect of Google Translate, I don't need their code, but I need at least the pseudo-code or the general strategy + their methodology and some intermediate data. Otherwise, it's not really a scientific contribution, just a tech report on something nice they did.
How is there not a stigma attached to non-reproducibility? If it is not reproducible, then the findings are incorrect, and an investigation should be made to determine if it is just incompetence or out right fraud. Otherwise fire up the nobel prize committee for I have just discovered cold fusion and cured cancer.
In science, there are still such things are "good manners" and "common scientific practices", not all of which are directly incentivised. For example, every article contains references to other articles that helped create it. Every article credits all authors, usually in a particular order (which differs culturally per discipline). They're all fairly common sense, and there are many more. Some are listed explicitly as rules for journal or conference submissions. Some, however, are considered too obvious for that.
These are rules everyone follows. People get frowned upon when they are not followed.
Unfortunately, when these rules came to be, there wasn't that much code around. You can't print out your physics experiment setup. Additionally, every additional piece of info, be it data, code, or whatever, would need to be printed and shipped. This cost too much, which encouraged a culture in which only the most important parts, findings and a brief story of how they were obtained, were to be included in publications.
But then the internet came. The whole world adapted, but science culture didn't. The internet allows large sets of information to be shipped along with papers. This does not just hold for computer science. Entire data sets of physics experiments could be included. Survey data, code, Matlab models, and so on.
All this data is collected using tax payers' money, done for the greater good of the world. It makes perfect sense to include it all with the publication, online. In fact, it is criminally wasteful to make different teams collect the same data, or do the same work, over and over again.
I think the only change necessary to make this happen is for a few influential publications and conferences to require that all data, code, etc is published online along with the paper. It can start in a niche subject area and expand from there.
What Jacques (or anyone else) is doing to fix this system is yelling on the internet about it, so that conference programme committees and journal reviewers may at some point be swayed to add this little rule.
A rule that makes sense, just like "your paper must contain an abstract" makes sense, just like "you must include references to all works you borrowed from or built upon" makes sense.
He's not actually saying they lack incentive so much as that there are significant disincentives. The rest of sience contains an incentive, various types of professional censure, for the behavior you describe. CS does not and in fact contains significant ince tives for secrecy.
When I was a grad student, I put all of my code online for my papers. I did this for two reasons: basic transparency, and the hope that it would be useful to others. I actually got lucky a few times, and people did think my code was useful. It was used in other research projects, which got me citations - which are academic incentives. I have bunted on a few support requests, though.
Regarding the disincentive that publishing code means others may scoop your own work, I find that doubtful. By the time a conference paper is published, almost six months have passed since the paper submission. Most researchers are well on their way to writing another paper with that work and codebase by the time the wider community knows about it. Then there's the difficulty of making non-trivial changes to prototype-quality code implementing something new.
Indeed, that's the main component of the gold-standard results replication. For a chemistry experiment, for example, you're not supposed to replicate it by going to the original lab, using their existing apparatus, and just re-running the experiment. Instead, there's stronger confidence in the results if you replicate it using your own equipment in your own lab, reconstructing any necessary components from descriptions in the paper. That way you know that the results were actually due to what the paper claimed they were, instead of some overlooked happenstance in the original lab or apparatus.
Of course, reimplementation can be quite time consuming, which is the main problem. But then sharing code can actually decrease the likelihood of anyone ever reimplementing the algorithm again, instead just re-using the same (possibly buggy) code forever without looking at it.
OK, but there's no way subtle bugs will be found unless the code is released. If you replicate a non-trivial piece of code, both will have bugs, and both will have subtle design trade-offs (which won't be documented in the paper). Will you publish your results, despite the fact that they don't agree with the existing, accepted ones? If you do, what will it achieve?
Part of the problem is that academic progression is based so heavily on churning out papers that there's a race to the bottom for 'minimum publishable units'.
A single codebase might provide the analysis framework for tens of individual papers, and releasing early just increases the chances of getting scooped on some of them.
I think there are precedents for delayed data distribution in other fields (biology / chemistry?), so maybe a partial solution would be to embargo the code for a year or so after publication. The downside is that you can never expect to see code for the really cutting edge stuff, but perhaps that will be the price to pay?
Having a delay also might serve to blunt the "Yeah, well. Your code sucks." argument, in that it would need to really suck in a massive, conclusion changing way, in order to gather new headlines. Otherwise, it just sort of sinks away into obscurity unless you're looking directly at that result, in which case knowing about possible defects and improvements is a net benefit.
Overall, I'm offering no new benefits here, but at least a couple of possible mitigations of the downsides you mention.
I really don't think that it's easier to understand algorithm by reading its implementation than by reading its description. It's actually opposite -- frequently the idea behind algorithm is really hard to decipher if you only have access to code and not words. If you don't believe me, take an algorithm you don't know, read sample implementation (there are quite a lot on the internet these days) and try to understand how it works. I suggest trying KMP and Boyer-Moore for text matching, or Miller-Rabin and AKS for primality testing, or even RSA -- RSA is terribly simple in theory, but if you don't have a clue how it's supposed to achieve its goal, and you don't have background in number theory, then your chances of understanding it just by reading the code are infinitesimal.
Why can't we have all three when relevant? Description + formulas + code? I understand why some algorithms (AKS, RSA) would not make sense if you just spit out the C code raw but I think it would make the paper much easier to understand if there was some code to go along with the formulas and descriptions, especially if the description is hard to decipher like "analyze the performance of the differencing algorithm on random instances by mapping it to a nonlinear rate equation."
In most (if not all) papers I read there was already pseudocode, which is actually better than real programming language, because it lets you express your algorithm in a concise way without all the real world overhead, like initialization, allocation, etc. If you cannot implement algorithm yourself by reading its description and pseudocode, it means that either you don't fully understand what you just read, or your programming skills need some polishing, and you should be rather reading textbooks than research papers for a while.
Also keep in mind that papers are written for the colleagues in the field to read, and not for the laymen, so it's no surprise that authors use the terminology familiar to people in the field. It makes no sense to write research papers with laymen in mind, because 99.999% of them don't care and the remaining 0.001% will learn terminology soon enough.
Also, you really should not expect that reading research papers will be as easy as reading newspapers, don't you agree?
Oftentimes, it is easy to overlook whether that real-world overhead affects the asymptotic performance of your algorithm in either an ideal sense or a pragmatic one. If memory allocation were free or memory were infinite, true constant time rather than amortized constant time push back on dynamic arays/vectors would be possible, etc.
Also, initialization often affects correctness, so it's much less frequently skipped even in pseudocode for an algorithm.
There are some famous examples of tweaks that don't make the paper, but are actually critical to successful methods. Antony Jameson's aerodynamic codes are prime examples. He made good money for decades because Boeing and some other majors couldn't reproduce his code on their own, but his code gave better predictions compared to the wind tunnel, so they used it to design several generations of aircraft.
Reviewers don't get to see the code and most don't have the time or inclination to reimplement based on what is described in the paper to check whether it tells the whole story or not. Yes, this is different from a theorem which should come with a rigorous proof. This is a great talk on the subject:
"Generally, academic software is stapled together on a tight deadline; an expert user has to coerce it into running; and it's not pretty code. Academic code is about "proof of concept." These rough edges make academics reluctant to release their software. But, that doesn't mean they shouldn't. "
I agree with the writer, in many ways. But it's going to be a huge uphill battle.
I'm on the other side of this, too, as a researcher in algorithms. Earlier this year, I submitted the first paper I've ever written, in which I would consider some related code to be an essential part of my research. Now, consider what this means.
- I had some spend a lot of time getting the code into shape. Documentation, general clean-up, etc. This was time I could have spent writing other papers. Publication output is a huge determining factor in academic tenure & promotion decisions. Seriously, the way things are now, you're asking people to risk not getting tenure, in order that you might have their code.
- The idea of a published paper, is a permanent record of the research. Repositories for code that are of a similar permanent nature, are few & far between. Some journals are starting to allow arbitrary attachments to papers, which can them be obtained online. I submitted my paper -- along with the code -- to such a journal. But this greatly restricted the list of journals I could submit to. And it pretty much left out all of the truly prestigious ones. Again, a problem.
Not to be harsh, but the code I've seen taken from "scientific" papers were the worst code I've ever seen. One function of a couple of thousand lines in C++ (But using a mix of C and C++ functions) using variables named with 1 or 2 letters.. without comments, gasp.. I still do nightmare about that (I had to refactor that code.)
But I do agree with Jacques that I so much prefer a graph with a summary than a couple pages of unreadable mathematical expression. A high-level pseudocode of the general idea might also be great. Sometime, I feel like the paper wasn't written for people to understand.. but as a way to make it seem difficult. It's not rare that I talk with an author of a scientific paper and can easily summarize an entire paragraph with a couple of simple sentences..
Of course it wasn't written for people to understand -- it was written for their colleagues in their field, and all they care about are these pages unreadable mathematical expressions.
It's not rare that I talk with an author of a scientific paper and can easily summarize an entire paragraph with a couple of simple sentences..
And when I talk about math with my friends over a lunch, much, much more information can be conveyed than by spending the same amount of time on reading papers. However, it does not prove that papers are written in unintelligible way, only that human-to-human communication is much more effective than paper-to-human.
I would be interested to see more CS research happen in a more open source like way, like how we see projects work on github where it is easy for anyone with an interest in an area to get involved, run the code and contribute.
Obviously this wouldn't be possible in all branches of CS research and there is the danger that these kind of projects would get stuck local maximas without a few minds driving the research and the ability to understand it well enough to make large scale changes.
Still a lot of good could probably come out of it, plenty of smart CS minds that while they don't have time to commit to the reading and work required to drive their own research project have the expertise to contribute to small parts of other peoples.
That's funny, because, you see, actual research really works like this. Instead of Github you've got journals, and seminars on universities. The smart minds contributing small parts to other people's research are graduate students.
The thing is, actually contributing in the cutting-edge research is much harder than helping FOSS projects. Most researchers I know think of their problems day and night, while to be helpful in community software projects it's enough to spend few hours on weekends.
This won't fly in the real world. If an academic writer includes a link to their code in their paper, when it goes through peer review before being published, it will no longer under be a blind review; the reviewer will know who wrote the code by virtue of the github account or domain it's uploaded to. Blind reviews are important, as if you know you're reviewing a paper by someone who rejected your paper, you're more inclined to reject it. That really happens, people are that petty.
Additionally there's the risk that the reviewer will reject the paper, and download the code and publish a paper about it quickly in some other journal/conference.
Your counterpoints are irrelevant. You can obviously attach code to the article, so that there's no need to include any link or other similar thing. If a reviewer wants to do the thing you described, the actual possession of the code is not a big help for him -- what actually keeps him from doing that is honesty, decency and the respect he has among his colleagues he would instantenously lose.
Why not just fill in the link after review but before publication? Presumably that happens with other information which identifies authors, such as their names and email addresses.
Although 'blind' reviewing is something of a misnomer in small research fields where the reviewer can probably guess at least one of the authors.
Solution: you anonymize the link for peer-review, and only include the real link in the final version. This way the code is not peer-reviewed of course, but peer review can look at the pseudo-code given in the paper.
Your last point really has nothing to do with given away code, nor with peer review: you can plagiarize just about anything. This generally doesn't happen because scientists care a lot about giving credit and recognizing priority.
I would like more code simply to prove the things described in the papers. The exploration just feels incomplete to me if I can't run some code and hack on it to test how it changes behavior.
I read a lot of CS papers and am not particularly strong on formal math (that is, I have a lot of gaps) but I'd rather not have actual code to demonstrate concepts. Instead, I prefer it when they use pseudocode because it forces them to boil their implementation down to the simplest parts possible than cut and paste code into a paper.
It seems to me like most CS research projects wouldn't necessarily have much code behind them. Obviously, data structures and algorithms could, but implementations are generally trivial once you've got the academic paper.
Sorry. A peer-reviewed research paper in computer science is supposed to be an original contribution to knowledge meeting the usual criteria of new, correct, and significant.
Okay, how do we present knowledge? Text? Yes. Text with math as in, say, a book on math or mathematical physics? Yes. Code? Nope! Sorry 'bout that!
Yes, coming from the programming side, many programmers have accepted that code, especially with mnemonic identifers, actually is meaningful and a substitute for good text possibly with good math. Sorry, guys: In principle, it just ain't. In simple terms, the code is meaningless; code with mnemonic identifer names and careful indenting is still meaningless and at best a puzzle problem to solve to identify the intended meaning. Sorry 'bout that. Yes, often in practice, with enough context from outside the code, can get by with just mnemonic names and pretty printing. Still, it's just ain't knowledge.
Really, for being knowledge, the code should be like the math in, say, a physics text, that is, surrounded by text. The math does not, Not, NOT stand alone and by itself is just meaningless. E.g.. F = ma is no more meaningful than a = bc. For either equation, the meaning has to come from surrounding text where the equation becomes just a nice abbreviation for what is in the text. So, for code, the comments play the role of text in a physics book and the code plays the role of the math in a physics book. As in a physics book, the text (comments) are MORE important than the math (code). Again, code just AIN'T knowledge. Sorry guys.
I think you are missing the point. The article is not about "give us the code instead of a paper", but instead about "give us the code as a supplement to the paper". The basic argument is that sometimes text alone is not sufficient to explain - a point you concede to by mentioning that the formulae exist in textbooks along with the description. The reason they include those formulae in physics books is because the text is one way of describing something, and the math explains the relationships. Both have their strong points.
Take for instance many protocol standards: they come with a reference implementation, because sometimes the text is harder to understand when you can't compare it to code which actually does what is being described.
"The article is not about 'give us the code instead of a paper', but instead about 'give us the code as a supplement to the paper'."
is an appropriate interpretation of the paper. But I can't have much sympathy for that interpretation because the code can be so darned long and obscure.
Some 'pseudo-code' of, say, a dozen lines as a 'figure' is okay. In nearly all cases, any code very close to something that actually runs in practice would tend to be too darned long.
The 'knowledge' of the paper is supposed to be clear just from the paper. Then the code is supposed to be 'routine'.
If the main body of the paper does not make the knowledge clear, then including a few hundred lines of code has little promise of improving the exposition.
There's a good history supporting my point in books by Knuth, Ullman, Sedgewick, the paper
Ronald Fagin, Jurg Nievergelt, Nicholas Pippenger, H. Raymond Strong, 'Extendible hashing—a fast access method for dynamic files', "ACM Transactions on Database Systems", ISSN 0362-5915, Volume 4, Issue 3, September 1979, Pages: 315 - 344.
and more.
For your
"The basic argument is that sometimes text alone is not sufficient to explain - a point you concede to by mentioning that the formulae exist in textbooks along with the description."
I believe you are asking too much of the equations, not a lot too much but too much.
In simple terms, even pure math is, usually is, at best is, is supposed to be written in complete sentences in a natural language, say, English. E.g., in a physics book describing Newton's second law, we discuss force, mass, and acceleration, explain that force is a vector F, mass is a scalar m, and acceleration is a vector a, say that force is the product of mass and acceleration, e.g.,
F = ma
So, by the time get to the equation, almost don't need it. Of course, not ALL writers write this way!
A long sequence of algebraic derivations is supposed to have a start much like what I gave for F = ma and after that is just algebraic derivations, often with not much need for more text.
For
"Take for instance many protocol standards: they come with a reference implementation, because sometimes the text is harder to understand when you can't compare it to code which actually does what is being described."
yes, there is a practical challenge there. Finally the Unix world decided to 'go empirical' and have 'code runoffs' or whatever they were called where people just got together to see if the actual code, say, the TCP/IP stack with SSL, would actually appear to work. Of course, that empirical approach was always risky because it is a long way from a mathematical proof of correctness or even a solid engineering document. And we've seen problems, e.g., with security in DNS.
For more, the AI world decided to f'get about 'theory' or even usual scientific documentation and make the criterion just 'does the code appear to work'. This, too, has proven to be risky and, of course, is a big step back in documentation quality from good applied math, mathematical physics, and engineering of past decades. E.g., how the heck to know that the Manhattan Project was on track before the first test? Sure: There were a LOT of derivations and calculations; that is, the project didn't rely on just the empirical approach. Indeed, the uranium bomb was dropped on Japan as the first 'test'; the engineering was so solid there was no doubt it would explode.
Sure, there is a big role for testing, but, still, the 'knowledge' in a journal is supposed to be able to stand on its own without much help from detailed documentation on code, electronic circuits, mechanical mechanisms, or chemical reactions. In particular, in experimental science, the published paper is supposed to be clear enough to permit others to reproduce the experiment, and to do so without detailed photographs and drawings of apparatus.
When I was an undergrad, one summer I worked in the lab of a famous scientist. He explained to me how a scientist is supposed to keep a good notebook and really good records. Then if another scientist wants to check details or reproduce results, sending the notebooks and data is supposed to be an obligation, and the materials are expected to be handled with great care and then returned.
So, yes, at times, in practice, there can be more that is relevant than just the published paper.
Still, communicating the 'knowledge' of a research paper with a few KLOC of code is a bit much!
I really think you are fighting a strawman here. Your argument is all over the place too... Is the math supposed to be clearly explained in text or just the starting conditions from which the rest of the formulae just tell the story for you?
Moving on to DNS having security holes and other problems with protocols made with reference implementations -- how is this relevant? Lots of scientific papers get published that later turn out to be wrong, whole fields of study sometimes look really good and push our understanding of the world but ultimately go away as "the wrong approach". This is just progress, not a fundamental flaw in putting making code available.
When you say the "knowledge of a paper" should stand on its own, without the need for the circuits, code, etc to understand it -- this is true of many fields, but not true when the paper is about the circuit, the code, etc. Then that very thing should indeed be presented, otherwise no amount of text will properly explain it, other than a 1000 words where a picture would suffice.
As for your famous lab scientist -- why is it ok that he wants to make notebooks and other lab stuff available, but when we ask for code to be put in a public repo when reviewing papers we are suddenly breaking knowledge?
An finally, about your last sentence -- supplemental code is not an attempt to transmit the knowledge with the knowledge of a research paper. The word supplemental means "in addition to", the paper should still happen!
A last, slightly tangential thing: No matter what you say about text being the only way to transmit knowledge, there are plenty of occasions where a simple digram transmits much more meaning to me than reams of text, I am visual thinker and learner. Similarly there are times when a few lines of code can tell me as much as paragraphs describing them. There are times when I see an equation and get what the paper is trying to discuss. There are also times when the text is invaluable, and without it I would be lost, no matter how good the diagram or equations or code. The methods of knowledge transmission are many -- to be stuck on a single type as the "true way" is a bit absurd and/or pretentious.
The resolution appears to be there are two quite different worlds: First there is the world of peer-reviewed research papers in computer science where the goal is to present knowledge that is new, correct, and significant and that sometimes results in code. And second there is the world of commercial computing that is heavily about just code. But these two worlds are very different. In particular, the research papers are not trying to document or describe code. Moreover, in the knowledge presented in a research paper, that knowledge is supposed to be in the form of text, possibly with some math, but not in code. E.g., the original paper with heap sort didn't need code; Knuth's presentation of heap sort in TACP has only a little code, in his language MIX which few people bother to understand, that is not essential.
"Is the math supposed to be clearly explained in text or just the starting conditions from which the rest of the formulae just tell the story for you?"
Again, you are asking too much from the math. Again, 'math' is supposed to be written in complete sentences. In particular there is no 'new language'. I illustrated with F = ma: You are supposed to get nearly all "the story" from the text and not the equations. Well written math doesn't ask a reader to get "the story" "just" or even primarily from the equations. If you don't want to accept this description of good writing in math, then so be it, and our main difference will be right there and not "all over the place".
Again, yet again, once again, the knowledge is supposed to be in the text, in well written paragraphs, in complete sentences, and there is no 'new language'. For an equation such as F = ma, that is a part of speech, a noun. And again, yet again, the surrounding text is supposed to be so good that the equation is hardly needed. Again, yet again, one more time, please read it this time, you are not supposed to have to dig the meaning out of the equations in more than a peripheral sense.
"Moving on to DNS having security holes and other problems with protocols made with reference implementations -- how is this relevant?"
Because, again, the code, including in a 'reference implementation', doesn't mean anything. We shouldn't ask to determine correctness, or even the quality of the design or engineering, just from the code, not even code with mnemonic identifier names and many in-line comments. The 'relevance' is that we see that the DNS code had security problems, that is, was not high quality design or engineering, and the reason for those problems is that high quality design and engineering are in text and NOT in code. So, with just the code, we have no solid, rational support of the quality of the design or engineering. Instead, about all we have is the code. Then to know if the code has high quality, we just deploy it and wait for the bug reports. Then we study the code, see where the problem is, patch the code, deploy it again, and wait for more bug reports. NOT good. NOT high quality design or engineering.
We do NOT design or engineer airplanes like in a 'reference implementation' of DNS. Instead, before the plane carries passengers, we have a mountain of rock solid engineering, heavily in text, saying that the plane is safe. We do NOT primarily determine the safety of an airplane by putting one million passengers in it. In DNS, we determined the quality of the work primarily by deploying it, using it in real networks, getting the bug reports, and then fixing the bugs. So, the DNS engineering was shoddy work compared with nearly all of the rest of engineering.
Broadly we just cannot expect to have delivered high quality software when what is delivered is mostly just the software. Instead, the quality has to be in a document almost entirely in text. In high quality? Yes. In common commercial practice? No.
Fundamentally it is the text that really matters, NOT the code. Again, yet again, given the text, the code is supposed to be routine. This holds in 'research level' 'knowledge'. Sure, this doesn't hold in routine commercial practice.
"Lots of scientific papers get published that later turn out to be wrong, whole fields of study sometimes look really good and push our understanding of the world but ultimately go away as 'the wrong approach'. This is just progress,"
No, you are not describing "progress" but are describing just mistakes, that are sometimes costly, and are to be avoided strongly.
"When you say the 'knowledge of a paper' should stand on its own, without the need for the circuits, code, etc to understand it -- this is true of many fields, but not true when the paper is about the circuit, the code, etc."
No, not really: We're talking about peer-reviewed research publications of 'knowledge'. That 'knowledge' is not really supposed to be about the circuit or the code. Instead, the circuit or code are supposed to be routine implementations of the knowledge in the paper.
You want to elevate the circuit or code to the level of knowledge, and that is backwards and wrong. Your goal of elevating code to the level of knowledge and what is primary is not promising. In simple, blunt terms, so far without serious exceptions, 'knowledge' is communicated in text, possibly with some math, in a natural language. That's all we've got. There ain't no more.
"Then that very thing should indeed be presented, otherwise no amount of text will properly explain it, other than a 1000 words where a picture would suffice."
In the communication of knowledge, we don't depend primarily on pictures, figures, diagrams, schematics or code. Instead we communicate knowledge in text, possibly with some math. Again, good examples are in books by Knuth, Ullman, Sedgewick, etc.
Again, yet again, once again, please try actually to read it this time, there has been a long tradition in math that there should be no diagrams or pictures at all, not even in vector calculus. Much of the motivation was to have the discipline to make SURE that the content was in text, just text, with some math, and NOT in pictures. Again, we didn't want the content to be in pictures. This point is SERIOUS about keeping up the quality of the material.
For pedagogy, a picture can be terrific, but 100% of the content should also be in the text.
"As for your famous lab scientist -- why is it ok that he wants to make notebooks and other lab stuff available, but when we ask for code to be put in a public repo when reviewing papers we are suddenly breaking knowledge?"
We're not. Put all the code in public repositories you want. Put in 1000 KLOC. Fine with me. But the code in the repository is not going to be part of the paper, e.g., it will not be reviewed in the peer-review process. Moreover, the paper just MUST make sense with no reference at all to the code. Again, the code is NOT the knowledge or the subject of the research paper. Again, the paper is NOT about the code; it is not a paper describing the code; the code is not the 'contribution to knowledge'. Instead, the code is at most a hopefully routine implementation of the knowledge in the research paper.
"No matter what you say about text being the only way to transmit knowledge, there are plenty of occasions where a simple digram transmits much more meaning to me than reams of text, I am visual thinker and learner."
Fine. So are many people. But such ways of 'transmitting knowledge' are NOT what we are talking about. We're talking about peer-reviewed research papers. Tutorial presentations can be very different. Movies are very different. TV is very different, even some of the TV cooking shows that try to be instructional are very different.
Maybe the main point of misunderstanding is, practical computing is nearly all about code, just the code, and, thus, you have accepted that computer science research papers can be about describing code. No. Instead, for such papers, any code is supposed to be just an implementation of the knowledge in the research paper and routine. The knowledge in the research paper is NOT 'code'.
The knowledge in a peer-reviewed research paper is supposed to be rock solid, and, again, yet again, we don't communicate that with pictures or code.
Pictures and code? In tutorials, fine. In commercial work, sure. In research papers, only a few lines of 'pseudo-code' in a picture that is not essential to the paper. Understand now?
"Really, for being knowledge, the code should be like the math in, say, a physics text, that is, surrounded by text..."
You kind of set yourself up for the rebuttal here. I don't think many people would suggest that presenting code devoid of context would be useful. Usually, really good writing about code, whether in papers or in books, is written in a kind of literate style, a la Knuth. Even if it's not formally done using literate tools, in many cases the code is accompanied and surrounded by text, just like mathematics.
This isn't some kind of pissing match between code and math, as you seem to want to make it out to be. Someone publishing a CS paper presumably has some interest in communicating something to the reader. The OP is just suggesting that for some subset of the readership, code might aid in that goal.
A published paper is supposed to present 'knowledge' in a clear and solid way. We know how to do that in, say, a physics text in a natural language, e.g., English, with text and some math.
So, now there's 'code' that is neither text nor math. So, in communicating knowledge, we don't know what role the code has. Fundamentally, the role is not promising: The code alone doesn't mean anything; mnemonic identifier names do not make a fundamental difference; with a lot of explanatory comments we are still a very long way from good examples such as a physics text.
In practice, programmers are often forced to look at code and discover its 'meaning', but that unfortunate situation does not mean that code should be in a published paper.
In simple terms, there are two cases of 'code' here:
(1) It is common to have a dozen or half-dozen lines of, say, 'pseudo-code' in a figure in a paper. The figure can help, but the paper should be plenty clear and solid without the figure. E.g., there was a long theme in much of the best in math that the writing should be so clear that no figures are needed; or, the content should not be in 'figures'.
But, I'm 'liberal' and can agree that intuition is important, that often there is an appropriate 'figure', that the reader should not always be expected to draw their own, either in their mind or on paper, and that, similarly, a few lines of pseudo-code can be helpful.
Still, the pseudo-code is not the solid, clear content of the paper.
(2) In practice, 'code' tends to grow quickly from dozens of lines to KLOC. Putting such code in a paper would be a bit much!
There is a real danger in computing: In nearly universal computing practice in writing and documentation, the content is not all in just text or math but in mnemonic identifiers and code, and that danger is real and a big step backwards in science. While it is clear where the danger comes from in computing practice, still the results don't belong in journals.
You mentioned Knuth's 'literate programming', i.e., his shot back at 'structured programming'! For the code itself, I was always in favor of his 'literate' idea. Although TeX is my main source of high quality word whacking, I never actually got through enough of his Web, Weave, and Tangle to write 'literate' code. The best I do is just use a good text editor.
In our AI project at Yorktown Heights, I tried to get people interested in literate programming and Knuth's tools, but people called it just 'pretty printing' and laughed.
Most often they don't have any code. Most of the action takes place inside the head and not inside the computer. What they have are proofs of correctness, invariant enforcement, complexity bounds etc. I doubt that reading the code will improve understanding of the algorithm anymore than a thorough description of the algorithm.
While I do not agree to the fashion in which the above comment was made, I do agree with the sentiment.
I know a few computer scientists who are really good at algorithms and theoretical computer science but will not write/be-willing-to-write code. There are, many times, constraints because of lack of proper representation in the language or because of added complexity in actual code than in pseudo code (because of edge cases - unnecessary boilerplate code to just make the code run etc.).
Additionally, many a times the paper might be presenting part of a larger algorithm/solution and that part in itself might not be useful/compilable.
I think that pseudo code fulfills the need for most of the cases; at times when it doesn't peer reviewers would ask for code and the same will be provided if possible.
Oh and papers without implementation make up a significant part of a lot of undergraduate course projects. I am actually looking for an interesting research paper to implement for Digital Image Processing.
We have discussed this topic on HN a number of times, for example:
http://news.ycombinator.com/item?id=2735537 http://news.ycombinator.com/item?id=2006749
Many of the comments in those threads do a better job summing up than I ever could. However, briefly, literally all of the incentives are aligned against publishing code and data.
If a writer's code is wrong, they are embarrassed (and there is no culture of being embarrassed by not publishing code).
If a writer publishes their code and it is actually good, someone else can scoop their follow-on results.
If a writer does not publish their code, and it is actually any good, they can potentially commercialize it thanks to the Bayh-Dole Act.
If a writer publishes their code and people intend to use it, the writer needs to clean it up, check it for correctness, and handle support requests. These activities are probably more time consuming than writing the code in the first place.
If the writer publishes their code, and other people in the writer's field do not, the writer is usually at a disadvantage. Others will appear to have more publications, the basic currency of academia. (Many people have great reasons for not publishing their code or data, especially researchers embedded at large companies making changes to large proprietary systems.)
So overall, yes, it would be great if CS paper writers gave out their code. What they are doing is not reproducible science in the philosophy of science sense.
But what is Jacques (or anyone else) doing to fix this system of incentives, and what could anyone do?