About Problem 1, I must stay vague, but the code I read was understandable. We t...

_delirium · on Sept 24, 2011

As far as examples for 2/3, Google is a common source. Lots of their papers report on experiments conducted with: 1) massive-scale proprietary data; and 2) using proprietary infrastructure. They do sometimes share data, but often don't, and in cases where they don't, there is not always equivalent publicly available data (especially of anywhere near the same size). And I would suspect that sharing the code is right out for a lot of cases; if they're writing a paper on improving an aspect of Google Translate (which they do fairly regularly), they aren't going to send the entire source code to GT to a committee.

St-Clock · on Sept 24, 2011

Yes, and Microsoft Research also publishes papers without any numbers on the axes of their graphs (e.g., number of bugs per component in windows vista). There is also a debate whether this can be seen as a significant contribution to science. Some thinks that it is, some don't think so.

In software engineering, these papers are still a minority and researchers (again, from Microsoft Research or IBM Research) often try to test their hypotheses or approaches on open data set (e.g., eclipse bugs repository).

Google just don't publish in SE venues so I never encountered this example. I'm sure they must publish more in distributed systems though. Btw, if Google is writing a paper on improving an aspect of Google Translate, I don't need their code, but I need at least the pseudo-code or the general strategy + their methodology and some intermediate data. Otherwise, it's not really a scientific contribution, just a tech report on something nice they did.