About Problem 1, I must stay vague, but the code I read was understandable. We tracked the time it took us to review the artifacts and since this is still an open problem (how to efficiently and fairly review artifacts), the process may change in the future.
To address the potential negative impact on the authors, I believe the papers who did not get a good review by the artifact evaluation committee just did not get a special mention. Since it is not possible to know who submitted an artifact and who did not (unless you got a mention), no harm is done for now.
The potentially negative impact on the authors' reputation was something that concerned me because I've been burned in the past by Ph.D. students not being able to use the tools I published and saying that my tools were buggy when they just did not know how to install Eclipse...
About problem 2, the conference organizers promised to keep the data confidential, but that might not be enough in some cases. For example, I would never show my interview transcripts to anyone, but there are some intermediate data that I could show and describe. We did not need to reproduce everything, we just wanted to see reasonable evidence that the approach described in the paper had been validated as advertised.
I'm not sure I understand the difference between problem 2 and 3. I must say that in my research area, I don't see many approaches and studies that are exclusively about proprietary data. Often, some part of the data/technique is publicly available or the approach has been tried on both proprietary data and open source data.
Overall, I think the artifact evaluation committee is a nice initiative and a step in the right direction. It needs to be carefully monitored and adapted to ensure that nobody gets burned for a bad reason though.
As far as examples for 2/3, Google is a common source. Lots of their papers report on experiments conducted with: 1) massive-scale proprietary data; and 2) using proprietary infrastructure. They do sometimes share data, but often don't, and in cases where they don't, there is not always equivalent publicly available data (especially of anywhere near the same size). And I would suspect that sharing the code is right out for a lot of cases; if they're writing a paper on improving an aspect of Google Translate (which they do fairly regularly), they aren't going to send the entire source code to GT to a committee.
Yes, and Microsoft Research also publishes papers without any numbers on the axes of their graphs (e.g., number of bugs per component in windows vista). There is also a debate whether this can be seen as a significant contribution to science. Some thinks that it is, some don't think so.
In software engineering, these papers are still a minority and researchers (again, from Microsoft Research or IBM Research) often try to test their hypotheses or approaches on open data set (e.g., eclipse bugs repository).
Google just don't publish in SE venues so I never encountered this example. I'm sure they must publish more in distributed systems though. Btw, if Google is writing a paper on improving an aspect of Google Translate, I don't need their code, but I need at least the pseudo-code or the general strategy + their methodology and some intermediate data. Otherwise, it's not really a scientific contribution, just a tech report on something nice they did.
To address the potential negative impact on the authors, I believe the papers who did not get a good review by the artifact evaluation committee just did not get a special mention. Since it is not possible to know who submitted an artifact and who did not (unless you got a mention), no harm is done for now.
The potentially negative impact on the authors' reputation was something that concerned me because I've been burned in the past by Ph.D. students not being able to use the tools I published and saying that my tools were buggy when they just did not know how to install Eclipse...
About problem 2, the conference organizers promised to keep the data confidential, but that might not be enough in some cases. For example, I would never show my interview transcripts to anyone, but there are some intermediate data that I could show and describe. We did not need to reproduce everything, we just wanted to see reasonable evidence that the approach described in the paper had been validated as advertised.
I'm not sure I understand the difference between problem 2 and 3. I must say that in my research area, I don't see many approaches and studies that are exclusively about proprietary data. Often, some part of the data/technique is publicly available or the approach has been tried on both proprietary data and open source data.
Overall, I think the artifact evaluation committee is a nice initiative and a step in the right direction. It needs to be carefully monitored and adapted to ensure that nobody gets burned for a bad reason though.