Hacker News new | past | comments | ask | show | jobs | submit login
Warning Signs in Experimental Design and Interpretation (2007) (norvig.com)
93 points by SatvikBeri on April 16, 2014 | hide | past | favorite | 21 comments



Readers who have observed my behavior after 1976 days of participation on Hacker News know that this is by far my favorite link to share in a Hacker News comment, so I'm glad to see this on the front page as an article submission today.

My one comment about the article is that most of what gets submitted to HN as a breathless press release on a research "breakthrough" is often not even based on experimental research, but rather on correlational research, so the study goes wrong with problems that Peter Norvig's excellent article doesn't even discuss much. Many, many submissions to HN are based at bottom on press releases, and press releases are well known for spinning preliminary research findings beyond all recognition. This has been commented on in the PhD comic "The Science News Cycle,"[1] which only exaggerates the process a very little. More serious commentary in the edited group blog post "Related by coincidence only? University and medical journal press releases versus journal articles"[2] points to the same danger of taking press releases (and news aggregator website articles based solely on press releases) too seriously. Press releases are usually misleading.

But, yes, definitely read the submission here, as it will help you check each submission to Hacker News you read for how many of the important issues in interpreting research are NOT discussed in the submission.

[1] http://www.phdcomics.com/comics.php?f=1174

[2] http://www.sciencebasedmedicine.org/index.php/related-by-coi...


Wow, this page has no <html> or <body> tags, it just gets down to business with <div>. Is it a fragment meant to be included elsewhere?

Can... can the whole Web be like this?



It should have been...


This is an excellent article. It helped put numbers and precise definitions behind several things I had sorta intuited before and also corrected me on a couple of others. :)

As part of internalizing the bit about P(H|E) vs P(E|H) [Warning sign I4], I wrote up a quick gdocs spreadsheet to let me play with the numbers: https://docs.google.com/spreadsheets/d/10JrG42iKY-LhcnaKU7O7...


It pretty actively depresses me that my statistics class never actually discussed how loose the connection between hypothesis testing (what we normally call a statistical test, frequentist) and a genuine updating of hypothesis probabilities (Bayesian) actually is.

Dear frequentists and Bayesians: could you people kiss and make up already? The rest of us are pretty tired of having to run bad numbers to get papers published and of having computationally intractable statistics to run. Please come up with a compromise.


I think this conflates poorly designed studies and preliminary or hypothesis-generating studies. If not for uncontrolled proof-of-concept studies, or pseudorandomized designs such as those used for post marketing studies, we would a) not have any new medications and b) rarely learn about unanticipated toxicities. It turns out that us clinical trial folks are more Bayesian than the cult of p<0.05 would lead you to believe.

At one end of the spectrum, we rely heavily on uncontrolled single- or multiple-ascending dose studies to prove to ourselves that a treatment is likely to be safe in next-step studies, and to guess at optimal dose. At the other, we learn a great deal from post marketing surveillance about unanticipated toxicities - because our priors based on big phase 3 studies may still be insufficient to accurately estimate risk. Neither of these designs are randomized or placebo-controlled - and in neither case is it an indication that 'something is wrong', even though an RCT would be better in both contexts. Better, if cost were no object and patient safety were not a concern.

I realize my fellow Brunonian does include some offhand caveats - but it worries me to read comments about how this negates most social science research.


Not at all... He states merely that if someone has the option to have a control and chooses not to, you should be concerned. Specifically he hits on the fact that some fields can't have a control (sometimes just due to ethical reasons).


I think the distinction is that we /often/ choose not to have a control, for a multitude of reasons, so as a cause for concern it's highly nonspecific. Much like the critique of small samples - there are underpowered 5000-patient multi center trials, and well-powered 30-patient trials.

As I said, I think this is a nice introduction but can lead one to discount, well, anything other than large randomized placebo-controlled trials.


I wonder how one moves in the opposite direction. I am not even sure how to frame my question. I have been given hell quite a lot for trying to explore it, conversationally. But say you have a working hypothesis and are getting results and you know of some research that fits with the situation. How does one effectively present something like that? Not as "proof" but as a place for others to start thinking about the problem space?

I don't ever seem to see scenarios like that addressed.


Have you read http://blog.sethroberts.net/ ? It's not clear to me if that addresses your question, but Roberts criticizes professional science for too much focus on expensive testing of popular hypotheses and not enough on generating and cheap winnowing of ideas.


I have not seen that before, thank you, but it is currently not wanting to open properly for me.


No idea about your access problem, but posts like these are why it came to mind: http://blog.sethroberts.net/category/personal-science/ http://blog.sethroberts.net/category/health/


Those links work. Thanks!


Great post! This shreds much of the research in social science. Should be required reading for every science major or science journalist.


I love Norvig's articles, but by not having a date on them, I wonder if I am reading the news or ancient history.


This one is timeless anyway.


This is definitely not new, I recall reading it a while back (at least a year, if I recall correctly).

Still good, nonetheless.



And we get 503 responses already.

There won't be much discussion if no one can read the article.


Working here also cached in Google (search on the title + Norvig, click the 'cached' link)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: