I think the biggest problem is for most of my statistical work, for Python I'd have to check, for R I can safely simply assume an implementation (or several) exists.
So, I'm not a statistician but I do wind up using a lot of (R) statistics tools in my various writings: http://www.gwern.net/tags/statistics R is a terrible terrible language from my perspective as a Haskeller, but the libraries are just too fantastic for me to not use it.
A crude bit of shell (`fgrep --no-filename 'library(' .page|sed -e 's/library//'|tr -d ')'|tr -d '('|sort -u`) on my writings and a bit of cleanup gets me the following list of R libraries I use:
MASS
MKmisc
Rcapture
XML
arm
biglm
boot
changepoint
ggplot2
insol
languageR
lme4
lubridate
metafor
moonsun
prodlim
pwr
randomForest
randomForestSRC
reshape2
rjson
rms
stringr
survival
verification
Some of them aren't important and I'm sure Python has equivalents to things like 'rjson', but others...? Let's look at a few.
- metafor: meta-analysis library (fixed & random-effects, forest plots, bias checks, etc). I use this in 3 meta-analyses I've done, so it's kind of important. A google of 'python meta-analysis' turns up... nothing? Well, it turns up some Bayesian stuff, but nothing jumps out as a finished polished library that one could easily reimplement my work in. ('METASOFT' sounds promising, but actually turns out to be for genomics and in Java.)
- randomForest, randomForestSRC: random forests are apparently implemented by Scikit. Great. What about random survival* forests, which I use in my Google shutdowns analysis to see if the standard survival model can be improved upon? 5th hit for 'python random survival forests' is... an R library.
- Rcapture: capture-recapture analysis, which I employ in my hafu analysis of anime/manga to try to estimate how many I may be missing. 'python capture-recapture analysis': the best-looking hit (http://www.mikemeredith.net/) is a false-positive. The relevant code is - you guessed it - in R.
- biglm: a package for incrementally updating a linear model, the idea being that you can load into memory as much data as fits, update the linear model, and load the next batch. I need this sucker for running even simple analyses on the Mnemosyne spaced repetition data set, where the database (~18GB) won't fit in my laptop's RAM. Some googling suggests that Scikit may support this. Or may not.
- lme4: the most famous library for doing multilevel models (indispensable generalizations of linear models, great for dealing with multiple measurements from the same subject; I use them on psychology or sleep data, usually). All the Python alternatives appear to either be exercises or Bayesian.
- survival: survival analysis library, used for Google shutdowns. https://stats.stackexchange.com/questions/1736/survival-anal... says you can sorta do them in Python, but you're cautioned against it and it looks like the Python alternatives don't have the Cox proportional hazards needed to use covariates.
I could go on to look at the others (any ordinal logistic regression in Python? how're the bootstrap libraries? what if I need to do power calculations for standard tests?), but I hope this gives you an idea of why I'm not doing everything in Python.
Can you please list some important tools implemented in R but not in Python? I'd like to know how bit the gap is.