Confession: after doing Data Science work for the past 4 years I STILL don't rea...

hervature · on April 30, 2021

Well, R isn’t the best language when it comes to building systems. Most R code is essentially one file written to produce an output once (for a paper, project, etc.). This means that people want a better language to build systems which Python fit. That explains why people moved to Jupiter.

I don’t like RStudio for the same reason I don’t like Matlab. I already have my editor and terminal workflow. I don’t want to use/learn a new tool for the privilege to use the language. Notebooks hit an acceptable middle ground where I can launch them via terminal. Notebooks have plenty of problems. Mainly, running cells out of order is just an incredibly dumb thing to be possible. This same problem is present in RStudio which you seem to enjoy (highlight and REPL) and you want it in other languages. If the code isn’t written to run in an order, a tool shouldn’t allow it.

disgruntledphd2 · on April 30, 2021

> Well, R isn’t the best language when it comes to building systems. Most R code is essentially one file written to produce an output once (for a paper, project, etc.). This means that people want a better language to build systems which Python fit. That explains why people moved to Jupiter.

I definitely agree that Python is a better general purpose computing language than R, but R's deployment story (i.e. packages) is much, much better than that of Python (pip/poetry/pipenv/conda/whatever came out this week). I honestly don't think that's the reason though, it's more that Python has much, much, much better developer mindshare.

Jupyter is a whole other world though, like iPython was the best thing ever as a proper REPL for python, and Jupyter was good for being able to do graphics with your code. That was all standard in the R world, with Sweave (which I wrote my thesis in), so it didn't appear to add a lot of value (to me, at least).

> I don’t like RStudio for the same reason I don’t like Matlab. I already have my editor and terminal workflow. I don’t want to use/learn a new tool for the privilege to use the language.

I am 100% with you on this, but Rstudio is just a nicer interface over the tools for literate programming in R, and the wonderfulness of Rmd vs ipynb is a thing of joy (to me, at least).

> Mainly, running cells out of order is just an incredibly dumb thing to be possible. This same problem is present in RStudio which you seem to enjoy (highlight and REPL) and you want it in other languages. If the code isn’t written to run in an order, a tool shouldn’t allow it.

So, this is a tricky one. I agree in principle, and I have a habit of continually re-running my documents to ensure that this doesn't cause problems, but there is definitely valid use-cases for out of order execution. Consider that you may often fit a model (which can take ages) and iterate on the visualisation/analysis code, but you don't want to re-run the modelling code every time you change a plot, which your solution would require.

Most of the tools claim to allow you to cache particular blocks, but I've never been able to get it to work reliably.

extr · on April 30, 2021

Yeah, I find that the out-of-order execution issue is common with people who have a software development mindset, but for data analysis/science is basically the only sensible way to work. The "load data" command might be one line but takes 3 minutes to run, while a huge chunk of code that plots the data might take 1 second and I might want to tweak it 50 different ways before settling on something that I like/delivers insight. Producing a standalone script that develops the same insight you get from "playing" with the data is an afterthought in some cases.

disgruntledphd2 · on April 30, 2021

As long as you're aware of the dangers, it's fine. Personally I try to model offline from analysis to avoid this issue, and set eval to no in org for those cases where I've built the model inline with the analysis.

Unfortunately, it generally takes a couple of terrible situations before people learn the problems with this.

hervature · on April 30, 2021

I agree that data analysis needs a tool to persist data while iterating over certain functions. But in this vein, said tool should aim to try to prevent the user from having to run the load_data() function more than once. Not encourage it by allowing someone to permanently manipulate the output of load_data().

disgruntledphd2 · on April 30, 2021

This is an option in many tools, but it doesn't tend to work that well in practice.

I do agree that this is the ideal though (As an example if Pluto is always reactive, then this workflow becomes much more difficult as when you change a downstream datapoint, the model will be re-run).

oivey · on April 30, 2021

That workflow has worked for me for Julia in Spacemacs and VS Code. Pretty sure it works in Atom, too.

nerdponx · on April 30, 2021

Spyder is basically an RStudio clone for Python, but I never had a great experience using it. Not really sure why, somehow I just ended up using Jupyter because that's what my coworkers all used. When doing "solo" stuff, it doesn't matter because I dislike every interface so I'm never happy anyway...

KMag · on April 30, 2021

> I dislike every interface so I'm never happy anyway...

It sounds like you might have some good constructive criticism after trying several options. Care to elaborate on the shortcomings of various options?