Hacker News new | past | comments | ask | show | jobs | submit login

Confession: after doing Data Science work for the past 4 years I STILL don't really understand why people like Jupyter.

R was my first programming language and I got really spoiled with RStudio where everything "just works" and the "highlight code -> run in REPL" workflow is super smooth and tightly integrated. All I want is for that to work in other languages, but it seems like if you want it in Python you need to be running PyCharm or a similarly-heavyweight IDE (seriously, despite all the hype of VSCode there are still a ton of issues with just highlighting code and running it in an IPython terminal) and for Julia it just doesn't exist. If you really want a Jupyter-like workflow you can just use R Notebooks, which are literally just better in every way.




Well, R isn’t the best language when it comes to building systems. Most R code is essentially one file written to produce an output once (for a paper, project, etc.). This means that people want a better language to build systems which Python fit. That explains why people moved to Jupiter.

I don’t like RStudio for the same reason I don’t like Matlab. I already have my editor and terminal workflow. I don’t want to use/learn a new tool for the privilege to use the language. Notebooks hit an acceptable middle ground where I can launch them via terminal. Notebooks have plenty of problems. Mainly, running cells out of order is just an incredibly dumb thing to be possible. This same problem is present in RStudio which you seem to enjoy (highlight and REPL) and you want it in other languages. If the code isn’t written to run in an order, a tool shouldn’t allow it.


> Well, R isn’t the best language when it comes to building systems. Most R code is essentially one file written to produce an output once (for a paper, project, etc.). This means that people want a better language to build systems which Python fit. That explains why people moved to Jupiter.

I definitely agree that Python is a better general purpose computing language than R, but R's deployment story (i.e. packages) is much, much better than that of Python (pip/poetry/pipenv/conda/whatever came out this week). I honestly don't think that's the reason though, it's more that Python has much, much, much better developer mindshare.

Jupyter is a whole other world though, like iPython was the best thing ever as a proper REPL for python, and Jupyter was good for being able to do graphics with your code. That was all standard in the R world, with Sweave (which I wrote my thesis in), so it didn't appear to add a lot of value (to me, at least).

> I don’t like RStudio for the same reason I don’t like Matlab. I already have my editor and terminal workflow. I don’t want to use/learn a new tool for the privilege to use the language.

I am 100% with you on this, but Rstudio is just a nicer interface over the tools for literate programming in R, and the wonderfulness of Rmd vs ipynb is a thing of joy (to me, at least).

> Mainly, running cells out of order is just an incredibly dumb thing to be possible. This same problem is present in RStudio which you seem to enjoy (highlight and REPL) and you want it in other languages. If the code isn’t written to run in an order, a tool shouldn’t allow it.

So, this is a tricky one. I agree in principle, and I have a habit of continually re-running my documents to ensure that this doesn't cause problems, but there is definitely valid use-cases for out of order execution. Consider that you may often fit a model (which can take ages) and iterate on the visualisation/analysis code, but you don't want to re-run the modelling code every time you change a plot, which your solution would require.

Most of the tools claim to allow you to cache particular blocks, but I've never been able to get it to work reliably.


Yeah, I find that the out-of-order execution issue is common with people who have a software development mindset, but for data analysis/science is basically the only sensible way to work. The "load data" command might be one line but takes 3 minutes to run, while a huge chunk of code that plots the data might take 1 second and I might want to tweak it 50 different ways before settling on something that I like/delivers insight. Producing a standalone script that develops the same insight you get from "playing" with the data is an afterthought in some cases.


As long as you're aware of the dangers, it's fine. Personally I try to model offline from analysis to avoid this issue, and set eval to no in org for those cases where I've built the model inline with the analysis.

Unfortunately, it generally takes a couple of terrible situations before people learn the problems with this.


I agree that data analysis needs a tool to persist data while iterating over certain functions. But in this vein, said tool should aim to try to prevent the user from having to run the load_data() function more than once. Not encourage it by allowing someone to permanently manipulate the output of load_data().


This is an option in many tools, but it doesn't tend to work that well in practice.

I do agree that this is the ideal though (As an example if Pluto is always reactive, then this workflow becomes much more difficult as when you change a downstream datapoint, the model will be re-run).


That workflow has worked for me for Julia in Spacemacs and VS Code. Pretty sure it works in Atom, too.


Spyder is basically an RStudio clone for Python, but I never had a great experience using it. Not really sure why, somehow I just ended up using Jupyter because that's what my coworkers all used. When doing "solo" stuff, it doesn't matter because I dislike every interface so I'm never happy anyway...


> I dislike every interface so I'm never happy anyway...

It sounds like you might have some good constructive criticism after trying several options. Care to elaborate on the shortcomings of various options?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: