Hacker News new | past | comments | ask | show | jobs | submit login

This is a really important comment. I started my career in scientific computing, went over to software/web development for a while, and am back over to scientific computing. I've been involved in teaching efforts lately, mainly for analysts who want to do data wrangling and stats, and I think it is a good environment for some of what they want to do.

I never used Jupyter before getting involved in this kind of coding, data wangling and stats. And it took me a while to get used to it. The "lack of state" is confusing, as well as the apparent non-linearity of a jupyter notebook (it can appear that certain commands are done in order from top to bottom, when if you look at the number next to them, they may have been executed in a completely different order). I actually chalked this up to my background in software development, figuring I was more wired to think of state as something that would be cleared every time a script was run. Analysts (statisticians, engineers, data scientist types) like to interact back and forth very iteratively with their data, and I think they're less likely to be tripped up by some back of mind assumption that you can determine the state of variables and so forth by looking at a bunch of commands as if they were run top to bottom in a file.

It got me thinking about when I'd use jupyter notebooks for my own work. In short, if I were writing a program, even just a one page script, I wouldn't use an interactive notebook (I have opened .py files in jupyter and just used it as a text editor).

But if I wanted to "do my math homework"? A one off where I need to get an answer to some complicated questions that will require a lot of data wrangling and stats? Yeah, I'd probably reach for jupyter notebook or something like it.

Otherwise? I can't remember the O'Reilly book where this was written (sorry I wish I had the cite) but the author wrote that his favorite dev environment is python and a text editor. That's still the case for me as well.

I hate to conclude with the disappointing "the problems are when you use it for the wrong task", mainly because I've objected when people have made this argument in the past. If a tool tends to get used for the wrong task, that's partly a problem with who is using it... but we shouldn't let the tool itself off the hook too easily. It's still worth thinking through. I'm worried I'm falling into the trap of making this argument when I like the tool, and objecting to it when I don't. I'll have to think about that a bit.

As it stands, though, I haven't had a huge problem with Jupyter Notebook, and I like it a lot, probably because I don't use it when Python + vi (or a "better" but still basic text editor) would do the trick.




For the times that you mention that you currently use Jupyter, how is it better than simply randomly selecting a small subset of data at the beginning and then writing your code in a full featured IDE with debugging functionality (including the ability to add breakpoints and see variable contents at those points, etc.)? The one line at the beginning to randomly subset the large data set is easy to delete or comment out once you have the code working the way you like - to me it seems more effective to do that in a native IDE like Eclipse+PyDev or in a lightweight IDE like Spyder than to use Jupyter, which requires so many compromises.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: