I think this is the Alan Kay future of computing. Right now we're in this weird hybrid state where we still work with digital documents primarily using the physical paper interface.
Imagine digital academic "papers" in STEM fields that natively ran the simulations the paper was describing. Jupyter sort of delivers that, but it still feels like early days for interactive digital-first documents (or as Steve Jobs has been credited for saying, "bicycles of the mind").
While a good point, at the moment the balance is much more shifted towards dead media rather than wasted resources. At best, the document doesn't get as much engagement as it could. At worst you get non-reproducible research papers, when you're really lucky if you can find the code in open access and compile it, let alone get the same results.
And sure, some simulations are very heavy, but they are more of exceptions. Also possible to have the best of both worlds, and have both a simulation, and a static snapshot available.
Often it’s the first step to reproducibility though. Am enormous amount of scientific effort is figuring out how a researcher did something they published.
Basically adding another whole project on top of the other project this way.
Imagine trying to figure out some 2001 JS paper thing for ex. But applied to every generation of technical development.
There’s always standards of course but we’ve seen those go sideways enough time to make one cringe at the thought of ‘dynamic papers’ via some new medium.
The kind of thing that sounds amazing on the surface then you remember the sort of crazy IT depts that thousands of universities run and forget the whole thing.
> Often it’s the first step to reproducibility though.
It shouldn't be! Reproduction needs to involve the interaction of human brain meats with a human level description of the solution. This is how we make sure that people aren't talking about something different than what was actually done, and how we make sure our conclusions are robust against the things we've failed to specify.
Imagine saying the same thing for physics: I start replication by running a time machine and using the same apparatus as the original experiment under the same conditions. Impracticality aside, this would be potentially useful to suss out fraud and certain kinds of errors, but what successful replication tells is is manifestly less powerful than successful replication on a new apparatus in a new location at a new time, with new values for everything we've failed to control.
Not everything is resource constrained, though. Imagine being able to easily make interactive content that illustrates what you're trying to convey and allows the user to "play with it."
For things that are heavily resource constrained, it still could be a boon to have interactive access to the data that comes out of it.
Even if it's not practical to re-run all of the computation, in many cases it would be nice to have the output data stored in the document in a form where you can interact with it rather than just having static pixels.
Even for non-academic reporting: imagine if instead of 'dead' news articles or some tax reforms, or climate change, or whatever, you had an interactive model you could play with (and for example, plug in your own numbers if you disagree with some of the inputs).
Sorry to horrify anyone but we actually do this at work (mechanical engineering company) - JavaScript calculated component dimensions as form fields based on user input (e.g. pressure or load rating) overlaid on technical drawings.
Reason it's done in pdf is a lot of our technical is spat out in PDF format (generated from CAD - SolidWorks).
There are other options like Traceparts or setting up a variable input SolidWorks model to generate loads of static outputs, if you have the time and money.
I think now we have a lot of things like this-- we have Jupyter, Matlab, etc, to create engineer-centric general purpose interactive documents. We have labor-heavy ways to make end-user focused ones in the browser. We have spreadsheets.
But-- wouldn't it be cool if there was a way ordinary people could create interactive content to interact with data in a rich, intuitive way?
Because it's not installed, and they don't want to and shouldn't have to learn something new when there's something not new already at hand which suffices.
If you ever find yourself saying something like, "people can just do X" and wondering why they don't, turn it around and ask yourself, "why can't I just do Y?" In this case, that would be, "Why can't I just make my notebooks work in the viewers that everyone already has agreed upon using (i.e. the WHATWG/W3C hypertext system, i.e. the web browser) instead of asking them to futz around with installing and learning Jupyter?". When you start making excuses for why not, it's the moment you should be able understand another person's reasons for why not Jupyter.
My feelings about this aspect of Jupyter are two-fold:
1. On the creation side, it requires someone be comfortable with Python (or other Jupyter language) to some degree. Right now, programming is still considered a career skill rather than something "ordinary people" should be expected to know. Perhaps layering a graphical programming interface on top of this, which UE4 seems to have had some success with with their Blueprint system, would get "ordinary people" over the mental hurdle of being intimidated by code-as-text. Just look at the mental gymnastics people will engage with in Excel while thinking it's not programming.
I see this as more of a social problem than a technical one, at any rate.
2. Once you build an interactive Jupyter document (especially if you use interactive widgets), it's not necessarily that easy to share in its original state without requiring the reader also have a Jupyter environment set up or access a server running Jupyter. I would like to be able to share the document in a way that can be accessed offline by someone without them needing to set up the whole environment. Maybe an "Adobe Reader"-like application for Jupyter notebooks that "ordinary people" can just install with a click?
re #1: I think it's a technical problem too. I'm technically competent and enjoy programming, but I'd still like it if sometimes I could ask questions and get answers with less or no code. BI platforms are a pain in the ass for many reasons, but they often make it very easy to ask simple questions and organize the data in simple ways. A document that could do similar things without all the scaffolding would be cool.
#2-- Or just use the browser. It's capable enough, even if large datasets are somewhat problematic. The hard thing is the UI and identifying what the correct subset of functionality to surface is.
MHTML is a neat format, it's unfortunate it never got much steam. I think it could have been more popular if web browsers had defaulted to it when saving pages, rather than this weird html + _files/ directory (which on Windows is mysteriously linked so that when you delete one, you delete the other - no idea how they do that!).
What I've read of EPUB is also pretty disappointing. Seeing as it's a compiled format, once again, instead of going the zipfile + bunch of html inside + specific layout, we could have had a subset of html in .mhtml.gz with, like, metadata in a <script type="application/json" id="x-epub-metadata">. And then, guess what, web browsers could have been able to read it natively…
> (which on Windows is mysteriously linked so that when you delete one, you delete the other - no idea how they do that!)
Probably just some special code in Windows Explorer watching for the combo of .htm(l) file plus simlarly-named folder – via the command line I can delete just the HTML file or the folder separately without problems.
I've been doing a lot of research that applies here. The answer comes down to a few things:
1. using vector graphics wherever possible and then encoding it as SVG
2. if bitmap graphics are absolutely required and they can be procedurally generated, then do that
3. if large photographic data, video, or any other kind of data is required that can't be handled using the above steps, then separate that data set as you normally would using the file system directories, place the data set subtree into a ZIP archive, write your code so it references items by file paths relative to the ZIP, and then put your page into the root of the ZIP file, too, e.g. as index.html—your readers and reviewers follow along by using their system's native ZIP support to explore the contents of the ZIP file so they can locate index.html and then double click it, and index.html opens up with an "open dataset" button which you use to then feed in its own parent ZIP archive
The last part might sound complicated, but it's not much different from asking someone to use MS Office or VS Code or an IDE to open a file/project. (It's just that instead of requiring then to already have that IDE installed, you're giving them the IDE they need at the same time that they're getting the document/dataset they're actually interested in).
These approaches are robust enough that they're very unlikely to be broken by future browser changes. It's not that the tech is lacking right now, it's that human habits are lagging behind and we haven't yet established this as a cultural norm/protocol/expectation.
There are also other situations when the data is neither procedurally generated, nor large enough† to warrant this kind of treatment : photographs, video, (non-MIDI) sound …
†IMHO as long as your document doesn't cross 10 Mo, you shouldn't have to separate the data…
I don't understand your comment. It sounds like an argument against a process for manually creating these kinds of files, which is not at all what my comment was about. It was about accessibility, real-world engineering, and describing a file format/packaging convention.
The packaging convention I described is similar to the container formats used and created by MS Office apps. The difference is that DOCX, XLSX, etc rely on XML instead of HTML that can be used without requiring a separate proprietary app. People create and exchange those files every day (even for things as trivial as a single-page flyer) without knowing or caring about whether it should "warrant this kind of treatment". Worrying about a purported edge case for <10 MB(?) of data sounds like an imaginary concern.
> Embedding a MP4 can work great, but your text editor will likely hate it.
Well, Libre Office Writer deals with (multiple, 100 Ko < size < 10 Mo) MP4 just fine. It's when the ODT is converted to PDF that most(?) PDF readers seem to be unable to read those MP4 properly.
Imagine digital academic "papers" in STEM fields that natively ran the simulations the paper was describing. Jupyter sort of delivers that, but it still feels like early days for interactive digital-first documents (or as Steve Jobs has been credited for saying, "bicycles of the mind").