Hacker News new | past | comments | ask | show | jobs | submit login

I think this is the Alan Kay future of computing. Right now we're in this weird hybrid state where we still work with digital documents primarily using the physical paper interface.

Imagine digital academic "papers" in STEM fields that natively ran the simulations the paper was describing. Jupyter sort of delivers that, but it still feels like early days for interactive digital-first documents (or as Steve Jobs has been credited for saying, "bicycles of the mind").




Why compute twice? Waste of resources. Some simulations also demand some serious hardware requirements that might not even be possible to run locally.


While a good point, at the moment the balance is much more shifted towards dead media rather than wasted resources. At best, the document doesn't get as much engagement as it could. At worst you get non-reproducible research papers, when you're really lucky if you can find the code in open access and compile it, let alone get the same results.

And sure, some simulations are very heavy, but they are more of exceptions. Also possible to have the best of both worlds, and have both a simulation, and a static snapshot available.


> At worst you get non-reproducible research papers

Ability to rerun programs is great, but we should be careful to remember that it's a different thing than reproducibility.


Often it’s the first step to reproducibility though. Am enormous amount of scientific effort is figuring out how a researcher did something they published.


Basically adding another whole project on top of the other project this way.

Imagine trying to figure out some 2001 JS paper thing for ex. But applied to every generation of technical development.

There’s always standards of course but we’ve seen those go sideways enough time to make one cringe at the thought of ‘dynamic papers’ via some new medium.

The kind of thing that sounds amazing on the surface then you remember the sort of crazy IT depts that thousands of universities run and forget the whole thing.


> Often it’s the first step to reproducibility though.

It shouldn't be! Reproduction needs to involve the interaction of human brain meats with a human level description of the solution. This is how we make sure that people aren't talking about something different than what was actually done, and how we make sure our conclusions are robust against the things we've failed to specify.

Imagine saying the same thing for physics: I start replication by running a time machine and using the same apparatus as the original experiment under the same conditions. Impracticality aside, this would be potentially useful to suss out fraud and certain kinds of errors, but what successful replication tells is is manifestly less powerful than successful replication on a new apparatus in a new location at a new time, with new values for everything we've failed to control.


Today was the first time I encountered a paper with a Docker image. Fantastic to be able to try it out with no effort.

I suppose this only works in a few fields though.


Not everything is resource constrained, though. Imagine being able to easily make interactive content that illustrates what you're trying to convey and allows the user to "play with it."

For things that are heavily resource constrained, it still could be a boon to have interactive access to the data that comes out of it.


Even if it's not practical to re-run all of the computation, in many cases it would be nice to have the output data stored in the document in a form where you can interact with it rather than just having static pixels.


It’s possible to also include the results, so no dilemma there. (I think current notebook formats already do this.)


Even for non-academic reporting: imagine if instead of 'dead' news articles or some tax reforms, or climate change, or whatever, you had an interactive model you could play with (and for example, plug in your own numbers if you disagree with some of the inputs).


Sorry to horrify anyone but we actually do this at work (mechanical engineering company) - JavaScript calculated component dimensions as form fields based on user input (e.g. pressure or load rating) overlaid on technical drawings.

Reason it's done in pdf is a lot of our technical is spat out in PDF format (generated from CAD - SolidWorks).

There are other options like Traceparts or setting up a variable input SolidWorks model to generate loads of static outputs, if you have the time and money.


Tons of articles in NYT, WaPo, FiveThirtyEight, and ProPublica have these. ProPublica also open-sources all their data and code on GitHub.


Good news: the software you describe exists since 1985.


I think now we have a lot of things like this-- we have Jupyter, Matlab, etc, to create engineer-centric general purpose interactive documents. We have labor-heavy ways to make end-user focused ones in the browser. We have spreadsheets.

But-- wouldn't it be cool if there was a way ordinary people could create interactive content to interact with data in a rich, intuitive way?


Why can’t ordinary people use Jupyter? Or put another way, what’s missing from Jupyter that would get ordinary people to use it?


> Why can’t ordinary people use Jupyter?

Because it's not installed, and they don't want to and shouldn't have to learn something new when there's something not new already at hand which suffices.

If you ever find yourself saying something like, "people can just do X" and wondering why they don't, turn it around and ask yourself, "why can't I just do Y?" In this case, that would be, "Why can't I just make my notebooks work in the viewers that everyone already has agreed upon using (i.e. the WHATWG/W3C hypertext system, i.e. the web browser) instead of asking them to futz around with installing and learning Jupyter?". When you start making excuses for why not, it's the moment you should be able understand another person's reasons for why not Jupyter.


My feelings about this aspect of Jupyter are two-fold:

1. On the creation side, it requires someone be comfortable with Python (or other Jupyter language) to some degree. Right now, programming is still considered a career skill rather than something "ordinary people" should be expected to know. Perhaps layering a graphical programming interface on top of this, which UE4 seems to have had some success with with their Blueprint system, would get "ordinary people" over the mental hurdle of being intimidated by code-as-text. Just look at the mental gymnastics people will engage with in Excel while thinking it's not programming.

I see this as more of a social problem than a technical one, at any rate.

2. Once you build an interactive Jupyter document (especially if you use interactive widgets), it's not necessarily that easy to share in its original state without requiring the reader also have a Jupyter environment set up or access a server running Jupyter. I would like to be able to share the document in a way that can be accessed offline by someone without them needing to set up the whole environment. Maybe an "Adobe Reader"-like application for Jupyter notebooks that "ordinary people" can just install with a click?


re #1: I think it's a technical problem too. I'm technically competent and enjoy programming, but I'd still like it if sometimes I could ask questions and get answers with less or no code. BI platforms are a pain in the ass for many reasons, but they often make it very easy to ask simple questions and organize the data in simple ways. A document that could do similar things without all the scaffolding would be cool.

#2-- Or just use the browser. It's capable enough, even if large datasets are somewhat problematic. The hard thing is the UI and identifying what the correct subset of functionality to surface is.


How can I send a Jupyter page as a standalone, offline document ?


Matlab doesn't even have proper text (Unicode) support… (And Octave even less so.)


Ok I'll byte (pun intended), of which software are you referring to here?


Sounds like a spreadsheet.


This already exists, with a focus on machine learning: https://distill.pub/


I would already settle for non-obsolete animation support :

- GIF is obsolete (~100x heavier than MP4 in my use-case, so out of the question)

- MP4 has poor support in PDF readers

(- Besides, PDF is not appropriate for electronic documents.)

- EPUB doesn't seem to support MP4 at all

(- EPUB does support PNG, not sure about APNG, will have to try it out...)

- MHTML=EML support has been dropped from browsers, which is completely baffling to me. There are alternatives like SingleFile, but they feel like dirty hacks : https://addons.mozilla.org/en-US/firefox/addon/single-file/

- What future for AV1 support ?


MHTML is a neat format, it's unfortunate it never got much steam. I think it could have been more popular if web browsers had defaulted to it when saving pages, rather than this weird html + _files/ directory (which on Windows is mysteriously linked so that when you delete one, you delete the other - no idea how they do that!).

What I've read of EPUB is also pretty disappointing. Seeing as it's a compiled format, once again, instead of going the zipfile + bunch of html inside + specific layout, we could have had a subset of html in .mhtml.gz with, like, metadata in a <script type="application/json" id="x-epub-metadata">. And then, guess what, web browsers could have been able to read it natively…


> (which on Windows is mysteriously linked so that when you delete one, you delete the other - no idea how they do that!)

Probably just some special code in Windows Explorer watching for the combo of .htm(l) file plus simlarly-named folder – via the command line I can delete just the HTML file or the folder separately without problems.


I’ve been surprised with what you can accomplish with data URIs. Embedding a MP4 can work great, but your text editor will likely hate it.


I've been doing a lot of research that applies here. The answer comes down to a few things:

1. using vector graphics wherever possible and then encoding it as SVG

2. if bitmap graphics are absolutely required and they can be procedurally generated, then do that

3. if large photographic data, video, or any other kind of data is required that can't be handled using the above steps, then separate that data set as you normally would using the file system directories, place the data set subtree into a ZIP archive, write your code so it references items by file paths relative to the ZIP, and then put your page into the root of the ZIP file, too, e.g. as index.html—your readers and reviewers follow along by using their system's native ZIP support to explore the contents of the ZIP file so they can locate index.html and then double click it, and index.html opens up with an "open dataset" button which you use to then feed in its own parent ZIP archive

The last part might sound complicated, but it's not much different from asking someone to use MS Office or VS Code or an IDE to open a file/project. (It's just that instead of requiring then to already have that IDE installed, you're giving them the IDE they need at the same time that they're getting the document/dataset they're actually interested in).

These approaches are robust enough that they're very unlikely to be broken by future browser changes. It's not that the tech is lacking right now, it's that human habits are lagging behind and we haven't yet established this as a cultural norm/protocol/expectation.


There are also other situations when the data is neither procedurally generated, nor large enough† to warrant this kind of treatment : photographs, video, (non-MIDI) sound …

†IMHO as long as your document doesn't cross 10 Mo, you shouldn't have to separate the data…


I don't understand your comment. It sounds like an argument against a process for manually creating these kinds of files, which is not at all what my comment was about. It was about accessibility, real-world engineering, and describing a file format/packaging convention.

The packaging convention I described is similar to the container formats used and created by MS Office apps. The difference is that DOCX, XLSX, etc rely on XML instead of HTML that can be used without requiring a separate proprietary app. People create and exchange those files every day (even for things as trivial as a single-page flyer) without knowing or caring about whether it should "warrant this kind of treatment". Worrying about a purported edge case for <10 MB(?) of data sounds like an imaginary concern.


My bad, I had indeed misunderstood what you were saying.


> Embedding a MP4 can work great, but your text editor will likely hate it.

Well, Libre Office Writer deals with (multiple, 100 Ko < size < 10 Mo) MP4 just fine. It's when the ODT is converted to PDF that most(?) PDF readers seem to be unable to read those MP4 properly.


> I’ve been surprised with what you can accomplish with data URIs.

Yeah, if I'm not mistaken, this is what SingleFile uses ?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: