Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: How I Wish Scientific Papers Were Displayed (egocodedinsol.github.io)
64 points by egocodedinsol on July 14, 2013 | hide | past | favorite | 62 comments



Stepping into an area about which I have strong opinions. I appreciate your effort, I just think you're way off base.

Physical Review Letters in journal (aka pdf) format are ideal. Nothing is especially stylish, but it's a simple, easy to read format that highlights figures as a way to convey the meaning of the work. I much prefer the PDF to the online version.

Some critics of your site (all in the spirit of "I hope you take this as a challenge to do better or maybe just realize there's not a lot wrong in the first place"): -your color contrast hurts my eyes (light grey on blue...) -the graphs are hard to understand (no axis labels on some, no captions to understand what's going on since that's all most people read first anyways, lack of contrast) -typeset equations are ugly (can't beat LaTeX) -drastic font size difference between code and text is jarring -no authors listed so I can instantly disregard your work because I've learned not to trust your methods (I kid, I kid... kindof) -the draggable figures did nothing for me -your column width was too narrow. Aim for 10-14 words per line. -because I have to scroll it's hard to quickly scan it for parts I actually want to read. When you read 10-20 papers a day, this is important.

One thing I really like: click the Fig. # and it scrolls to be beside the text. That is a nice feature that online journals can learn from.

In short, I think you're trying to solve a non-existent (or at least minimally existent) problem. LaTeX is beautiful, but perhaps you don't appreciate it. Journals in print (except for Science and Nature which blend all their articles together) generally look really nice. The PDF's of individual articles are the thing to read online, and I think your solution is a long way from any sort of reasonable online reading experience.


As a grad student who reads lots of papers (with a typical length of 30+ pages), I too have a strong opinion on this. I think PDFs do not serve the needs of the day and such efforts are a step in the right direction.

PDFs okay if you plan to print out the papers to read and are just about bearable if you read the papers on a large monitor. They are quite unfriendly for tablets or smaller laptops -- starting with the multiple vertical columns, which are taller than the screen height. The font size is dictated by the aim of reducing printing costs rather than reading convenience. Just for starters, a "responsive" reader with the ability to resize fonts would make a big difference. Notes/references which appear on the side (rather than the bottom of the page) or links to interactive data (which scientists will have to figure out a good way of using) would drastically improve the reading experience and the efficacy of this mode of scholarly communication.


You mention printing. This is the singular reason why I prefer pdfs myself. PDFs are viewable in the browser and resizeable. They are operating system agnostic. Most importantly, they preserve the formatting that the original author intended.

This is not to say it can't be all done natively in a browser, with resizing based on screen or print resolution. Just that most paper authors will not be savvy enough to do it right. It's enough for them to learn LaTeX properly (the only option for typesetting equations) and I doubt tooling for doing what you suggest will come any time soon.


> PDFs are viewable in the browser and resizeable. They are operating system agnostic. Most importantly, they preserve the formatting that the original author intended.

So are most webpages.

> Just that most paper authors will not be savvy enough to do it right. [...]

I believe this can be done gracefully in a manner where the authors just supply a source file and the software chain will generate the requisite output formats (much like pdflatex or Pandoc do the job today). Scientists should NOT have to muck around with HTML/CSS/JS/etc else the solution would be DOA.


> So are most webpages

View hacker news on multiple browsers and operating systems. It won't look the same. My gmail looks different too. Fonts are different which means the numbers of words per line are different.

> I believe this can be done gracefully in a manner where the authors just supply a source file and the software chain will generate the requisite output formats (much like pdflatex or Pandoc do the job today). Scientists should NOT have to muck around with HTML/CSS/JS/etc else the solution would be DOA.

Agreed but this is much much easier said than done.


PDFs are resizable, but still terrible for viewing on a computer screen, which uses a landscape orientation, whereas papers are printed in portrait orientation. (Sure, some monitors can be rotated, but most can't. A tablet can show stuff in portrait, but not at readable font sizes.)


One of the downsides to the formats (including PDF) currently used in academia is that their layout and typography is fixed by the author, and optimized for printing. You can't reflow the text for the screen unless you have the original source document.

For example, most papers use multiple columns. This optimizes for readability when printed in A4/Letter, but on a computer screen it's much harder to scan and read pages this way, compared to a strictly linear layout -- especially when the text is interspersed with graphs and tables.

Another example: Paragraphs are usually signaled with a line break and an indent. Papers seldom use any spacing between paragraphs. I think indented paragraphs work well for fiction, less so for academic texts.

The papers mentioned in this thread by dfc are good examples of those two problems.


Well, have you tried, or you just assume that you can't reflow the text? ;)

I read well enough academic papers on a Nexus 7 and I assure you it's not much more annoying than reading them in print. Search and zoom in for those [scale=0.2] figures come in handy too.


Your criticisms are of the submission's design, not the idea. I don't think that this submission fully emphasizes the advantages of a dynamic, interactive document as a medium for conveying ideas. I think this does it much, much better:

http://worrydream.com/LadderOfAbstraction/

https://news.ycombinator.com/item?id=3099595

see also: Khan Academy


Having been offered a part-time job to typeset LaTeX, I do really like it :) I used MathJax, which was the best math typesetting I could find for the web.

Without disrespecting your criticisms at all, is there any sort of dynamism which would make you want to read something other than the pdf?


If anyone is left wanting for an example of the pdfs from The Physical Review Letters. This is my best attempt at providing an example:

http://prl.aps.org/pdf/PRL/v111/i1/e012001

or

http://prl.aps.org/pdf/PRL/v105/i25/e252302


Also as someone who has strong opinions, I concur that PDFs are great, especially for archiving purposes. As a distillation of important information, I think they're ideal.

That said I think we need something else in addition to PDFs, rather than to replace them. Dynamic graphs, for example, and the ability to export raw data. In my field, rotatable/zoomable 3D views for molecules and crystals and the like would be great, as well as being able to measure bond lengths and angles. This would be infinitely superior to a static 2D projection. But these should all be in addition to: I still want a PDF I can take with me and read offline if necessary.


> rotatable/zoomable 3D views for molecules and crystals and the like would be great

Adobe Reader does this (?and maybe Evince). http://help.adobe.com/en_US/reader/using/WSebddb957d123ebb01...


There are two problems with this:

1) There's not good broad support for 3D views in PDFs; I imagine it won't work on most things other than Adobe Reader. What does it fallback on if it doesn't work? etc. And in any case, it's a corruption of what the PDF document format was "meant" to be, which is a document format guaranteed to display the same on all devices.

2) The 3D data isn't useful. It's raw information, rather than the metadata. If I'm looking at a 3D view of a crystal, I may want to export it to manipulate the structure myself or use it in my own simulations, so I want to know what atoms are there and information about boundary conditions, symmetry and the like. I specifically don't want a list of vertices and edges that, say, make up the individual spheres that represent the atoms.


I doubt anyone will move away from LaTeX, though I also like the one-click alignment of figure and text.

If LaTeX were to add a way to associate figures with text, then someone could write a converter from LaTeX to HTML to implement this idea.


Not to pile on but ..

.. and some lines were far too tight and some were far too loose, at least in my browser. I am 54 and I can't see some things that I expect a 24 year old can see, and I depend on interword space to be there.


Slightly off topic, but I wish there was a change to how scientific papers were written (and this comment doesn't pertain to this article - it's just a general comment). For some, inexplicable reason, it sometimes feels like authors are challenging one another to write the most convoluted, unclear account of what is actually a fairly intuitive idea.

Clearly, this is not always the case, and obviously complexity doesn't lend itself to nice 10 word summaries (hat-tip to Jed Bartlet), but equally the choice of language used and the pace of arguments can make a huge difference. There is no need to use "big" words, when simple ones will do just fine. There is no need to use technical language in place of standard vocabulary where it adds nothing to further your ideas. Don't "hit the ground running" but introduce a new idea with an analogy, or a toy example.

Write papers so humans can understand them - often the science being portrayed is complicated enough, there's no need to further complicate your ideas to the point where you leave English speaking audiences trying to decipher the message, let alone what foreign readers must think.

The best papers are the papers you can read abstract-to-conclusion without stopping and asking yourself, "What does that mean?". This is all too rare an occurrence for me. /rant


The fundamental issue is that papers are often incremental, that is, they build upon some previous work, which is thereby referenced. The issue this creates is that the papers become hard to follow for anyone who is not already familiar with the previous work in the cited papers. Even if someone goes on to read the referenced papers, it often does not help as the problem is recursive.

This keeps on happening till the research becomes old and mature enough for someone to write a book on the topic which then is nearly self-contained (or assumes some prior background like mathematics, or undergrad-level courses, etc.). MOOCs are also helping the situation since the instructors create nearly self-contained courses which often also include very recent research results.

To solve the issue you mention, this basic notion may need to be broken, such that each paper explicitly explains its subject matter without relying too much on the references. Since much research does not go anywhere, this may be a premature optimization really though.

A thing that does bother me, and signifies the extent to which what you say is true is that "talking" to the authors of these papers nearly always delivers those good insights within a few minutes that you would not get after spending an hour on the paper they wrote.


A thing that does bother me, and signifies the extent to which what you say is true is that "talking" to the authors of these papers nearly always delivers those good insights within a few minutes that you would not get after spending an hour on the paper they wrote.

This. A thousand times this.

I'm not talking about technical language - there are often (if not constantly) times when technical language is required to convey a precise meaning within the context of a paper. All I'm talking about here is the use of complexity where it's not required. It's a comment on the quality of writing.


With some exceptions like Osmium wrote below, I do not think the authors are using unneeded complexity purposefully. I think complexity does end up happening still because (A) of the reason I mentioned, (B) them being not as good at explaining [1], and (C) possibly added effort required. Of these, reason A does not apply when talking to the authors in person.

I would like to hear if you have further explanations for what is behind this. For example, I read in [2] that reading is much more complex operation for the human mind as compared to seeing or hearing because, assuming you believe in biological evolution, reading and writing came to humans only about 6000 years ago while verbal language and hearing came a few millions years back. I do sometimes find listening to Coursera videos more helpful than just reading the slides, but then feel that this is generally because the slides are often incomplete (they often do not really "say" it).

[1] Many people, especially engineers, often lack good presentation skills. I am extrapolating this to writing since that sounds natural even though I do not have not directly experienced this.

[2] http://www.amazon.com/Designing-Mind-Simple-Understanding-In...


Scientific papers aren't written for you, the layman. They are written specifically for scientists in that particular field. I'm a PhD student, and never have I read a paper with wording that I would describe as "convoluted."


Lucky you! I agree, though, that the problem isn't as severe as the OP suggests, but it does exist. You see it especially in papers that aren't that great, where the author obfuscates what they've done because the reality isn't as impressive as they'd like (which is a shame, because that doesn't mean it's not useful research -- and if it was presented more concisely, it'd probably be more accessible by more people -- not everything needs to be groundbreaking!).


I am not the lay person, and I am not talking about communicating your findings to the pubic, which obviously requires an entirely different set of skills than communicating your work to fellow scientists (another problem, but one step at a time!). This is purely a comment on the typical language and style of writing used in papers - not on the content itself, per se, although the two are to a certain extent linked.

> I'm a PhD student, and never have I read a paper with wording that I would describe as "convoluted."

What field are you in, because I wonder if this makes a difference? I'm in a highly interdisciplinary field, so perhaps that's part of the problem.


You nailed it. If someone cares about laypeople understanding his research, he writes a blog about it. However, most research is of no interest to people who don't work on nearly exactly the same thing. And these people will not like to read the extra pieces needed to make something self-contained over and over again.


A great example of a web-friendly, interactive view of a scientific paper is the eLife Lens, example here: http://lens.elifesciences.org/#00380 . They announced the open source project back in June (blog post http://www.elifesciences.org/lens/). This is the direction that web-based paper views are moving toward, and some publishers (notably PLOS) already have quite nice interactive components.

PubMed, the government-run biomedical abstract database, also recently introduced its PubReader app for reading biomedical papers (http://www.ncbi.nlm.nih.gov/pmc/about/pubreader/). But my money's on eLife here — NCBI is better known for its databases than its user interface design (to say the least).

It would be great to see some innovation here (some publishers have absolutely awful web-based journal article views). But as the main format for paper publishing and dissemination, I think the PDF won't be going away anytime soon.


Thanks for the refs!

I really like plos, but one criticism I have of their interface is that the figure takes up the entire page, which destroys my easy reference to the rest of the content. e.g. at http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fj... Figure 1 takes up the entire height of my screen on my laptop and then I have to scroll back to where it is in the text.


Thanks for the Lens link! I may have to try this with my next paper.


Lens reads JSON, and we convert from XML to JSON using this component: https://github.com/elifesciences/refract. We are working on updated documentation, and a getting started guide, our hope is to make the base system flexible enough to display any "component".


Any existing tools to help author NLM XML?


A great example in this vein:

http://worrydream.com/ScientificCommunicationAsSequentialArt...

You can mess with the sliders, etc.

Video @ http://vimeo.com/67076984


I love BV!

fwiw I considered using the Strogatz paper as an example for consistency (maybe it could become something of a standard for reimagining paper layouts), and use tangle or knockout, but I decided to get feedback sooner rather than later after reading a line in pg's essay today re: procrastination.


I like it!

Clicking on a link and having a figure appear solves a small but real problem.

In mathematical papers it'd be nice to be able to do the same with equation references --- click on the equation number, and the equation would magically appear in the margin. Ditto bibliography entries, theorem statements, definitions, and so on. I guess there are possible solutions other than having them appear in the margin, too.


Thanks! I was thinking about doing the same with bib refs since I don't like being sent to the bottom of the page like a footnote. I really like the idea of doing the same thing with theorems, defs, etc.

I'm a bit worried about what to do with column overload, though, if I add a col for X, a col for Y, etc. Perhaps a hover-text is better but that would obscure the content ...


I'm not sure I see why you'd want separate columns for X, Y, etc. Why not have one margin, and use it for theorems, definitions, etc?


My★ PDF viewer can do something similar to that:

http://skim-app.sourceforge.net/

If the PDF contains proper links, just CMD-click them to bring up the linked position.

★ - one I use, not authored


This is quite unreadable with the low contrast colors and the columns are much too small. If you pick a random paper on the arxiv[1] and imagine how much vertical space it will take to just to show the text you'll see the problem.

[1] http://arxiv.org/list/hep-ex/new


I think you're right re: contrast, but I'm still conflicted about column size. On the one hand, a wider column gives better justified results, but on the other hand, it makes the sliding figure column more difficult on 13 in. Screens. Which did you find more beneficial, wider columns, or sliding figures?


Contrast[1] is too low for me as well, and text should not be justified[2] IMO.

I too appreciate your effort, but keep in mind the differences between print- and web- layout design. Most scientific papers are published in PDF, using Latex. These two work great for the print. Correct hyphenation, good justified text, etc. Browsers are not that far yet, unfortunately.

[1] http://contrastrebellion.com/

[2] http://www.rnib.org.uk/professionals/webaccessibility/articl...



I came in to mention the tufte latex class. It is unfortunate that you posted the link with no context. If you are going to post a link with no context I would think that one of the two samples would have been a better choice:

sample handout: http://mirrors.ctan.org/macros/latex/contrib/tufte-latex/sam...

sample book: http://mirrors.ctan.org/macros/latex/contrib/tufte-latex/sam...

For those looking for some context:

"The Tufte-LATEX document classes define a style similar to the style Edward Tufte uses in his books and handouts. Tufte’s style is known for its extensive use of sidenotes, tight integration of graphics with text, and well-set typography. This document aims to be at once a demonstration of the features of the Tufte-LATEX document classes and a style guide to their use."


The layout is nice, but I found the text really difficult to read with the lack of contrast. I imagine it would be next to impossible for someone with a vision impairment.


Thanks. Did you like the auto scrolling of each column depending on what was clicked?

re: contrast, I've found black on white is too harsh for me (I like the 'lights out' setting on project euler for instance.) Do you prefer dark text on light bg, or light text on dark bg?


Everyone seems to have different preferences. Maybe you could provide a means to select from among several color schemes.

But I really wanted to say that this is a wonderful idea. I agree that there is a great deal of friction in flipping back and forth between text and figures plus text and references. Your solution is a great step in the direction of solving this.

EDIT: Just went back for another look and realized that you can move and resize the figures. Nifty! Now if only we could combine this kind of convenient online reading with decent typography - but that's a huge issue in itself.


Black on off-white works well and is fairly universal (like here on HN). Unless you know you're aiming for a particular crowd with a specific style (eg green-on-black to evoke hacker-geeks), stick with something close to fairly universal.


I like black on white, because a lot of the displays I use (e.g. my mobile phone) are too washed out to display more subtle differences in color.

Have you considered using a user stylesheet to override the default font colors in the browser?


Ah, thanks. I considered doing a lights on/off button like project euler, but decided not to. Part of it was that it seemed like additional complexity and I was worried it might overload the minimalist design, and part of it was that I'm not a javascript developer, so additional overhead is more of an issue than it would be, say, for a n intern at the Khan Academy ;)

Ultimately, it would be great for someone to do that, but this is more proof of concept than a product per se.


Oh, User CSS is something you do in your browser to override what site authors do.

Here's the docs for Opera, although I think this is also a feature in Firefox (and maybe Chrome; I'm not as sure about that one).

http://www.opera.com/docs/usercss/


I don't generally read scientific papers, but I found it annoying that I had to click on the "Figure 1" in the text to see the figure, and that the space in the figure seemed useless until you hit the reference in the text.

Part of me wants parallax scroll, but I'm sure that would be a terrible fit unless you had the right number of diagrams and the references to diagrams were equally spaced out (as opposed to having two in the same line).

This also reminds me of good old <frame>s from the 90s...

Edit: I realized afterwards that you refer to some diagrams multiple times, such as Figure 4. I guess this is a step towards not having to flip back three pages in the PDF to get to that diagram, but I'm not sure it's a "300x improvement" that would cause scientists/academics to switch away from LaTeX (or use your new LaTeX module, if you went that way) any time soon.


I'm not sure what you mean re: click on "Figure 1" to see it. You could scroll back up, but I thought that was the equivalent of flipping back pages. In a lot of scientific papers I read the figure is not on the same page as the first introduction. I considered fixing them to one position, or placing them within the text like most people do, but this interrupted the flow for me.

re: "Edit" It's not a 300x improvement by any means. Definitely less cool than the research I get to do ;) That said, I would like to see things switch away from LaTeX as typesetting gets better on the web. I love LaTeX, and was even offered a part-time gig as a LaTeX editor for an economist that has 3 full-time typesetters (I'm not even kidding, he must take division of labor seriously). But it doesn't seem to allow for the same kind of freedom we have now as a result of newer media: it's optimized for print, even with addons like hypertext.


So I find that in order to see the diagram, I need to either scroll manually (O(n) linear search) or click on the reference in the text (makes my hand hurt).

I 100% agree with the problem you've identified, but I find your solution confusing.

Then again, I found LaTeX confusing when I first used it, so I guess it might just be me not being used to it.


You should make the source more discoverable, particularly on your 'homepage'.

Particularly on a site like this, but also in general, in doesn't take much more effort to write a pull request than it does to write a comment.

For example, modifying the colour scheme or layout would be a few lines changed in the css. Encourage people to give the most useful feedback they can - patches!

I should probably go and raise a PR implementing this now, but I've already spent so long writing this comment... ;)

[EDIT] Adding a Readme to the repository would help people understand how everything is organised as well (for people who want to look at the nuts and bolts), and a License file would clarify what, if any, modification of your work is allowed.


it's all there at github.com/egocodedinsol/egocodedinsol.github.io, or did you mean adding a direct link to it?

I'm not sure how much it's a software project so much as a proof of concept. If you did want it to be more of a project, what would you suggest doing?


Sorry I missed this earlier!

I do mean explicitly adding a link to the source from the site itself. Sure, you may not get anyone to contribute, but at least there are no excuses not to. The best feedback almost always comes in the form of a patch :)

If you wanted to make specific components more of a project, consider splitting them into their own repositories. If you have a branch 'gh-pages' it will be served under username.github.io/repository

If you know of existing issues or feature requests, make use of the Github issue tracker. This shows people that you are thinking about the future of the code, so they feel comfortable that if they contribute it won't be wasted effort. This also lets people know of an easy way to provide feedback that won't get lost into the aether.

Lastly, continue to invite discussion and ask for contributions. If somebody says "The contrast is too high" say to them "here is where the colour scheme is defined, what should the values be?"

Hope that all helps, I really like the ideas that you have. Innovation is always important!


To use: click 'text' to sync text col to reference of figure in text, multiple times if there are multiple are multiple references, drag a figure, resize a figure, click 'reset', click a figure reference to sync to the figure col.

I know there are a few editors that read this, I would love to see better interfaces for your journals ;).


what on earth is happening to the kerning? chrome/linux the line "but is better than the hand-waving I was doing." is awful (no space between words).

edit: word-spacing: -2px; wtf?


Narrow column + no hyphenation + justified = typesetting disaster, especially with the crude type handling in web browsers.


The justified cols aren't great yet on web browsers and I was getting too much white space. I think I'll go back to left-align given your feedback, thanks. For what it's worth, the extent of my design knowledge is "Design for Hackers" so I don't claim to be an expert or anything.


I wish they simply hyperlinked their sources.


You guys should take a look at eLife sciences journal then... I think it is what you want...


Oh, why the color! I can't read anymore.


onfirefox22.0onwindowsthetextlookslikethis thanksbutnothanks


Odd, it doesn't do that for me on ff22windows. I originally messed with word spacing because it looked too sparse, but I'm going to switch it back to something safer. Thanksforthefeedback;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: