Hacker News new | past | comments | ask | show | jobs | submit login
How to Recalculate a Spreadsheet (lord.io)
298 points by todsacerdoti on Nov 9, 2020 | hide | past | favorite | 72 comments



>The few people with the problem “my Excel spreadsheet is too slow” have already written enough Excel formulas that migration to any other platform is impossible. Fortunately, I have no understanding of business, and so we’re going to look at further improvements anyway.

That's how you know you're reading a really solid tech piece and not some blogspam!


I had this line on my clipboard, ready to paste into this comment thread:

> Fortunately, I have no understanding of business, and so we’re going to look at further improvements anyway.

That line grabbed me the same way it did you. I think of Electron chat apps, and how JIRA pages take 15 seconds to load, all the while the UI is glitching and twitching and daring you to click early[0]. And the only reason it's allowed to continue is because very large and complicated processes have been built on top of them.

[0] (What I want to know is, how do these lazy-load UIs know I'm about to click before I do? Because the buttons seem to dodge my click with uncanny timing.)


Alright, we’re piling onto this comment. So let me complain about clicking a Google result, immediately realizing I want the result beneath, hitting back and clicking, just in time for a “more like this” panel to drop down below the first result.

Man, Google is so frustrating to use.


It’s really not so bad. I only do this about 300 times per day.


Surely Google has to know how bad this is. Why have they not changed it? It makes no sense to me. It is so frustrating how slowly these big companies move.


Because fixing this is not a new feature that would get the project manager a promotion. (I think that's the usual answer for the multitude of silly bugs while harebrained new "features" creep in.)


The same stupid behavior is what often gets people to click ads. Surely they could load the ads at the same time as the results, but no, they make them come in just slightly later, right as you're about to click a link.


I tried to hide it with stylus addon, but they keep changing the id.


Isn't that something they penalise other websites for?


Most annoying thing is even gmail does this to me all the time now. Enter the recipient, then click down to subject line.. but in that two-second delay gmail suggests new contacts, just in time for me to accidentally click on them.

Not to mention the whole UI taking forever to load, with the trash button seeming to be the last icon up.


I spent a few hours migrating my email away from Gmail to Fastmail. Probably had to update 100+ sites with my new email address.

I you ask me why I did this, I might grumble about Google becoming too powerful, and my desire to decouple my life from them. And that's true.

But the concrete thing that pushed me over the edge was the UI getting worse and worse and worse.

I'm sure there are a dozen great web mail services out there, but Fastmail is the one I landed on and it's been fantastic. But the moral of the story is that I might have stayed with "evil Google" longer if they also didn't screw up the UX.


> Probably had to update 100+ sites with my new email address

This is one of the main reasons I have my own domain. When I migrated to Fastmail, I just had to tell Fastmail that I wanted it to handle mail from the domain, set the MX records at my domain's nameserver to point to Fastmail, add a TXT record there for SPF, and add some CNAME records for DKIM. Fastmail gives instructions for all that here [1].

There was no need to update any sites with my email address because my address did not change. And there will be no need to update any sites if I ever decide to move from Fastmail to some other mail service, or to go back to running my own mail server.

Even simpler, you can use Fastmail as your DNS provider. Then it is just a matter of registering a domain, and telling your registrar that Fastmail is your DNS provider. Fastmail takes care of setting up the mail-related DNS records then. Here are there instructions for this approach [2].

The only serious things I use addresses that aren't @mydomain for are things that I might need to receive email from in order to resolve problems with @mydomain.

I strongly recommend this approach.

[1] https://www.fastmail.com/help/receive/domains-setup-mxonly.h...

[2] https://www.fastmail.com/help/receive/domains-setup-nsmx.htm...


I have my own domain and email host, but still ship it through Gmail because I like it. At least I know that I'm not tied to Gmail.


I'm in the middle of doing exactly this right now. I'm at the point where I have set up email forwarding from Gmail to Fastmail, but not switched any service accounts over to my new address. Even just this quarter-measure has improved the UX of email for me by roughly 573%

It's not strictly the web interface that is better/faster/etc either. I get notifications from the Fastmail app for forwarded messages ~30-60 seconds before the Gmail app knows about the original.


What’s wrong with labels and tabs and important and starred and blah blah blah?


I don't suppose Fastmail has labels, does it? Labels are the one thing that's keeping me from switching from Gmail full-time.


I was going to reply "no", but I checked the settings just now and it appears it does!

This is from the settings:

    Labels
    ======
    Organize messages with:
    [ ] Folders: file each message into a single folder
    [ ] Labels: add multiple labels to each message
    Switch methods at any time. Folders will become labels and vice versa.
I don't use labels, so I don't know if it's the same as how Gmail does it. But multiple labels per message is a thing it looks like.


What's wrong with the UX? I haven't used anything else in a while.


I haven't used Gmail for a couple years now, but when I decided to leave, it was slow. Loading the page took awhile, sifting through emails felt sluggish, etc.

Nothing like JIRA or other really slow sites, but just enough to be noticeable and annoying. Like I could "feel" the JS-heavy frontend doing more than it needs to in order to display a list of emails, or show a single one. Meanwhile it's nagging me on the left to add contacts to my GTalk, or Hangouts, or whatever they keep renaming that thing to.

The whole tabbed inbox thing didn't really help me. There are some "promotional" emails that I want to see in my main inbox, but most I don't. But with the tab system, I still have to check it all the time to avoid missing the ones I care about. So instead of decluttering things, I now have more places to click on to see what I have.

There was something about them disabling Exchange support a few years back. This made emails arrive much slower in my inbox on my iOS native client. I can't remember the specifics of that, or if IMAP is now functionally equivalent, but I don't care anymore.

And I just really would like to see my inbox and the currently-selected message at the same time. AFAIK, you have to choose between the message list or a single message; can't see both.

-------

So, after typing all that, I just logged into it again to see if my complaints held up. It seems faster now. I discovered I can disable the tabbed inbox (probably always could). I can enable a right reading pane (probably always could), and change the theme to a dark background (again...).

I got it almost looking like my Fastmail setup. But they're 2 years too late, and the font rendering is kinda crappy on my non-HiDPI external display anyway

So I guess now the only reason I don't use Gmail anymore is because Google.


> just in time for me to accidentally click on them.

And somewhere a product manager earns their bonus. Engagement! KPIs! metrics! Omg people really love this feature, they keep clicking on it <3


Can you enter the subject first, then to recipient?


Zendesk has a horrible UI for that reason. You open the dropdown menu to choose the operation to be applied, try click "pending" and inadvertently end up pressing "solved" when the UI changes just in time. Infuriating. I've been trained to patiently wait until the flickering ends, which is such a shame in 2020.


"Until the Flickering Ends" is going to be the name of my autobiography


> JIRA pages take 15 seconds to load, all the while the UI is glitching and twitching

In the biz we call that "an improved user experience"


The “Build Systems a la Carte” paper famously compares Excel with build systems such as Make and Bazel. Recalculating a spreadsheet efficiently is the same problem as compiling source code.

https://www.microsoft.com/en-us/research/uploads/prod/2018/0...


Yes, or (in this case?) interpreting it. I think of spreadsheets as visual functional programming environments. The humble spreadsheet has an interesting kinship with the most trendy, and, for some (me!), difficult-to-grok programming paradigm.


YES! This is exactly how I see it. I feel like I'm in this weird intersection of people who've spent A LOT of time in Excel and also significant time in programming to feel this way about spreadsheets but I don't know who else to talk to..

I tried making the same point on reddit the other day

https://www.reddit.com/r/excel/comments/jpb2ud/edit_a_spread...


The soon-to-come LET function agrees :)

https://support.microsoft.com/en-us/office/let-function-3484...

Disclaimer: I work at Microsoft


That's awesome. So you basically allow implicit creation of off-spreadsheet named cells (sub-cells, in a sense, as they're only locally available). Very useful.


See also "Spreadsheets: Functional Programming for the Masses" by Simon Peyton-Jones et al.


This is an interesting paper - thanks for linking!


Topological sorting has many applications. I built this a couple years ago https://github.com/atlassubbed/atlas-relax.

It's a DAG-computing framework in JS that allows you to listen to updates in a compute graph. It's a set of legos that I used to build both a MobX clone (https://github.com/atlassubbed/atlas-munchlax) and a React clone (https://github.com/atlassubbed/atlas-mini-dom). React and MobX are actually the same thing mathematically: reactive DAGs, so they can both be built using the exact same abstraction.


FWIW, I found it weird that the article didn't mention "DAG" once...


If you're in TypeScript/JavaScript world, check out HyperFormula [1]. There's a summarized design document in the documentation that explains the AST, dependency graph, formula evaluation and lazy CRUD operations [2].

Disclaimer, I'm one of the founders.

[1] https://github.com/handsontable/hyperformula [2] https://handsontable.github.io/hyperformula/guide/key-concep...


Looks nice. Do you have plans to allow it to implement nicely with grid frameworks such as Ag Grid, Kendo etc?


We concentrate on making it a general purpose calculation engine and bring value by adding more formula functions (now there are about ~200 of them).

And we will integrate it with our other product, Handsontable.

It should be fairly easy to make any integration. If not, then we will be happy to help.

So far I have mostly seen people integrate it with vanilla JS and Node. Any framework should be possible, as long as the data CRUD operations are performed through the HyperFormula API.


The dependency graph is why you should always prefer INDEX over OFFSET, unless you really can't avoid it:

https://docs.microsoft.com/en-us/office/client-developer/exc...

Every instance of OFFSET in your sheet is recalculated after each cell change. I learnt that the hard way.


Article doesn't explain the issue, but OFFSET lets you access cells that aren't referenced in the formula.

For example, OFFSET(B2,1,1,1,1) is a live reference to cell C3, which means you can use functions like COLUMN to investigate the range. C3 shows up nowhere in the formula, so there's no non-volatile way to implement it.

The first argument to INDEX is a "sqref" (cell, range, or set of ranges) and INDEX will error if you try to reference a cell outside of the sqref, so use of INDEX doesn't break the obvious dependency structure


or worse, INDIRECT, which allows you to arbitrarily refer to any cell anywhere with a string.


"Volatile functions are always dirty."


I found this: https://www.geeksforgeeks.org/topological-sorting/ very helpful when I was trying to figure out how to solve a toy spreadsheet I was working on. Includes a couple alternative implementations in a variety of languages.


The issued patents are published by the patent office so you can find the whole long list of them and see exactly what they say. Of course, you couldn't actually read that whole list as there are too many of them. In the US, there are hundreds of thousands of software patents.

There is no way you can keep track of what they are all about. You would have to try to search for relevant ones. Some people say that should be easy in these modern days of computers. You could search for key words and so-on. That one works to a certain extent. You will find some patents in the area. You won't necessarily find them all however. For instance, there was a software patent which may have expired by now on natural order recalculation in spread sheets. This means basically that when you make certain cells depend upon other cells, it always recalculates everything after the things it depends on, so that after one re-calculation, everything is up to date. The first spread sheets did their recalculation top-down, so if you made a cell depend on a cell lower down, and you had a few such steps, you had to recalculate several times to get the new values to propagate upwards. You were supposed to have things depend upon cells above them. Then someone realized why don't I do the recalculation so that everything gets recalculated after the things it depends upon? This algorithm is known as topological sorting. The first reference to it I could find was in 1963. The patent covered several dozen different ways you could implement topological sorting but you wouldn't have found this patent by searching for spreadsheet. You couldn't have found it by searching for natural order or topological sort. It didn't have any of those terms in it. In fact, it was described as a method of compiling formulas into object code. When I first saw it, I thought it was the wrong patent.

— Richard Stallman, 2002 (https://www.gnu.org/philosophy/software-patents)


USPTO is only obligated to search their own database for prior art. Others must pay to invalidate patents that are obvious next steps, just math (which is indeed unpatentable even in the US), infringing other (defensive) patents, or which are invalidateable due to published prior art.

It's hard to believe that topological sort (available in the GNU `tsort` and `make` apps, for example) was ever patentable.

Perhaps Wikipedia contains references to published prior art?

If they don't even look at everyone else's registered-as-unique ideas, how could they have been knowingly infringing?

Everyone wants their cut of Progress in the Useful Arts and Sciences. Nobody will invalidate patents for everyone else; USPTO has no obligation to even check a search engine for the subject matter.

And then, prove the date that published prior art was published without a distribted time-stamping blockchain. Find the nonce that makes it all worth it!

Topological Sort: https://en.wikipedia.org/wiki/Topological_sorting

Reactive programming: https://en.wikipedia.org/wiki/Reactive_programming

DOM implementations typically don't repaint the whole screen. (All browsers do graph traversal in order to determine what to redraw or draw first): https://en.wikipedia.org/wiki/Document_Object_Model#Implemen...


While it's cool to see the CS theory used to optimize spreadsheet calculations, it also shows some important points. You can ship the first version with a poor algorithm that works. Performance improvement matters to users, but after doing what they need correctly. If you don't lead all the way to a solid solution you will be replaced by someone who does.


Ik not sure the algorithm used by VisiCalc was poor for the constraints set by its operating environment. Fancier algorithms and their fancy dirty bits and inter-cell links take memory, and that was in short supply (VisiCalc was initially designed to run in 16kB of RAM, but ended up needing 48kB)

Also, in most spreadsheets (certainly at the time), evaluating in top to bottom, left to right order is the correct way. That the code had to go through all cells a second time to discover that you’re done didn’t matter, given that there were no other processes that could want to use the CPU, as long as you interrupted calculations on key presses.

Also, I’m not sure 123 beat VisiCalc because of its smarter recalculation. I think it were the charts and its macro system (awful by today’s standards, but way better than nothing)


Topological sorting - that's a very nice idea.

In the past I've implemented dirty marking and, while fun to figure out the appropriate logic, topological sorting seems much neater.


The Jane Street talk on this is interesting - they walk through how their incremental system evolved as they addressed issues like performance and garbage:

https://www.janestreet.com/tech-talks/seven-implementations-...


The reactive programming model (or whatever you call it, honestly don’t know) is at the core of so many great tools and products - eg Terraform/Pulumi, Kubernetes, React, Mobx, Observable...

At the same time, we’ve struggled to find decent reference material or libraries for building a reactive framework for X (for our use case, X = data science workflows). Most of these libraries seem to implement all the primitives from scratch.

I’d be interested to here other people’s thoughts on this!


As mentioned in the article, Salsa is really good for this


Reactive programming is a very solid paradigm. It is hard to get it right though. Especially when calculations are asynchronous. Excel is ubiquitous but is universally painful to use for anything beyond quick prototyping.

This was the major motivation for us to create

https://ellx.io

- a platform for visual reactive programming, working in your browser.

ObservableHQ was obviously an inspiration, but we concentrated on making it more production oriented: you can literally build a fully-fledged web application without leaving your browser, around solid reactive abstractions.

Some examples built with Ellx:

[1] https://ellx.io/matyunya/tensorflowjs-simple-demo

[2] https://ellx.io/matyunya/simple-voyager-clone

[3] https://ellx.io/dmaevsky/monte-carlo


React Hooks reminds me a lot of spreadsheets. Change a dependency and everything needs to be recalculated that depends on it and so on.


I came here to say this. The whole thing reminds me of shouldComponentUpdate, and I wonder if immutable data structures could be another solution style, or if they turn out to just work like this at a lower level.


Yes, there's a very deep correspondence. My RustLab talk is about pretty much exactly this topic.


I'm not a frontend developer, but I think functional reactivity is also pretty core to how Svelte.js works[1].

https://svelte.dev/tutorial/reactive-statements


It's interesting to note that the improved algorithm also has a downside. With the VisiCalc algorithm, it is possible to model iterative fixpoint computations, so you could have a process that "converges" to a value, even having cycles in the computation graph. This is not true for the topological-sort based variants.


The VisiCalc story blew my mind here. It calculated everything from left to right over and over?

Around 2000 we had to make a spreadsheet in Java as an assignment in an introductory programming class. That it would be calculated in "demand-order" was such an obvious requirement it wasn't even mentioned in the assignment. This says something about what modern tools and languages does (Not that Java and Swing feels all that modern any more). Because they surely understood the problem way better than introductory programming students 20 years later.


Back then there was no memory vs speed tradeoff as the constraint is to minimize memory. It makes sense when you realize it has to be as simple as possible: there is no memory budget for the code or data structures to be more computationally efficient.


Interesting read, thanks for sharing! I have played around with formulas for my Excel-like data table product [DataGridXL](https://datagridxl.com) for quite a while but I decided not to include the feature since it was a little more difficult to implement than I had imagined, exactly because of these update chain rules. Perhaps one day, in a v2.

Most other Javascript spreadsheet products are slow even without formulas, with a very slow DOM structure.


Does anyone know how LibreOffice Calc does this?

It is much slower than Microsoft Excel and WPS Office for complex loads [1, 2]

[1] https://github.com/amzn/computer-vision-basics-in-microsoft-...

[2] https://news.ycombinator.com/item?id=22357374


Observable works like this (for arbitrary JavaScript):

https://observablehq.com/@observablehq/how-observable-runs

Here’s the runtime source:

https://github.com/observablehq/runtime


The journalist quoted, Tracey Robnett Licklider, is the child of JCR Licklider whose vision for DARPA gave us all the toys we enjoy today. If you haven't read Mitchell Waldrop's "The Dream Machine" then you should remedy that. It's a great read and a hell of a story, and suddenly all the names of computing history will fit into a picture.


Interesting. I've never seen salsa available by weight, only by volume. I'd like the spicy salsa please, yes the 50 gram - pesado y picante.

To be honest you lost me at the end with the burrito code, I was left wondering why we have burrito code in our spreadsheet calcs.


burrito code vs spaghetti code, which is worse?


I've always thought that these "dependency"-driven recalculations was akin to functional programming paradigms. Or am I misunderstanding functional?


excel is like pure FP in that

- everything you write is an expression that yields a value

- those expressions are "pure"/"referentially transparent" bc there's no way to do side effects (like writing to another cell)

the whole "recalculate value if any dependency changed" thing would probably fall more under "reactive" systems - you need the notion of inputs that a user can change.

but you're right that efficient recalculation is enabled by the calculations being side-effect-free! we know that changing a cell can only influence cells that reference it, so we only need to recalculate those, not the whole spreadsheet!


This was a fantastic deep dive, and it made me think about things I had not thought about in a very long time, as well as some things that were entirely new to me.

Bravo.


This was quite fun, this type of problem solving deserves more attention in the business world.


so basically a DAG


Excel allows for cyclic dependencies.


Or you could just use alteryx


Don't know alteryx but at 5k/user/yr, I'm not sure this classifies as a "just use"


It's a silly comparison, they're different products ultimately.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: