The Tyranny of Spreadsheets

narush · on July 22, 2021

Fantastic blog post. I highly recommend reading it in full, and also checking out the work of the European Spreadsheet Risks Group, and Felienne Hermans specifically (referenced in the original post). I've been working on a spreadsheet startup [1] for the past 8 months or so and those folks have a large amount of absolutely upsetting-but-helpful research on spreadsheet usage/errors.

After the past few years working with spreadsheet power users, here's how I like to think about them: 1. Most spreadsheets are just used a lists/trackers [2]. 2. Some spreadsheets are very calculation heavy, and are better understood as complex software projects (usually modeling something; s/o to those PE mfs) than as anything else. 3. Spreadsheets make the transition between (simple list/tracker) and (complex software project very fluid. This flexibility usually means that the (complex software project) that is created is buggy as hell.

Spreadsheets give not-super-technical users an incredible visibility into their data. Spreadsheets give not-super-technical users a way to program data transformations in what I would argue is the most generally intuitive way that exists. Spreadsheets give not-super-technical users tools to build software without introducing them to proper software development methods. Ya know, like maybe a test. Or a code review. Or no global variables. Etc.

If you want to see how we're attacking the spreadsheet problem, check us out. Feedback highly appreciated! [1]

[1] https://trymito.io/hn [2] https://www.joelonsoftware.com/2012/01/06/how-trello-is-diff...

dgudkov · on July 23, 2021

>Spreadsheets give not-super-technical users a way to program data transformations in what I would argue is the most generally intuitive way that exists

I don't think spreadsheets are a good way to program data transformations at all. Data transformations are inherently a pipeline/function abstraction, and even non-technical users understand that. Spreadsheets don't offer a good way to decompose a data transformation process and inspect each step for correctness. This alone makes them prone to calculation errors. Also, tables are an afterthought to spreadsheets, therefore even trivial joins are always a pain in Excel.

I have a good practical experience with building visual pipeline-based data transformation tools [1] for non-technical users and they have been a success.

[1] https://easymorph.com

tomnipotent · on July 23, 2021

> good way

But it is an accessible way. Your average Excel user can perform any number of useful transformations with simple formulas, and then throw that into a pivot table for useful aggregations (which is also a transformation).

> tables are an afterthought to spreadsheets

Excel power users have been solving this with VLOOKUPs for decades.

> therefore even trivial joins are always a pain in Excel

Trivial to a trained user. Excel has more advanced data modeling capabilities, which includes implicit joins when creating PivotTables from tables with relationships.

https://support.microsoft.com/en-us/office/create-a-data-mod...

jsmith99 · on July 23, 2021

Lookups in excel are fine, at least now we have XLOOKUP but there's something depressing about an excel model which contains multiple massive grids of lookups. I find such things are much easier in a real join, where you can get your data into long format then write a single line of code to manipulate meaningful named columns.

dgudkov · on July 23, 2021

The link leads to an article on PowerQuery which is a Power BI product. PowerQuery can surely do joins, but it doesn't operate within Excel's data model - cells and sheets. It's an external tool from the Power BI family from which integrates with Excel to a certain extent.

In a similar fashion you could've posted a link to running an SQL join query from Excel (which is surely doable for a trained user). Technically it's possible, but it would be wrong to pretend like the join is done in Excel.

tomnipotent · on July 23, 2021

> but it doesn't operate within Excel's data model

Power Query is an Excel add-on, and has been around much longer than Power BI and has nothing to do with it. This has been possible for almost a decade, since Excel integrated a robust in-memory columnar database capable of dealing with millions of rows.

> it would be wrong to pretend like the join is done in Excel

The join is done in Excel, I don't need to pretend. That you think Excel is just a bunch of sheets and cells is your own mistaken mental model of what the program is and can do.

Here's another reference:

"Now that Excel has a built-in Data Model"

https://support.microsoft.com/en-us/office/create-a-relation...

dgudkov · on July 23, 2021

Can you point me to an article that shows a simple way how to do a simple SQL-like join (no aggregation) of two regions in the same spreadsheet? Two source tables, and one result table - all in the same spreadsheet.

Something that is as simple as this: https://www.youtube.com/watch?v=RYCtoRTEk84

tomnipotent · on July 23, 2021

I sent you two links with information, including images that show the process.

Here's a video of the process:

https://support.microsoft.com/en-us/office/merge-queries-and...

Here's another video:

https://www.youtube.com/watch?v=BV3srtI20Bo

dgudkov · on July 24, 2021

Thank you. I didn't know this. I wouldn't call the process easy, but it's relatively straightforward.

monocasa · on July 23, 2021

> Data transformations are inherently a pipeline/function abstraction, and even non-technical users understand that. Spreadsheets don't offer a good way to decompose a data transformation process and inspect each step for correctness.

They absolutely give you that ability; you break the transformation across many cells. Spreadsheets are probably the most successful functional reactive programming paradigm set of languages out there.

dgudkov · on July 23, 2021

>you break the transformation across many cells

In theory, yes. But the reality, nobody does that unless an error is apparent and needs to be investigated. The vast majority of calculations in Excel spreadsheets are done using cumbersome, multi-line, unreadable formulas that try to pack as much logic as possible into one expression. Such obscure formulas are the main reason why lots of business-critical Excel spreadsheets contain grave errors, many of which are are never discovered.

pjmlp · on July 26, 2021

It is like programming, there are people that keep doing goto and methods that take two screens in Excel, and there are those that bothered to learn lambda, powerquery and addons.

taneq · on July 23, 2021

The fact that you had to use so many big words to explain why spreadsheets aren't a good way for users to "write some numbers here and then calculate some numbers from those first numbers over here" says to me that your definition of "good" varies a fair bit from the definition these users are using.

Joins are great and relational databases are super useful but these peoples' problems are 99% solved by "store a list, maybe do some arithmetic."

hcarvalhoalves · on July 23, 2021

Spreadsheets provide a spatial mapping (cells) to represent data transformations over time. Users are more comfortable working w/ spatial paradigms, as humans have a better sense of location in space than in time. Thinking in terms of "transformations over time" is difficult to reason about, specially in programming.

dgudkov · on July 23, 2021

The point about spatial representation is spot on. The problem is that in Excel calculations tend to be presented as huge unreadable formulas with no intermediate steps. Thus the spatial representation actually only covers the first and the last step, but rarely something in between.

stjohnswarts · on July 23, 2021

sometimes all you have is a hammer and a screw though. Not everyone is a computer scientist.

Ntrails · on July 23, 2021

Sometimes I want to do something quickly, and I dump 20,000 rows from a db query into csv/excel and fiddle. Tableau is often nicer for pure visualisations.

The downside is that something useful you make keeps getting extended and added on to. It is hard to identify when you should switch and do that extra work to move it. And then, you know, tech debt forevs

asdff · on July 22, 2021

I don't know why we think the user is some child and try to abstract away any and all complexity from their job, which ends up leading to some proprietary solution with huge inefficiencies somewhere compared to a flat file and a script in python or R. Software like excel is often seen as a way to do stuff you could do in R but without having to write code.

Imo that thinking is wrong because it makes this assumption that doing anything at all with a gui is easy and frictionless. I don't know how to do anything in excel. If I have to look up how to make a plot in excel, I will have to find some article on the internet of how to do that in my particular excel version. Excel is tough software to use that only becomes easier after you spend time with it (hint hint, just like programming). If I don't know how to make a plot in R, I might also have to find an article. It's no different, no harder, usually easier in fact, to just do the job in R. All these common tasks in R or excel are all 5 min articles worth of information you have to cram in your head.

It's not hard to code, children do it. I'd love to see spreadsheet power users realize that they could be working with much simpler things like parsing plain text files with a few lines of code rather than trying to open some macro heavy spreadsheet on the shared network drive that takes 5 mins just to open on the work issued workstation. I'd guess it would be up to business programs and accounting programs and all these other college majors to actually teach classes in python and R, rather than what they do now which is teach classes in excel. Imo the entrenchment in excel is rooted in ignorance to other (often simpler) options available with python or R, than in any actual critical proprietary features offered by excel. Microsoft is probably pretty happy that their software is so baked into these different academic curriculums, keeping it alive in industry going forward for an entire working persons career after college.

CJefferson · on July 22, 2021

People aren't learning excel because Universities are part of some Microsoft conspiracy. They are learning it themselves, or from each other.

My mother self-learnt excel, first as a to-do list, then a money tracker, then she learned some simple equations to keep track of weekly spending.

How in R would someone do that? With a nice graphical view? I agree Excel has many many issues, but people use it because, in my experience, it super easy to use and in particular let's you easily mix data and code.

Zababa · on July 22, 2021

People are also learning excel because that's the only way to program in a big company without going through "official IT", which won't listen to what you have to say if you're not a $100 000 project they can offload to a third-party.

thefr0g · on July 23, 2021

> People aren't learning excel because Universities are part of some Microsoft conspiracy. They are learning it themselves, or from each other.

At least in germany children are often forced to learn excel in school because the official curriculum includes "office software" which is an euphemism for "microsoft product training". This has certainly not developed from necessity. Even someone who thinks school should only be a preperation for the job market would agree that employees who are able to help themselves are better than ones who know how to use a specific version of Excel.

mavhc · on July 23, 2021

I'd rather everyone was taught Excel than a very minor set of people learn spreadsheets by themselves. Barely anyone uses spreadsheets in UK it seems

thefr0g · on July 23, 2021

I'm really not against teaching spreadsheat software since it's a really useful tool. I just don't like that it's thought very microsoft specific and in depth. Students could benefit way more from some basic computer (science) literacy.

nicoffeine · on July 23, 2021

> I'd guess it would be up to business programs and accounting programs and all these other college majors to actually teach classes in python and R, rather than what they do now which is teach classes in excel. Imo the entrenchment in excel is rooted in ignorance to other (often simpler) options available with python or R, than in any actual critical proprietary features offered by excel.

Every time a business user sits at a computer, are they going to spend 10 minutes getting it done, or 30 minutes trying to figure out how to get pip env configured so they can maybe eventually get it done? If by some miracle they are supernaturally gifted at programming and succeed after some weeks, they now have:

  - a nonstandard file format that will eventually break 
  - no real time feedback of formatting or calculation as they input/manipulate data 
  - no charting 
  - no GUI 
  - no "undo" button 
  - no import wizards 
  - no support (but I guess someone light years ahead of any dev talent I have ever seen wouldn't need it)

And all of this for a single use case.

Microsoft (and I am no fan) has the best tool for data manipulation that exists for most of humanity. Excepting the awful ribbon UI redesign, you could take any Excel 97 user, put them in front of the latest M365 version, and they could get the same work accomplished in the same amount of time.

You cannot say the same thing about almost anything else in the tech industry. Excel is terrible in a lot of ways. But it's a mostly reliable tool, has worked approximately the same way for 25 years, so people are going to use it and teach it. There's no point in wasting someone's valuable time with the horrifying fragility of the python/R/whatever ecosystem.

qwrshr · on July 23, 2021

there are languages that have better support with dependencies though, e.g. Julia (every time I have to debug dependency issues in some other person's Python code/requirements.txt, I find myself wishing it had been written in Julia). I'm not disagreeing with the overall tenor of your post, but I do think there's a medium that's possible to strike here: maybe something like a simplified DSL in Julia, or something like https://www.visidata.org/

CRConrad · on July 23, 2021

> Microsoft (and I am no fan) has the best tool for data manipulation that exists for most of humanity.

Sigh... "Microsoft has". Time was, spreadsheets was a generic category of software, made by lots of different software makers. Nowadays, it feels like people not only don't know this, but can hardly even conceive of the possibility that "spreadsheet" could mean something other than "Excel". At least I think that more people are still able to think of, say, word processors or presentation software as not necessarily Microsoft products; the spreadsheets battle seems to have been their most crushing victory in the "Office" or "productivity software" wars.

It's sad not only for the IT world in general or from a free-market perspective, but for spreadsheets in specific, too: Who knows what stuff like Quattro Pro or Lotus Improv would be capable of now, if they were still around?

pjmlp · on July 26, 2021

True, however Microsoft is not to blame for the management errors of others, and there are other spreadsheets on the market still.

Except instead of trying to be the C++ of Spredsheets, they target Go like capabilities, and then wonder why business keeps adopting Excel.

Jiro · on July 22, 2021

>It's not hard to code, children do it.

It is hard to code, and average children don't do it. The fact that exceptional children do it more than they do other adult things is because coding doesn't require a lot of expensive equipment and you can afford to do it as a kid.

IggleSniggle · on July 23, 2021

Yes, and also I think also because it is possible for it to be a self-isolated system; you don’t need as much greater context of the world to make sense of its rules.

duped · on July 24, 2021

I strongly disagree with this. It is easy to code and any child or adult can do it. The typical problems we solve with code are hard, and can only be solved with code.

Consider the quadratic formula. It's trivial to write this code. That's not a hard problem to teach anyone to solve with code. Or calculating compound interest. Or solving payroll and basic accounting.

All these things are done with spreadsheets as the abstraction instead of code because of the myth that "code is hard." Code is not hard, it's just we use it to solve problems that are so hard we can't do it with spreadsheets.

Jiro · on July 25, 2021

The steps needed to convert the quadratic formula to code are absolutely not trivial, unless you're a geek. If I went to my mother trying to tell her to code a quadratic formula, she'd have no idea how to do it, even if I explained everything.

Spreadsheets are hard too.

UncleMeat · on July 23, 2021

> It's not hard to code, children do it.

My wife knows R and Python. 90% of the pain of any new project is in environments and dependencies. Some weird error message shows up because there are multiple python executables on her machine or it isn't speaking with jupityer properly and we are on a wild goose chase of googling and command lining.

For excel, you open it and run.

Even if you know how to code, setting up a coding environment has enough "wtf does this mean" roadblocks for nonexperts that it becomes a terrible experience.

analog31 · on July 23, 2021

>>> It's not hard to code, children do it.

I think a paradox of programming is that we know it's easy, yet we also know that a certain fraction of people -- perhaps a majority -- will never grasp how to do it effectively. And the ones who can do it well enough to get paid for it, aren't sticking around in low paying administrative jobs.

And coding well enough to really replace a spreadsheet without making something even worse is yet a higher level of programming skill.

Given the virtually ubiquitous knowledge that programming skill is of market value, I think the existing proportion of people who can program is indicative of how many people are potentially capable of it. I don't know a single manager or engineer who doesn't think that learning a little bit of Python would be useful.

throwaway675309 · on July 23, 2021

Wow, I don't know how you arrived at the conclusion that coding is somehow easier than excel. Excel/Google sheets is tabular wysiwyg and about as simple as it gets... even for children who get acquainted with tabular data relatively early (reading simple scientific graphs, etc)

arthurcolle · on July 22, 2021

I never thought I'd be someone to do this but 'mito' is slang for pathological liar in French, i.e. short for 'mythomane' haha, kind of funny for something that's supposed to be a persistent source of truth. Names don't matter at all, but just wanted to comment on this. Lol!

johnwalkr · on July 23, 2021

“European Spreadsheet Risks Group” is the most European thing ever.

Andy_G11 · on July 23, 2021

'000s of rows of data in a spreadsheet - sigh. But say 'database' to some people and they hear 'The Devil'.

Excel is a glorious tool which welcomes all, the savvy and the unskilled but imaginative newbies alike. There is something about all those little cells that presents an itch everyone wants to scratch, and you just know that for some that scratching is going to produce something akin to a spreadsheet version of gangrenous melanoma.

On the other hand, some people are happy to say that the complexity of sophisticated Excel models means they would have been better off built in a code interface. Ha ha ha!

Even with the right mix of functional and object-oriented code and suitable documentation and version control applied to code, I respectfully disagree.

Data handling capabilities could benefit from skill in all of these tools: error checking (including analytical review); pen & paper; calculators; databases; spreadsheets; math & stat techniques; Word processors (& clear logical explanations).

Together, they form a rich menagerie of tools which will probably all be around until we have an AI that follows us around and we just tell it what we want. Actually, even then, it would be good to know them so that we can grasp the underlying logic and frame the concepts leading to the actions that we ask the AI to assist with...

zoomastigophore · on July 23, 2021

I think one of the main problem with databases is how do you populate them.

I wish there were simple-as-Excel frontends for databases where a normal user could input his data like in preformated Excel table without having to deal with the database mechanics.

rnavi · on July 23, 2021

> I wish there were simple-as-Excel frontends for databases

This is exactly what we are attempting to solve at nocodb : https://github.com/nocodb/nocodb

Which is, nocodb gives you a google-drive like collaborative spreadsheets on your existing databases (MySQL, Postgres etc)

And the original problem in article could be countered with nocodb as we keep audit of all changes done to the database.

tutuncommon · on July 23, 2021

This looks like a good tool.

II2II · on July 23, 2021

> I think one of the main problem with databases is how do you populate them.

There are several products that will let you populate databases in a table view and easily import CSV files into a database. The problem is with almost everything else.

Databases are designed to address a different set of problems. The design of database applications reflects the needs of people who need to maintain a database and make sense in that context. Unfortunately, those design decisions do not make sense in the context of people who use spreadsheets. Doubly unfortunate, a spreadsheet makes a better database than a database makes a better spreadsheet. If anything, the shortcomings of spreadsheets implementation dependent in most cases while the shortcomings of databases are due to the domain they are designed to address.

First example: creating a table is more involved than creating a new file and being presented with a blank table. First, a database is created. Then a table is created within that database. Then columns are created with each column, as a minimum, having a type. Only then can the database be populated.

Second example: doing a quick calculation is more involved than finding an empty cell and inserting a formula. The only way I can think of doing a calculation within a database involves performing a query. If the calculations are performed within a row, a new column can be created to contain the results. If it makes sense to store a result for each row, but it pulls or consolidates data from other rows, the complexity of the query is going to be interesting. If the calculation involves several rows, there is no obvious place to store the result within the table. There are several ways to deal with that, yet none of them are particularly convenient. If you need to perform a calculation on arbitrarily selected data, well, good luck with performing the query. Oh, and automatic recalculate ...

Third example: putting the data into a presentable form is less convenient with databases. It certainly isn't done in the table view since that is just a raw presentation of the data. A form is useful here, but it is a separate step and it isn't going to address all of the functionality found in spreadsheets.

dsr_ · on July 23, 2021

What do we know about most database projects? They need CRUD plus queries.

What's the easiest way to generate a CRUD form?

HyperCard.

We need a ubiquitous HyperCard equivalent that talks to the Real Database(TM) of your choice on the backend -- sqlite, postgresql, mysql/mariadb, Oracle, MS-SQL, whatever -- with easy functions for "get a sequence number", "get a unique identifier cookie" and "do this block atomically".

I bet there's a few out there right now.

CRConrad · on July 23, 2021

Delphi has been around (and better than HyperCard) for twenty-five years. Free Pascal for at least as long, but it has only had the visual component aspect provided by Lazarus for ten or fifteen years.

gopher_space · on July 23, 2021

Filemaker is a neat program.

otherme123 · on July 23, 2021

Sadly databases are hard because data is hard. I've seen people trying to stuff one-to-many relationships in a single Excel tab duplicating the one side as many times as needed, e.g.

  Author_A Book_1
  Author_A Book_2
  Author_A Book_3
  Author_B Book_4
  Author_B Book_5

Another strategy I've seen is to stuff the many-side into a single cell, splitted with some less used symbol like |, ; or : . I'm curious to find what would they do when they meet a many-to-many relationship.

Horrors surfaced when we analyzed the data: unintended duplications because ids and uniqueness checking was not a thing, shifted columns, undocumented column "formats"... In the end we parsed the whole mess into a DB.

I don't see how you could implement a frontend for a database simpler than Access. It was as easy as Excel, with relationships. But reasoning a schema is harder than stuffing all things in a 2D table, and querying a multitable is harder than filtering a few columns.

zoomastigophore · on July 23, 2021

I agree data is hard.

If we could provide a properly designed database which has already defined datatypes but that would be as easy to fill for a standard user as a preformated Excel table that would be great. You're right, Access has this simplicity, I just wish such tools would be available out-of-the-box for other DB engine.

chux52 · on July 23, 2021

The author/book example is actually the preferred method to format data in Excel. From there you can take the entire table of data and put it in a pivot table and answer most questions pretty quickly.

For the cells with multiple values, I agree that is a terrible idea.

beckingz · on July 23, 2021

This format is referred to as third normal form (boyce-codd 3NF). Also referred to as "tidy data" in the hadley wickham world.

rrrrrrrrrrrryan · on July 23, 2021

Such tools exist, but aren't popular because they're frustrating. Relational databases are much more rigid than flat files. Columns have datatypes which must be enforced, there are foreign keys, some (combination of) columns may have uniqueness constraints, and so on.

All this adds up to a bunch of errors when you're trying to input and manipulate data, because what you've entered is violating some rule someplace in the database.

It would be awesome if someone could build an interface that gives users the flexibility of Excel, and communicated the issues with the data they're trying to write in a really intuitive way. (e.g. syntax highlighting for data with issues, mouseovers / tooltips, dropdowns & autocompletes for foreign keys, etc)

segfaultbuserr · on July 23, 2021

Microsoft Access?

cutler · on July 23, 2021

Filmmaker Pro? Please, no.

mavhc · on July 23, 2021

What we really wish for is a modern replacement for MS Access

kitkat_new · on July 23, 2021

modern in what way?

phoyd · on July 23, 2021

"More Modern" such that it does not contain 8 year old unfixed bugs.

https://stackoverflow.com/questions/68034271/how-can-i-get-t...

mushi · on July 23, 2021

You know what database doesn’t contain 8 year old unfixed bugs?

A database that’s 7 years old.

mavhc · on July 23, 2021

Faster, better at handling multiple users, smarter, web based.

And also free with whatever package you're already paying for :-)

IanSanders · on July 23, 2021

Written in javascript

ComodoHacker · on July 23, 2021

And wrapped in Electron.

_prhk · on July 23, 2021

You mean MongoDB Compass?

https://www.mongodb.com/products/compass

Andy_G11 · on July 23, 2021

MS Access can be directly linked to a spreadsheet for upload / download, or you can copy and paste data directly or import data files. Other databases (MySQL) also allow for import of CSV data.

cutler · on July 23, 2021

MySQL connector for Excel.

_prhk · on July 23, 2021

My preferred database is MongoDB, which comes with a GUI called MongoDB Compass that allows you to visually explore your JSON documents with full CRUD functionality.

https://www.mongodb.com/products/compass

ubermonkey · on July 23, 2021

There used to be many such tools, or tools that tried to do these things.

We lost them.

yodelshady · on July 23, 2021

Say "database" and IT says "no, unless we spend a year reviewing requirements".

This is only a failure story for every software development route that involves procurement, which is every route except Excel. Those routes tracked, within the only timescale that mattered, precisely zero patients.

You want to give smart users DB and DVCS tooling? I'm right behind you. Start asking why they don't have those things already.

imwillofficial · on July 23, 2021

I found your point to be incomprehensible.

Jaepa · on July 22, 2021

This was adapted from Tim Harharford's podcast Cautionary Tales eps Wrong Tools Cost Lives

https://timharford.com/2021/05/cautionary-tales-wrong-tools-...

If you have not listened to the series I highly recommend it. His episode "LaLa Land: Galileo’s Warning" is by far one of my favorite pieces of media. In brief it is why redundant tightly coupled fail safes will often lead to cascading failures.

https://timharford.com/2019/11/cautionary-tales-ep-3-lala-la...

dwohnitmok · on July 22, 2021

To what extent is the Galileo effect described in that episode really just a kind of survivorship bias? If you've eliminated all the failures that can be caused by non-redundant systems with no failsafes, then the only remaining failures are those that can be caused by redundant systems with failures (or at least slipped through). The way to see whether those redundancies and failsafes are working is to see whether the overall failure rate is decreasing, not solely whether failures are being caused by those redundancies and failsafes.

john-tells-all · on July 22, 2021

As a DevOps consultant, I highly recommend this series. The amount of things that can go wrong in the real world is jaw-dropping, let along what can happen when computers get involved! My DevOps buddies love this stuff.

I also adore the specific episode mentioned by Jaepa. It clearly describes why "hey this bandaid helps us, so let's do more of it" can actually cause problems, vs helping us be more safe.

helsinkiandrew · on July 23, 2021

For more by Tim Harford checkout the "More or Less" podcast "explaining the numbers and statistics in the news and in life."

https://www.bbc.co.uk/programmes/p02nrss1/episodes/downloads

vxNsr · on July 22, 2021

Oh man, it reads exactly like his podcast 50 things that made the modern economy. I was having flashbacks but couldn’t pinpoint why. Thanks for sharing.

timvdalen · on July 23, 2021

I really like this series and I can wholeheartedly recommend it to anyone that is on HN.

catherd · on July 23, 2021

The thing that's always baffled me about Excel is why you must always work in "minified" mode when composing formulas.

It seems like just adding the ability to spread a calculation out over multiple lines and add some indentation would make the bugs everyone complains about go down by... a lot.

quietbritishjim · on July 23, 2021

You can just spread the formula over multiple cells. Instead of

  A3: IF(<boolean>, <result if true>, <result if false>)

where each of the three parameters are complex formulae, you can do:

  A3: IF(B3, C3, D3)
  B3: <boolean>
  C3: <X>
  D3: <Y>

Not only is the formula now broken down into simpler chunks, you also get to inspect the component results (like watches in a breakpoint! sorta...). Then you can just hide the relevant columns if you like (B,C,D in this case). You can even use a separate sheet and hide the whole sheet if you wish.

DataGata · on July 27, 2021

You say this but it still feels gross to do.

akho · on July 23, 2021

No, it wouldn’t. Case in point: Excel has ability to spread a calculation over multiple lines, but the bugs are not down.

In fact, people who are aware of Alt-enter produce buggier code: they end up writing longer formulas, with fewer intermediate results displayed, and have less visibility of the functioning of their spreadsheets.

Write simpler formulas.

mkl · on July 23, 2021

> Write simpler formulas.

Excel's formula language seems deliberately designed to prevent that.

quietbritishjim · on July 23, 2021

They mean simpler formula per cell, by splitting the formula across multiple cells (as I mentioned in my sibling comment).

chrispsn · on July 23, 2021

A nice shortcut is Ctrl-Shift-U to toggle single- or multi-line view.

FriedrichN · on July 23, 2021

I'll give a free cookie to whoever gets this done.

Sometimes I'll get these spreadsheets with byzantine formulas that I have to copy it to a text editor and format myself to make sense of all the parenthesis.

reportgunner · on July 24, 2021

It's called "VBA Macros"

pjmlp · on July 26, 2021

Extended with Lambda, PowerQuery and AddOns.

skipnup · on July 23, 2021

But you can just drag the input field larger and add line breaks?

FriedrichN · on July 23, 2021

You could, but when you finally press enter it's all compressed back to one line.

HPsquared · on July 23, 2021

It doesn't get compressed back, I often break complex formulas into multiple lines with indentation.

The only disadvantage is if someone else isn't expecting the formulas to be like this, then gets confused when they can only see the first line.

catherd · on July 23, 2021

Huh. Well, LibreOffice still can't preserve the formatting. Not sure if the last version of real-Excel I tried could.

Having the compose box fit itself to the formula size or give other indication that there is more to see is still a head-scratcher why they didn't do it.

CRConrad · on July 23, 2021

> real-Excel

???

:-(

rossdavidh · on July 22, 2021

A nice post, brought low by the unfortunate fact that it misses the primary reason we were able to eradicate smallpox (and it wasn't data tracking):

smallpox had no animal reservoirs

Most viruses (and other diseases) have animal reservoirs, and that's why few diseases which ever get widespread, are ever eradicated, and it's nothing to do with spreadsheets.

Now if he wanted to make a point about data, it would be good to mention that we have very little data on, say, how many species have which of these diseases, and how that's changing over time. But that wouldn't have much to do with spreadsheets either.

mulmen · on July 23, 2021

The article makes two points. First that data is important but must be curated and maintained. Second that spreadsheets require specific methodologies and expertise to prevent data quality issues from propagating.

I don't think that part of the story had anything to do with spreadsheets. It just illustrated the value of (timely) data, especially when fighting a virus and specifically in distributing scarce vaccine doses.

A virus with animal reservoirs could similarly be controlled based on monitoring where outbreaks occur and taking some action. That could be distributing vaccines, limiting access to the animals or taking some action to reduce the animal reservoir.

markus_zhang · on July 22, 2021

In my last company we keep a spreadsheet that has tabs occupying almost the maximun number of columns allowed and in different headers. Updating that Excel file needs the full team to work for two full days each month. That was after I created a VBA script to scrape data from internal dashboards and put them directly in the spreadsheet.

From my experience these monstrosities are usually created by two reasons: Push from business side and a data team too slow to react. Thus the analysts have to create the first version that seems to be OK and smart but eventually evolved into a monstrosity.

kwyjobojoe · on July 23, 2021

We use spreadsheets to manage data that would 1000% be better suited for storage in a database (and extracted the data would be sooooo much simpler than my monthly half day job of updating management reports).

Why? Because we don't have access to any databases. I dread to think of the requests and committees I would need to find and traverse to get a database set up for my team, let alone a system that I can write and run code on. Yet we can guarantee that every internal and external user has access to Excel.

Unless you are working for a software firm, or are part of a dedicated analytics team with access to suitable systems, large companies and government pretty much have no chance of being able to roll your own database.

Don't suggest Access. Never suggest Access.

markus_zhang · on July 23, 2021

Now that you mentioned, we didn't have access to SQL Server dwh until 6 months before I left for good.

Can you try sqlite? It's usually enough for simple things I think.

kwyjobojoe · on July 23, 2021

But then we need something to interface with the db. Getting any programming language installed isn't going to happen in government.

regularfry · on July 23, 2021

Best of both worlds: libsqlite.dll is almost certainly on any corporate Windows rollout somewhere already, and you can load it into Excel as an extension for a pretty front-end that happens to have a programming language built in.

andrew_jef · on July 23, 2021

Nocodb is trying to solving the issues of implementing a spreadsheet-like frontend on top of a database backend https://www.nocodb.com/

rvba · on July 22, 2021

Maximum number of columns is 16 384 (in modern Excel). PostreSQL has a limit of 16 000.

So Excel > PostrGRE? :)

I wonder what kind of data did you use and why it had to be stored in so many columns. This approach would probably kill a "real" database too.

rrrrrrrrrrrryan · on July 23, 2021

The reason why you often encounter spreadsheets with so many columns is because spreadsheets really, really want data to be flat. Analysts often like to have all the possible datasets pre-joined into one monstrous sheet, then they can easily slice and dice it up however they want.

With a relational database this would be kind of insane, as it's much easier to normalize data into separate tables, then just join up them up when you need to.

Someone · on July 22, 2021

PostreSQL has way more data types than Excel. You can have arrays, json, xml, ranges, polygons, etc. in columns (https://www.postgresql.org/docs/current/datatype.html)

Because of that, chances are the natural format for a data item taking 16,384 columns in excel likely takes fewer than 16,000 columns in PostreSQL.

markus_zhang · on July 22, 2021

Oops I definitely got it wrong. Probably a bit more than a thousand columns now that I think about it. Still a beast. The thing is a database with every column a product id.

mkl · on July 23, 2021

I was thinking maybe that meant an older version had a limit of 1024, but according to https://www.askingbox.com/info/xls-and-xlsx-maximum-number-o... it has always been 256 for .xls and 16384 for .xlsx.

spoonjim · on July 22, 2021

Excel has its flaws, as do spreadsheets in general, but the upside is that spreadsheets allow people to become programmers who never would grok programming in purely abstract terms.

Spreadsheets allow the user to see each intermediate state in their application, as well as each iteration over the input data set, laid out in a 2-dimensional grid. There just isn't any substitute I've seen that is as accessible to the computing "layperson."

asdff · on July 22, 2021

I really don't think excel is accessible. I don't know how to do anything in it. If I had to make a plot I'd have to search for an article that would probably be about the same word length as the one I'd find searching "how to make a plot in python." Imo excel isn't easier for the lay person, it's just entrenched because you have people now in the workforce using excel since windows 95.

It allows people to become programmers by making them learn this clumsy tough gui that isn't readily apparent how anything works, and changes between versions. I don't think the learning curve is any easier than what it would take to make a person actually into a programmer with a language like python or R. A two hour tutorial in either language is probably all you need to do 90% of what people use excel for. Why not just skip the abstraction and learn how to actually program if you have to spend time learning how to faux program in excel anyway?

CJefferson · on July 22, 2021

In excel, I can make a plot by typing in my numbers, then clicking the 'plot' button. Excel shows a selection of plots of my data and lets me choose the one I like the most.

To do that in Python, I'd have to first choose a format to enter my numbers (I'd probably use Excel, then export CSV). Then I have to install Python, pip install a graphing library, import the CSV, then plot. I have to go through the different types of plots one by one.

I've taught python and graphing, you need much more than a 2 hour lesson before people can be left alone without help I'm afraid, if they have zero programming experience.

I don't claim Excel is good programming practice, or scalable, but it is much easier than Python or R. The best future thing (I think) would be an excel like interface which would show you, and let you edit, Python that did what you'd just chosen in the GUI.

spoonjim · on July 22, 2021

Because the layperson I'm talking about can't think in terms of abstractions, kind of like a math student who hasn't learned algebra yet. They can only verify the correctness of their algorithm by inspecting the intermediate values at every step, and scrolling down through the input set to inspect the corner cases. The problems with this approach are extremely obvious (which is why spreadsheets error out so often, or even worse, produce the wrong answer without any visible errors), but in many cases this is the only way for the people with the domain knowledge of what needs to be done to build an application to meet their need (they don't know how to code and they don't know how to write a functional spec).

ncphil · on July 22, 2021

This is a really important point that underscores the persistent tech education gap that exists across our economies. Mike Duncan observed some time ago that our politicians can barely handle e-mail on their own, making them particularly vulnerable to infrastructure disruption. Low and behold, we've now got an army of firms _legally trading_ in tools whose only purpose is to disrupt (or capture) infrastructure.

newbie2020 · on July 22, 2021

Conversely, I am a professional programmer and I haven’t found any easier way to plot your data. What could be easier than highlight your data, insert, scatter chart?

lightbendover · on July 23, 2021

I get paid way too much to develop software, but most of what I actually do is work with data in Excel. There is no easier way to get quick answers on a small amount of data regardless of skill level with alternatives and the ability to then share those answers and the work leading to them with non-technical stakeholders is what really separates Excel from the "just use Python/awk/Jupyter/ad hoc solution X" sprinkled all over this thread. With Excel, there is a single solution that works for me, my engineers, my PM, my program manager, my account team, my customers, and my finance stakeholders. Asking any of those parties outside of engineering to go learn R would be quite embarrassing.

max46 · on July 23, 2021

I think I big mistake people make here is that they think they are far better at everything on a computer than a non-programmer. They don't realize how much you can do in Excel and how much they suck at it. If you can't use it without a mouse and/or if you don't know pivot tables, you are just as much of a beginner as a C++ programmer who doesn't know what a pointer is.

It's only after trying to convert an excel sheet to [insert your favorite language] that they realize it would take them 6 month with a team of 5 to replicate what a single accountant did in a week.

AnIdiotOnTheNet · on July 23, 2021

Indeed. Excel gets a lot of (pretty deserved, actually) shit from our industry, but the reality is that Excel does what computers are meant to do: enable users to create sophisticated applications[0] that solve their problems with no coding expertise required, nor kowtowing to some developer, with the option to onramp into adding some code via macros.

Excel is the most widely used programming environment on earth because it is so good at it. That said, I think the fact that we haven't made anything better speaks to the change in philosophy our industry has undergone, from one where we wanted to enable users with computing to one where we want to herd and farm them for money.

[0] yes, both 'sophisticated' and 'application' are defined pretty loosely here.

beefield · on July 23, 2021

Opposing anecdote: I have converted excel spreadsheets built over years to SQL, and typically it did not take weeks to do that.

( Any organization that has a smallest reason to care about their data should remove save button from excel and start educating their personnel. Using excel in any important role should be seen as making the eventual mistakes on purpose and someone should be kept responsible.)

rrrrrrrrrrrryan · on July 23, 2021

Getting rid of lousy Excel-driven processes is a big part of my current job. SQL only solves half the problem, though: Excel is also a tool for manual data-input and manipulation. To solve this piece you often need to create CRUD webapps, which can be much more complex to develop and maintain than some SQL in the database.

CRConrad · on July 23, 2021

> To solve this piece you often need to create CRUD webapps, which can be much more complex to develop and maintain than some SQL in the database.

Why do they need to be webapps? If you've done it with Excel so far, you're obviously on a LAN, so why not traditional client/server native apps?

PeterisP · on July 26, 2021

Deployment and authentication is going to be a problem, but the big thing is that in many such environments users would simply not be able to execute your native client app on their controlled workstation by default. But for most software teams nowadays creating a CRUD webapp is simpler and cheaper than traditional client/server native apps; I've seen major companies maintain their existing client/server native apps but whenever they have the choice (e.g. greenfield development), they'd rather not have a native app because, frankly, there's no good reason to do so as the maintenance is much simpler if you don't have to worry about the deployed client instances.

benhurmarcel · on July 23, 2021

> I have converted excel spreadsheets built over years to SQL, and typically it did not take weeks to do that.

The "years" of work are typically not for implementing the spreadsheet, but to define what it does. The requirements evolve over many versions.

Of course it's fast to re-implement it after all the work to define the requirement is done, but it's only possible because Excel allowed all those prototypes.

beefield · on July 23, 2021

Grandparent claimed that something done in excel by one person in a week takes a team of five and half a year to reimplement in some other language. I disagree with that as a general rule (of course, exceptions may apply, as usual).

I fully agree that Excel is great for quick drafting, visualizing data quickly, and prototyping. But it should be left there. Anything you do that lasts even overnight and has any significant numbers in it should be done with something else than Excel.

benhurmarcel · on July 23, 2021

Fair enough, the issues for me are that: once something works in a spreadsheet there are no money assigned and willpower to redo it clean in something else, because most of the value is already there; and the people who are available and know the requirements aren’t skilled in anything else.

So prototypes are done in Excel because it’s the fastest and cheapest way to do it, and they don’t get redone in something else afterwards for the reasons above.

tomnipotent · on July 23, 2021

> done in excel by one person in a week

Done by a domain expert in their field. If you had to create the project from scratch without aid of an existing Excel file to clone, you would not be able to do it even remotely as quickly as the domain expert, if at all.

tomnipotent · on July 23, 2021

> and typically it did not take weeks to do that.

You're not seeing the process and effort that went into it, which can be considerable.

beefield · on July 23, 2021

I think you misread my comment. Some of the excel sheets were a result of considerable effort. Years of different people working on them. Converting those to SQL was not a tremendously large effort. That was made by me, so I kind of know the effort:)

swader999 · on July 22, 2021

Many businesses risk it all by building actual applications with Excel. Security, version control, automated testing are pretty much set aside. Case in point: https://insideparadeplatz.ch/wp-content/uploads/2013/01/ubs-...

LeifCarrotson · on July 22, 2021

It's not like they had a meeting where a consultant said "We reviewed all your requirements, and you could build this with normal software engineering practices, or we could build this in spreadsheets with associated deleterious effects on privacy, correctness, durability, extensibility, readability, and interoperability, likely resulting in an estimated $N to $NN million shortfall in 5 years. What do you want? Yeah, let's do the spreadsheets."

It's a two-pronged attack of using tools that you have figured out as you go (not everyone is an educated, practiced, professional programmer, or even good at Excel) and a sunk cost fallacy/resistance to changing what "works".

catherd · on July 23, 2021

Spreadsheets are like the ubiquity of motorcycles in 3rd world countries. Jumping on one and blasting off without any safety equipment is cheap, fast, everyone uses them, and they get 80% of the job done. And they will kill you, but that's life.

The thing about spreadsheets is you don't have to commission bespoke software. And just the value of that swamps everything else in the vast majority of cases.

In small businesses I doubt the success rate is anywhere near 50% if a non-technical person recognizes a business need for a software solution, hires someone to build it for them, and then it actually reaches a useful conclusion. But if you are numerate at all you can drive a spreadsheet. Not having to swim with the sharks for months or years while you hemorrhage cash is hard to appreciate for the sharks (us, the people who build technical things). Most small businesses and many large ones don't have anyone who can manage even contracting something like this out.

If things have gone on long enough that the problem is stable, clearly defined, and the costs of 3rd order effects like lack of revision control are losing enough money to justify the risk of a custom software project, sure, it needs to be done. But for many companies that risk will always be bigger than the ones that come with spreadsheets.

ERP systems are supposedly finished software that you deploy and tweak for particular business needs. The success rate on deploying those is waaaay below 100%, and the expenses are huge. If your company or department is non-technical the time for custom business software is usually when your hair is on fire and it's the only option.

swader999 · on July 22, 2021

Very true, especially for small business. But large multi national enterprises are also doing this.

recursive · on July 22, 2021

I worked at a company that did product testing from spreadsheets. And not statistically modelling them. Like physical products. The spreadsheet logic was controlling power supplies on a physical test bench.

kwyjobojoe · on July 23, 2021

Impressive.

I worked at a financial software firm written in python that used Excel behind the scenes to run calculations and modelling.

thefr0g · on July 23, 2021

> written in python that used Excel behind the scenes

That's from my worst nightmares :(

rexreed · on July 22, 2021

* The Utility of Spreadsheets

No single application has been as widely adopted by as wide a number of people for such a wide range of uses as the spreadsheet. Love them or hate them, but the spreadsheet metaphor is at once highly useful, highly adaptable, and highly usable requiring minimal support to extract needed value.

The spreadsheet, with all its questionable glory, is here to stay. From the first Visicalc to Excel 2300 I don't think we'll see the end of spreadsheets for a long time coming.

navneetloiwal · on July 23, 2021

This! As programmers, it is easy to list N reasons why spreadsheets suck and deserve to die. But the reality is that it is the most ubiquitous business software and the non-tech/business users would rather be in a spreadsheet than anywhere else. Companies run on spreadsheets, even if they have the best "data stack" and tools. Because few, if any, software can beat the

* flexibility (throw data anywhere and link it to each other, make edits, write notes),

* power (formulas, etc), and

* familiarity (the most underrated factor) of spreadsheets.

Nothing else allows the non-tech user to feel empowered like spreadsheets do.

Spreadsheet evolution has been slow though. Google Sheets added cloud + collaboration 10 years ago. We (https://coefficient.io) are adding the layer of connectivity to sheets so they can remain in sync with the actual sources of data (Cloud apps like Salesforce, DBs, BI tools, etc) so sheets actually become "live" (even though they have been in the cloud for a while) and to reduce manual work and increase trust/accuracy. There is so much more that can be done to leverage this largest software platform that is out there.

grp000 · on July 23, 2021

I used to work at in construction as an estimator and we used spreadsheets. No backups, no version control, a single value change took 10s of minutes to propagate through the many sheets in the file. We fixed errors when we saw them, but I always felt it was an exercise in pointlessness.

These were not small companies either, ~$800M in yearly revenue for one, and ~$150M for the other.

Then, afterwards, the end value would be massaged to what felt right. And yet, these sheets were seen as part of the "secret sauce" of the business.

It's one of the reasons I really wanted out.

Programmers have a reputation of being arrogant assholes, but I think this push-back and ridiculing other industries of using excel for stuff like this is completely justified. Excel spreadsheets let these people FEEL productive and like masters of their own fate with a bunch of numbers neatly encapsulated in their own little cells in a table, but their actual usefulness is questionable. For construction, it gives a rough feel for a project, but a lot of it is smoke and mirrors.

throwawayboise · on July 23, 2021

I can't remember where I read it but at one point Boeing was maintaining the entire bill of materials for one (maybe more) of their aircraft in a spreadsheet. I tried to google it but could not find the reference. I did find that Boeing actually had their own in-house spreadsheet software for a while, which was kind of interesting.

https://en.wikipedia.org/wiki/Boeing_Calc

anonymouse008 · on July 23, 2021

> I used to work at in construction as an estimator and we used spreadsheets. No backups, no version control, a single value change took 10s of minutes to propagate through the many sheets in the file. We fixed errors when we saw them, but I always felt it was an exercise in pointlessness. These were not small companies either, ~$800M in yearly revenue for one, and ~$150M for the other.

Jesus... want to start a competitor to these guys!? Email me

max46 · on July 23, 2021

I think I big mistake people make here is that they think they are far better at everything on a computer than a non-programmer. They don't realize how much you can do in Excel and how much they suck at it. If you can't use it without a mouse and/or if you don't know pivot tables, you are just as much of a beginner as a C++ programmer who doesn't know what a pointer is.

It's only after trying to convert an excel sheet to [insert your favorite language] that they realize it would take them 6 month with a team of 5 to replicate what a single accountant did in a week.

raxxorrax · on July 23, 2021

Entirely true. Many physicists also basically live in Excel. And replicating the functionality of pivot tables isn't at all trivial and they also have a lot of helpers by now. In many cases you just need to click it and have all column neatly formatted so that you don't have to do much anymore.

I also have seen Access applications with more features than a common Salesforce CRMs. I have seen it as a navigation system that could bring you to the nearest partners or shops that provide necessary parts. Yes, there is code of doom behind it and if those maintainers leave the company, problems arise. But the feature set was often extremely hard to beat.

And I am not sure how to respond to non-technical users about better alternatives. They use it as a front end for SQL manipulation and it isn't easy to come up with something better for that target group.

Even worse is SharePoint. It is a complete abomination, it can easily beat most horror novels. But the workflow engines are just extremely practical for corporate processes. Only recently some alternatives came to the market... I just wish I could kill it...

sheetjs · on July 23, 2021

> Spreadsheet evolution has been slow though.

Excel has to contend with nearly 40 years of backwards compatibility (MultiPlan, the predecessor to Excel, was released in 1982) and a deep userbase that literally has decades of experience and muscle memory with the software. The Symbolic Link "SYLK" file format introduced in MultiPlan is still supported in recent versions of Excel, leading to the infamous CSV "ID" issue.

Many of our users still run very old versions of Excel and Windows (e.g. Excel 5.0 on Windows 95) because a change in a future version of Excel caused problems or gave different results.

coolaliasbro · on July 23, 2021

> Nothing else allows the non-tech user to feel empowered like spreadsheets do.

I'd say it allows the same for technical users as well since it helps bridge communication/knowledge gaps and move things along. It's typically not the solution but often plays a critical supporting role.

echelon · on July 22, 2021

You're correct, but this entirely misses the point of the article. It's not calling for the end of spreadsheets -- it's stating that there's an unspoken problem that demands our attention.

Spreadsheets aren't 100% reliable for use cases where you need to collaborate and share immutable health records. Especially during a time of global emergency when tensions are heightened.

Spreadsheets don't impose validation, schema correctness, constraints, etc. and can amplify human errors. They can also inject errors of their own (eg. turning March1 the gene [1] into a date) when they're simply trying to be helpful.

How do multiple people manage a spreadsheet? How do you safely merge spreadsheets? How do you keep records from being duplicated or replaced? How do you do double bookkeeping? Can you atomically identify individual records?

This article is saying that we have to realize spreadsheets can be a source of scientific error simply because of their design intentions and ergonomics.

The closing remarks estimate that 1,500 people died as a result of spreadsheet error. That's remarkable.

[1] https://www.proteinatlas.org/ENSG00000145416-MARCH1

rexreed · on July 22, 2021

You're also correct, but I think miss the point as well. The spreadsheet isn't going anywhere. Curse it all we might, blame it for deaths or attribute any aspect of malignment to it, and you'd be right. But the spreadsheet isn't going anywhere.

Spreadsheet mistakes have been directly attributed not only to thousands of deaths but billions of dollars of mistakenly wasted money. Heck, in the UK alone, they have attributed billions of pounds and thousands of deaths specifically to spreadsheet errors and flaws, and that's not even counting the COVID-related issues from the article. Check this scary article: https://theconversation.com/excel-errors-the-uk-government-h...

There have been hordes of purpose-built apps that have aimed to replace the common spreadsheet for any number of tasks from inventory management to financial planning to even tracking nuclear weapons (true story: our nation's nuclear weapons stockpile is tracked in a spreadsheet). And yet, the spreadsheet is still here. The spreadsheet is the cockroach of all apps. At once utterly adaptable and seemingly indefeatable.

The spreadsheet is a horror show with lack of control, schema, and collaborative management. We surely can do better. But we haven't. Google and Microsoft have the best developers in the world who can do anything. And they have produced more spreadsheets. But spreadsheets are the bane of our existence! And yet they are still here.

The spreadsheet has survived from the mainframe era to the cloud-based era in pretty much the same form with enhancements. The spreadsheet is not going away, even with all the complaints in the article. The points are well made and well founded. But unfortunately they miss the point. 20 years from now we'll still be cursing the tyranny of the spreadsheets because of their utility.

I close with a Haiku:

| Rows and cells, alas |

| What makes your simple structure |

| So very useful? |

kfk · on July 23, 2021

It is very clear that they are useful but it is also very clear that they can be problematic. In Finance people check each number - multiple times per day - because there is no way to write tests in Excel. Sounds good? Until you can trust the people crunching the numbers, of course. Would you build a bridge like that? Or a building? I think it is worth to talk about this problem and, at the very minimum, inform people there are alternative ways to crunch numbers that are more reliable than Excel.

HPsquared · on July 23, 2021

Lots of engineering calculations are indeed done in Excel, it's very common.

benhurmarcel · on July 23, 2021

> Would you build a bridge like that? Or a building?

That's exactly what's being done. I work in mechanical engineering and Excel is the primary calculation tool all over the industry.

jay_kyburz · on July 22, 2021

It would be hard, but I think a new kind of spreadsheet that was easier to validate, easily impose a schema and constraints and helps us identify human errors would be helpful.

I bet excel has some excellent features around this, but they are not front and center in the UI.

throwawayboise · on July 23, 2021

Lotus Improv was a bit like that I think. It's been a long time since I used it. Wikipedia says:

Conventional spreadsheets used on-screen cells to store all data, formulas, and notes. Improv separated these concepts and used the cells only for input and output data. Formulas, macros and other objects existed outside the cells, to simplify editing and reduce errors. Improv used named ranges for all formulas, as opposed to cell addresses.

https://en.wikipedia.org/wiki/Lotus_Improv

Like most of the software on the NeXT platform, it was ahead of its time.

arthurcolle · on July 23, 2021

> easier to validate, easily impose a schema and constraints and helps us identify human errors would be helpful.

yeah, it's called IF

denimnerd42 · on July 23, 2021

Right.. at my company we had to purge spreadsheets from all processes due to these issues. Instead we allow import and export of spreadsheets but they aren't the source of truth.

guyomes · on July 23, 2021

> From the first Visicalc to Excel 2300 I don't think we'll see the end of spreadsheets for a long time coming.

We might even argue that spreadsheets were already used in 1295 BCE, as shown in Fig. 1.1 of [0].

[0]: http://uruk-warka.dk/mathematics/ER6%20tables.pdf

LargoLasskhyfv · on July 23, 2021

Whoa!

mr_toad · on July 23, 2021

Spreadsheets are the private motor vehicles of computing.

alephu5 · on July 23, 2021

I think the best approach in many cases is to bring data management into a specialised application, rather than using spreadsheets as a database. They can use the CRUD applications we all write or some off-the-shelf solution, but the important thing is that it provides validation, schemas etc.

Then you provide a spreadsheet export or direct connection that can be hooked into another spreadsheet with all the formulas, pivot tables and so on.

This way the core data is safeguarded and kept clean while not requiring constant development for every report requested.

ttul · on July 22, 2021

Personally, I strongly prefer working with GSheets to Excel. It handles large sheets better and you never worry about the app crashing and taking your precious work along with it.

dan-robertson · on July 22, 2021

Instead one only needs to worry that one can’t get one’s work done.

I don’t think I’ve ever had a time when I wanted to use a spreadsheet for anything moderately complicated where gsheets was up to snuff. That’s not to say that excel doesn’t have problems or does things easily, but it doesn’t lead to quite the same level of frustration and hair-pulling, and indeed it’s usually possible to achieve things in excel when they are impossible in gsheets. Indeed that app even struggles at the ‘easy’ task of being able to fiddle around with table formatting. Its sole advantage is the collaboration. It can be used as a place to dump small amounts of varied tabular data to share but not really for calculation or analysis.

r00fus · on July 22, 2021

As far as competing anecdata, I found gsheets absolutely much better for joining together multiple sheets of data. The lookups are so much less touchy simply because "it's all in the cloud" (as long as you have permissions).

With Excel you have to ensure the file is there, and readable, not tampered with, etc.

This allowed us to create better multi-workbook "apps" that have a bit more modularity - you could even embed sheets into a webpage as a view-only client.

In the end, Excel is good for us only for interoperability with external groups.

LasEspuelas · on July 23, 2021

I have. Especially before Excel released their dynamic array features. I love the sort function in gsheets. Combined with filters it allowed me to automate things that relied on vba in Excel.

benhurmarcel · on July 23, 2021

Google Sheets has more useful formulas than Excel. For example query(), but also all the array formulas that they include.

plaidfuji · on July 23, 2021

Wait, seriously? I find the exact opposite! I would love it if Gsheets were as performant as Excel with large tables but scrolling gets bogged way down. If you add too many formulas, Gsheets will get stuck recalculating formulas endlessly. I love Gsheets because of the ability to link between documents easily, their comment system, filter views, and native regex… but performance?? It’s terrible!

arthurcolle · on July 22, 2021

Large sheets? Google Sheets has a much lower cell and row limit compared to Excel

rtkwe · on July 22, 2021

I assume they mean performance wise. Excel could have larger hard limits but handle complex sheets worse.

arthurcolle · on July 23, 2021

Google Sheets is an unusable dumpster fire. I just tried to import a 3 million row CSV yesterday and it wasn't able to do it. INDEX/MATCH don't even work properly! Basically unusable unless you need it for scheduling parties or whatever people use Airtable for.

thanhhaimai · on July 23, 2021

Have you considered using a different tool for the job? Processing 3 millions rows of CVS sounds like a task that is more fitting for a small script than Excel/Google Sheet.

Google Sheet excels at many other tasks, but it's not a replacement for specialized tool (and it should not).

arthurcolle · on July 23, 2021

Yes of course, I originally generated the file using Pandas but it's always a pain in the ass to remember the magical incantations to make pretty 2D graphs that aren't fugly. That being said, I have a knack for making great 3D graphs that are awesome! In any case, it's hard when my employer doesn't just gimme that juicy Excel subscription because cost-cutting and efficiencies.

In the end I put on my robe and wizard hat and used matplotlib. (╯°□°）╯︵ ┻━┻

popinman322 · on July 23, 2021

Seaborn in Python to make matplotlib pretty by default + plot some uncommon graphs. `%matplotlib widget` to make matplotlib interactive. Plotly in Python if you want decent graphs that are also interactive by default.

Observable if you want an interactive, hosted visualization notebook (and all their libraries are open source, except their UI). Plotly, Observable's Plot library, or D3 if you don't mind JavaScript. Can bundle libraries + data + visualization into a single html file and deliver that as an interactive report; once you have the base template down it's decently ergonomic.

Plotly is my current default when I'm working with Jupyter, though. Sane defaults are nice.

elefanten · on July 23, 2021

I'd love to learn if I'm doing something wrong, but Sheets has terrible performance compared to Excel in my experience.

Not just sheet sizes, but Sheets feels much slower and has a vastly weaker toolset. Just the fact that the browser takes precedence for keyboard commands from the web app makes the usability suffer massively.

arcturus17 · on July 23, 2021

And controversial as this may be, JavaScript.

arthurcolle · on July 23, 2021

performant JavaScript exists apparently - I was shocked to learn that there are some new DeFi projects that are executing DEX arb bots and write all their code in TypeScript (lmao).

It will be extremely funny to see how Google does this whole "rewrite all our office products to use <canvas> instead of the DOM" project over the next couple quarters... he says, laughing, shaking his head at the wasted effort... Caesar wept for there were no more worlds to conquer

sheetjs · on July 23, 2021

Newer versions of Excel have native JavaScript support https://docs.microsoft.com/en-us/office/dev/add-ins/referenc...

cratermoon · on July 23, 2021

People do multi-million dollar deals with Excel spreadsheets. I know personally of one deal involving multiple multi-national petrochemical companies, pipelines, and refineries in the US. The whole deal was done on a spreadsheet, the rest of the legal paperwork was just formalities.

zwieback · on July 22, 2021

Spreadsheet is the greatest SW ever invented, that's why it's so dangerous.

HPsquared · on July 23, 2021

With power comes responsibility, the latter can be lacking in some users who then like to blame their tools.

lhh · on July 23, 2021

He mentions several times that accountants do their work in Excel, but any account would consider it barbaric to actually try to do transaction-level double-entry bookkeeping in Excel instead of a purpose-built system like QuickBooks/Xero/NetSuite/etc. I think I've also probably seen more broken financial models than logically sound ones in Excel, so the integrity of spreadsheets is oftentimes not even safe in the hands of finance professionals.

Data integrity is definitely a hard problem.

hartator · on July 22, 2021

Yes, hijack my scrolling. I really don't like have a consistent feedback when I am scrolling pages.

sahila · on July 22, 2021

Dunno why you're being downvoted but it's true. I noticed it pretty quickly when I tried going back to the previous page and instead it scrolled up.

stevenpetryk · on July 22, 2021

They're being downvoted because it's against the rules to criticize the design of the site on which the content is hosted.

cwillu · on July 23, 2021

And the content doesn't load if you disable javascript!

ectopod · on July 23, 2021

This. I should take the risk of enabling javascript to read about the risks of spreadsheets? No thank you.

simonw · on July 22, 2021

This is a great read - I love the Italian accountancy history lesson at the start.

version_five · on July 22, 2021

I actually stopped reading there and went to the comments. I dont mean any disrespect, just trying to give a counterpoint: I like to have some more upfront info about an article's thesis before I invest in reading about an Italian history lesson hoping it will be relevant. I see this style a lot for example on substack, as if people are always trying to weave in a relevant historical anecdote or clever fact, which is fine, but it sets a much higher bar if the article is going to delay the build up to a point without first telling me what that point will be. It's easy to get bored along the way and bail.

simlan · on July 22, 2021

Well tastes are different. I liked it because of the interspersed backdrop. Granted i am reading this on a chill evening not on the go. Would be different if i had expected a to the point article.

version_five · on July 22, 2021

Yeah I definitely don't find it odd to like that style, I was just reacting because I found it interesting that the thing someone highlighted as being a strong point was exactly the thing that had put me off. Different tastes as you say.

gcthomas · on July 22, 2021

Anything from Tim Harford is going to be good. His "More or Less" Radio 4 series is a must listen podcast.

ineedasername · on July 22, 2021

This seems less about spreadsheets in general and much more about Microsoft not wanting Excel to eat into sales of Access and SQL server.

Even now the limit is ~1million rows, meaning that if you have even a very simple table but lots of rows then you have to pay more for Access.

joe_the_user · on July 22, 2021

Data caught in spreadsheets is kind of an instance of the general siloing of data.

The situation is something of a general problem of a data-driven Society. One group of people want information available with one interface and spends resources only in putting it in a format suitable - to the detriment of others who/want need it. With spreadsheets in particular, naturally you have a data-integrity but that's still an instance of the broader problem.

themodelplumber · on July 22, 2021

It's a general psychological problem, too. A huge part of storage organization psychology, whether in software or in a physical room, is subjective/proprietary perception and judgment. People who prefer and naturally warp any job role or position to use these tools, as all humans do with subjectively preferred processes and perspectives, will never stop creating them.

The good part of this not-invented-here mindset is that it's also how technology moves forward. "I need this new system to be mine/fit my way of organizing" is also often another way of creating a "this never existed before" product or outcome.

These storage and organization tools and silos are born as subjective (context-fitting) design processes. So they tend to get locked up behind depth-oriented (subjective) processes, and eventually frozen by stabilizer groups.

Stabilizer groups follow up to use the tools and promote the idea of keeping the data siloed the way it is. They don't like change at work because change breaks their preferred psychological processes, causes them to have to re-build their perceptual frameworks, and because they're humans, this makes them do stupid things at work, like lash out or become passive-aggressive or detonate their new diet plan or become late for a baseball game. They will blame all of this on "open data" or whatever it is that caused their stable workflow to destabilize.

There are lots of solutions to this, and Excel by itself can be seen as an attempt at a solution to the problem...also the problem can be moderated by the system of energy surrounding the data valuation and access to the data.

And "just don't silo stuff, guys" seems to be the proposal with the worst track record so far.

CJefferson · on July 22, 2021

So, what's your alternative?

Spreadsheets have many advantages, they are easy to edit, copy around, and quickly filter and do simple calculations (like find how many students are in a class if size smaller than 10, for example). I can't think of a sensible alternative given current tools -- we should make better tools of course.

HPsquared · on July 23, 2021

In my use cases, the siloing is an advantage: the project I'm working on is contained in a couple of files in a folder somewhere that can be archived and opened/rerun with confidence years from now. Most other systems don't allow this level of encapsulation / archiving ability.

CivBase · on July 22, 2021

I hate Excel, not because it's a bad program but because in my experience people misuse it more often than not.

Excel excels at creating spreadsheets. Spreadsheets are pretty reports used to report tabular data. Excel is capable of so much more, but for eveything beyond spreadsheets it's medicore at best and its complexity often makes it a liability.

It's a bad calculator. It's a bad database. It's a bad medium for transferring data. It's a bad front-end. It's a bad notetaking app. If your use case goes beyond making tabular data look pretty, Excel is probably bad at it.

People use Excel for everything because they're comfortable with it. But other solutions are often much better and I'm so tired of cleaning up all the issues Excel has caused (and continues to cause) me.

bartread · on July 22, 2021

I think this is a really unfair comment. Excel is a very powerful tool. It's great for financial modelling and prototyping, for example. It can be great for testing out different scenarios and investigating likely outcomes.

It's also something that most people in most businesses have installed and as such, can be incredibly helpful for sharing data, models, etc. The collaborative editing in Office 365 adds a new dimension in that (finally) we don't have to email round documents and implement half-cocked versioning any more (though plenty of people still do), and multiple people can make changes at the same time.

If you want to slag off any of the applications in Microsoft's Office suite then the correct answers (and I don't think this is particularly controversial) are Outlook and Teams. If you want real productivity destroyers and daily pain, look no further.

Excel? It's great.

Yes, it can be misused, but so can Python, Rust, C++, JavaScript, and every other tool, language, and platform at our disposal. So can a circular saw for that matter. That doesn't make any of them bad tools.

CivBase · on July 22, 2021

I think you missed my point.

Yes, Excel is a very powerful tool and I would never argue otherwise. I definitely wouldn't argue against using it altogether. But I have come to loathe it because of how often it is misused.

Sure, any tool can be misused. But in my experience Excel gets misused more than anything else combined. Excel's accessibility and ubiquity is part of what makes it so powerful... and what compels people to use it where other tools are much better suited.

Don't get me wrong. It's okay to use a suboptimal tool... to a point. Users should be aware of a tool's limitations and seek out better solutions when it's clear those limitations will cause problems.

Excel is not a bad tool. It's so popular because it really is a good tool. It's just not the ultimate tool.

hodgesrm · on July 22, 2021

We run our company finances off Google Sheets, which is essentially Excel, but shared. It's an outstanding tool for financial modeling as well modeling in many other domains. We would not be able to function without it. The fact that others may misuse it does not change things for me.

p.s., Google Sheets is one of the more underrated cloud apps of all time.

greazy · on July 22, 2021

I completely agree with your main point: excel is terrible because people use it for everything.

I've noticed there different ways people use excel: as a dashboard, to store data, to modify data.

IMO storing data is the worst use of excel, especially when it's done in a non 'tidy' fashion , with highlights and formatting to encode data.

This is from my experience as an R and python user in science.

romwell · on July 22, 2021

Nah. Spreadsheets are poor man's MapReduce.

Most Excel tables I've seen aren't just data and presentation. It's computing the data as well as presenting it.

And how is the data computed?

* Applying formula to a row/column range to get another range

* Applying formula to a range to get a single-cell output out of it

Obviously, I am not talking about clustering and performance here, but a computational paradigm.

CivBase · on July 22, 2021

> Spreadsheets are poor man's MapReduce.

Yes. Emphasis on "poor man's". Excel is mediocre at many things, but it's highly accessible so people use it anyways. That can be good or bad depending on the situation.

Need to quickly crunch numbers for a small dataset? Excel is perfect. You can probably put something together quickly to get what you want.

Need to maintain a contact tracing database? Not so great. Excel scales poorly and the low implementation cost isn't worth the limitations and risks.

mikewarot · on July 23, 2021

I strongly doubt there is any other reactive programming environment that is easier to teach to novices than a spreadsheet.

One wonders what would happen if you could extend lisp to do reactive programming, and then extend it a bit further to load an excel workbook into an abstract syntax tree.

hanche · on July 23, 2021

There was the common lisp cells project. But I think it’s gone dormant.

https://common-lisp.net/project/cells/

gcthomas · on July 22, 2021

Spreadsheets are a hammer so if that's all you know then all problems look like the nail. Really, many power users of Excel ought to be looking at more capable, testable and readable solutions from professional data analysts and scientists. Python would be a good start, but there are many options better than Excel for critical systems.

Closi · on July 22, 2021

> Really, many power users of Excel ought to be looking at more capable, testable and readable solutions from professional data analysts and scientists. Python would be a good start.

I can write python, but I can get the answer out of excel in a fraction of the time for most use cases.

As an added bonus, excel documents are far more ubiquitous and I can share them with clients, and they can see my workings. Python isn’t very transparent and business users can’t usually check the logic.

Excel is also much more interactive and allows for much more “discovery” and playing with the data.

What I would say though, is most power users don’t know about functionality in excel like PowerQuery, relational data models and DAX that actually do turn it into a serious and repeatable tool.

(I’m not arguing that Python isn’t better for many use cases, and excel definitely isn’t right for most critical applications, but I also think Python is much slower for ad-hoc data analysis so both have different places in the market. Anyone think python is faster? I’m down for a race!).

codesnik · on July 22, 2021

> they can see my workings

How? Do they check that every cell in a column actually has the same formula? Do they check data format everywhere?

Python script is something that is possible to be actually reviewed, and results - reproduced, and "formula's" there are actually readable.

But for some stupid reason excel files are still shared over email.

eropple · on July 22, 2021

It's only "stupid" if one suffers from a catastrophic lack of empathy for people who are not programmers and for the incentives to which they are exposed--which do not include using a software developer's preferred tools, nor do they include the carving out of time with which to learn them.

I would hope that we would be generally wiser than that here.

codesnik · on July 28, 2021

Like being a "programmer" is some genetic trait or something, or like using excel in any more or less productive manner doesn't require carving out hell a lot of time, it's just that time is taken from user's lives in small pieces, and doing PROGRAMMING seems like taking a university course.

I actually have a lot of empathy for people who forced to deal with all those problems, otherwise I just wouldn't care.

Closi · on July 23, 2021

> Do they check that every cell in a column actually has the same formula?

They just have to check the top cell as most of my formulas are array formulas.

You don’t have to drag a formula down - that’s a common misconception in the latest versions. If you do =A1 + B1 and want to apply it to the 1000 cells below you just write = A1:A1000 + B2:B1000.

That’s still not that readable though, so I’ll apply those cells two named ranges “Sales” and “Taxes”.

Then the formula is = Sales + Taxes once and that will populate the whole column of data.

Then there’s M Code and PowerQuery which literally allows you to review the data cleaning line by line and even see the data state at any intermediary step. It also has

> Python script is something that is possible to be actually reviewed

The problem for me is that, as someone who works in consulting, it can’t be reviewed by my boss or a client, neither of whom can program. But they can review a tidy excel sheet.

And then they can’t edit it either, so if I go on holiday and I’ve built some sort of model nobody else can make progress until I come back, or if I move projects I’m also stuck maintaining the model on the old one.

LordEthano · on July 23, 2021

That's terrible practice, ironically, as it's extremely unreadable. How could someone looking at it know what Sales and Tax actually are? You have to go into the formula name box and dig in to find sales = "yada yada" etc. That doesn't seem too bad until you have a decently sized file and you have to dig into 40 formulas to find the one you want, and go check that it's actually referencing what you want.

I work as a banker, and what you do is one of the very first things new employees are taught not to do in excel. It's an amazing solution for the person that built it, but a terrible one for anyone looking to check the work.

Closi · on July 23, 2021

> That's terrible practice, ironically.

Can you point me to the 'best practice' guide you are referring to? What authority on excel standards said this?

Personally I tried to find articles saying it's not best practice by typing in "dont use named ranges best practice" or "named ranges in excel are bad" into google, but it mostly brings up articles stating that using named ranges is best practice and improves readability!

> How could someone looking at it know what Sales and Tax actually are?

If you really want to use that example, you would click in the formula bar and it will highlight the ranges, and colour code them automatically.

If it's on a different sheet, you just hit ctrl + g and type in the name, and it will take you directly to the cell it's linked to (which usually in my case, is linked to a sheet that contains all my model's assumptions in one place, each one with a named range describing what it is). It's much easier and quicker than going to =Assumptions!G52.

> I work as a banker, and what you do is one of the very first things new employees are taught not to do in excel.

Seems like a silly thing to teach people IMO. In my experience it makes formulas much more readable (both writer and reader), makes it much faster to build models, and cuts down errors substantially.

I personally find that formulas are much easier to review, because the named ranges provide some intent. If someone writes =(A2Assumptions!92)/Assumptions!91 I've got to really unpick it to work out if it's right, but if someone labels it =(A2Miles_Per_Hour)/Average_Miles_Per_Vehicle then I can see that the formula is wrong almost instantly.

Additionally if I want to write another formula using those values, I can just type it straight into the formula bar without having to go and click on the right cell reference in another sheet.

LordEthano · on July 24, 2021

>Can you point me to the 'best practice' guide you are referring to? What authority on excel standards said this?

Well investment bankers tend to be pretty darn good at excel, and the banks I've been at would scorn you for doing it, as well as all the standardized training given out to fresh recruits to the industry. We had a buy-side deal recently fall through partially because the (fairly sophisticated) model the company selling itself used was absolutely unreadable and unaccountable. They did exactly what you said (naming), including with their assumptions. It was 20 sheets of unpenetrable mass, and we were all turned off by the fact that you couldn't follow it whatsoever, and was basically unaccountable.

>If you really want to use that example, you would click in the formula bar and it will highlight the ranges, and colour code them automatically. If it's on a different sheet, you just hit ctrl + g and type in the name, and it will take you directly to the cell it's linked to (which usually in my case, is linked to a sheet that contains all my model's assumptions in one place, each one with a named range describing what it is). It's much easier and quicker than going to =Assumptions!G52.

This how I can tell you've never seriously worked with excel, because that's MUCH slower than using the native/macabacus auditing tools. Adds up over thousands of times. And what if Sales isn't just a simple cell in another sheet, but is actually tied to a named formula itself, tied to a named formula itself, tied to another worksheet (with named formulas in them!). That's a deep rabbit hole and you to be going down with little transparency, where you're having to look up the formula manager to find whatever the fuck the named variables are actually referring to (what if someone follows your advice and names the tax rate as "Tax" and now ctrl-g doesn't work!)

>Seems like a silly thing to teach people IMO. In my experience it makes formulas much more readable (both writer and reader), makes it much faster to build models, and cuts down errors substantially.

It makes formulas much easier to read, but with zero accountability, and considering someone will very likely be looking at the excel at some point, with no idea what you did, they have to check each and every one out to make sure it's not bullshit. Plus, it doesn't really make modeling faster assuming you've built out your source numbers/assumptions well and use best practices (like A24+A25+A26 instead of A25+A26+A24).

>I personally find that formulas are much easier to review, because the named ranges provide some intent. If someone writes =(A2Assumptions!92)/Assumptions!91 I've got to really unpick it to work out if it's right, but if someone labels it =(A2Miles_Per_Hour)/Average_Miles_Per_Vehicle then I can see that the formula is wrong almost instantly.

Use tracing, and if the assumption tab is well built out it really isn't faster, at all. +alt w,n

>Additionally if I want to write another formula using those values, I can just type it straight into the formula bar without having to go and click on the right cell reference in another sheet.

Use multiple windows, makes life much easier.