Hacker News new | past | comments | ask | show | jobs | submit login
The Tyranny of Spreadsheets (timharford.com)
405 points by shortleash on July 22, 2021 | hide | past | favorite | 276 comments



Fantastic blog post. I highly recommend reading it in full, and also checking out the work of the European Spreadsheet Risks Group, and Felienne Hermans specifically (referenced in the original post). I've been working on a spreadsheet startup [1] for the past 8 months or so and those folks have a large amount of absolutely upsetting-but-helpful research on spreadsheet usage/errors.

After the past few years working with spreadsheet power users, here's how I like to think about them: 1. Most spreadsheets are just used a lists/trackers [2]. 2. Some spreadsheets are very calculation heavy, and are better understood as complex software projects (usually modeling something; s/o to those PE mfs) than as anything else. 3. Spreadsheets make the transition between (simple list/tracker) and (complex software project very fluid. This flexibility usually means that the (complex software project) that is created is buggy as hell.

Spreadsheets give not-super-technical users an incredible visibility into their data. Spreadsheets give not-super-technical users a way to program data transformations in what I would argue is the most generally intuitive way that exists. Spreadsheets give not-super-technical users tools to build software without introducing them to proper software development methods. Ya know, like maybe a test. Or a code review. Or no global variables. Etc.

If you want to see how we're attacking the spreadsheet problem, check us out. Feedback highly appreciated! [1]

[1] https://trymito.io/hn [2] https://www.joelonsoftware.com/2012/01/06/how-trello-is-diff...


>Spreadsheets give not-super-technical users a way to program data transformations in what I would argue is the most generally intuitive way that exists

I don't think spreadsheets are a good way to program data transformations at all. Data transformations are inherently a pipeline/function abstraction, and even non-technical users understand that. Spreadsheets don't offer a good way to decompose a data transformation process and inspect each step for correctness. This alone makes them prone to calculation errors. Also, tables are an afterthought to spreadsheets, therefore even trivial joins are always a pain in Excel.

I have a good practical experience with building visual pipeline-based data transformation tools [1] for non-technical users and they have been a success.

[1] https://easymorph.com


> good way

But it is an accessible way. Your average Excel user can perform any number of useful transformations with simple formulas, and then throw that into a pivot table for useful aggregations (which is also a transformation).

> tables are an afterthought to spreadsheets

Excel power users have been solving this with VLOOKUPs for decades.

> therefore even trivial joins are always a pain in Excel

Trivial to a trained user. Excel has more advanced data modeling capabilities, which includes implicit joins when creating PivotTables from tables with relationships.

https://support.microsoft.com/en-us/office/create-a-data-mod...


Lookups in excel are fine, at least now we have XLOOKUP but there's something depressing about an excel model which contains multiple massive grids of lookups. I find such things are much easier in a real join, where you can get your data into long format then write a single line of code to manipulate meaningful named columns.


The link leads to an article on PowerQuery which is a Power BI product. PowerQuery can surely do joins, but it doesn't operate within Excel's data model - cells and sheets. It's an external tool from the Power BI family from which integrates with Excel to a certain extent.

In a similar fashion you could've posted a link to running an SQL join query from Excel (which is surely doable for a trained user). Technically it's possible, but it would be wrong to pretend like the join is done in Excel.


> but it doesn't operate within Excel's data model

Power Query is an Excel add-on, and has been around much longer than Power BI and has nothing to do with it. This has been possible for almost a decade, since Excel integrated a robust in-memory columnar database capable of dealing with millions of rows.

> it would be wrong to pretend like the join is done in Excel

The join is done in Excel, I don't need to pretend. That you think Excel is just a bunch of sheets and cells is your own mistaken mental model of what the program is and can do.

Here's another reference:

"Now that Excel has a built-in Data Model"

https://support.microsoft.com/en-us/office/create-a-relation...


Can you point me to an article that shows a simple way how to do a simple SQL-like join (no aggregation) of two regions in the same spreadsheet? Two source tables, and one result table - all in the same spreadsheet.

Something that is as simple as this: https://www.youtube.com/watch?v=RYCtoRTEk84


I sent you two links with information, including images that show the process.

Here's a video of the process:

https://support.microsoft.com/en-us/office/merge-queries-and...

Here's another video:

https://www.youtube.com/watch?v=BV3srtI20Bo


Thank you. I didn't know this. I wouldn't call the process easy, but it's relatively straightforward.


> Data transformations are inherently a pipeline/function abstraction, and even non-technical users understand that. Spreadsheets don't offer a good way to decompose a data transformation process and inspect each step for correctness.

They absolutely give you that ability; you break the transformation across many cells. Spreadsheets are probably the most successful functional reactive programming paradigm set of languages out there.


>you break the transformation across many cells

In theory, yes. But the reality, nobody does that unless an error is apparent and needs to be investigated. The vast majority of calculations in Excel spreadsheets are done using cumbersome, multi-line, unreadable formulas that try to pack as much logic as possible into one expression. Such obscure formulas are the main reason why lots of business-critical Excel spreadsheets contain grave errors, many of which are are never discovered.


It is like programming, there are people that keep doing goto and methods that take two screens in Excel, and there are those that bothered to learn lambda, powerquery and addons.


The fact that you had to use so many big words to explain why spreadsheets aren't a good way for users to "write some numbers here and then calculate some numbers from those first numbers over here" says to me that your definition of "good" varies a fair bit from the definition these users are using.

Joins are great and relational databases are super useful but these peoples' problems are 99% solved by "store a list, maybe do some arithmetic."


Spreadsheets provide a spatial mapping (cells) to represent data transformations over time. Users are more comfortable working w/ spatial paradigms, as humans have a better sense of location in space than in time. Thinking in terms of "transformations over time" is difficult to reason about, specially in programming.


The point about spatial representation is spot on. The problem is that in Excel calculations tend to be presented as huge unreadable formulas with no intermediate steps. Thus the spatial representation actually only covers the first and the last step, but rarely something in between.


sometimes all you have is a hammer and a screw though. Not everyone is a computer scientist.


Sometimes I want to do something quickly, and I dump 20,000 rows from a db query into csv/excel and fiddle. Tableau is often nicer for pure visualisations.

The downside is that something useful you make keeps getting extended and added on to. It is hard to identify when you should switch and do that extra work to move it. And then, you know, tech debt forevs


I don't know why we think the user is some child and try to abstract away any and all complexity from their job, which ends up leading to some proprietary solution with huge inefficiencies somewhere compared to a flat file and a script in python or R. Software like excel is often seen as a way to do stuff you could do in R but without having to write code.

Imo that thinking is wrong because it makes this assumption that doing anything at all with a gui is easy and frictionless. I don't know how to do anything in excel. If I have to look up how to make a plot in excel, I will have to find some article on the internet of how to do that in my particular excel version. Excel is tough software to use that only becomes easier after you spend time with it (hint hint, just like programming). If I don't know how to make a plot in R, I might also have to find an article. It's no different, no harder, usually easier in fact, to just do the job in R. All these common tasks in R or excel are all 5 min articles worth of information you have to cram in your head.

It's not hard to code, children do it. I'd love to see spreadsheet power users realize that they could be working with much simpler things like parsing plain text files with a few lines of code rather than trying to open some macro heavy spreadsheet on the shared network drive that takes 5 mins just to open on the work issued workstation. I'd guess it would be up to business programs and accounting programs and all these other college majors to actually teach classes in python and R, rather than what they do now which is teach classes in excel. Imo the entrenchment in excel is rooted in ignorance to other (often simpler) options available with python or R, than in any actual critical proprietary features offered by excel. Microsoft is probably pretty happy that their software is so baked into these different academic curriculums, keeping it alive in industry going forward for an entire working persons career after college.


People aren't learning excel because Universities are part of some Microsoft conspiracy. They are learning it themselves, or from each other.

My mother self-learnt excel, first as a to-do list, then a money tracker, then she learned some simple equations to keep track of weekly spending.

How in R would someone do that? With a nice graphical view? I agree Excel has many many issues, but people use it because, in my experience, it super easy to use and in particular let's you easily mix data and code.


People are also learning excel because that's the only way to program in a big company without going through "official IT", which won't listen to what you have to say if you're not a $100 000 project they can offload to a third-party.


> People aren't learning excel because Universities are part of some Microsoft conspiracy. They are learning it themselves, or from each other.

At least in germany children are often forced to learn excel in school because the official curriculum includes "office software" which is an euphemism for "microsoft product training". This has certainly not developed from necessity. Even someone who thinks school should only be a preperation for the job market would agree that employees who are able to help themselves are better than ones who know how to use a specific version of Excel.


I'd rather everyone was taught Excel than a very minor set of people learn spreadsheets by themselves. Barely anyone uses spreadsheets in UK it seems


I'm really not against teaching spreadsheat software since it's a really useful tool. I just don't like that it's thought very microsoft specific and in depth. Students could benefit way more from some basic computer (science) literacy.


> I'd guess it would be up to business programs and accounting programs and all these other college majors to actually teach classes in python and R, rather than what they do now which is teach classes in excel. Imo the entrenchment in excel is rooted in ignorance to other (often simpler) options available with python or R, than in any actual critical proprietary features offered by excel.

Every time a business user sits at a computer, are they going to spend 10 minutes getting it done, or 30 minutes trying to figure out how to get pip env configured so they can maybe eventually get it done? If by some miracle they are supernaturally gifted at programming and succeed after some weeks, they now have:

  - a nonstandard file format that will eventually break 
  - no real time feedback of formatting or calculation as they input/manipulate data 
  - no charting 
  - no GUI 
  - no "undo" button 
  - no import wizards 
  - no support (but I guess someone light years ahead of any dev talent I have ever seen wouldn't need it)
And all of this for a single use case.

Microsoft (and I am no fan) has the best tool for data manipulation that exists for most of humanity. Excepting the awful ribbon UI redesign, you could take any Excel 97 user, put them in front of the latest M365 version, and they could get the same work accomplished in the same amount of time.

You cannot say the same thing about almost anything else in the tech industry. Excel is terrible in a lot of ways. But it's a mostly reliable tool, has worked approximately the same way for 25 years, so people are going to use it and teach it. There's no point in wasting someone's valuable time with the horrifying fragility of the python/R/whatever ecosystem.


there are languages that have better support with dependencies though, e.g. Julia (every time I have to debug dependency issues in some other person's Python code/requirements.txt, I find myself wishing it had been written in Julia). I'm not disagreeing with the overall tenor of your post, but I do think there's a medium that's possible to strike here: maybe something like a simplified DSL in Julia, or something like https://www.visidata.org/


> Microsoft (and I am no fan) has the best tool for data manipulation that exists for most of humanity.

Sigh... "Microsoft has". Time was, spreadsheets was a generic category of software, made by lots of different software makers. Nowadays, it feels like people not only don't know this, but can hardly even conceive of the possibility that "spreadsheet" could mean something other than "Excel". At least I think that more people are still able to think of, say, word processors or presentation software as not necessarily Microsoft products; the spreadsheets battle seems to have been their most crushing victory in the "Office" or "productivity software" wars.

It's sad not only for the IT world in general or from a free-market perspective, but for spreadsheets in specific, too: Who knows what stuff like Quattro Pro or Lotus Improv would be capable of now, if they were still around?


True, however Microsoft is not to blame for the management errors of others, and there are other spreadsheets on the market still.

Except instead of trying to be the C++ of Spredsheets, they target Go like capabilities, and then wonder why business keeps adopting Excel.


>It's not hard to code, children do it.

It is hard to code, and average children don't do it. The fact that exceptional children do it more than they do other adult things is because coding doesn't require a lot of expensive equipment and you can afford to do it as a kid.


Yes, and also I think also because it is possible for it to be a self-isolated system; you don’t need as much greater context of the world to make sense of its rules.


I strongly disagree with this. It is easy to code and any child or adult can do it. The typical problems we solve with code are hard, and can only be solved with code.

Consider the quadratic formula. It's trivial to write this code. That's not a hard problem to teach anyone to solve with code. Or calculating compound interest. Or solving payroll and basic accounting.

All these things are done with spreadsheets as the abstraction instead of code because of the myth that "code is hard." Code is not hard, it's just we use it to solve problems that are so hard we can't do it with spreadsheets.


The steps needed to convert the quadratic formula to code are absolutely not trivial, unless you're a geek. If I went to my mother trying to tell her to code a quadratic formula, she'd have no idea how to do it, even if I explained everything.

Spreadsheets are hard too.


> It's not hard to code, children do it.

My wife knows R and Python. 90% of the pain of any new project is in environments and dependencies. Some weird error message shows up because there are multiple python executables on her machine or it isn't speaking with jupityer properly and we are on a wild goose chase of googling and command lining.

For excel, you open it and run.

Even if you know how to code, setting up a coding environment has enough "wtf does this mean" roadblocks for nonexperts that it becomes a terrible experience.


>>> It's not hard to code, children do it.

I think a paradox of programming is that we know it's easy, yet we also know that a certain fraction of people -- perhaps a majority -- will never grasp how to do it effectively. And the ones who can do it well enough to get paid for it, aren't sticking around in low paying administrative jobs.

And coding well enough to really replace a spreadsheet without making something even worse is yet a higher level of programming skill.

Given the virtually ubiquitous knowledge that programming skill is of market value, I think the existing proportion of people who can program is indicative of how many people are potentially capable of it. I don't know a single manager or engineer who doesn't think that learning a little bit of Python would be useful.


Wow, I don't know how you arrived at the conclusion that coding is somehow easier than excel. Excel/Google sheets is tabular wysiwyg and about as simple as it gets... even for children who get acquainted with tabular data relatively early (reading simple scientific graphs, etc)


I never thought I'd be someone to do this but 'mito' is slang for pathological liar in French, i.e. short for 'mythomane' haha, kind of funny for something that's supposed to be a persistent source of truth. Names don't matter at all, but just wanted to comment on this. Lol!


“European Spreadsheet Risks Group” is the most European thing ever.


'000s of rows of data in a spreadsheet - sigh. But say 'database' to some people and they hear 'The Devil'.

Excel is a glorious tool which welcomes all, the savvy and the unskilled but imaginative newbies alike. There is something about all those little cells that presents an itch everyone wants to scratch, and you just know that for some that scratching is going to produce something akin to a spreadsheet version of gangrenous melanoma.

On the other hand, some people are happy to say that the complexity of sophisticated Excel models means they would have been better off built in a code interface. Ha ha ha!

Even with the right mix of functional and object-oriented code and suitable documentation and version control applied to code, I respectfully disagree.

Data handling capabilities could benefit from skill in all of these tools: error checking (including analytical review); pen & paper; calculators; databases; spreadsheets; math & stat techniques; Word processors (& clear logical explanations).

Together, they form a rich menagerie of tools which will probably all be around until we have an AI that follows us around and we just tell it what we want. Actually, even then, it would be good to know them so that we can grasp the underlying logic and frame the concepts leading to the actions that we ask the AI to assist with...


I think one of the main problem with databases is how do you populate them.

I wish there were simple-as-Excel frontends for databases where a normal user could input his data like in preformated Excel table without having to deal with the database mechanics.


> I wish there were simple-as-Excel frontends for databases

This is exactly what we are attempting to solve at nocodb : https://github.com/nocodb/nocodb

Which is, nocodb gives you a google-drive like collaborative spreadsheets on your existing databases (MySQL, Postgres etc)

And the original problem in article could be countered with nocodb as we keep audit of all changes done to the database.


This looks like a good tool.


> I think one of the main problem with databases is how do you populate them.

There are several products that will let you populate databases in a table view and easily import CSV files into a database. The problem is with almost everything else.

Databases are designed to address a different set of problems. The design of database applications reflects the needs of people who need to maintain a database and make sense in that context. Unfortunately, those design decisions do not make sense in the context of people who use spreadsheets. Doubly unfortunate, a spreadsheet makes a better database than a database makes a better spreadsheet. If anything, the shortcomings of spreadsheets implementation dependent in most cases while the shortcomings of databases are due to the domain they are designed to address.

First example: creating a table is more involved than creating a new file and being presented with a blank table. First, a database is created. Then a table is created within that database. Then columns are created with each column, as a minimum, having a type. Only then can the database be populated.

Second example: doing a quick calculation is more involved than finding an empty cell and inserting a formula. The only way I can think of doing a calculation within a database involves performing a query. If the calculations are performed within a row, a new column can be created to contain the results. If it makes sense to store a result for each row, but it pulls or consolidates data from other rows, the complexity of the query is going to be interesting. If the calculation involves several rows, there is no obvious place to store the result within the table. There are several ways to deal with that, yet none of them are particularly convenient. If you need to perform a calculation on arbitrarily selected data, well, good luck with performing the query. Oh, and automatic recalculate ...

Third example: putting the data into a presentable form is less convenient with databases. It certainly isn't done in the table view since that is just a raw presentation of the data. A form is useful here, but it is a separate step and it isn't going to address all of the functionality found in spreadsheets.


What do we know about most database projects? They need CRUD plus queries.

What's the easiest way to generate a CRUD form?

HyperCard.

We need a ubiquitous HyperCard equivalent that talks to the Real Database(TM) of your choice on the backend -- sqlite, postgresql, mysql/mariadb, Oracle, MS-SQL, whatever -- with easy functions for "get a sequence number", "get a unique identifier cookie" and "do this block atomically".

I bet there's a few out there right now.


Delphi has been around (and better than HyperCard) for twenty-five years. Free Pascal for at least as long, but it has only had the visual component aspect provided by Lazarus for ten or fifteen years.


Filemaker is a neat program.


Sadly databases are hard because data is hard. I've seen people trying to stuff one-to-many relationships in a single Excel tab duplicating the one side as many times as needed, e.g.

  Author_A Book_1
  Author_A Book_2
  Author_A Book_3
  Author_B Book_4
  Author_B Book_5
Another strategy I've seen is to stuff the many-side into a single cell, splitted with some less used symbol like |, ; or : . I'm curious to find what would they do when they meet a many-to-many relationship.

Horrors surfaced when we analyzed the data: unintended duplications because ids and uniqueness checking was not a thing, shifted columns, undocumented column "formats"... In the end we parsed the whole mess into a DB.

I don't see how you could implement a frontend for a database simpler than Access. It was as easy as Excel, with relationships. But reasoning a schema is harder than stuffing all things in a 2D table, and querying a multitable is harder than filtering a few columns.


I agree data is hard.

If we could provide a properly designed database which has already defined datatypes but that would be as easy to fill for a standard user as a preformated Excel table that would be great. You're right, Access has this simplicity, I just wish such tools would be available out-of-the-box for other DB engine.


The author/book example is actually the preferred method to format data in Excel. From there you can take the entire table of data and put it in a pivot table and answer most questions pretty quickly.

For the cells with multiple values, I agree that is a terrible idea.


This format is referred to as third normal form (boyce-codd 3NF). Also referred to as "tidy data" in the hadley wickham world.


Such tools exist, but aren't popular because they're frustrating. Relational databases are much more rigid than flat files. Columns have datatypes which must be enforced, there are foreign keys, some (combination of) columns may have uniqueness constraints, and so on.

All this adds up to a bunch of errors when you're trying to input and manipulate data, because what you've entered is violating some rule someplace in the database.

It would be awesome if someone could build an interface that gives users the flexibility of Excel, and communicated the issues with the data they're trying to write in a really intuitive way. (e.g. syntax highlighting for data with issues, mouseovers / tooltips, dropdowns & autocompletes for foreign keys, etc)


Microsoft Access?


Filmmaker Pro? Please, no.


What we really wish for is a modern replacement for MS Access


modern in what way?


"More Modern" such that it does not contain 8 year old unfixed bugs.

https://stackoverflow.com/questions/68034271/how-can-i-get-t...


You know what database doesn’t contain 8 year old unfixed bugs?

A database that’s 7 years old.


Faster, better at handling multiple users, smarter, web based.

And also free with whatever package you're already paying for :-)


Written in javascript


And wrapped in Electron.



MS Access can be directly linked to a spreadsheet for upload / download, or you can copy and paste data directly or import data files. Other databases (MySQL) also allow for import of CSV data.


MySQL connector for Excel.


My preferred database is MongoDB, which comes with a GUI called MongoDB Compass that allows you to visually explore your JSON documents with full CRUD functionality.

https://www.mongodb.com/products/compass


There used to be many such tools, or tools that tried to do these things.

We lost them.


Say "database" and IT says "no, unless we spend a year reviewing requirements".

This is only a failure story for every software development route that involves procurement, which is every route except Excel. Those routes tracked, within the only timescale that mattered, precisely zero patients.

You want to give smart users DB and DVCS tooling? I'm right behind you. Start asking why they don't have those things already.


I found your point to be incomprehensible.


This was adapted from Tim Harharford's podcast Cautionary Tales eps Wrong Tools Cost Lives

https://timharford.com/2021/05/cautionary-tales-wrong-tools-...

If you have not listened to the series I highly recommend it. His episode "LaLa Land: Galileo’s Warning" is by far one of my favorite pieces of media. In brief it is why redundant tightly coupled fail safes will often lead to cascading failures.

https://timharford.com/2019/11/cautionary-tales-ep-3-lala-la...


To what extent is the Galileo effect described in that episode really just a kind of survivorship bias? If you've eliminated all the failures that can be caused by non-redundant systems with no failsafes, then the only remaining failures are those that can be caused by redundant systems with failures (or at least slipped through). The way to see whether those redundancies and failsafes are working is to see whether the overall failure rate is decreasing, not solely whether failures are being caused by those redundancies and failsafes.


As a DevOps consultant, I highly recommend this series. The amount of things that can go wrong in the real world is jaw-dropping, let along what can happen when computers get involved! My DevOps buddies love this stuff.

I also adore the specific episode mentioned by Jaepa. It clearly describes why "hey this bandaid helps us, so let's do more of it" can actually cause problems, vs helping us be more safe.


For more by Tim Harford checkout the "More or Less" podcast "explaining the numbers and statistics in the news and in life."

https://www.bbc.co.uk/programmes/p02nrss1/episodes/downloads


Oh man, it reads exactly like his podcast 50 things that made the modern economy. I was having flashbacks but couldn’t pinpoint why. Thanks for sharing.


I really like this series and I can wholeheartedly recommend it to anyone that is on HN.


The thing that's always baffled me about Excel is why you must always work in "minified" mode when composing formulas.

It seems like just adding the ability to spread a calculation out over multiple lines and add some indentation would make the bugs everyone complains about go down by... a lot.


You can just spread the formula over multiple cells. Instead of

  A3: IF(<boolean>, <result if true>, <result if false>)
where each of the three parameters are complex formulae, you can do:

  A3: IF(B3, C3, D3)
  B3: <boolean>
  C3: <X>
  D3: <Y>
Not only is the formula now broken down into simpler chunks, you also get to inspect the component results (like watches in a breakpoint! sorta...). Then you can just hide the relevant columns if you like (B,C,D in this case). You can even use a separate sheet and hide the whole sheet if you wish.


You say this but it still feels gross to do.


No, it wouldn’t. Case in point: Excel has ability to spread a calculation over multiple lines, but the bugs are not down.

In fact, people who are aware of Alt-enter produce buggier code: they end up writing longer formulas, with fewer intermediate results displayed, and have less visibility of the functioning of their spreadsheets.

Write simpler formulas.


> Write simpler formulas.

Excel's formula language seems deliberately designed to prevent that.


They mean simpler formula per cell, by splitting the formula across multiple cells (as I mentioned in my sibling comment).


A nice shortcut is Ctrl-Shift-U to toggle single- or multi-line view.


I'll give a free cookie to whoever gets this done.

Sometimes I'll get these spreadsheets with byzantine formulas that I have to copy it to a text editor and format myself to make sense of all the parenthesis.


It's called "VBA Macros"


Extended with Lambda, PowerQuery and AddOns.


But you can just drag the input field larger and add line breaks?


You could, but when you finally press enter it's all compressed back to one line.


It doesn't get compressed back, I often break complex formulas into multiple lines with indentation.

The only disadvantage is if someone else isn't expecting the formulas to be like this, then gets confused when they can only see the first line.


Huh. Well, LibreOffice still can't preserve the formatting. Not sure if the last version of real-Excel I tried could.

Having the compose box fit itself to the formula size or give other indication that there is more to see is still a head-scratcher why they didn't do it.


> real-Excel

???

:-(


A nice post, brought low by the unfortunate fact that it misses the primary reason we were able to eradicate smallpox (and it wasn't data tracking):

smallpox had no animal reservoirs

Most viruses (and other diseases) have animal reservoirs, and that's why few diseases which ever get widespread, are ever eradicated, and it's nothing to do with spreadsheets.

Now if he wanted to make a point about data, it would be good to mention that we have very little data on, say, how many species have which of these diseases, and how that's changing over time. But that wouldn't have much to do with spreadsheets either.


The article makes two points. First that data is important but must be curated and maintained. Second that spreadsheets require specific methodologies and expertise to prevent data quality issues from propagating.

I don't think that part of the story had anything to do with spreadsheets. It just illustrated the value of (timely) data, especially when fighting a virus and specifically in distributing scarce vaccine doses.

A virus with animal reservoirs could similarly be controlled based on monitoring where outbreaks occur and taking some action. That could be distributing vaccines, limiting access to the animals or taking some action to reduce the animal reservoir.


In my last company we keep a spreadsheet that has tabs occupying almost the maximun number of columns allowed and in different headers. Updating that Excel file needs the full team to work for two full days each month. That was after I created a VBA script to scrape data from internal dashboards and put them directly in the spreadsheet.

From my experience these monstrosities are usually created by two reasons: Push from business side and a data team too slow to react. Thus the analysts have to create the first version that seems to be OK and smart but eventually evolved into a monstrosity.


We use spreadsheets to manage data that would 1000% be better suited for storage in a database (and extracted the data would be sooooo much simpler than my monthly half day job of updating management reports).

Why? Because we don't have access to any databases. I dread to think of the requests and committees I would need to find and traverse to get a database set up for my team, let alone a system that I can write and run code on. Yet we can guarantee that every internal and external user has access to Excel.

Unless you are working for a software firm, or are part of a dedicated analytics team with access to suitable systems, large companies and government pretty much have no chance of being able to roll your own database.

Don't suggest Access. Never suggest Access.


Now that you mentioned, we didn't have access to SQL Server dwh until 6 months before I left for good.

Can you try sqlite? It's usually enough for simple things I think.


But then we need something to interface with the db. Getting any programming language installed isn't going to happen in government.


Best of both worlds: libsqlite.dll is almost certainly on any corporate Windows rollout somewhere already, and you can load it into Excel as an extension for a pretty front-end that happens to have a programming language built in.


Nocodb is trying to solving the issues of implementing a spreadsheet-like frontend on top of a database backend https://www.nocodb.com/


Maximum number of columns is 16 384 (in modern Excel). PostreSQL has a limit of 16 000.

So Excel > PostrGRE? :)

I wonder what kind of data did you use and why it had to be stored in so many columns. This approach would probably kill a "real" database too.


The reason why you often encounter spreadsheets with so many columns is because spreadsheets really, really want data to be flat. Analysts often like to have all the possible datasets pre-joined into one monstrous sheet, then they can easily slice and dice it up however they want.

With a relational database this would be kind of insane, as it's much easier to normalize data into separate tables, then just join up them up when you need to.


PostreSQL has way more data types than Excel. You can have arrays, json, xml, ranges, polygons, etc. in columns (https://www.postgresql.org/docs/current/datatype.html)

Because of that, chances are the natural format for a data item taking 16,384 columns in excel likely takes fewer than 16,000 columns in PostreSQL.


Oops I definitely got it wrong. Probably a bit more than a thousand columns now that I think about it. Still a beast. The thing is a database with every column a product id.


I was thinking maybe that meant an older version had a limit of 1024, but according to https://www.askingbox.com/info/xls-and-xlsx-maximum-number-o... it has always been 256 for .xls and 16384 for .xlsx.


Excel has its flaws, as do spreadsheets in general, but the upside is that spreadsheets allow people to become programmers who never would grok programming in purely abstract terms.

Spreadsheets allow the user to see each intermediate state in their application, as well as each iteration over the input data set, laid out in a 2-dimensional grid. There just isn't any substitute I've seen that is as accessible to the computing "layperson."


I really don't think excel is accessible. I don't know how to do anything in it. If I had to make a plot I'd have to search for an article that would probably be about the same word length as the one I'd find searching "how to make a plot in python." Imo excel isn't easier for the lay person, it's just entrenched because you have people now in the workforce using excel since windows 95.

It allows people to become programmers by making them learn this clumsy tough gui that isn't readily apparent how anything works, and changes between versions. I don't think the learning curve is any easier than what it would take to make a person actually into a programmer with a language like python or R. A two hour tutorial in either language is probably all you need to do 90% of what people use excel for. Why not just skip the abstraction and learn how to actually program if you have to spend time learning how to faux program in excel anyway?


In excel, I can make a plot by typing in my numbers, then clicking the 'plot' button. Excel shows a selection of plots of my data and lets me choose the one I like the most.

To do that in Python, I'd have to first choose a format to enter my numbers (I'd probably use Excel, then export CSV). Then I have to install Python, pip install a graphing library, import the CSV, then plot. I have to go through the different types of plots one by one.

I've taught python and graphing, you need much more than a 2 hour lesson before people can be left alone without help I'm afraid, if they have zero programming experience.

I don't claim Excel is good programming practice, or scalable, but it is much easier than Python or R. The best future thing (I think) would be an excel like interface which would show you, and let you edit, Python that did what you'd just chosen in the GUI.


Because the layperson I'm talking about can't think in terms of abstractions, kind of like a math student who hasn't learned algebra yet. They can only verify the correctness of their algorithm by inspecting the intermediate values at every step, and scrolling down through the input set to inspect the corner cases. The problems with this approach are extremely obvious (which is why spreadsheets error out so often, or even worse, produce the wrong answer without any visible errors), but in many cases this is the only way for the people with the domain knowledge of what needs to be done to build an application to meet their need (they don't know how to code and they don't know how to write a functional spec).


This is a really important point that underscores the persistent tech education gap that exists across our economies. Mike Duncan observed some time ago that our politicians can barely handle e-mail on their own, making them particularly vulnerable to infrastructure disruption. Low and behold, we've now got an army of firms _legally trading_ in tools whose only purpose is to disrupt (or capture) infrastructure.


Conversely, I am a professional programmer and I haven’t found any easier way to plot your data. What could be easier than highlight your data, insert, scatter chart?


I get paid way too much to develop software, but most of what I actually do is work with data in Excel. There is no easier way to get quick answers on a small amount of data regardless of skill level with alternatives and the ability to then share those answers and the work leading to them with non-technical stakeholders is what really separates Excel from the "just use Python/awk/Jupyter/ad hoc solution X" sprinkled all over this thread. With Excel, there is a single solution that works for me, my engineers, my PM, my program manager, my account team, my customers, and my finance stakeholders. Asking any of those parties outside of engineering to go learn R would be quite embarrassing.


I think I big mistake people make here is that they think they are far better at everything on a computer than a non-programmer. They don't realize how much you can do in Excel and how much they suck at it. If you can't use it without a mouse and/or if you don't know pivot tables, you are just as much of a beginner as a C++ programmer who doesn't know what a pointer is.

It's only after trying to convert an excel sheet to [insert your favorite language] that they realize it would take them 6 month with a team of 5 to replicate what a single accountant did in a week.


Indeed. Excel gets a lot of (pretty deserved, actually) shit from our industry, but the reality is that Excel does what computers are meant to do: enable users to create sophisticated applications[0] that solve their problems with no coding expertise required, nor kowtowing to some developer, with the option to onramp into adding some code via macros.

Excel is the most widely used programming environment on earth because it is so good at it. That said, I think the fact that we haven't made anything better speaks to the change in philosophy our industry has undergone, from one where we wanted to enable users with computing to one where we want to herd and farm them for money.

[0] yes, both 'sophisticated' and 'application' are defined pretty loosely here.


Opposing anecdote: I have converted excel spreadsheets built over years to SQL, and typically it did not take weeks to do that.

( Any organization that has a smallest reason to care about their data should remove save button from excel and start educating their personnel. Using excel in any important role should be seen as making the eventual mistakes on purpose and someone should be kept responsible.)


Getting rid of lousy Excel-driven processes is a big part of my current job. SQL only solves half the problem, though: Excel is also a tool for manual data-input and manipulation. To solve this piece you often need to create CRUD webapps, which can be much more complex to develop and maintain than some SQL in the database.


> To solve this piece you often need to create CRUD webapps, which can be much more complex to develop and maintain than some SQL in the database.

Why do they need to be webapps? If you've done it with Excel so far, you're obviously on a LAN, so why not traditional client/server native apps?


Deployment and authentication is going to be a problem, but the big thing is that in many such environments users would simply not be able to execute your native client app on their controlled workstation by default. But for most software teams nowadays creating a CRUD webapp is simpler and cheaper than traditional client/server native apps; I've seen major companies maintain their existing client/server native apps but whenever they have the choice (e.g. greenfield development), they'd rather not have a native app because, frankly, there's no good reason to do so as the maintenance is much simpler if you don't have to worry about the deployed client instances.


> I have converted excel spreadsheets built over years to SQL, and typically it did not take weeks to do that.

The "years" of work are typically not for implementing the spreadsheet, but to define what it does. The requirements evolve over many versions.

Of course it's fast to re-implement it after all the work to define the requirement is done, but it's only possible because Excel allowed all those prototypes.


Grandparent claimed that something done in excel by one person in a week takes a team of five and half a year to reimplement in some other language. I disagree with that as a general rule (of course, exceptions may apply, as usual).

I fully agree that Excel is great for quick drafting, visualizing data quickly, and prototyping. But it should be left there. Anything you do that lasts even overnight and has any significant numbers in it should be done with something else than Excel.


Fair enough, the issues for me are that: once something works in a spreadsheet there are no money assigned and willpower to redo it clean in something else, because most of the value is already there; and the people who are available and know the requirements aren’t skilled in anything else.

So prototypes are done in Excel because it’s the fastest and cheapest way to do it, and they don’t get redone in something else afterwards for the reasons above.


> done in excel by one person in a week

Done by a domain expert in their field. If you had to create the project from scratch without aid of an existing Excel file to clone, you would not be able to do it even remotely as quickly as the domain expert, if at all.


> and typically it did not take weeks to do that.

You're not seeing the process and effort that went into it, which can be considerable.


I think you misread my comment. Some of the excel sheets were a result of considerable effort. Years of different people working on them. Converting those to SQL was not a tremendously large effort. That was made by me, so I kind of know the effort:)


Many businesses risk it all by building actual applications with Excel. Security, version control, automated testing are pretty much set aside. Case in point: https://insideparadeplatz.ch/wp-content/uploads/2013/01/ubs-...


It's not like they had a meeting where a consultant said "We reviewed all your requirements, and you could build this with normal software engineering practices, or we could build this in spreadsheets with associated deleterious effects on privacy, correctness, durability, extensibility, readability, and interoperability, likely resulting in an estimated $N to $NN million shortfall in 5 years. What do you want? Yeah, let's do the spreadsheets."

It's a two-pronged attack of using tools that you have figured out as you go (not everyone is an educated, practiced, professional programmer, or even good at Excel) and a sunk cost fallacy/resistance to changing what "works".


Spreadsheets are like the ubiquity of motorcycles in 3rd world countries. Jumping on one and blasting off without any safety equipment is cheap, fast, everyone uses them, and they get 80% of the job done. And they will kill you, but that's life.

The thing about spreadsheets is you don't have to commission bespoke software. And just the value of that swamps everything else in the vast majority of cases.

In small businesses I doubt the success rate is anywhere near 50% if a non-technical person recognizes a business need for a software solution, hires someone to build it for them, and then it actually reaches a useful conclusion. But if you are numerate at all you can drive a spreadsheet. Not having to swim with the sharks for months or years while you hemorrhage cash is hard to appreciate for the sharks (us, the people who build technical things). Most small businesses and many large ones don't have anyone who can manage even contracting something like this out.

If things have gone on long enough that the problem is stable, clearly defined, and the costs of 3rd order effects like lack of revision control are losing enough money to justify the risk of a custom software project, sure, it needs to be done. But for many companies that risk will always be bigger than the ones that come with spreadsheets.

ERP systems are supposedly finished software that you deploy and tweak for particular business needs. The success rate on deploying those is waaaay below 100%, and the expenses are huge. If your company or department is non-technical the time for custom business software is usually when your hair is on fire and it's the only option.


Very true, especially for small business. But large multi national enterprises are also doing this.


I worked at a company that did product testing from spreadsheets. And not statistically modelling them. Like physical products. The spreadsheet logic was controlling power supplies on a physical test bench.


Impressive.

I worked at a financial software firm written in python that used Excel behind the scenes to run calculations and modelling.


> written in python that used Excel behind the scenes

That's from my worst nightmares :(


* The Utility of Spreadsheets

No single application has been as widely adopted by as wide a number of people for such a wide range of uses as the spreadsheet. Love them or hate them, but the spreadsheet metaphor is at once highly useful, highly adaptable, and highly usable requiring minimal support to extract needed value.

The spreadsheet, with all its questionable glory, is here to stay. From the first Visicalc to Excel 2300 I don't think we'll see the end of spreadsheets for a long time coming.


This! As programmers, it is easy to list N reasons why spreadsheets suck and deserve to die. But the reality is that it is the most ubiquitous business software and the non-tech/business users would rather be in a spreadsheet than anywhere else. Companies run on spreadsheets, even if they have the best "data stack" and tools. Because few, if any, software can beat the

* flexibility (throw data anywhere and link it to each other, make edits, write notes),

* power (formulas, etc), and

* familiarity (the most underrated factor) of spreadsheets.

Nothing else allows the non-tech user to feel empowered like spreadsheets do.

Spreadsheet evolution has been slow though. Google Sheets added cloud + collaboration 10 years ago. We (https://coefficient.io) are adding the layer of connectivity to sheets so they can remain in sync with the actual sources of data (Cloud apps like Salesforce, DBs, BI tools, etc) so sheets actually become "live" (even though they have been in the cloud for a while) and to reduce manual work and increase trust/accuracy. There is so much more that can be done to leverage this largest software platform that is out there.


I used to work at in construction as an estimator and we used spreadsheets. No backups, no version control, a single value change took 10s of minutes to propagate through the many sheets in the file. We fixed errors when we saw them, but I always felt it was an exercise in pointlessness.

These were not small companies either, ~$800M in yearly revenue for one, and ~$150M for the other.

Then, afterwards, the end value would be massaged to what felt right. And yet, these sheets were seen as part of the "secret sauce" of the business.

It's one of the reasons I really wanted out.

Programmers have a reputation of being arrogant assholes, but I think this push-back and ridiculing other industries of using excel for stuff like this is completely justified. Excel spreadsheets let these people FEEL productive and like masters of their own fate with a bunch of numbers neatly encapsulated in their own little cells in a table, but their actual usefulness is questionable. For construction, it gives a rough feel for a project, but a lot of it is smoke and mirrors.


I can't remember where I read it but at one point Boeing was maintaining the entire bill of materials for one (maybe more) of their aircraft in a spreadsheet. I tried to google it but could not find the reference. I did find that Boeing actually had their own in-house spreadsheet software for a while, which was kind of interesting.

https://en.wikipedia.org/wiki/Boeing_Calc


> I used to work at in construction as an estimator and we used spreadsheets. No backups, no version control, a single value change took 10s of minutes to propagate through the many sheets in the file. We fixed errors when we saw them, but I always felt it was an exercise in pointlessness. These were not small companies either, ~$800M in yearly revenue for one, and ~$150M for the other.

Jesus... want to start a competitor to these guys!? Email me


I think I big mistake people make here is that they think they are far better at everything on a computer than a non-programmer. They don't realize how much you can do in Excel and how much they suck at it. If you can't use it without a mouse and/or if you don't know pivot tables, you are just as much of a beginner as a C++ programmer who doesn't know what a pointer is.

It's only after trying to convert an excel sheet to [insert your favorite language] that they realize it would take them 6 month with a team of 5 to replicate what a single accountant did in a week.


Entirely true. Many physicists also basically live in Excel. And replicating the functionality of pivot tables isn't at all trivial and they also have a lot of helpers by now. In many cases you just need to click it and have all column neatly formatted so that you don't have to do much anymore.

I also have seen Access applications with more features than a common Salesforce CRMs. I have seen it as a navigation system that could bring you to the nearest partners or shops that provide necessary parts. Yes, there is code of doom behind it and if those maintainers leave the company, problems arise. But the feature set was often extremely hard to beat.

And I am not sure how to respond to non-technical users about better alternatives. They use it as a front end for SQL manipulation and it isn't easy to come up with something better for that target group.

Even worse is SharePoint. It is a complete abomination, it can easily beat most horror novels. But the workflow engines are just extremely practical for corporate processes. Only recently some alternatives came to the market... I just wish I could kill it...


> Spreadsheet evolution has been slow though.

Excel has to contend with nearly 40 years of backwards compatibility (MultiPlan, the predecessor to Excel, was released in 1982) and a deep userbase that literally has decades of experience and muscle memory with the software. The Symbolic Link "SYLK" file format introduced in MultiPlan is still supported in recent versions of Excel, leading to the infamous CSV "ID" issue.

Many of our users still run very old versions of Excel and Windows (e.g. Excel 5.0 on Windows 95) because a change in a future version of Excel caused problems or gave different results.


> Nothing else allows the non-tech user to feel empowered like spreadsheets do.

I'd say it allows the same for technical users as well since it helps bridge communication/knowledge gaps and move things along. It's typically not the solution but often plays a critical supporting role.


You're correct, but this entirely misses the point of the article. It's not calling for the end of spreadsheets -- it's stating that there's an unspoken problem that demands our attention.

Spreadsheets aren't 100% reliable for use cases where you need to collaborate and share immutable health records. Especially during a time of global emergency when tensions are heightened.

Spreadsheets don't impose validation, schema correctness, constraints, etc. and can amplify human errors. They can also inject errors of their own (eg. turning March1 the gene [1] into a date) when they're simply trying to be helpful.

How do multiple people manage a spreadsheet? How do you safely merge spreadsheets? How do you keep records from being duplicated or replaced? How do you do double bookkeeping? Can you atomically identify individual records?

This article is saying that we have to realize spreadsheets can be a source of scientific error simply because of their design intentions and ergonomics.

The closing remarks estimate that 1,500 people died as a result of spreadsheet error. That's remarkable.

[1] https://www.proteinatlas.org/ENSG00000145416-MARCH1


You're also correct, but I think miss the point as well. The spreadsheet isn't going anywhere. Curse it all we might, blame it for deaths or attribute any aspect of malignment to it, and you'd be right. But the spreadsheet isn't going anywhere.

Spreadsheet mistakes have been directly attributed not only to thousands of deaths but billions of dollars of mistakenly wasted money. Heck, in the UK alone, they have attributed billions of pounds and thousands of deaths specifically to spreadsheet errors and flaws, and that's not even counting the COVID-related issues from the article. Check this scary article: https://theconversation.com/excel-errors-the-uk-government-h...

There have been hordes of purpose-built apps that have aimed to replace the common spreadsheet for any number of tasks from inventory management to financial planning to even tracking nuclear weapons (true story: our nation's nuclear weapons stockpile is tracked in a spreadsheet). And yet, the spreadsheet is still here. The spreadsheet is the cockroach of all apps. At once utterly adaptable and seemingly indefeatable.

The spreadsheet is a horror show with lack of control, schema, and collaborative management. We surely can do better. But we haven't. Google and Microsoft have the best developers in the world who can do anything. And they have produced more spreadsheets. But spreadsheets are the bane of our existence! And yet they are still here.

The spreadsheet has survived from the mainframe era to the cloud-based era in pretty much the same form with enhancements. The spreadsheet is not going away, even with all the complaints in the article. The points are well made and well founded. But unfortunately they miss the point. 20 years from now we'll still be cursing the tyranny of the spreadsheets because of their utility.

I close with a Haiku:

| Rows and cells, alas |

| What makes your simple structure |

| So very useful? |


It is very clear that they are useful but it is also very clear that they can be problematic. In Finance people check each number - multiple times per day - because there is no way to write tests in Excel. Sounds good? Until you can trust the people crunching the numbers, of course. Would you build a bridge like that? Or a building? I think it is worth to talk about this problem and, at the very minimum, inform people there are alternative ways to crunch numbers that are more reliable than Excel.


Lots of engineering calculations are indeed done in Excel, it's very common.


> Would you build a bridge like that? Or a building?

That's exactly what's being done. I work in mechanical engineering and Excel is the primary calculation tool all over the industry.


It would be hard, but I think a new kind of spreadsheet that was easier to validate, easily impose a schema and constraints and helps us identify human errors would be helpful.

I bet excel has some excellent features around this, but they are not front and center in the UI.


Lotus Improv was a bit like that I think. It's been a long time since I used it. Wikipedia says:

Conventional spreadsheets used on-screen cells to store all data, formulas, and notes. Improv separated these concepts and used the cells only for input and output data. Formulas, macros and other objects existed outside the cells, to simplify editing and reduce errors. Improv used named ranges for all formulas, as opposed to cell addresses.

https://en.wikipedia.org/wiki/Lotus_Improv

Like most of the software on the NeXT platform, it was ahead of its time.


> easier to validate, easily impose a schema and constraints and helps us identify human errors would be helpful.

yeah, it's called IF


Right.. at my company we had to purge spreadsheets from all processes due to these issues. Instead we allow import and export of spreadsheets but they aren't the source of truth.


> From the first Visicalc to Excel 2300 I don't think we'll see the end of spreadsheets for a long time coming.

We might even argue that spreadsheets were already used in 1295 BCE, as shown in Fig. 1.1 of [0].

[0]: http://uruk-warka.dk/mathematics/ER6%20tables.pdf


Whoa!


Spreadsheets are the private motor vehicles of computing.


I think the best approach in many cases is to bring data management into a specialised application, rather than using spreadsheets as a database. They can use the CRUD applications we all write or some off-the-shelf solution, but the important thing is that it provides validation, schemas etc.

Then you provide a spreadsheet export or direct connection that can be hooked into another spreadsheet with all the formulas, pivot tables and so on.

This way the core data is safeguarded and kept clean while not requiring constant development for every report requested.


Personally, I strongly prefer working with GSheets to Excel. It handles large sheets better and you never worry about the app crashing and taking your precious work along with it.


Instead one only needs to worry that one can’t get one’s work done.

I don’t think I’ve ever had a time when I wanted to use a spreadsheet for anything moderately complicated where gsheets was up to snuff. That’s not to say that excel doesn’t have problems or does things easily, but it doesn’t lead to quite the same level of frustration and hair-pulling, and indeed it’s usually possible to achieve things in excel when they are impossible in gsheets. Indeed that app even struggles at the ‘easy’ task of being able to fiddle around with table formatting. Its sole advantage is the collaboration. It can be used as a place to dump small amounts of varied tabular data to share but not really for calculation or analysis.


As far as competing anecdata, I found gsheets absolutely much better for joining together multiple sheets of data. The lookups are so much less touchy simply because "it's all in the cloud" (as long as you have permissions).

With Excel you have to ensure the file is there, and readable, not tampered with, etc.

This allowed us to create better multi-workbook "apps" that have a bit more modularity - you could even embed sheets into a webpage as a view-only client.

In the end, Excel is good for us only for interoperability with external groups.


I have. Especially before Excel released their dynamic array features. I love the sort function in gsheets. Combined with filters it allowed me to automate things that relied on vba in Excel.


Google Sheets has more useful formulas than Excel. For example query(), but also all the array formulas that they include.


Wait, seriously? I find the exact opposite! I would love it if Gsheets were as performant as Excel with large tables but scrolling gets bogged way down. If you add too many formulas, Gsheets will get stuck recalculating formulas endlessly. I love Gsheets because of the ability to link between documents easily, their comment system, filter views, and native regex… but performance?? It’s terrible!


Large sheets? Google Sheets has a much lower cell and row limit compared to Excel


I assume they mean performance wise. Excel could have larger hard limits but handle complex sheets worse.


Google Sheets is an unusable dumpster fire. I just tried to import a 3 million row CSV yesterday and it wasn't able to do it. INDEX/MATCH don't even work properly! Basically unusable unless you need it for scheduling parties or whatever people use Airtable for.


Have you considered using a different tool for the job? Processing 3 millions rows of CVS sounds like a task that is more fitting for a small script than Excel/Google Sheet.

Google Sheet excels at many other tasks, but it's not a replacement for specialized tool (and it should not).


Yes of course, I originally generated the file using Pandas but it's always a pain in the ass to remember the magical incantations to make pretty 2D graphs that aren't fugly. That being said, I have a knack for making great 3D graphs that are awesome! In any case, it's hard when my employer doesn't just gimme that juicy Excel subscription because cost-cutting and efficiencies.

In the end I put on my robe and wizard hat and used matplotlib. (╯°□°)╯︵ ┻━┻


Seaborn in Python to make matplotlib pretty by default + plot some uncommon graphs. `%matplotlib widget` to make matplotlib interactive. Plotly in Python if you want decent graphs that are also interactive by default.

Observable if you want an interactive, hosted visualization notebook (and all their libraries are open source, except their UI). Plotly, Observable's Plot library, or D3 if you don't mind JavaScript. Can bundle libraries + data + visualization into a single html file and deliver that as an interactive report; once you have the base template down it's decently ergonomic.

Plotly is my current default when I'm working with Jupyter, though. Sane defaults are nice.


I'd love to learn if I'm doing something wrong, but Sheets has terrible performance compared to Excel in my experience.

Not just sheet sizes, but Sheets feels much slower and has a vastly weaker toolset. Just the fact that the browser takes precedence for keyboard commands from the web app makes the usability suffer massively.


And controversial as this may be, JavaScript.


performant JavaScript exists apparently - I was shocked to learn that there are some new DeFi projects that are executing DEX arb bots and write all their code in TypeScript (lmao).

It will be extremely funny to see how Google does this whole "rewrite all our office products to use <canvas> instead of the DOM" project over the next couple quarters... he says, laughing, shaking his head at the wasted effort... Caesar wept for there were no more worlds to conquer


Newer versions of Excel have native JavaScript support https://docs.microsoft.com/en-us/office/dev/add-ins/referenc...


People do multi-million dollar deals with Excel spreadsheets. I know personally of one deal involving multiple multi-national petrochemical companies, pipelines, and refineries in the US. The whole deal was done on a spreadsheet, the rest of the legal paperwork was just formalities.


Spreadsheet is the greatest SW ever invented, that's why it's so dangerous.


With power comes responsibility, the latter can be lacking in some users who then like to blame their tools.


He mentions several times that accountants do their work in Excel, but any account would consider it barbaric to actually try to do transaction-level double-entry bookkeeping in Excel instead of a purpose-built system like QuickBooks/Xero/NetSuite/etc. I think I've also probably seen more broken financial models than logically sound ones in Excel, so the integrity of spreadsheets is oftentimes not even safe in the hands of finance professionals.

Data integrity is definitely a hard problem.


Yes, hijack my scrolling. I really don't like have a consistent feedback when I am scrolling pages.


Dunno why you're being downvoted but it's true. I noticed it pretty quickly when I tried going back to the previous page and instead it scrolled up.


They're being downvoted because it's against the rules to criticize the design of the site on which the content is hosted.


And the content doesn't load if you disable javascript!


This. I should take the risk of enabling javascript to read about the risks of spreadsheets? No thank you.


This is a great read - I love the Italian accountancy history lesson at the start.


I actually stopped reading there and went to the comments. I dont mean any disrespect, just trying to give a counterpoint: I like to have some more upfront info about an article's thesis before I invest in reading about an Italian history lesson hoping it will be relevant. I see this style a lot for example on substack, as if people are always trying to weave in a relevant historical anecdote or clever fact, which is fine, but it sets a much higher bar if the article is going to delay the build up to a point without first telling me what that point will be. It's easy to get bored along the way and bail.


Well tastes are different. I liked it because of the interspersed backdrop. Granted i am reading this on a chill evening not on the go. Would be different if i had expected a to the point article.


Yeah I definitely don't find it odd to like that style, I was just reacting because I found it interesting that the thing someone highlighted as being a strong point was exactly the thing that had put me off. Different tastes as you say.


Anything from Tim Harford is going to be good. His "More or Less" Radio 4 series is a must listen podcast.


This seems less about spreadsheets in general and much more about Microsoft not wanting Excel to eat into sales of Access and SQL server.

Even now the limit is ~1million rows, meaning that if you have even a very simple table but lots of rows then you have to pay more for Access.


Data caught in spreadsheets is kind of an instance of the general siloing of data.

The situation is something of a general problem of a data-driven Society. One group of people want information available with one interface and spends resources only in putting it in a format suitable - to the detriment of others who/want need it. With spreadsheets in particular, naturally you have a data-integrity but that's still an instance of the broader problem.


It's a general psychological problem, too. A huge part of storage organization psychology, whether in software or in a physical room, is subjective/proprietary perception and judgment. People who prefer and naturally warp any job role or position to use these tools, as all humans do with subjectively preferred processes and perspectives, will never stop creating them.

The good part of this not-invented-here mindset is that it's also how technology moves forward. "I need this new system to be mine/fit my way of organizing" is also often another way of creating a "this never existed before" product or outcome.

These storage and organization tools and silos are born as subjective (context-fitting) design processes. So they tend to get locked up behind depth-oriented (subjective) processes, and eventually frozen by stabilizer groups.

Stabilizer groups follow up to use the tools and promote the idea of keeping the data siloed the way it is. They don't like change at work because change breaks their preferred psychological processes, causes them to have to re-build their perceptual frameworks, and because they're humans, this makes them do stupid things at work, like lash out or become passive-aggressive or detonate their new diet plan or become late for a baseball game. They will blame all of this on "open data" or whatever it is that caused their stable workflow to destabilize.

There are lots of solutions to this, and Excel by itself can be seen as an attempt at a solution to the problem...also the problem can be moderated by the system of energy surrounding the data valuation and access to the data.

And "just don't silo stuff, guys" seems to be the proposal with the worst track record so far.


So, what's your alternative?

Spreadsheets have many advantages, they are easy to edit, copy around, and quickly filter and do simple calculations (like find how many students are in a class if size smaller than 10, for example). I can't think of a sensible alternative given current tools -- we should make better tools of course.


In my use cases, the siloing is an advantage: the project I'm working on is contained in a couple of files in a folder somewhere that can be archived and opened/rerun with confidence years from now. Most other systems don't allow this level of encapsulation / archiving ability.


I hate Excel, not because it's a bad program but because in my experience people misuse it more often than not.

Excel excels at creating spreadsheets. Spreadsheets are pretty reports used to report tabular data. Excel is capable of so much more, but for eveything beyond spreadsheets it's medicore at best and its complexity often makes it a liability.

It's a bad calculator. It's a bad database. It's a bad medium for transferring data. It's a bad front-end. It's a bad notetaking app. If your use case goes beyond making tabular data look pretty, Excel is probably bad at it.

People use Excel for everything because they're comfortable with it. But other solutions are often much better and I'm so tired of cleaning up all the issues Excel has caused (and continues to cause) me.


I think this is a really unfair comment. Excel is a very powerful tool. It's great for financial modelling and prototyping, for example. It can be great for testing out different scenarios and investigating likely outcomes.

It's also something that most people in most businesses have installed and as such, can be incredibly helpful for sharing data, models, etc. The collaborative editing in Office 365 adds a new dimension in that (finally) we don't have to email round documents and implement half-cocked versioning any more (though plenty of people still do), and multiple people can make changes at the same time.

If you want to slag off any of the applications in Microsoft's Office suite then the correct answers (and I don't think this is particularly controversial) are Outlook and Teams. If you want real productivity destroyers and daily pain, look no further.

Excel? It's great.

Yes, it can be misused, but so can Python, Rust, C++, JavaScript, and every other tool, language, and platform at our disposal. So can a circular saw for that matter. That doesn't make any of them bad tools.


I think you missed my point.

Yes, Excel is a very powerful tool and I would never argue otherwise. I definitely wouldn't argue against using it altogether. But I have come to loathe it because of how often it is misused.

Sure, any tool can be misused. But in my experience Excel gets misused more than anything else combined. Excel's accessibility and ubiquity is part of what makes it so powerful... and what compels people to use it where other tools are much better suited.

Don't get me wrong. It's okay to use a suboptimal tool... to a point. Users should be aware of a tool's limitations and seek out better solutions when it's clear those limitations will cause problems.

Excel is not a bad tool. It's so popular because it really is a good tool. It's just not the ultimate tool.


We run our company finances off Google Sheets, which is essentially Excel, but shared. It's an outstanding tool for financial modeling as well modeling in many other domains. We would not be able to function without it. The fact that others may misuse it does not change things for me.

p.s., Google Sheets is one of the more underrated cloud apps of all time.


I completely agree with your main point: excel is terrible because people use it for everything.

I've noticed there different ways people use excel: as a dashboard, to store data, to modify data.

IMO storing data is the worst use of excel, especially when it's done in a non 'tidy' fashion , with highlights and formatting to encode data.

This is from my experience as an R and python user in science.


Nah. Spreadsheets are poor man's MapReduce.

Most Excel tables I've seen aren't just data and presentation. It's computing the data as well as presenting it.

And how is the data computed?

* Applying formula to a row/column range to get another range

* Applying formula to a range to get a single-cell output out of it

Obviously, I am not talking about clustering and performance here, but a computational paradigm.


> Spreadsheets are poor man's MapReduce.

Yes. Emphasis on "poor man's". Excel is mediocre at many things, but it's highly accessible so people use it anyways. That can be good or bad depending on the situation.

Need to quickly crunch numbers for a small dataset? Excel is perfect. You can probably put something together quickly to get what you want.

Need to maintain a contact tracing database? Not so great. Excel scales poorly and the low implementation cost isn't worth the limitations and risks.


I strongly doubt there is any other reactive programming environment that is easier to teach to novices than a spreadsheet.

One wonders what would happen if you could extend lisp to do reactive programming, and then extend it a bit further to load an excel workbook into an abstract syntax tree.


There was the common lisp cells project. But I think it’s gone dormant.

https://common-lisp.net/project/cells/


Spreadsheets are a hammer so if that's all you know then all problems look like the nail. Really, many power users of Excel ought to be looking at more capable, testable and readable solutions from professional data analysts and scientists. Python would be a good start, but there are many options better than Excel for critical systems.


> Really, many power users of Excel ought to be looking at more capable, testable and readable solutions from professional data analysts and scientists. Python would be a good start.

I can write python, but I can get the answer out of excel in a fraction of the time for most use cases.

As an added bonus, excel documents are far more ubiquitous and I can share them with clients, and they can see my workings. Python isn’t very transparent and business users can’t usually check the logic.

Excel is also much more interactive and allows for much more “discovery” and playing with the data.

What I would say though, is most power users don’t know about functionality in excel like PowerQuery, relational data models and DAX that actually do turn it into a serious and repeatable tool.

(I’m not arguing that Python isn’t better for many use cases, and excel definitely isn’t right for most critical applications, but I also think Python is much slower for ad-hoc data analysis so both have different places in the market. Anyone think python is faster? I’m down for a race!).


> they can see my workings

How? Do they check that every cell in a column actually has the same formula? Do they check data format everywhere?

Python script is something that is possible to be actually reviewed, and results - reproduced, and "formula's" there are actually readable.

But for some stupid reason excel files are still shared over email.


It's only "stupid" if one suffers from a catastrophic lack of empathy for people who are not programmers and for the incentives to which they are exposed--which do not include using a software developer's preferred tools, nor do they include the carving out of time with which to learn them.

I would hope that we would be generally wiser than that here.


Like being a "programmer" is some genetic trait or something, or like using excel in any more or less productive manner doesn't require carving out hell a lot of time, it's just that time is taken from user's lives in small pieces, and doing PROGRAMMING seems like taking a university course.

I actually have a lot of empathy for people who forced to deal with all those problems, otherwise I just wouldn't care.


> Do they check that every cell in a column actually has the same formula?

They just have to check the top cell as most of my formulas are array formulas.

You don’t have to drag a formula down - that’s a common misconception in the latest versions. If you do =A1 + B1 and want to apply it to the 1000 cells below you just write = A1:A1000 + B2:B1000.

That’s still not that readable though, so I’ll apply those cells two named ranges “Sales” and “Taxes”.

Then the formula is = Sales + Taxes once and that will populate the whole column of data.

Then there’s M Code and PowerQuery which literally allows you to review the data cleaning line by line and even see the data state at any intermediary step. It also has

> Python script is something that is possible to be actually reviewed

The problem for me is that, as someone who works in consulting, it can’t be reviewed by my boss or a client, neither of whom can program. But they can review a tidy excel sheet.

And then they can’t edit it either, so if I go on holiday and I’ve built some sort of model nobody else can make progress until I come back, or if I move projects I’m also stuck maintaining the model on the old one.


That's terrible practice, ironically, as it's extremely unreadable. How could someone looking at it know what Sales and Tax actually are? You have to go into the formula name box and dig in to find sales = "yada yada" etc. That doesn't seem too bad until you have a decently sized file and you have to dig into 40 formulas to find the one you want, and go check that it's actually referencing what you want.

I work as a banker, and what you do is one of the very first things new employees are taught not to do in excel. It's an amazing solution for the person that built it, but a terrible one for anyone looking to check the work.


> That's terrible practice, ironically.

Can you point me to the 'best practice' guide you are referring to? What authority on excel standards said this?

Personally I tried to find articles saying it's not best practice by typing in "dont use named ranges best practice" or "named ranges in excel are bad" into google, but it mostly brings up articles stating that using named ranges is best practice and improves readability!

> How could someone looking at it know what Sales and Tax actually are?

If you really want to use that example, you would click in the formula bar and it will highlight the ranges, and colour code them automatically.

If it's on a different sheet, you just hit ctrl + g and type in the name, and it will take you directly to the cell it's linked to (which usually in my case, is linked to a sheet that contains all my model's assumptions in one place, each one with a named range describing what it is). It's much easier and quicker than going to =Assumptions!G52.

> I work as a banker, and what you do is one of the very first things new employees are taught not to do in excel.

Seems like a silly thing to teach people IMO. In my experience it makes formulas much more readable (both writer and reader), makes it much faster to build models, and cuts down errors substantially.

I personally find that formulas are much easier to review, because the named ranges provide some intent. If someone writes =(A2Assumptions!92)/Assumptions!91 I've got to really unpick it to work out if it's right, but if someone labels it =(A2Miles_Per_Hour)/Average_Miles_Per_Vehicle then I can see that the formula is wrong almost instantly.

Additionally if I want to write another formula using those values, I can just type it straight into the formula bar without having to go and click on the right cell reference in another sheet.


>Can you point me to the 'best practice' guide you are referring to? What authority on excel standards said this?

Well investment bankers tend to be pretty darn good at excel, and the banks I've been at would scorn you for doing it, as well as all the standardized training given out to fresh recruits to the industry. We had a buy-side deal recently fall through partially because the (fairly sophisticated) model the company selling itself used was absolutely unreadable and unaccountable. They did exactly what you said (naming), including with their assumptions. It was 20 sheets of unpenetrable mass, and we were all turned off by the fact that you couldn't follow it whatsoever, and was basically unaccountable.

>If you really want to use that example, you would click in the formula bar and it will highlight the ranges, and colour code them automatically. If it's on a different sheet, you just hit ctrl + g and type in the name, and it will take you directly to the cell it's linked to (which usually in my case, is linked to a sheet that contains all my model's assumptions in one place, each one with a named range describing what it is). It's much easier and quicker than going to =Assumptions!G52.

This how I can tell you've never seriously worked with excel, because that's MUCH slower than using the native/macabacus auditing tools. Adds up over thousands of times. And what if Sales isn't just a simple cell in another sheet, but is actually tied to a named formula itself, tied to a named formula itself, tied to another worksheet (with named formulas in them!). That's a deep rabbit hole and you to be going down with little transparency, where you're having to look up the formula manager to find whatever the fuck the named variables are actually referring to (what if someone follows your advice and names the tax rate as "Tax" and now ctrl-g doesn't work!)

>Seems like a silly thing to teach people IMO. In my experience it makes formulas much more readable (both writer and reader), makes it much faster to build models, and cuts down errors substantially.

It makes formulas much easier to read, but with zero accountability, and considering someone will very likely be looking at the excel at some point, with no idea what you did, they have to check each and every one out to make sure it's not bullshit. Plus, it doesn't really make modeling faster assuming you've built out your source numbers/assumptions well and use best practices (like A24+A25+A26 instead of A25+A26+A24).

>I personally find that formulas are much easier to review, because the named ranges provide some intent. If someone writes =(A2Assumptions!92)/Assumptions!91 I've got to really unpick it to work out if it's right, but if someone labels it =(A2Miles_Per_Hour)/Average_Miles_Per_Vehicle then I can see that the formula is wrong almost instantly.

Use tracing, and if the assumption tab is well built out it really isn't faster, at all. +alt w,n

>Additionally if I want to write another formula using those values, I can just type it straight into the formula bar without having to go and click on the right cell reference in another sheet.

Use multiple windows, makes life much easier.


> Well investment bankers tend to be pretty good at excel

> This is how I can tell you have never seriously worked in excel.

Ah, so the style guide and best practice is based on “Trust me - I rock and you suck”.

I can guarantee that I do use excel seriously, and my personal experience is the exact opposite. You wouldn’t see beautiful code with variable names like A2, and it’s exactly the same for me with excel.

Besides - a feature you can turn off in about 5 minutes stopped you from doing a deal?! Why not just explode the named ranges out? I suspect the issue here is an overly complex model rather than named ranges - I’ve seen lots of these without named ranges too and they are even less manageable in that state!


This debate about whether to use named-ranges in Excel formulas brings back some distant memories. I was on a team at Goldman Sachs that built Excel tools for the investment bankers and because of that we'd help out with the new banker Excel training held at the giant Chelsea Piers sports complex.

Every year freshly minted MBA's would begin the training and immediately become aghast when the trainers told them to never, ever, use named-ranges in formulas. Not only had the trainees been explicitly taught to use named-ranged in their MBA programs, but any idiot could plainly see that [=enterprise_value/ebitda] was better in every way than [=C13/F22]. More expressive, more readable, easier to spot errors, etc.

The trainers would argue that in an MBA program you build your models, submit them, and move on to the next assignment, so you don't get a view of the longer-term problems that arise from named-ranges. What does [=ebitda] actually tell you? Is that the last quarter's actual number, is it the current estimate of the next fiscal quarter, is it a 12-month blended forward estimate?

And as you try to solve those questions with [=ebitda_est_next_fiscal_qtr] you often end up with two more problems: an unambiguous name for you can unambiguously mean something else to someone else and as you update your model over time, if you forget to update your named-range references, you have formulas that look right, but are wrong, e.g., [=ebitda_est_next_fiscal_qtr] now refers to an actual reported ebitda number, not next quarter's estimate.

[=C47] while not telling you much at all, is unambiguous. It doesn't look "right" or "wrong". It can't mislead you. If you want to know what it is, go look at cell C47. The new MBA's would argue, "Wait a second, C47 could now point to the wrong data as well, and it could be mislabeled with a stale row or column header. And the trainers would reply, "Exactly, and when you go to see what C47 represents you will have the context to recognize those errors and fix them."


You put it in much better words than I could, thanks haha.

Also, very cool experience at Goldman!


heh, your exchange looked pretty much like a holywar between programmers. Except problems looked like if you're discussing merits of different old varieties of BASIC, with GOTO and line numbers. I really don't see how excel is more visual, or easier to grasp after adding just a several columns of data or formulas.

I wonder if there could be made some middleground language, visual enough to not being scary for spreadsheet people, and debuggable and readable enough to not become a mess when model becomes big. Jupyter comes to mind, but it looks like it didn't get any traction outside data science.


Many power users of excel do use more capable, testable and readable solutions. But they also use excel. Programming sucks in many ways that excel does not. Excel is incredibly flexible and adaptable.

If you need to treat one row differently as a special case that is easy in excel but can be a lot of work with programming, both for technical reasons (you need to handle the special case with special code) and for silly reasons (in excel you click the row which is special; in a program you must describe how to find the row.) Certainly there is an argument that one shouldn’t have such special cases as they make maintainability a nightmare but sometimes it matters a lot more to have something now rather than something later.

In excel, the data is the most important thing and in the real world it often is too. In a program, the data (especially the intermediate data) is quite invisible. Power users may spot errors because intermediate values look wrong, rather than by carefully reading the formulas. Though if the values look right or there are a lot of them, errors may not be spotted.

Excel is very ad-hoc which is the problem people complain about but also it’s power in dealing with the changing real world and anything exploratory. Once it is working and stable in excel, it is more wise to do it in python.


Excel is more than just a hammer it’s the fork that is in every single drawer of every single company/org in the world.

It’s not about every problem looking like a nail, it’s that, nobody had to pay for this fork, somehow everyone has it and it’s such a great fork that it can do 90% of the problems you throw at it.

Yea sure some problems would be better solved with specialized tools, (eg knife, spoon, etc) but those require training and budget. No one has those. Everyone has a fork.


Cutting steak with a fork is a pretty good metaphor for working with a spreadsheet heavy enterprise team.


> nobody had to pay for this fork

Actually, everyone had to pay for that fork. Excel is a paid product, and isn't even included in the cost of Windows. It is, however, bundled with MS Office, which includes Word, which is even more of a must have at almost every single company/org.


I know very few people who have paid for excel specifically. everyone pays for MS office, albeit because it includes excel, but in an workplace setting the main thing you're paying for is outlook/exchange the fact that word/excel/powerpoint are bundled is a plus.

The main point I'm making is that no one has to go to their manager and justify paying for it. It's already bought and paid for, and because of its ubiquity most people have a pretty good idea of how to use it.


Even better, because some of those deals include MSDN, when Excel users grow beyond VBA, they just get VB.NET installed and keep doing AddOns in .NET.


While technically true, the office coffee costs more usually. It is generally just there, and no one usually is thinking of the costs, because it’s too small to care about.


I disagree.

Spreadsheets are like a Swiss Army Knife.

I have used them for

- Templates for Printing - Data Analysis - Financial Tracking - Report Generation - Calculation sheet for engineering stuff

and a multitude more uses that I cannot even remember now.

They are not a hammer, though that is one form in which they can work.

Ultimately they are a tool, and a damn fine tool at that.

Their potential is limited only by the creativity of the user using them.


Spreadsheets are great. I just wish folks would use a more appropriate tool for what they really want: a very light-touch database. Something like Airtable is perfect for structuring a bit of data into rows and columns.

But, nothing beats the availability (both of sharing and editing) of an open Google Sheet.


And then there's the pricing.. "We already have Google". No way my company would pay for Airtable, Notion, etc

Its spreadsheets all the way.


Shameless plug: I'm building a Desktop app that serves the niche between Excel and Python.

It enables you to load CSVs and write SQL on those CSVs.

The main benefit is that it is very fast when compared to Excel or Python.

Excel cannot really handle 1GB CSV file. With Python, just loading 1GB file and iterating through every line probably already takes more than a minutes.

Superintendent pushes most logic to C and Sqlite, so it's super fast.

Here's the app: https://superintendent.app


I think of Excel as the modern back of the envelope. You can prototype some things, proof of concept etc.

But if you're running things in prod via excel you haven't actually thought enough about it to call it prod.


How many professional construction sites do not have a supply of hammers? I think your analogy actually fits pretty well seeing as how Excel is an invaluable tool that fits into a larger enterprise.


Spreadsheets are sometimes more readable than code. It’s harder for certain kinds of bugs to get through when you see the whole program, inputs, and outputs, and all intermediate states at once.


what would python be working on then? a .csv, textfile, database, something else? Or would the data be stored as part of the script itself?


It varies and how you're outgrowing it. I heavily use spreadsheets, but if I start to hit some limits or need to formalize a process I'll switch to something like Python. As a data store I may still query the spreadsheet, or use csv, sqlite, or something that scrapes the data source. I really like columnar data because I like mucking with it by piping data on the command line (which has its own, well known ways to bungle data).

As for reasons you're outgrowing a spreadsheet; access and permissions, friendlier error messages or a better "wizard-like" process, formalizing input data, outgrowing data size, smarter caching for computing values, protection or clarity around modifying "constant" or "magic" numbers. Many of those you could implement in a spreadsheet, but its often more complex than using something else. That something else probably varies with which problem you're trying to solve.


You would have the script take in the data as input from somewhere else, maybe your organizations shared drive. could be a .csv or whatever format. could be a database like sqlite. Much better than some big hulking macro heavy excel workbook.


Then how do I edit, extend and quickly filter and search that CSV? I'd probably do it with Excel to be honest.

Lots of data in business isn't generated by another program, it's built and maintained by users.


I think you raise a good point that often gets forgotten. When you switch from a general purpose tool to something specific, you lose visibility and introspection tools.

I see people move from a filesystem based workflow to a database and while that may be faster and more efficient you lose those tools to look at and tweak data. You either need to write those or teach them how to navigate the database. People rightfully hate black-boxes as soon as stuff stops working as expected.

Personally, I love the Linux command line and will pipe stuff in and out or stash things in flat files. I may move the data back into a spreadsheet to get more visibility or look for trends.


Python or R have tons of ways to edit, extend, and quickly filter and search within stuff like a CSV. You could even do all of that with a bash script. It will probably run a whole lot faster on your computer than opening the excel workbook.


I think we might be talking at cross purposes. How, in Python, do I extend a CSV with new numbers, or edit existing values? Use 'input'? I could implement an excel-like interface, but then I already have Excel.

I could just open it in a text editor I suppose, but then I have to deal with escaping, and moving around cells, and that sounds painful.


The Pandas library makes manipulating tabular data easy. And it's better than Excel when it comes to cleaning messy data.


Plus 100 for pandas. It is a truly amazing library. If I need to do some exploratory data analysis with it I'll often do so in a Jupyter notebook, but always make sure serious coding happens in a real editor like vscode.

But things really are hopeless.

Even Microsoft can't control the monster they unleashed on the world. Execs and managers using Excel instead of Planner for project tracking ("Spreadsheets are for manipulating numbers, not word processing!"). Used in that way, Excel isn't so fragile, but why is it being used at all? For f*cks sake, hasn't anyone ever heard of Word tables?

Honestly, anyone who uses Excel as a database should be exiled to... someplace where they can't do any more damage. Maybe connectivity-poor rural Arkansas. That's technological malpractice. It's indefensible, and an embarrassment to the whole tech sector that anyone is trying to excuse it.


You would read in the csv, edit what you need or add/remove data, then write a new file or overwrite the old one. You don't need to implement any interface, there are tools that do all of this. You could probably write the code up for something simple like this in the time it takes for excel to open the file and you to make the change manually by finding the field, clicking away with the mouse, and typing it into the cell. Specifics beyond that can be searched for on the internet as there are a wide variety of methods you could tailor to your purpose.


TSV, you have several tools to parse it. Even AWK.


This simple dilemma comes up again and again: the people who made this can a) use Excel, warts and all, b) contract out for a proper system, and wait at least a year. Who am I kidding... a decade.

There is simply no point complaining about Excel unless you have a viable alternative, and that alternative is not coming from traditional IT depts.


i think spreadsheets could represent an interesting future programming paradigm, but excel is not the tool that'll do it.


I feel like the next major advancement in spreadsheets is nested hierarchy.

Specifically, iTunes (especially the older interface from iTunes 5 to 10) allowed for Playlist Folders and Smart Playlists.

Workbooks can have several sheets, but not in a nested structure. Same with SQL databases and tables. If I want to organise data (e.g. Country > Social group > Friend name/email/phone number) then it's easy to do in iTunes but difficult with typical Contacts apps.

Web browser bookmarks are pretty good, but the end object has only 2 fields (name and URL). So what I do is I have a fake-radio track object in iTunes, with all the extra tags. Double-clicking it loads a page on localhost, which launches a PHP script that then opens a URL in my browser (for the friend list) or document.


Notebooks are the next-gen spreadsheets, except they use programming languages instead of formulas.


And they require so much additional setup and processing power that they will never supplant spreadsheets.


Jupyter is a joke compared to spreadsheets, it doesn't auto-recalculate when you edit cells.


Observable does. So does Julia’s Pluto.jl.

You’re not paying attention to the space but recalculating notebooks are gaining popularity.


Things are getting better. I was aware of them, but jupyter is the 600 pound gorilla in this area.

I wouldn't use observable, as you are tied to their web service, and I've had too many companies close to use such a service for important data. Pluto is promising, but (I believe?) Only supports Julia, which is growing but still not that popular, or known.


True, but you have access to whatever Python libraries are installed, and your output is web-based. So you can potentially output anything a browser can display, including interactive 3D graphics and widgets. You also have access to the command line in Jupyter, so you can run whatever shell commands you like.


Not only that, it doesn't even keep track of the calculation graph, so it doesn't know what needs calculating, or in which order to calculate.


yeah, and also it's 1-dimensional; you can only go up or down, never to the right


Because the data is inside a data structure like a table or dictionary that you can output, iterate and map over however you like with the various Python and library methods or functions.


But the spreadsheet abstraction really works like a charm, it gives the flexibility and the power to do a lot.

On a humorous side note ... competitive excel https://www.youtube.com/watch?v=xubbVvKbUfY


Somewhat related, this is a fantastic talk about the design of the first electronic spreadsheet VisiCalc by Dan Bricklin:

https://www.youtube.com/watch?v=YDvbDiJZpy0


“For example, if you spend 100 florins on wool, that is recorded as a credit of 100 florins in your cash account and a debit of 100 florins worth of wool in your assets account.”

Didn’t he say this backward? I feel like I’m about to lose my mind


nope, in accounting terminology "credit" is taking money away and "debit" is adding it.


Just as an aside, eradicating small pox really doesn't get enough press. The day the last case of small pox happened should be a global holiday celebration. It is probably the greatest achievement of humanity in the last few hundred years, and is a gift to all future humans with unbounded upside. Sure we could have done it earlier, better, with more equality etc, but we still did it and we did it everywhere. I try to remember this on the days when the world seems to be going to hell.


Content of this article aside, man, I hate it when sites choose to hijack the scroll event to roll their own (janky) version of smooth scrolling.


Why don't excel and all the things that can read excel kill the old xls (2^16 limit) format? I mean still let excel read the file, but don't allow changing the file. Only allow saving the old file to the new format.


What's your opinion on VBA? I have successfully used it to build quite complex applications (position management in trading) and found it fast, reliable and easy to develop in a fraction of the time it took me to build an Electron application.. I have a really hard time going back to modern JS frameworks after having been exposed to Excel..


I used it for the same back in 2003, so maybe my views are outdated. I think it only works because when you're on the trading desk things need to be done yesterday.

For scripting a thing or two it's not terrible, but back then it wasn't so easy to split out the code for version control. Some formula for pricing this or that is gonna work in whatever language anyway and doesn't change a whole lot though.

It was when I started to pull FIX connections into it that I thought maybe it's time to do things properly. Then you're suddenly in need of understanding how the execution model works.

Also as soon as you need to follow a schema Excel is too soft. It's trivial for someone to just add an extra bit of info on a row that throws everything.


If Microsoft would update the core VB language to be a bit more modern, I bet a lot more professional developers would descend upon it. As it stands now, it’s in limbo because you can’t use it in the web version of excel.


They introduced Lambda, M Code and PowerQuery instead.

I guess F# like is modern enough.


Shouldn't Excel warn you if your data is going to be truncated? Or did they use Excel in a way that suppressed such warnings?


Looks like it was automated tool that mistakenly generated an XLS file instead of XLSX.


It’s still amazing to me that we use spreadsheets to handle big systems like the contact tracing system. I know the reasons – it’s probably the most effective, low-price, available at any time, but it is just so amazing that the same exact program I use to do simple calculation can also handle nation-scale contact tracing.


Disclaimer: I am building a spreadsheet alternative for javascript hackers based on Observable https://www.jigdev.com

For those interested it’s all in the disclaimer ;-)


> Here at Jig, we believe that BI should be done by developers.

> Jig apps are written in Javascript.

Validate the market ASAP.


I work in a global firm that would be the poster child for using spreadsheets, especially spreadsheets that are used for calculations or drawing conclusions.

The firm was made a huge move to Alteryx as a future replacement and is an immediate improvement over Excel. Which in my opinion is fantastic and a much much easier to review platform than Excel. For those that may not have used Alteryx, I'd highly recommend taking a look. It's essentially a GUI based Excel, which is the best way I can describe it, but still offers more functionality than that of Excel with Python integration etc etc


Other, less expensive, alternatives are also available such as Easy Data Transform and Knime.


Absolutely, Alteryx isn't cheap by a long mile.

It's the only one I've had experience with.


spreadsheets is where productivity goes to die.

The second you make a spreadsheet with a lot of data on it, it’s a commitment.


Blocking save to xls via group policy would be a good start to prevent a recurrence


I love Tim Harford's podcast Cautionary Tales, I warmly recommend that.


The next big thing will integrate excel with Jupyter notebooks.


xlwings gets you part of the way there. You can automate Excel from a Jupyter notebook.


O, the tyranny of web page animations...


> Foege would show up in a village in eastern Nigeria, all six foot seven of him, and the elders would put out the word, Come and see the tallest man in the world! And people did. Foege reckons he once vaccinated 11,600 people in a single day. It wasn’t enough to quash periodic outbreaks.

If only today's governments had the creativity to spark interest in vaccines.

In Romania, people distrust the vaccine, and I believe we're going to have a tough COVID wave again.


what if i told you spreadsheets are matrices


You'd probably come off as missing the point


Spreadsheets are DAGs.


DA*Gs

*mostly, as TFA indicates


So... We should be using spreadsheets for machine learning and 3D graphics?


The article is a nice read, but very inaccurate.

Most companies do not keep their books in spreadsheets. Perhaps there are some poor souls who use Excel for bookkeeping, but this is generally impossible for any company that is audited (so basically everything bigger than some micro company). Auditors will demand to store all bookkeeping data in some ERP system - that generally allows double entry bookkeeping and many other things "from the box".

There are hundreds of better or worse bookkeeping/ERP solutions around word - some are expensive (and can cost millions to implement and still suck), some are cheap and some are even free (OpenERP) -> they are the tools used to input and store bookkeeping data, not Excel. Those solutions often come with own fair share of problems, especially in implementation. There are various business and programming horror stories.

So the article talks first about double-entry, which is most basic form of bookkeeping, then proceeds to call it accounting (which is like a "bigger" term - because it involves much more things, e.g. reporting, calculation of accruals, tax..).

Then somehow we move from double-entry (not done to Excel), to the UK mishap, where some low level public office clerk used Excel to combine tables that show number of coronavirus cases. What does it have to do with bookkeeping or accounting? I dont know. I guess for some, keeping something on a list = "bookkeeping". What is just plain wrong terminology. If some programmer makes some table in a relational datbase - you probably dont call him a bookkeeper, or accountant. But using the logic of this article - they should be, what is obviously wrong.

In the UK case, Excel was used as a database - and something went wrong when combing data from multiple sources, because the people responsible did not check at all what they were doing. There are multiple questions here: did they use the correct tool? Why didnt they check via some report if all data was uploaded? Why did they use XLS format, which is from 2003, so 18 years old? Also "not checking what are you doing" happens not only to public servants, for example in programming there is the very well known case of MongoDB, that used math.random to randomly log just 10% of issues.

The whole "16k missing cases issue" was reported multiple times by multiple sources and the narrative changed few times: IMHO the most plausible explanation is that UK government wanted to cook the books, so they made a bad way of consolidating data (that reduced number of cases) and when someone detected that it does not add up, they blamed it "on the computer". Public will eat it up, since everyone made some mistake. What is confusing for me that programmers also buy this. If your program does something wrong, probably there is the issue with the program. If some Excel VBA macro did something wrong when consolidating data - this is somehow problem with Excel, not the macro, or the ones who used it and didnt check it if even works.

In fact the first time they reported the problem, they said that they stored each data in a separate column. Excel 2007+ had a limit of 16384 columns, so one could argue that Excel is better than PostgreSQL, because PosgreSQL has a limit of only 16 000 columns ;)

So if some bad programmer also used a real database wrong, same issue could happen. And please dont act there arent many bad programmers in the wild - when a new guy comes usually you hear that the previous code is spaghetti garbage, what is often quite true. There are lots of programming horror stories. There are also lots of accounting horror stories. And lots of Excel horror stories. But the missing 16k cases has nothing related to double-entry, or accounting. You dont store cases on some income statement or balance sheet account...

If someone confused a motorcycle with a car, they would probably receive a lot of shit. Here we have same case: collecting few tables to make 1 big list has nothing to do with accounting. Or double entry. As I said, it is nice read, but completely wrong terminology.


"Most companies do not keep their books in spreadsheets." Iceberg issue (only n% visible). Most companies do not have the self awareness to know that their ERP systems and server class database apps are shadowed by an enormous volume of spreadsheets created by their own employees. Audits and Auditors will never see them because they are not part of the official work flow.

I have seen a case where entry happened in the spreadsheet and "locally processed/validated" in some way that satisfied the clerical person. Then the now "correct" mass of data was cut and pasted from Excel into the ERP system which used a development control that worked like a crippled Excel sheet.

They did this to avoid the cryptic and most unhelpful errors from the ERP system, if the data was not ERP rules "perfect".


How about a book on "The Tyranny of SPAM" or perhaps "How I Market My Books Online by Rewriting Redundant News Stories Already Better Written by Others" ?

See

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...


This is more shameful:

Scientists rename human genes to stop Microsoft Excel from misreading them as dates

https://www.theverge.com/2020/8/6/21355674/human-genes-renam...

Seriously. If I was a data scientist I wouldn't touch that turn even from 300,000 kms away.

This mediocrity kill lives, FFS.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: