Hacker News new | past | comments | ask | show | jobs | submit login
Excel as Code (github.com/gavinmendelgleason)
166 points by LukeEF on Sept 20, 2021 | hide | past | favorite | 144 comments



> People refuse to stop using Excel because it empowers them and they simply don't want to be disempowered.

That is not always true in my experience. Many people use Excel because it's one of the two programming tools allowed by the IT department, the other being a web browser. Even if you manage to install Python or something (good luck getting the package management working from behind your corporate proxy), your collegue will not have it, so it's useless. And distributing executables is usually not tolerated either. So you use excel, and share Excel files.

I'll add that another big problem I have with Excel is usually the lack of database support. Moving data around by copy/pasting it in Excel with macros is a pain, and IT didn't allow Microsoft Access either so I can't comment on that. But I think it would have made my life easier.


I somewhat disagree. I work with a lot of excel power users. We have some massive spreadsheets which are collaboratively worked on and do complicated things. The first thing to say about them is that they are very valuable to the business so it is important to be able to do some of the things excel does.

Excel has a lot of advantages compared to regular programming:

- It is quick to change. The programs I work on take nearly an hour to go from code review completion to production, even with manual poking to speed up continuous deployment. It can be valuable to be able to change things quickly.

- In excel the main thing you interact with is the data. If you are a domain expert then you should be able to look at outputs and see if they seem right. When you change a formula or add a column, you are, in some sense, also getting to run it on realistic data instead of needing to try to construct realistic tests.

- There isn’t much difference between config parameters and hard coded values. In the programming language I use, you can’t really have globally readable configs so any new parameter must be threaded through from app startup to the place you want to use it, discouraging configuration parameters. Which means it is often slow to change something that ought to have been configurable. In excel you can make a quick cell for some Config parameter (changing a lot of formulas is not so fun though.)

- Functional and declarative, Excel tends to give you internally consistent output. There is less need to worry about incorrect state updates.

- Its maybe better for producing graphs. I never really liked making graphs in excel and I thought the defaults were bad for good data visualisation but then other systems have bad defaults (when I draw a graph I often use GNU Calc with gnuplot…)

- Pivot tables are great for ad-hoc analysis (indeed Excel is pretty good for as-hoc analysis in general.) The pivoting operation is trivial in excel and a big pain to with tools like grep or awk or sed.

These Excel users are generally capable of programming too and may use jupyter notebooks with python or R, or something more fully featured when required. And some things will get outsourced to software engineers, but excel is still clearly useful (so long as it scales) and people don’t just use it because they are desperate for some kind of ‘real’ programming language.


I agree, what I shared was mostly my experience with Excel. We did mostly lots of data transformation, and for that it was far from ideal. But for your use case, it seems great. My father works as a CFO and uses Excel in the way you describe it, and it's really hard to find a better tool for the task. There is another limitation that I forgot to mention: if you are not using Windows, you don't have acces to Pwoer Query and things like that, so for huge datasets, Excel becomes very slow.


To be honest half of these points sound like company process issues (you can make changes quickly with code, just not with this particular review process/CI), and half of them sound like programming-language issues (lots of languages are functional and/or let you make a quick global config parameter)


It’s never so simple. I use a functional programming language but generally in functional programming you don’t want to have global configs but rather to pass functions the things they need. Compilation is mostly fast but there is a lot of code (and we get roughly exponentially faster at building it. The problem is that we also have an exponentially increasing codebase.) Faster deployments could be possible (eg deploying dev builds rather than release builds made in the standard way) but it’s better to have the system work automatically and sensibly in the ordinary case than to try to support quick changes in the extraordinary case. It’s not really a good practice to support deploying random binaries of unknown origin, and certainly not great to do it routinely.) To be clear, excel can have scaling issues too (as well as issues with multiple people trying to change a single sheet) and we ended up recreating some excel-like features in some of our own applications.


If you substitute "spreadsheet" for "Excel" than i agree. But i have tried everything possible to avoid Excel as every iteration just brings new problems instead of fixes. I used Clarisworks years after it was EOL and now am trying very hard to convert to Libreworks. Admittedly i can't completely escape Excel yet, but i am hopeful and it's getting to the point where the bugs in Libreworks are no worse the bugs in Excel. If not for all the Legacy Excel sheets i have, i'd be off it completely.


As a counterpoint, I'm supporting a lot of corporate environments and I'd love if just one user anywhere expressed a desire to use Python. I've encouraged people look at it several times, along with R Studio or anything else that might help.

But politically it won't happen. So instead, I get handed badly coded Excel macros and told "there's a server problem", or a document that just slides in the 2GB limit by a few KB and told "it's a bit slow".


> badly coded Excel macros

I think you mean badly recorded Excel macros.

A lot of problems, and their solutions have been rethought in the form of excel. Rows, columns and sheets, copy/paste pivot. As much as I detest excel, it does have its place. At a 2gb file size however, that place doesnt really exist anymore.


> I'll add that another big problem I have with Excel is usually the lack of database support.

SharePoint site as the backend. Spare PC in a cupboard somewhere chugging away as a substitute for cron, or if you want to updating a spreadsheet middle layer that the dumb terminal sheets connect to via a shared drive... with a script running simulating moving the mouse so the mandatory screensaver timeout doesn't kick in. Been there, done that.


If they look around enough, there is also VBScript, JScript, Powershell, and basic C# and VB.NET compilers as well.

Naturally it depends how locked down the domain policies are.


I offer my master's thesis "Compiling and optimizing spreadsheets for FPGA and multicore execution"

https://dspace.mit.edu/handle/1721.1/45983


I'm really enjoying reading this. Ahead of its time.

I really like your RISC CPU in a spreadsheet in Table 1-4. Have you seen others do this since you wrote it?


Thanks. It is still crazy to me that this was 15 years ago since I still have nightmares about completing it.

I used the approach to design a PDP-11 Floating Point Unit commercially, but haven't really seen more Spreadheet-to-FPGA work or RISC emulation -- it is clearly my thing to do.

Microsoft just added LAMBDA to Excel but didn't copy my approach of capturing a table as the formula--Like most people if I start with a big data table to analyze, I start by spreading my formulas across a row with short intermediate results, and then test it on a few rows before applying it to the whole table. My lambda would let you select your input cells and define your output cells and capture the dataflow graph between them, with any external references as globals and then spare you the results in intermediate cells.

I spent some time trying to sell this to Wall Street folks in 2007-08. I remember presenting on this at a supercmputing conference on Wall Street exactly today 13 years ago, and the Lehman guy on the panel didn't show up because they shut down that day.


Your step-by-step approach would be vastly more user friendly than the current LAMBDA implementation (where the user needs to put the entire nested function in the Name Manager).

Simon Peyton Jones advocated for a similar step-by-step approach: https://www.microsoft.com/en-us/research/uploads/prod/2018/1....

At least, step-by-step can be facilitated with a programming language (Visual Basic, C++, Python, JavaScript). ...That could be a sweet Excel add-in.


Thanks for sharing, this is really interesting


I've been tasked with migrating an excel model to a "real language" (usually by breaking it apart and re-implementing it via a combination of ETL and data warehouse jobs). I've never found a great way to run excel in a headless way, so in addition to not having version control for it, it's hard to "deploy" it when it grows beyond a single person's machine. I wish there was more of a gradient between Excel and "real systems".


A few years back, I was tasked with a similar thing. A government ministry was creating a complex calculator for a (very anticipated) public project that was supposed to go on their website. We started out using pure JS but the mathematician working on the project kept giving us new Excel documents with extremely heavy changes to the algorithms.

In the end I gave him a location where he could upload the document and told him to just make sure inputs and outputs were always in the same predefined cells. Then we used Java and Apache POI to load the Excel document and run the actual calculations on the website. Best decision ever.


The Microsoft Graph APIs in Microsoft 365/Office 365 give you pretty much all of the Excel execution engine as REST endpoints "in the Cloud" if you just store your Excel files in SharePoint.

It's not surprising the number of turducken business applications being built exactly this way. With Named Cells you don't even have to hard-code cell numbers, just tell them to name them specific things, and Excel users are very happy with the amount of flexibility to rewrite the spreadsheets at will.

It's not necessarily the sanest approach to building software, but no one ever accused most enterprise software development of being sane.


As a rapid prototyping tool, it doesn’t sound terrible, honestly. Many people are comfortable with Excel, so let them use it! You’re gonna use some calculation engine on the backend, might as well be the tool that contains the ”reference” calculations.


> turducken business applications

Great analogy


>In the end I gave him a location where he could upload the document and told him to just make sure inputs and outputs were always in the same predefined cells. Then we used Java and Apache POI to load the Excel document and run the actual calculations on the website. Best decision ever.

This is the kind of simple and effective solution that programmers who think they know everything would scoff at.

Love it.


I have used the google sheets API to implement something similar when working with a nonprofit. They needed a fairly complex listing on their website that needed search/sort/filter/mapping and needed to update this list regularly. So I just took their existing google sheets document, and accessed it as a read-only database in the browser using Google's REST api and it was fairly painless! If they ever broke anything I could easily go into the spreadsheet and fix it. This approach really reduced the effort needed. If I had to write a "proper" interface for them to enter and update their data I wouldn't have had time to work on their project.


No.

I flinched hard at this, because this only works until it doesn't.

I've done the exact same thing: give a user a location to upload an Excel set up just the way we'd want to parse it.

Good luck dealing with the absolute morass of formatting troubles that Excel throws at you because:

1) The user didn't format a date input correctly and now Excel treats it as a simple string instead of its internal Date representation

2) Excel mysteriously treats a random entry in a numeric or date column (PEBKAC? Who knows! The user denies all wrong-doing!) as a string and now it's got a leading apostrophe

3) The user used someone else's computer which has different Regional Settings, and suddenly:

3a) Commas are decimal points in numbers instead of periods

3b) Date formatting is messed up

3c) Months and days have their names in a different language

...

And these are just the issues that I can remember without having to dredge through painful memories.


Most of these are easy to work around our aren't unique to Excel: 1 & 2) isn't unique to Excel, incorrect inputs will give incorrect results in code too 3, 4 & 5) worked around by getting the internal raw numeric value then formatting it as required.


Sure, you know better than the person that actually implemented it and tells us it worked out fine. Because you've done something similar before and failed.

The exact kind of know-it-all attitude I was talking about. Thanks for the demonstration :)


To be honest I also had bad experiences with processes like this - very simple data entry mistakes which were basically invisible to the user made the entire thing fail.


It's simple and effective until a part of the rube goldberg machine breaks...


I did something similar for a few clients, but mainly for automating documents. People were copying and pasting between Excel and Word, so I made them systems that link the two together. I had enough clients asking for something similar that I made a SaaS product that does it. Gives them a nice little interface that links web form fields to named ranges, and then a simple templating language to insert those fields into a Word document. Instead of writing a calculation engine for our webforms, I just used Excel. It's pretty powerful, and more than I could have implemented if starting from scratch.


Personally, I've found that Jupyter notebooks occupy that niche pretty well.

When authoring, you have something that shows intermediate results just like Excel, making troubleshooting without dedicated debugging still pretty doable. And then you can still run them headless, and you can check them into version control, and diffs are readable enough.


There are a few. I use xlwings for https://github.com/asemic-horizon/stanton , which is some bits of code to specify expert-led sensitivity analysis from Excel and use the results to emulate the spreadsheet from a ML model.


That's github based for collaboration I think. The one mentioned in the post VersionXL [1] is based on the cloud version of TerminusDB (co-founder here), which is an open source revision control database. It uses delta encoding for updates, but is a proper DB optimized for the task. You get transaction processing and updates to an immutable database with version control features: branch, merge, rollback, searchable diffs, and time-travel. It also ships with a mature python client to allow you to manipulate the Excel data.

[1] https://versionxl.com/ [2] https://github.com/terminusdb/terminusdb


There are systems that can do this e.g. https://github.com/stephenrauch/pycel as described here https://web.archive.org/web/20210308015732/https://dirkgoris...

If you're deploying large-scale models, https://timelydataflow.github.io/differential-dataflow/ may be of interest (this is used, for instance, by https://materialize.com/)


So I started on the same work with the idea of extracting all relevant information from all cells (value ranges, values, formulas), exporting to TXT and then building a conversion layer. I didn’t get there but what was informative was the sparsity of information in a big spreadsheet. Most cells in the few models I had for testing have only a few that actually do meaningful stuff. Most are fluff. One of the yet untouched harder parts is the conversion of Excel-specific formulas. The Excels I had used mainly basic math and lookups (thousands of them), not many of the built-in Excel formulas. (Any tips on my approach are very welcome, VBA for Excel isn’t hard but quite hard to Google.)


Not sure what you are trying to achieve but the majority of Excel formulas I see used most should be relatively easy to reimplement. Some like SUMPRODUCT, AGGREGATE and the specialised finance ones might be harder to reimplement. The trickier bit might be getting the evaluation order right.


In the past I wrote a simple formula evaluator in Python I used to replicate some multicell calculation - the spreadsheet I had took the form of mostly simple algebra being performed in a scanning pattern against various small windows in time (rows) from a set of columns. I just extracted the cell formula definitions and transformed them.

It may not be that hard to replicate the set of formulas you need to get 90%+ of your excel model.

If someone implements a 90% reimplementation of Excel in Python that would be a really useful library for stuff like this. You could do some neat stuff with dependency identification too.


There is https://pyspread.gitlab.io/, not sur if it fits your use case?


> I've never found a great way to run excel in a headless way

Do not ask me for source, but I remember opting to do just that and making it work after being unable to replicate Excel's results in an app that was supposed to replicate it. This was using Delphi, and the solution was to load the Excel spreadsheet as a COM object, programmatically write input data and collect the results. Loading the COM object was as slow as starting Excel but no Excel window ever showed up. This was circa Windows NT 4.0, so possibly 1998? And I would bet this would still work.


"And I would bet this would still work."

Yes, it would. You can operate a spreadsheet in that manner using any .NET[1] language. You could do it with any COM aware platform before .NET existed. Today you could use Python if you want to remain among the cool kids while doing it[2]. Judging by the popularity of that repo there are a bunch of people doing exactly that.

This is a late 90's era problem. I have a hard time imagining any programmer having difficulty with this. Then or now. Literally anything that could run a VB macro could programmatically manipulate an Excel spreadsheet.

Such approaches are fragile. No doubt about it. They rarely survive a major version upgrade of any component without some fussing. On the other hand the same basic APIs that emerged in the 90s work today with little conceptual change, so the value of the investment in this knowledge has never been wasted.

[1] https://docs.microsoft.com/en-us/dotnet/api/microsoft.office... [2] https://github.com/mhammond/pywin32


Yeah, you either build a pipeline that generates/updates excels that get emailed or self-service downloaded, or you teach them how to use powerquery to get the data from the enterprise db.


Depending on budget, it might be less expensive to look at a tool like https://app.molnify.com/#ajax/examples (or its 5 competitors from a google search.)

It feels like a subset of this should be an open source app (that is, turn an excel spreadsheet into a C# app) for anyone looking for an idea.


Isn't that the only reason access exists? Import from excel and build forms on top.


Have you tried AirTable & their API?


I haven't tried it, it does look really interesting, although most of the time the problem is that the finance/ops/etc. team already had something really complicated in Excel and the question is "what should stay in excel, and what should be reimplemented in some other system".


What about Powerapps?


I get the spirit of the document, but disagree with the goal. I'm biased, I've kind of made my career writing web applications for people reliant on Excel. While I've come to respect it's power – I had a colleague in architecture school design buildings using excel and I've seen some ridiculous formulas based on crazy pivot tables and conditionals.

I've seen more spreadsheets than I would care to admit, and what drives me crazy about each and everyone is that it is not readily apparent where the work is being done. I think you could say the same about a "programming language" except that the programming language is usually not also the product. When the interface is the code and the output, the lack of consistent implementation is something I find frustrating.

It's a nice thought experiment, but in my mind I think the world would be a better place without excel.


"It's a nice thought experiment, but in my mind I think the world would be a better place without excel. "

I agree with most everything you said, however, proliferation of programming and automation is a net win in my books, no matter the medium, and good spreadsheet software does this incredibly well. It makes programming in its very basic form accessible to a wide amount of users with a relative gradual and easy to grasp learning curve. Sure you can always improve on it, but I think the world would most definitely not be better off without it.

I do agree that the work is hidden, they can be a nightmare to audit, and I think it would scare a lot of people on this board the amount of business critical functions that are completed by excel and other spreadsheets. However, I like to think this a short term problem, and to the authors point, the industry and the sw needs and will improve, and we should all be trying to eventually close the gap.


Honestly, I think nearly the exact reverse -- the world would be a better place with more Excels and fewer "languages as we think of them."

Separation of "developer" and "user" is artificial and more should be done to recognize that.


In my professional life, I've come to understand the difference between a developer and a user is the degree to which they can think about a problem riggeriously. Most users work at a very high level, a developer helps them break problems down and think about things concretely.


> Separation of "developer" and "user" is artificial and more should be done to recognize that.

I've seen some company ROI models built in excel that I think would change your mind about this.


I bet they wouldn't be much different from some C++/Perl, and VB 6 / COM stuff I have seen as well.


> When the interface is the code and the output, the lack of consistent implementation is something I find frustrating

This is the reason why spreadsheets are popular in the first place, though. I won't ever defend them - I'm on a project right now that's been working on Excel for years, I know the pain! - but this is something that's worth thinking about.

See also Jupyter Notebooks, yet another invention from the deep pits of hell. The popularity of the interactive paradigm is undeniable. Would the world be better if everyone started using something sane instead? Definitely so. But the world would also be better if every day was Christmas and that's not going to happen either.

So while I share most of your concerns, I'm mostly sympathetic with the OP.


I don’t know why you would say that.

Excel is democratizing tool for programming. It’s a true WYSIWYG for databases, calculations, plotting, and more. And it’s just a regular app that every PC has.

Everyone needs a table. Hey did you know your table can do math automatically? It actually can fetch live forex data too. And infinitely more.


>"It's a nice thought experiment, but in my mind I think the world would be a better place without excel."

I stopped using excel after this whole subscription madness started and switched to native Softmaker Office. I keep countless small spreadsheets for various money related tasks and absolutely not prepared to spend any time / effort on doing it "the right way". My brain cells are much better off working on software design (the stuff that actually makes me money).


Your points are definitely valid criticisms.

And the world (for CS-type folks) would certainly be better. For everyone else, I think it would be a whole lot worse. I don't think its a stretch to say excel had enabled billions (trillions?) of dollars to be realized without the need for CS staff, where CS/software purchases would have otherwise been required. I wonder if this is partly why its a reoccurring meme on HN, which has a heavy CS following?


Perhaps the world would be a better place with an excel replacement of some sort.


Excel is kind of WYSIWYG programming. I use it for quick stuff frequently and I’m amazed at what it makes easier than e.g. numpy. There’s a whole class of error you don’t make because you see the whole intermediate state all together (there are also whole classes of error you do make that you wouldn’t make in normal programming).

I have been using it for character sheets in tabletop RPGs I’m playing lately, and it’s great. With a line of js, you can add an arbitrary button to google sheets, and then it turns into a quick, dirty UI that’s transparent (click on the cell and see that AC=10 plus dexterity modifier) and on-the-fly editable by everyone together.


>What needs to change is the idea that they are not programmers, so they can join us in using modern software practices.

Most of them don't want to use modern software practices, the want their formulas and their macros no matter the security risks. They don't remove unnecessary code because they don't want to read and learn what others had done before in the spreadsheet. Excel is easy and successful because you don't need to follow any software practice in the first place and that's also the reason why it's a pain in the ass for all that have to maintain them and keep them secure.


You keep saying what they don't want to, but remember that most users of Excel are not programmers in the traditional sense. I'd wager that most of them aren't even aware of the things you say they don't want.

It's not that you don't need to follow modern software practices. It's that they don't know about them to follow them or not. Further, Excel is almost pure thought stuff. The distance between the user's idea and their implementation is almost as small as we can get without investing in a lot of educational outreach. And then they end up going crazy with macros because they don't know there are other or better ways to do it.

Also, "modern" Excel (not sure which version, maybe 2007 or 2010?) has largely obviated the need for macros with the addition of tables and functions for interacting with them. It turns Excel into a kind of relational database permitting something close to functional relational programming as described in "Out of the Tar Pit" [0].

[0] http://curtclifton.net/papers/MoseleyMarks06a.pdf


Just to add on to this - remember the first time you saw syntax highlighting? And before that, the code was all in one color? You didn't know you needed it before, but you didn't go back, did you?


Yes, yes I did. As fast as humanly possible.

Syntax highlighting seems to help most people but I find it a horrible impediment to reading code.


I think you're in a tiny minority but I'd love to hear more about how it impedes you.


It's very easy to get wrong. Often, the syntax categorization itself is wrong because of parser issues (it's hard to get right in C IDEs. Java should be fine here).

Then, it's a question of what should be emphasized. Many syntax highlighters emphasize syntactic structure too much, making it hard for the programmer to concentrate on the content. I use Visual Studio for example, which is relatively unobtrusive (pastel colors) compared to e.g. the typical VIM color schemes. Visual Studio will color macro names, function names, variable names all in different colors - roughly purple, bronze, black. I just noticed it even colors function argument variables different from global variables/functions and member names. I don't think that's helpful for reading code. It's a distraction.


I certainly often find it distracting - it's notable that e.g. elvis' default highlighting, which mostly just slightly bolds {} and () and similar, doesn't bother me at all, though I get most of the same advantage out of 'showmatch' in vi.

(see my sibling comment for the fully verbose version)


So, most highlighting stuff frankly comes across to me as more like https://kassandry.net/misc/english_syntax.jpg than anything else - highlighting trivially inferrable structure at the expense of overall comprehension (note this entire comment is written under the assumption whatever the highlighting is doing it does perfectly, "bugs exist sometimes" is of course both trivially true and not the point).

Gentler colour schemes and things like e.g. rust highlighters that use a colour per lifetime are interesting and certainly I'm less likely to go out of my way to disable those.

The big problem I have though is that they are, by nature, highlighting at a specific granularity, and I'm generally not reading that way. My conceptualisation of code as I'm reding it jumps between the symbol, expression, statement, block, function, class levels all the time and it's very easy for the discontinuity when my granularity just changed and the colourisation didn't to snap me completely out of flow. Which is clearly an oddity of my brain, and unfixable absent focus-follows-mind, so oh well.

An interesting thing I've noticed is that having the code visually be a single artifact rather than heavily distinguished individual syntax artifacts means that I can often skim a file of code, have a "wait, that looked odd" moment, and find a bug that was completely unrelated to anything I was currently doing but had been hiding in there for quite some time - so, I mean, yes I'm weird, but there advantages to having one of this sort of weird around for review and debugging.

I agree I'm very much in a minority though I've encountered rather more (albeit still relatively very few) people who've found that synhi is helpful when -learning- a language, then when they're familiar they prefer turning it off at least some of the time to take advantage of the gestalt effects I was talking about.

This leads me to suspect that there might be people who'd enjoy this approach some of the time, later on, but never find out because it does absolutely take a while to get used to so "always use synhi" is a local maximum they don't escape - though even if my suspicion is correct, I'm making no claim there'd be that many people in that category either, just 'some'.

Certainly I've suggested to a few people over the years to try going without for a few weeks if they're curious, and of the ones who've tried a non-zero percentage have ended up either 'sometimes synhi' and at least one joining me in 'basically never synhi' - but they were all already experienced devs and weird enough to be willing to try it in the first place, so there's obviously going to be selection effects there.

It would be absolutely fascinating to run studies on a bunch of this - especially the "which bugs are easier to spot in which situation" part - but for the moment all I can really say is "yes, I'm definitely an outlier, but anecdotally sometimes usefully so".

(excuse the wall of text but hopefully it's enough to give you some idea)


You should ask that question in the Go and Plan 9 forums.


I have to maintain and support lots of those Excel "solutions" and the users actively block learning any software practices.

We are talking about people that still use spaces to right align a date in word or create the table of contents by hand. The least that they want to be is programmers.


Most Excel users are unaware of modern software practices, and wouldn't know where to begin. The software developers don't know how to teach us. Perhaps someone should create a series of practical 15 minute lessons, each of which can be applied to an existing spreadsheet, to make it better.

I have literally improved my Python code in 15 minute chunks, but it's a lot easier to find good coding advice for Python, plus the editors provide things like built-in linters and style checkers that help quite a lot. And Python wasn't my first rodeo, so I already understood the value of good coding practice and some of the theory behind it.


A lot of comments are criticizing Excel users as if we are resistant to learning more about other programming languages. Resistant as in hard-headed or lazy.

One thing to remember is that the vast majority of Excel users aren't fully in IT or tech. We have to deal with data but the roles aren't primarily data roles.

- Customer Service Reps

- Admin Assistants

- Warehouse Managers

- Non-profit Fundraisers

- Sales Reps

- Realtors

- Inventory Managers

- Insurance Agents

I've taught at non-profit conferences and saw how people were torn. The fundraiser who uses Excel every day has to decide: do I spend 4 hours in an Excel session or 4 hours in a session on fundraising trends?

===

So many roles require some kind of data use, and Excel is immediately accessible, even if all it is is typing numbers into a cell, hand-coloring certain values and getting a sum.

Here's the question: WHEN is a person best served to put in the time and effort required to learn Python, JavaScript or another formal programming language? WHEN should a Warehouse Manager be sent to a Python class? What would that situation look like?

Personally, I hate true programming--and I've done a lot of it. But true programming is a whole different mindset. I like the visual aspect of Excel. But when I open a code editor and there's this wall of letters, numbers, indents, curly-brackets ... WOAAAHHHHHHH! No. HELL NO!

Even with WordPress and the templates that are supposedly drag-&-drop, I still found myself writing CSS and HTML.

===

One other thing. Don't forget looking the opposite way. Too many coders don't know what Excel can do. I watched a presentation on 6 hours of JavaScript that someone wrote to accomplish a task. That same task would have taken less than 5 minutes in Excel.


I think a lot more automation/computing should be done in these more approachable “citizen programming” tools.

“Job done”, “I did it myself”, and “I understand how it works” are three qualities that are often undervalued when “real programmers” look at the work of “citizen programmers”. I say this as someone who loves and makes a living at “real programming”.

We need more not less sub-real programming.


WOW! Excellent perspective. And I've never heard the term "citizen programmer" before.

You're right. "I did it myself" and "I understand how it works" are definitely undervalued. And that plays into a lot of the empowerment/disempowerment conversation.

I had a client who would have me build prototypes in Excel, then he'd hand them over to his in-house development team. I asked him why he uses me in the middle. He explained that he can guide me and kinda understand what I'm doing, and we can test and tweak formulas really easily. He can stop me and ask questions if I start doing something that seems wrong.

Then he said, "but, when my devs open that code editor, I don't know what the hell I'm looking at."

That was a different kind of disempowerment that he felt vis-a-vis his own devs.


I first read “citizen programmer” here on HN and it immediately resonated. I’m happy to pass it along, but I can’t properly attribute from whom I first read it.


One story:

https://news.ycombinator.com/item?id=8612828 - The Salesforce Platform: The Return of the Citizen Programmer - by leephillips, 7 years ago

Three comments:

https://news.ycombinator.com/item?id=27651486 - by patentatt, 3 months ago, on: Why did we ever think a student's first programming language didn't matter?

https://news.ycombinator.com/item?id=17384284 - by DonHopkins, 3 years ago, on: Ted Nelson struggles with uncomprehending radio interviewer (1979) [audio]

https://news.ycombinator.com/item?id=16228498 - by dragonsky67, 4 years ago, on: Ted Nelson on What Modern Programmers Can Learn from the Past [video]


Thanks for the research! None of those ring a bell, but I can’t be certain (and can’t be certain it was directly from here).


A killer app would be a spreadsheet format that worked as well as source code as storage format for a spreadsheet application. Something that was designed for manual editing in two different ways, in text editors and in a "cell editor". That would support all version control use cases that developers are familiar with and that have been best practice for decades.

Perhaps all that is needed is to port OpenOffice to the sc format (and extend it in the spirit it works now)


At my last job at OurWorldInData we made something like this. One of the head researchers would build sophisticated spreadsheets containing all the transformations and views our users could do, and rather than re-implement that logic in Typescript, we saved it as TSV and built a spreadsheet editor for the researchers to use. From a code perspective it was just a tree to traverse.

Demo: https://www.youtube.com/watch?v=0l2QWH-iV3k

Changes in the spreadsheet UI then work really well with git. For example: https://github.com/owid/owid-content/commit/37ef12d65655fa14...


I did not really understand that demo. What was different compared to a regular spreadsheet?


It is a strongly typed grammar backed spreadsheet that maps to a tree. The grammar enables the autocomplete, error checking, secondary notations, and so forth.

Not sure if that explains it.

From a different perspective: other programmers can work with the programs generated by it without knowing that it’s a spreadsheet.


This might be able to be achieved with a new serialization format for xls files. Something line-based, with canonicalized sorting of cells.


Excel IS code. It's a dataflow language combined with a visual/spatial language. It is hard to migrate or transfer to other languages because other language don't have these features/architecture.

The other side of this coin is that spreadsheets have NOT BEEN IMPROVED significantly since VisiCalc. Excel has some window dressing and intentional obfuscation by moving UI elements around to make it seem improved but it really isn't at all.


>spreadsheets have NOT BEEN IMPROVED significantly since VisiCalc

Not true. Excel added dynamic-array formulas a few years ago (where a single formula automatically spills into applicable cells below the edited cell) — game changer. And LAMBDA functions are currently in the Excel beta version (create your own (recursive) functions directly in Excel) — another game changer.


The improvements you mentioned are honestly esoteric.

I'm more curious on why we can't have built-in version control, or have unlimited rows, or have a linter in the formula bar in excel by now.


You forgot about the F# flavoured query language as well.


> spreadsheets have NOT BEEN IMPROVED significantly since VisiCalc.

False -- they have indeed been improved. Unfortunately the Improv-ement didn't stick: https://en.wikipedia.org/wiki/Lotus_Improv


Only for a lack of imagination would you fail to perfectly model your target problem domain in terms of tables & columns... You would have a fucking monster of a time trying to describe to me a practical problem that I could not hypothetically wrangle & demonstrate with Excel. Just think about it. You can model a ray tracer in Excel if you have the patience for it.

The magic of Excel is that it runs everywhere and is very intuitive to work with. I honestly can't recall any users who were simply unable to function in a basic read-only way with Excel. Iterating complex problem domains in excel workbooks is a low-friction way to collaborate with your business stakeholders.

Once you get it nice in Excel, the next steps are compelling. Using an obvious 1:1 mapping between Excel worksheets and SQL tables, you trivially move all data items into a realm to be easily queried using a declarative, domain-specific language. You can also sprinkle in views and user-defined functions for maximum happiness on the business-side of the house.

The richer and better-normalized the relational model, the better your SQL interface will be. If you ignore the performance equation for just a few seconds, you might see the blinding luminosity of cleanliness that emerges from normal forms beyond the 3rd one. We are going to investigate a variation on 6NF for the next major version of our product.

I will conclude my rant by saying that there is no logical determination/interpolation/projection of facts which is unachievable in an ideal SQL representation. It is very easy to teach SQL to non-wizards by way of the mighty example. Excel is the most important starting point on this journey, because it defines the common language and relations that you and the business will use to refer to all of the things.


It seems like most graph traversal and high-dimension problems would be difficult to model in Excel’s 2D data structures.

(Don’t get me wrong, I think more of the world runs on Excel than most people think and it’s perfectly well-suited for it, but “it can do anything” does not seem practically true to me.)


SUM(Sheet1:Sheet3!A1:B3) is a valid function. Excel is a 3D data structure.


Graph structures in general are painful in Excel. You can generate adjacency lists or matrices and do stuff that way, but it's painful and non-intuitive. I'm currently helping some people who started modeling such a datastructure in Excel. You end up with massive sparse matrices that aren't intuitive to work with, visualize, traverse, and so forth.


Actual tensor math would be fun to see. Definitely possible but it'd be ugly.


As in something like MMULT, or deeper functionality?


I was thinking about how wild it would be to build up full support for the operations in tensor algebra for arbitrary orders. Doing that in Excel seems like it'd be wild. And then build up (numerical) tensor calculus operations on top of that if you really wanted to go nuts. That would probably be the worst hell I could imagine.


> Unfortunately, none of this applies to Excel because Excel doesn't work well with revision control. Why? Because Excel is not a source file. It is a database coupled with code. [...] The path to enlightement is a more sophisticated revision control systems - ones that can understand Excel.

This is where the author lost me. The "path to enlightenment" is not to build new VCS software. The solution is simply to stop coupling your database with your code. Embrace the Unix philosophy and stop perpetuating monolithic software.

Excel is a spreadsheet editor. It was never designed to be a database. It can act as a quick-and-dirty database with minimal setup and training required. Sometimes that's all you need and Excel is a fine tool for those situations. But it has limitations.

Stop trying to force Excel as the solution to all your problems and don't be afraid to learn a new tool once in a while.


I used to work with someone who refused to learn another programming language besides VBA in Excel. He slowed everyone down and it got to the point where he had implemented a JSON parser and generator in Excel 97. Badly. It's one of the worst experiences of my professional life. I dislike VBA because it convinces those who learn it that it is a programming language and that Excel is a programming environment just like Python or another popular programming language with their standard libraries. That's just not the case, but its very hard to convince business people who have spent their whole professional life using MS Office that there are better choices for building their business apps than MS Office and VBA. Just let Excel and VBA die.


I’ve been using VBA for a long time. It’s not great but allows for some things that wouldn’t otherwise be possible. I like to think I use it as a last resort. That said, I’ve been following the development around using JS and others within excel and I just don’t get it. Or, when I do, it seems ass backwards to me (like a hosted app on onedrive). I don’t want that. And I don’t really see what good using JS is if it’s just a wrapper for the quirky VBA I already am familiar with.


I worked in a bank where we still used paper because the lead supervisor told us we had to. I'm not talking like paper that was needed so we just kept it in the file, I'm talking "Print out the entire loan profile in paper simply because the supervisor refused to learn how to do use a computer" paper. We're talking thousands of pages a day for ONE LOAN! All because this woman was lapsed by technology and HR had no clue she was so out of touch.


I know exactly what you mean. MS Office, even on the web, is the equivalent of that kind of mentality.


Years ago I wrote some VBA that exports all the VBA in an Excel file. I ran this script manually from time to time so I could add my code to version control. Excel should make it easier to separate the code from the data. For the former you probably want the entire commit history, for the latter you usually only want the current state.


When I was doing a lot of VBA work like this, I used an OSS tool that would export all of the code and check it into SVN. I can't remember the name of it for the life of me, though, but it's probably hiding on a drive somewhere.

edit: This was probably it https://www.codeproject.com/Articles/18029/SourceTools-xla


I work in a lab where there are lots of excel sheets floating around. I went one step further - when I save an xltm (an excel template), the code is exported and then a bash script automatically uploads it into my git repo. The VBA asks for a commit message and then all the rest is automatic. It's worked pretty well, all things considered.


I once wrote an MSBuild script that decompiled a workbook into XML and VBA code. It used a CLI tool I wrote that opened the workbook using VSTO to access the VBA objects.

I never finished it, though. I only spent enough time on it to realize that it was going to be too big of a project to make it worth it.

So now I just rename the workbook to a zip file, extract it, and check that into git. Only drawback is that the VBA macros are in an OLE container. But I stay away from VBA, so it's not that big of a deal for me.


Excel makes programming easy because all of the intermediate values are visible left to right and the loop iterations are visible top to bottom. This makes it easy to iterate towards a solution by visual inspection, but also creates spreadsheets as buggy as you’d expect if you only tested by visual inspection.


That’s true for only a subset of spreadsheets. There’s no requirement for formulas to work left to right and top to bottom.

Also, the moment you write = (A1 + A2)/2, not all intermediate values are visible anymore (although Excel has support for temporarily making them visible (https://support.microsoft.com/en-us/office/evaluate-a-nested...))

Also, in my experience, it’s fairly normal to have hidden rows or columns (https://support.microsoft.com/en-us/office/hide-or-show-rows...) or hide entire sheets (https://support.microsoft.com/en-us/office/hide-or-unhide-wo...)

And of course, the ultimate “not all intermediate values are visible” is the use of macro functions or iterations (https://support.microsoft.com/en-us/office/change-formula-re...)


Sounds like you've never seen advanced Accounting spreadsheets because Excel definitely does not have left-to-right/top-to-bottom or even intermediate values restrictions. There are some amazing Gordian knots people have programmed in Excel.

You haven't really seen the horrors of programming in Excel until you've needed to use the "Formula Auditing" group of the Formulas tab in the Excel ribbon. Admittedly "Trace Dependents" and "Trace Precedents" are still rather more visual tools than their source code equivalents, but they are their own sort of fun.


I share your pain and have seen people doing everything from numerical integration to curve fitting in Excel, all of it terribly. I worry what new special versions of recursive hell the new lambda functions will unleash upon us.


Lots of SaaS services, like Google sheets, go the quick and dirty route: one central database and the UI displays a view which you all work on together. That's not collaboration imho - and no dev shop would accept that as a reasonable way to work (lets work on the code in a google doc!).


It's sad that after nearly 50 years, the way we write programs has not changed. We still use keyboards and write code one line at a time. Sure, there are auto-complete extensions and helpers, but the basic idea is still the same: write your instructions for the computer to perform them.

When it comes to making programming approachable for the masses, it's actually kinda funny to think that Excel (and spreadsheets in general) have been way ahead of traditional programming software.

I hoped that new tech (AR/VR/etc) would help shift the focus from "typing" programs to "drawing" programs. But efforts to visualize programming only remain at the conceptual level and never gained traction.

It's hard to imagine 100 years from now we will still be typing code.


Typing speed is not my bottleneck for generating code, it's renaming variables and rewriting it 5 times until it's no longer a mess.

A Vulcan mind meld would be nice but lacks precision.


Musicians have been happy with their simple keyboards for hundreds of years. Why wouldn't software developers be using theirs in a hundred years?


Creating complex things using drawing tools is physically laborious, and suffers from readability problems when things get too complicated to fit on one screen. I've seen this with mechanical and electrical CAD.


As long as human still use a list of characters to represent things. Typing textual code is inevitable, even if you have a fancy editor to edit its semantic, there will always text here and there in whatever kind of language.


At https://www.xltrail.com, we wrote an open-source Git extension that allows you to diff the VBA part of your Excel workbooks. The extension also integrates with SourceTree, Atlassian's free Git client. You can see some screenshots on my blog post: https://dev.to/fzumstein/how-to-diff-excel-vba-code-in-sourc...


So this is an ad for some new merge tool I suppose.

Is there a solid open source tool for merging Excel files? Or CSVs or SQLite files for that matter?

I think this is probably best seen as a shortcoming of our current general VCS. At the moment we're stuck with newlines as the main means of merge semantics. That really restricts what we can put in VCS. Even with custom merge tools, its quite cumbersome as git does not allow this to be preconfigured.


> Git was not built for this - ...

But it does have a sort of plugin system to support other formats, right? Does an Excel format lend itself to being supported in this way?


You can still use 'straight Git', maybe with some PR management system like GitHub/GitLab/... . The difference is you can't rely on the diff to be useful, instead you'd need to provide a good commit message summarising the changes (which you should do anyway!) and then reviewers will need to check out the relevant version to poke it directly in Excel.

But I agree that being able to manage an XLS(X) as plain text and having a proper diff would be incredibly useful. :)


I highly doubt you'd ever get diff for XLS in a general or universal case. That format is so old and crusty that it's only really defined by what Excel will do with it.

XLSX, on the other hand, at least has to follow XML conventions and basic ZIP file structure, even if the open specification for the XML is really now a strict subset of what the current version of Excel will accept.


Git supports extension-specific overrides which enables things like textual comparison of Office files. https://tech.marksblogg.com/git-track-changes-in-media-offic...


I explored storing file types like XLSX as the deconstruction of their zip file into individual XML/etc files. In my cases my focus was DOCX rather than XLSX, and I originally targeted a different VCS than git so I built it as precommit/postmerge hooks rather than git's diff hooks/attributes plugins. I got some interesting results with my tool and it wasn't a bad experience. Just not one I could suggest to novice users (fixing XML in a merge conflict is not entirely fun and very different from say Word's own review tools designed for higher level merge fixing).


> Because Excel is not a source file.

Well, it is a zip-archive with XML files, so it's close.


I recommended exploring this approach here <https://news.ycombinator.com/item?id=27998733>:

> Hot tip for handling office file formats or anything that uses a ZIP container: just unzip them and commit _that_ to the repo.

Even modern (zipped XML-based) office file formats do make some limited use of binary blobs. You can either keep these intact, or write a small objdump-like tool that serializes them to text†. For portability, it might be best to write the serializer/deserializer in JS dumped into a thin HTML wrapper, so you pretty much anyone can double click to "run" it. (My experiments on roundtrippability with including that file in the ZIP container yielded poor results.)

† I've used this strategy for Oberon .rsc binaries. Due to Wirth's affinity for single-pass compilers, the Oberon toolchain doesn't involve a discrete assembler or AOT linker tool, so there is no assembly format or linker scripts. However, Wirth's distribution of the Oberon system does have an ORTool utility <https://people.inf.ethz.ch/wirth/ProjectOberon/Sources/ORToo...> (in the vein of objdump/readelf/nm) that will dump a textual description of the binary you give it. I realized that with some slight tweaks, you can use the output of ORTool.DecObj as a de facto "assembly" format—just write a tool capable of parsing it and then write out the corresponding binary.


>> Hot tip for handling office file formats or anything that uses a ZIP container: just unzip them and commit _that_ to the repo.

What is the point if that? I think neither binary nor XML output would be meaningful in the diff output.


Notably: The VBA stuff is stored as a binary OLE2 blob thing inside of the xlsm file.

(Or at least it is in the few spreadsheets I checked, no clue if there's some way to change that behavior)


The VBA blob is documented: https://interoperability.blob.core.windows.net/files/MS-OVBA...

111 pages, and it looks non-trivial to implement something to tear it apart. But, I give MS some credit for documenting it.


I built a tool to explore version control of files like that by decompressing their contents and version controlling those. It was an interesting experiment.


Surprised nobody has mentioned this. There is a company called Boardwalktech with a tool called "Excel Cloud" which adds a native extension into Excel which includes a change log and (i think) realtime collaboration, among other things.

They call their underlying tool a "digital ledger" which sounds very blockchain-y, but it's not a distributed public ledger so there's no crypto here, just a centralized, Boardwalktech controlled ledger.

https://www.boardwalktech.com/products/boardwalk-excel-cloud

They're already integrated with some very big companies like Accenture, Ernst and Young, Coca-Cola, Mars, Facebook, etc etc.

Personally, I can't imagine company leaders really investing tens to hundreds of thousands of dollars leaving their processes in Excel and not instead buying a real system, but I'm not running all of the companies mentioned above.


How is this different than the Office 365 version of Excel? It produces change logs/version management and real time collaboration.


There is dolthub.com which is Git for data. But there is only an SQL interface. No way to source control the style of the data in Excel.

Last time I looked into Dolt there were no commit hooks either. That would let you add linting or other data validation.


I think this is spot on. I agree that Excel users want to stick with Excel, but they do run into major issues that are solved by code. Namely: their data size is too large, Excel is too slow, and they struggle to get repeatability from their work.

I am building Mito[1], a spreadsheet interface for Python. Every edit you make in the spreadsheet generates the equivalent Python. It is a bridge between the workflows of Excel users and Python users, and allows Excel users to reap Python's benefits without needing to know how to code.

[1] https://docs.trymito.io/


Excel as code is a main spreading vector for bad practices like copy & paste, monolithic procedural monsters, bad databases with duplicate entries and so forth.

The reason why management cant perceive code-quality, is because there main tool, does not allow for good code-quality. In fact it does not even allow for abstractions..

If you ever wondered, why management does not blink and recoil one description of coding horrors..


I truly believe that Excel is the most abused software of all time. It has been mangled, malformed, smashed, and manipulated to do stuff that I don't believe the creators ever intended. Adding a scripting capability to it has unlocked the spirit of challenge in all SME's of finance related fields to make Excel the sole software they will use for all problems.


This depends on the evidence you want to highlight.

There are a lot of truly amazing things people use Excel for. And they work. There's no denying that.


It mind-boggles me that microsoft are not investing in VBA more, its userbase is massive. Sure its old and has its problems but I'm sure continuing to develop it alongside more modern solutions would help them rather than hinder their efforts. Make it more similar to other things out there and eventually people will change over.


Microsoft borked it with the migration to .NET. Instead of making VB.NET 100% compatible with VBA they created a unnecessary C# clone with a VB skin. That decision ended VB as a viable product and any migration path for VBA in Office.

If they had made VB.NET fully compatible, then we'd all just have the CLR in Office and we could be using any number of languages to write Office integrated software.


That's interesting.


I have recently discovered macros on google sheets. With an option of scheduling them in the cloud and simplicity of writing them, even with my limited coding skills , it allows me to put together pretty complex dashboards.


Yep Excel is great, actually working on a Minix like Kernel in it.


Excel is a language alright, a hideous and perverse one!


No-one is going to pay to version control Excel.


Actually, people do. But it's not terribly fine-grained. SharePoint offers version control of MS Office documents and is used in many businesses as an improvement over shared drives and files named: Foo_Report_v3_FINAL_20210928_FINAL_DRAFT_FINAL.xlsx

I don't think you get branching with SharePoint, though.


I'm about to pitch this to my manager. We have automated manufacturing going through Excel. It allows the domain experts to tweak the manufacturing process without having to learn to code.

The price is going to make this a difficult sell. If it was one off at $1000 easy but monthly per user...


They do, it is called Sharepoint.


Nice perspective. Thanks.


"virtually nobody treats Excel seriously like a programming language."

Because Excel was not Turing complete until recently.


VBA was not recent. Also, you'd be amazed by Turing Complete things like what someone determined can do with just VLOOKUP(). That's even before you get into truly abstract models of things proven Turing Complete such as Rule 110 of Cellular Automata and how easy/hard you can implement them in Excel without VBA Macros or "advanced functions".


Can't you just implement a turing machine in Excel by using the cells in a row as your tape?

* Store the initial internal state in A1.

* Store the initial head position B1.

* Store the initial state as boolean values in the rest of row 1.

* Write simple lookup formulas in row 2 to compute the next state from the previous row.

* Fill down. Look for the halting state in column A and your output will be written in that row.

What am I missing?


Yeah you can, I made one here a long time ago: https://www.felienne.com/archives/2974 :)




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: