Is Microsoft Excel an Adequate Statistics Package?

avs733 · on Aug 28, 2016

A couple notes from a heavy Excel and heavy R (and heavy several other similar products) user...

factories run on excel...terrifyingly so. I know semiconductor semiconductor fabs that basically run on queries in a many sheeted workbooks. The reason is because not everyone codes, especially not factory foreman, technicians, etc. They don't need a continuously running app (well they do but they don't know it) they just need formatted data.

That links to comment two. The issue that Excel solves over code goes back (IMHO) to two issues that come up with coders/noncoders. First, most users of excel are interested in the answer rather than the process. I get the value of being able to audit your steps visually, but the transition from Excel to R is a transition from the primary visual being data (i.e., results) to process. Depending on your perspective that can be highly meaningful.

Second, and probably more importantly, Excel lets you visualize your data as it progresses. For those with lower (or just different/less abstract) spatial and visual reasoning ability, seeing the progress of data from column to column can have a fairly profound effect. This extends to students who are trying to learn by seeing the progression as steps of a processing algorithm are applied. Doing so in code abstracts that process heavily. For some, especially those not used to treating data in the abstract/blind via code.

SmokyBorbon · on Aug 28, 2016

> factories run on excel...terrifyingly so

Chemical plants too. But that does not mean there is no code.

An Excel document can be programmed with VBA to make connections to network resources, read/write files, send emails, etc.

I might not be the most effective tool for every task but it's everywhere and allows workers to automate work without making a request to hire a developer or buy new software. Someone can gradually automate business tasks on their own initiative rather than go through multiple layers of corporate bureaucracy.

avs733 · on Aug 28, 2016

absolutely agreed. I mean terrifying more in the sense of the lack of controls that I have typically seen in place and the way 'pull request equivalents' are handled and modifications are made.

kprybol · on Aug 27, 2016

The other reason to not use Excel for stats? It's virtually impossible to reproduce your work unless you documented every step in some other format. Excel is easily one of the worst tools you can use if you ever need to refer back to/redo your original work at any point in the future or if you need to be able to validate your result with any confidence. Yea I know there are some people who are freaking magicians with VBA who can "fix" many of the problems listed, but that doesn't absolve excel of its statistical sins.

kprybol · on Aug 27, 2016

Oh, almost forgot! (can't edit so I'll reply to myself) - there's no way for you to set a fixed seed using the built in random number generator. REALLY hope you don't need to reproduce your results because it's essentially impossible. Again, it's technically possible to implement your own Lehmer random number generator within Excel and rely on that and again VBA is always and option, but let's be realistic. The people who understand the need for setting fixed seeds, can implement their own random number generator, and/or are comfortable actually programming aren't going to using Excel anyway (willfully at least)

MlCROSOFT · on Aug 27, 2016

I think that's where R stats package (R Project) really shines, I love the command line console that displays every input / action so you have a track record of what you did. Used it for T-tests, linear regression, ANOVA, MANOVA. For those working in biological / environmental science fields, I would definitely recommend it.

okket · on Aug 27, 2016

There is a free R course on Codeschool if anyone wants to try it for an hour or two. It's fun.

https://www.codeschool.com/courses/try-r

Nicholas_C · on Aug 28, 2016

If you're ripping through the command line you can run into the same problem.

If someone is manipulating data in excel and erasing the original data they are not a good analyst. Further, it is very easy to add documentation of steps taken to files.

hexane360 · on Aug 28, 2016

I've spent a lot of time using Excel in ways it was never meant for.

If anyone is trapped using Excel to make a box plot, it is possible (disclamer please for the love of god never do this): Make a stacked column chart. The first series should be equal to Q1 minus the graph minimum. Format this to be invisible. Series two should be median - Q1, formatted as a black outline on a white bar. Series three should be Q3 - median, formatted like series two. Error bars can be added to give the whiskers. Here's an example: https://i.ytimg.com/vi/ucWmfmXb1kk/maxresdefault.jpg

Again, don't do this.

For a histogram, just make a min and max for each bin, and use countifs(range, ">=" & min, range, "<" & max) to get the amount in each bin.

bereasonable · on Aug 28, 2016

Excel 2016 supports box and whisker plots.

twunde · on Aug 27, 2016

I disagree. Where I work, we have one person that does all the hedging, price calculations and other financial modeling and he only uses Excel. The only practical downsides to his use of Excel it's that he doesn't have direct database access so we generally need to create the initial reports to give him the data and secondly Excel can only hold a little over 2M rows.

dandermotj · on Aug 27, 2016

So just to give you an opposing perspective of what you just said: "we have __one__ person that does all...". So apart from the fact he can't access a database appropriately, he's the _only_ person who does this work and none of it is documented or reviewable because that's the nature of Excel. There's some serious operational risk in that. This guy disappears tomorrow and what do you do?

twunde · on Aug 27, 2016

It's a misconception that work in Excel can't be documented or reviewed. We have the seed data he uses and the resulting Excel files (with the formulas he used still in them). This is the equivalent to version control for programming. The results are also reviewed against 1-2 appropriate sources (invoices, historical data, multiple other reports built by IT) for accuracy. To your very legitimate point that he's the only one doing this work, if he left, long-term we would probably need 3-4 people to take over his work. Short-term, the company would probably need to put together a triage team of 5-6 people to take over his existing processes. We actually moved some of the hedging to a different team last year and we had to hire two business analysts and implement a industry-specific system to do so.

gjm11 · on Aug 28, 2016

> if he left, long-term we would probably need 3-4 people to take over his work.

I hope you're paying him really well.

fnord123 · on Aug 28, 2016

Or they could drop him to sort out the operational risk aspects.

euyyn · on Aug 30, 2016

I don't understand your version control equivalence. If someone introduces a non-obvious bug in his formulas, how can you discover it, and restore the file to its previous good state?

rabboRubble · on Aug 27, 2016

The data, information layouts, and APIs provided by information brokers are well tested with Excel and straight forward. Excel really shines when used in financial environment.

The problems with Excel appear when the incoming data is "dirty". Outside the finance industry there less financial incentive to ensure data is reliably interpreted in Excel.

fnord123 · on Aug 27, 2016

>Where I work, we have one person that does all the hedging, price calculations and other financial modeling and he only uses Excel.

We used to buy in signals from a company who did their work in Excel. I wrote some scripts to export the data and recalculate it in Python. Almost every month I found errors in their reports and had to ask them to fix it.

So I recommend you fight HARD to get someone to reproduce his work in a language that is visible and reproducible.

nightski · on Aug 27, 2016

Your conclusion is that it was excel causing these errors and you are implying similar errors would not be made in Python. I think it is more because of your experience as a developer why you were able to spot and correct errors.

dandermotj · on Aug 27, 2016

It is absolutely the interface to excel that causes errors. I can take your excel spreadsheet, format cells, add things, change a reference and generally fuck it up and hand it back to you and you would never know. Whereas, if I change a text file then you can see what's changed. Similar errors are much less common in a programming language.d

nightski · on Aug 29, 2016

You can write tests for an excel spreadsheet if that is your thing. In fact Excel can be driven and automated by any .NET language. It is quite extensible. But most people don't do that because they are not developers and the word "unit test" is not part of their routine.

Which is why they will not be switching to python or R any time soon and even if they do, it will still have similar issues as the Excel version.

Just because you know how to use git history, diffs, etc.. to spot differences in code doesn't mean that is going to help the layperson.

twunde · on Aug 27, 2016

When you're working with people who are good with Excel, they will notice. If you're only ok with excel and just use vlookups, then you're probably going to have problems, especially since those people don't tend to keep a versioned history of their files. Really the main difference is that version control has become an ingrained habit in software development whereas with Excel it's rare

xapata · on Aug 28, 2016

Ah, a True Scotsman is good at Excel. I understand now.

fnord123 · on Aug 28, 2016

Sure, you can make errors in Python. But you can also write tests to validate the data in Python. And there is a culture of doing so in Python.

>I think it is more because of your experience as a developer why you were able to spot and correct errors.

I don't feel like I spotted errors. I wrote a script to validate the data and the script told me if there were errors (there were).

nightski · on Aug 29, 2016

The act of writing a script to validate the data says otherwise.

Also the people who use Excel as a primary tool are not the type that write unit tests in Python generally. Or would even think to do something like that. That is my point. You, as a developer, would think of something like that. It's not that you couldn't write similar tests with excel (you could, in any .NET language). But that you thought of doing so.

fnord123 · on Aug 30, 2016

Ok then I think we're agreeing but using different bits of the same point. In Excel there is no culture of good data validation. My experience working as a developer in Python gave me that culture. Agreed.

In GGGP's case they almost certainly have no tests so I stand by the recommendation that they fight hard to get a validation system in place. Probably by moving to Python and/or a RDBMS.

blahi · on Aug 27, 2016

It can hold many more if you use Power Pivot.

twunde · on Aug 27, 2016

Good to know.

bereasonable · on Aug 28, 2016

Excel can handle millions of rows if he brings the data into PowerPivot.

Note: I'm a PM on Excel

newman314 · on Aug 28, 2016

Random point but Excel sure could use a CONTAINS function for substring matches.

I was looking for something like this yesterday and the solutions I found were quite ugly.

Ended up having to use DSUM/DCOUNT instead which is still inelegant when one has to to multiple lookups with slightly varying parameters.

bereasonable · on Aug 28, 2016

We're planning on adding support for regex. The current state of the work is shown on our User Voice site. https://excel.uservoice.com/forums/304921-excel-for-windows-...

throw_away_777 · on Aug 27, 2016

How do you verify that his calculations are correct?

shostack · on Aug 28, 2016

Why do you think this can't be verified? You can see the formulas in cells, or you can see the macro code. Further, you could auto import older data inputs and outputs via data connections in Excel and set up some automated checks to flag if something is off.

Is there a use case where you think this wouldn't be possible?

throw_away_777 · on Aug 28, 2016

You can verify it sure, but it is much harder to verify excel equations than code. There is no github for Excel (besides saving with different names). You generally don't see the equations when you look at Excel - this is one of the reasons Excel is so error prone.

shostack · on Aug 28, 2016

Fully agree. My point was that it can indeed be verified though in many circumstances.

That said, I think there's always a balance between a quick down and dirty business solution that gets the job done, vs. something fully engineered. Additionally, it is much easier for business users to shoulder more of the workload while letting people with programming knowledge focus on other tasks.

twunde · on Aug 27, 2016

His calculations are usually compared to invoices (ie do the numbers match what we were charged for), other reports and historical data. Similarly when one of our DBAs or BAs creates an important report, we check the results against other data sources.

lordnacho · on Aug 27, 2016

Why does he not simply read a tutorial on how to pull in data from a database?

twunde · on Aug 27, 2016

I think it comes down to the fact that he hasn't needed to and that he prefers the tools he already knows. He's also not part of the IT department, so it would be similar to having your CFO have direct database access. Possible but probably not the best use of time.

nerdponx · on Aug 28, 2016

I'm pretty sure there is an ODBC plug-in for Excel

ygra · on Aug 28, 2016

I understood it as that the person didn't have database access, not Excel.

ramblenode · on Aug 27, 2016

A point I have not seen mentioned is that Excel encourages bad practices for data visualization. No other statistics or data analysis software I am aware of gives you the option of 3D-ifying a 2D plot. This adds negative value to the plot for anyone who is actually interested in data over eye candy. Simple example is a pie chart. Give it depth and it becomes much more difficult to reason about, and one's reasoning could easily change if the chart was rotated.

learningman · on Aug 28, 2016

Agreed for the most part, although there are rare valid uses for 3d graphs. I think most contemporary data vis people would say that Excel makes it possible to make pie charts at all means it may encourage bad practices.

ramblenode · on Aug 28, 2016

I was hesitant about the pie chart example but it was the simplest I could think of. I'm glad someone brought this up. ;)

infinite8s · on Aug 28, 2016

What are those rare valid uses for 3D graphs?

learningman · on Aug 28, 2016

I can think of one-- we had a 96-element datatable (12x8) and limited space in our biochemical journal article. We were looking to graphically show differences in orders of magnitude to help explain the assay. A 3d column chart fit the bill because we could show lots of data in a small space, and readers could quickly see the range of values.

ee8aq3g5c6 · on Aug 27, 2016

Let's not forget the time Excel's bad UI contributed to a bombshell paper's faulty evidence in favor of fiscal austerity, probably prolonging the European recession: http://www.bloomberg.com/news/articles/2013-04-18/faq-reinha...

nolok · on Aug 27, 2016

Yeah no, if people who were decided by this paper took it at face value without checking the numbers made sense, it's not an excel ui's fault. As for the others, they would have picked that policy choice with or without that paper.

ee8aq3g5c6 · on Aug 27, 2016

Sure, we could argue over multiple possible sources of blame and maybe we will conclude that Excel is not the most important. It seems nearly impossible to have that debate in a rigorous way. In any case, "the evidence will be ignored anyway" is not a good argument to use a tool that will produce bad evidence.

teamonkey · on Aug 27, 2016

> "Changes in Excel 2010 have improved its use for statistics considerably. For earlier versions of Excel, however, the answer is generally ‘No’. The following refers to versions prior to 2010 (2011 on the Mac)."

tangue · on Aug 27, 2016

Considering Microsoft recent involvment with R I have the crazy hope that future versions of Excel will be shipped with R as a scripting language.

Mikeb85 · on Aug 27, 2016

IMO adding scripting to a spreadsheet is a waste of time when R can simply import a spreadsheet, perform whatever operations needed, then spit out another spreadsheet/database/whatever format you want.

actuallyalys · on Aug 27, 2016

Plus, if you use a tool like RStudio or Jupyter, it's pretty easy to see the data as you manipulate it, which is a commonly cited advantage of Excel, with none of the awkwardness of trying to look at cell formulas.

blahi · on Aug 27, 2016

That would depend on what you want to do. If by statistical analysis you mean regression or similar tasks, then sure R is the way to go.

However there are other family of analytics tasks which can be summed up as "decision analysis". Taking regression models (for example, maybe even built in R) and simulating them under different inputs, getting quantiles, sensitivity, etc. This goes much further with multiple sets, multiple outcomes with decision trees (NOT regression trees/CART) and even further with solvers. Excel is the best option in those tasks most of the time because of quick data entry and already built output reports and interfaces.

tangue · on Aug 28, 2016

For people used to code, yes. But many excel user won't leave the ui they're familiar with. Including R will build a bridge between Excel user and modern development features (testing, vc ...)

dandermotj · on Aug 27, 2016

So is Hadley Wickham... https://twitter.com/hadleywickham/status/748392441154248704

peatmoss · on Aug 28, 2016

Well, later in that thread he did write:

"I am worried by the number of people that couldn’t tell that this was sarcasm"

dandermotj · on Aug 28, 2016

That was the joke I was making...

peatmoss · on Aug 29, 2016

Ah dang, I got whooshed.

KhalilK · on Aug 27, 2016

There is an Excel addin (RExcel) that allows the use of R from within Excel. But it's proprietary.

blahi · on Aug 27, 2016

RExcel blows.

https://bert-toolkit.com/

godzillabrennus · on Aug 27, 2016

Harper Reed gave a fireside chat at 1871 in Chicago a year back where he said that he's yet to see any meaningful data opportunity that can't be addressed by Excel. Most are just trying to build a solution where there isn't a problem.

ramblenode · on Aug 27, 2016

He has apparently never worked with a dataset of more than 1,048,576 observations, Excel's row limit [0].

[0] https://support.office.com/en-us/article/Excel-specification...

lordnacho · on Aug 27, 2016

Never, ever use Excel for anything other than prototyping, spot-checking data, or as a makeshift GUI while you develop a proper one.

I've seen several financial shops where people were moving millions of dollars around using Excel. IMHO, it's always an indicator of deficient processes and lack of coding skill. Yes, I do know that some clever people use it. They are productive in spite of excel, not because of it.

Your main problem with excel is not that some statistical function is missing, or wrong, or misleading. Sure, that's an issue, but you can live with a few things being wrong if you can detect them and fix them yourself. I'll come back to stats later...

The problem with Excel is it's damn near undebuggable. There's simply nothing in the way of someone making a beast of a calculation, with the flow going all over several sheets. You can even make things circular if you want. The data and the code are all together, mixed up. Was a number in a given cell written there by the VB code, or was it an input? You can use the auditing functions, but chances are you will see a spaghetti of arrows.

It's also non trivial to find differences between different versions. Typically the dude who is using excel has also never heard of Git or SVN, so you will see a load of sheets like "portfolio1" and "portfolio2_new_old" and so on. I don't know if it's changed, but when I was using Excel, the files had code files within them, rather than separate files, like we do with most other languages.

Of course you aren't forced to write crappy spreadsheets, but there's simply a tendency for people who don't code to be a bit messy. But Excel is positively inviting trouble. It's so flexible that anyone under a little pressure will hack in some extra bell or whistle, building up tech debt for future generations. It basically lulls the novice into thinking they can build anything.

Amazingly I've met several people in finance who pride themselves on spreadsheets that stretch over thousands of lines. I remember a billion dollar merger where the analyst in charge of the "modelling" showed me how they reached the line limit (I think that's gone now, so good luck!). It's as if making things complicated justified their salaries, so maybe that's why excel is so popular.

Now, about statistics. If you're doing anything non-trivial, you absolutely do not have space to look at all the individual numbers in your matrices. Just like if you're solving equations on a piece of paper, you need your own symbolism. You need to be able to give things names and see short statements at the appropriate level of abstraction. You probably need to be able to verify the pieces independently, too. So unit tests for various operations, that aren't intruding into the business logic.

ZanyProgrammer · on Aug 27, 2016

I could've sworn I've seen people who work in the financial and maybe data science industries who've praised Excel to the heavens on HN. Like on the same level as Python, R, etc.

I don't work in finance (or data science really) so I can't comment, but it seems merely a tool to me.

Nicholas_C · on Aug 28, 2016

I'm probably one of those people. You can't build a model in R or Python and walk a sales or finance VP through it. The other day I walked some auditors through a regression model built in excel. I could not imagine doing that in R or Python, their eyes would glaze over and think it's a black box. That's the beauty of excel, very easy for everyone to understand as the numbers are right in front of you.

throw_away_777 · on Aug 27, 2016

There are lots of people who are data analysts who use Excel. Data scientists would more generally use more powerful tools, but Excel is still useful for some things. For example if you have to manually enter in a small amount of data and then perform some simple calculations on this data, Excel is a fine tool. If your data is stored in a machine readable format or you want to do complex analysis with it you generally shouldn't be using Excel.

huac · on Aug 27, 2016

+1 for manual entry; or if you need to fix some cells in the data.

lordnacho · on Aug 27, 2016

No way, not data science. How are you going to load in millions, possibly billions of individual numbers?

With finance there's a lot of things that seem like good fits, but only if you keep them small. Something like a personal budget is fine, where everything fits on a screen.

dandermotj · on Aug 27, 2016

The "big data vs. small data" argument isn't the issue for data scientists. It's the total lack of decent statistical libraries, visualisation tools, programming capabilities and on and on. Don't think anything I've worked on would come close to hitting the excel memory ceiling, but without a doubt completing it in excel would be nigh on impossible.

PeterisP · on Sept 1, 2016

A simple excel pivotchart on 10k-100k financial entries often works better than the best BI / MIS packages.

For most companies, all their data science fits in excel easily. Full transaction history with each individual item sold since founding the company? No problem. Every single visitor/page view on their website? I've seen such logs imported to excel to do some rough aggregation. Manufacturing data about each particular widget sub-part that ever went off your conveyor belt? Again, for many companies with hundreds of employees that would still fit in excel without issues.

There are so many companies who speak about big data while their largest datasets can fit into RAM of a cheap laptop. That doesn't mean that data science and analysis is worthless to them, quite contrary.

blahi · on Aug 27, 2016

I have 2.4 billion data points loaded in Excel at the moment.

Not that it matters. "Data science" is not something that magically kicks in after you go beyond some "big data" threshold.

Excel is heavily used in managerial science type of positions and I can assure you those are rather heavy on the "data science" workflows.

ramblenode · on Aug 27, 2016

The maximum number of rows in the most recent version is 1,048,576 [0]. I'm deducing then that your data is in wide format. What would happen if you wanted it in long format? It seems you would just be out of luck with Excel, which is not a problem in any other major statistics software.

[0] https://support.office.com/en-us/article/Excel-specification...

blahi · on Aug 27, 2016

Except that there is Power Pivot which is xVelocity - the same columnar store database use in SQL Server.

goatlover · on Aug 28, 2016

Why on Earth would you use Excel for that many data points?

blahi · on Aug 28, 2016

Quik self-service OLAP. Blazing fast slice & dice and multidimensional expressions (cube formulas). I would need a corporate BI solution to get that and it won't be self-service.

Nothing beats Excel + Power BI in reporting.

kgwgk · on Aug 27, 2016

> Yes, I do know that some clever people use it. They are productive in spite of excel, not because of it.

Would they be more productive if you took Excel from them?

> I remember a billion dollar merger where the analyst in charge of the "modelling" showed me how they reached the line limit. It's as if making things complicated justified their salaries, so maybe that's why excel is so popular.

How should the analyst write his model in a simple way? As a Java (or Haskell, or whatever firs your ideal of simplicity) program? Or maybe he could simplify the model until he can write it down in a piece of paper.

jnbiche · on Aug 27, 2016

> How should the analyst write his model in a simple way? As a Java (or Haskell, or whatever firs your ideal of simplicity) program?

In Python or R, like the rest of us do. Or SAS, Stata, or even SPSS if they want a familiar interface (although using the SPSS GUI instead of its scripting interface will put them at risk of the same kinds of mistakes as Excel is).

Ensuring reproducibility and robustness isn't rocket science, and it doesn't require the obscurity of Haskell or the verbosity of Java to do properly. But it does require learning to script to program their models, instead of relying on GUI (which should be perfectly natural to analysts and their long Excel formulas).

Edit: To be clear, when I say "scripting" here, I mean text-driven programming, as opposed to using a GUI interface (which is rarely robust or reproducible, and certainly isn't in plain Excel).

kgwgk · on Aug 27, 2016

I don't think you know what a financial model is: https://en.m.wikipedia.org/wiki/Financial_modeling#Accountin...

jnbiche · on Aug 27, 2016

Well, I've never worked in finance, but if these are like the mathematical models that I used to make predictions in grad school and for the year after grad school when I worked as a statistical analyst, I do. I presume these are predictive mathematical models, used for forecasting?

Why in the world would Python or R be inadequate for that type of work, if that's what you're implying? What do you think they lack that Excel has, other than a friendly GUI interface? Or SAS, Stata, or SPSS (all of whom have versions that they market heavily toward finance).

kgwgk · on Aug 27, 2016

No, these are not mathematical models used for forecasting. In this context, "financial statement forecasting" means making up financial statements. Something like this: https://mycourses.aalto.fi/pluginfile.php/144740/mod_resourc...

lordnacho · on Aug 27, 2016

Yes, if people didn't think that crappy tools were OK, they would have better tools, and they'd have a better idea of what they could do. For instance, I did a lot of fixed income spreadsheets at one point. Instructive for learning how the business works, but ultimately if you stay with the spreadsheets, you are not full exploiting opportunities. If I were to come back to that I'd automate the opportunity finding as well, going back through time to validate. That's something you just can't do with an excel sheet.

Models that take up entire spreadsheets are not models. There is never enough data to validate the sheer number of degrees of freedom that such models contain. There would be things like sub-models of entire divisions of firms. Who even has data that could validate all the potential things that could happen? These spreadsheets are pure ludic fallacy; advisors pretending they know how adding or removing some employees will affect some merger.

kgwgk · on Aug 27, 2016

> These spreadsheets are pure ludic fallacy; advisors pretending they know how adding or removing some employees will affect some merger.

Yes, I agree. And I think Excel is an appropriate tool for that job.

itisbiz · on Aug 28, 2016

I've been recommending to all the people I work with to use Excel add-in Power Query to connect to data and do all calculations and transformations. Then they can just replace/update data sources and refresh queries. It's somewhat more accessible for non technical people than Power Pivot/ DAX/MDX. it's easy to make tabular dimensional record sets ready for consumption by pivot tables, Tableau, etc. You can choose to keep data in model instead of showing in worksheet to bypass row limits and connect model to pivot table. Best of all it teaches people ETL, better data management, separation of data and report.

princeb · on Aug 27, 2016

there are a few unique issues i face with excel beyond the stats package.

1. if you shift cells (ctrl c, ctrl v) around, delete or insert rows, you may mess up existing cell references in formulas without realizing it. your vlookups, hlookups will not change your column numbers just because you did. your vba code will not change your A1 cell references. things will blow up here in spectacular fashion.

2. if you have a massive spreadsheet with a lot of lookups, UDFs, non-static cell values (like a Bloomberg real time feed), it's not so clear which UDFs in which cells get calculated in what order. sometimes it results in #value errors, which is a million times more desirable than if there was a iferr(..., 0) or iferr(..., "") and you can't tell if there was a failure.

3. your macros will happily destroy your work if you let it by mistake (writing over formulas in the wrong sheet etc). python will generally not destroy the code it's running.

4. AUTOFORMAT will destroy, without any honor or humanity, any data if it just barely looks like it should be something else. I've had strings get converted into dates, 0s get stripped off (I think geneticists also face that same issue), all kinds of nonsense.

some of the problems I see raised here in HN (such as errors in formulas, sanity checking) are also issues in other tools like scipy, matlab. common errors in these languages are off-by-one matrix references, terminating loops prematurely (esp for numerical solutions), formulas not written correctly, brackets in the wrong place or + instead of -, typos in variable names, nan versus 0 vs na, these are things that affect excel equally.

otherwise, excel is pretty good. it's quick to prototype, it gives passable charts if all you need are passable charts. it's very good at displaying intermediate results. it's pretty ok for WSYWIG presentation, formatting, especially if you have custom reports to produce every week rather than regular ones that tex can solve. the biggest thing about excel is that everyone uses excel and if you try to send over results in a non-excel format they'll (clients or whoever) ask you to send it back in xls.

oh also... mediocre workers can produce excel sheets of passable quality. mediocre works may not even produce a single scipy script of any quality. I have seen some horrific matlab code, written by people with engineering background. i've seen one guy, in his desire to make a programming language look just like excel, write a single line for each of the 50 charts he creates and calls them Chart1, chart2, chart3, rather than use a for loop even though it's just 2 lines. it's totally bizarre.

viraptor · on Aug 28, 2016

> your vlookups, hlookups will not change your column numbers

If you're not using tables yet, you should. They solve that problem easily, and make addressing more explicit (relative to the table name rather than the sheet)

princeb · on Aug 28, 2016

you can still use table names in vlookups, and it will still not solve your problem because vlookups do not work by table headers but by column numbers

viraptor · on Aug 28, 2016

You can use either index/match as avs733 suggested, or alternatively:

    =VLOOKUP(value, Sometable, MATCH("Column2", Sometable[#Headers]))

avs733 · on Aug 28, 2016

don't use v/hlookup...use index(match()) and you will solve that problem

Also, the folks running R should pay Matlab/Mathworks to give them a workshop on writing help files.

fithisux · on Aug 28, 2016

Never, there are better alternatives and especially FOSS. For windows, PSPP is there to make your life easier. If you are a spreadsheet guy Libreoffice will make you never turn back to excel and can be programmed in StarBasic. R is amazing, but if you do not want to invest Scilab and Octave are there. if you are the freeware proprietary guy Google Sheets, WPS Office or FreeOffice are better alternatives than Excel. And if you have a good PC, why not give the FOSS giac/xcas a try, you may find your sanity.

ramblenode · on Aug 27, 2016

Related discussion: https://news.ycombinator.com/item?id=12370605

galkk · on Aug 27, 2016

"statistics don't lie quote but liars use statistics"

Well, the similar can be said about authors of that article (don't know about newer version of Excel though).

Basically they are saying "Of course, it all appears only to old versions of Excel but there were sooo much problems with them".

So, what?

nimish · on Aug 27, 2016

Excel can calculate whatever you need but the main issue is that it will modify your data to be helpful like converting number strings to floats unless you tell it not to and even then...

Fej · on Aug 27, 2016

When an article title poses a question like that... the answer is probably no.

chris_wot · on Aug 27, 2016

How does LibreOffice compare?

gaius · on Aug 27, 2016

It's just as unsuitable. But there are plenty of open source packages that R the right tool for the job...

nn3 · on Aug 27, 2016

The article disagrees with you

""" Solution #2: Alternatives to Excel Yalta (ref 1) states that p-values [inverse probability distributions] reported by the free OpenOffice’s Calc spreadsheet and the open-source Gnumeric spreadsheet do not have the same numerical problems as does Excel - their programmers used accurate algorithms."""

It is not surprising because with an open source program everyone who can program can fix such issues, while with Microsoft you are at the mercy of the likely overworked Excel team.

okket · on Aug 27, 2016

At this point I doubt Microsoft can change the fundamental way how Excel calculates any more, because it would break too many existing workflows, see

https://xkcd.com/1172/

[meta: If you want to quote, use two spaces at the beginning of every line, see https://news.ycombinator.com/formatdoc]

sandworm101 · on Aug 27, 2016

Isn't breaking existing workflows their business model? I have yet to see an update from the linux community result in rushed boardroom-level meetings about whether we are ready for the next software mandate or update deadline (win10). For the last few decades, every couple years MS comes out with some new thing that we all have to prepare for as if it were y2k all over again.

gaius · on Aug 27, 2016

Right, but that isn't the only problem with Excel in particular or spreadsheets in general

Delmania · on Aug 27, 2016

> It is not surprising because with an open source program everyone who can program can fix such issues, while with Microsoft you are at the mercy of the likely overworked Excel team.

Yep, that's the theory behind open source applications. The reality is that in a company, people will prefer Excel because Microsoft is a point of contact that can work with, blame, or yell at to fix because you're paying them. With OpenOffice or LibreOffice, sure, you could have your engineering department fix it, or they could work on the software you need for your business.

chris_wot · on Aug 28, 2016

In the rather large Australian company I work for currently my manager asked himself out loud "how many tines did we ask Microsoft for support with Office"?

The answer was - "never". They wouldn't have listened and so it was not only pointless, but it was actively frowned upon!

You can purchase a rather less expensive support contract with Collabora. You don't have to build or fix the software yourself.

0x145555 · on Aug 27, 2016

Excel is the de facto standard for my STAT 301 class.