Interesting article. I have thought a lot about this frustration on the job. How...

dahfizz · on June 19, 2022

It sounds like you just don't include performance in your acceptance criteria. You're using a lot of words to basically say that you don't care about performance.

I don't find it acceptable to merge something that I know could be 10X faster with a bit more work. Part of my acceptance criteria is "has reasonable performance".

Performance optimization is contentious because everyone has to draw the line somewhere. You can sink an almost unlimited amount of time making a complex program faster. But I've seen so many devs just ignore it altogether, where half a day of work literally makes the thing they made 10x faster.

jerDev · on June 19, 2022

I feel more work than is necessary is always a waste.

Unless there is a business requirement for something to be 10x faster, and that is rooted in $’s, there is no reason to be more than 1x.

Performance criteria should only be measured by whether or not you lose money/customers from it.

It’s not an EM’s job to have performance criteria themselves. That should be dictated by business needs

Delk · on June 19, 2022

> Performance criteria should only be measured by whether or not you lose money/customers from it.

While I agree that performance optimization for no tangible benefit isn't generally useful, I find it quite cynical to think of the loss of customers as the only measure that could or should matter.

If a product's user experience is measurably or subjectively worse, but not enough to drive people away, it's still a worse experience.

That may or may not matter to the owners of the company, and of course putting too much effort into details that don't affect the business should be avoided. It's also a reasonable view that one needn't do better than is necessary for the bottom line.

Some people like to take pride in building good products, though, and care about the experiences of their users. It sounds rather cynical to think that one should refrain from ever making anything better, or that it's somehow wrong to care about users' experience, if it isn't enough to immediately affect the bottom line.

(Also, perceived quality could affect user opinions in the long run and, when compounded with other things, could also affect the bottom line even if the effects aren't immediate. Trying to build products of high perceived quality may be a reasonable strategy, and a part of that might be to use good quality, possibly including performance, as a heuristic. But that's a bit of a different matter.)

palata · on June 20, 2022

My problem with that argument is that the business requirement is based on what we can convince the user to buy. Users won't pay more for a better product if they don't understand it's better.

And one may say that "if they don't make the difference, then the bad products are fine after all", but I think that's shortsighted. By continuously favouring the worse but cheaper design, it has brought the industry where it is today: "better write bullshit that ships now than a good product that users won't buy anyway, because they bought the bullshit from the competitor already".

anonymoushn · on June 19, 2022

commonly the cost is externalized to millions of people who spend hours waiting on the thing.

ElFitz · on June 19, 2022

See https://www.folklore.org/StoryView.py?project=Macintosh&stor...

ehaliewicz2 · on June 20, 2022

And in turn, instead of doing extra work once, you are making every computer that runs your software do extra work every time.

ratww · on June 20, 2022

And every user that runs their software is also probably a bit less productive than they could be, since they have to wait around.

ShroudedNight · on June 19, 2022

> iterative improvements after a working model if those improvements become necessary.

This doesn't actually work if you've shotgunned the bloat uniformly across your system.

My personal experience was trying to find a 10% performance degradation that turned out to be a worthless branch in memory allocation that ended up being inlined prolifically throughout the codebase. It didn't show up in any profile, and if the group I was with wasn't ruthlessly disciplined in maintaining performance, such that it's effects were felt immediately, I strongly doubt it could have been reversed after the fact without serendipitous re-discovery.

_gabe_ · on June 19, 2022

> Could we have it run in 010ms vs 250ms?

Modern games can render 100s of thousands of polygons, run a physics simulation, run complex audio processing and mixing, run several post processing effects, and provide consistent multiplayer results in under 7ms (144FPS) on a VR headset (ever played half life alyx?).

There's no excuse for why any algorithm should take more than 100ms unless it's communicating over a crappy network, or operating on obscene amounts of data (think 100s of GB of data). No excuse.

bluquark · on June 19, 2022

So as a specific example, would you say there's no excuse for not being able decode a 1GB PNG image in less than 100ms? The state of the art algorithms, in Rust and Wuff programming languages, have reached around 500MB/second decode speed on a desktop x86.

If I had specified a 1GB image with a PNG-like compression ratio and allowed you define the data format along with the algorithm, then the challenge would not be so difficult. But the moment the requirement is an image format that's actually widely used...

_gabe_ · on June 19, 2022

PNG is a compressed format, so right off the bat it would be weird to have 1GB of compressed image data. According to this[0] out of 23 million analyzed images from web pages, PNGs comprised around 14.4% of those images and averaged around 4.4KB. So this scenario isn't likely to occur in the real world, and if it did, decompressing 4.4KB would happen well under 100ms.

Secondly, PNG is an unnecessarily slow algorithm as we've found out from QOI[1]. From the results, it can encode over 300 megapixel per second in some cases. Now I'm a bit fuzzy on the math, but I believe 300 megapixels means 300 million pixels, and if we assume 16 bytes per pixel (4 bytes RGBA) than that means it can encode at speeds of up to 4 GB/s and achieve very similar compression ratios to PNG. Oh and by the way this algorithm isn't even using multithreading, so I assume it can be sped up even more.

So I wouldn't insinuate that the developers who coded the implementations to decode PNG are programming slow inefficient code, but they're stuck with a bad algorithm from the outset because of the requirements, as you've already pointed out.

I think my original statement still stands.

> There's no excuse for why any algorithm should take more than 100ms

The PNG algorithm is inherently flawed imo. We could do better. The developers who are stuck with this specific compression algorithm have no choice but to do the best with what they've got. But you can always transform the PNG image into a faster lossless format like QOI and then use that. So it's still not worthwhile to give up and say we can't do better since PNG is inherently slow.

Edit: I think my original statement almost stands. Processing 100s of GBs of data in under 100ms is definitely overestimating our current computing capacity. I would probably amend that to something like processing anything under a few GBs of data (3-5 GBs) is definitely possible in under 100ms.

[0]: https://www.pingdom.com/blog/new-facts-and-figures-about-ima...

[1]: https://qoiformat.org/benchmark/

bluquark · on June 20, 2022

> The developers who are stuck with this specific compression algorithm have no choice but to do the best with what they've got.

Right. The point I wanted to make is that with the exception of game developers, who have a unique freedom to define their set of requirements and design a vertically integrated system to solve them, every developer faces their own version of this.

So I agree games are a useful reference point to conceptualize what's possible after an extended transition plan to migrate ecosystems to new standards. But I don't agree with the framing of "no excuses". It takes heroic efforts and wisdom to break out of ecosystem local maximums. For example, when image transcoding to newer formats has been tried, users hated how other software didn't read it correctly if they tried to save or reshare it. So with the latest attempt (JpegXL) they're aiming to gather industrywide support instead of acting unilaterally.

We should cheer orders-of-magnitude performance improvements in the rare cases when a migration can be successfully coordinated to so do, instead of being unduly negative about the normal expected case when it isn't.

int_19h · on June 20, 2022

The PNG example is a case of a well-established standard limiting what you can do - basically, improvement is only possible by breaking compatibility.

But this is not the case for the vast majority of slow software. Nor is there any equivalent problem there. It really is, as the article describes, a case of devs not caring to put the time because the users tolerate the status quo.

eecc · on June 19, 2022

Not sure, but I smell intolerance for craft, and micromanaging. You’re going on with this “optimize for efficiency” and “good enough” rhetoric but tell me: how are you going to make a bunch of professionals care about your widgets if you keep mortifying their interest in them?

And let’s assume they deliver your acceptance criteria, do you really think you have enough requirements to keep them busy and interested?

Are you going to fire them? No, because you might need them back!

Are you going to let them do what they’re good at? No because that’d be waste since it wasn’t vetted!

So are you going to let them play pool at the office?

leeoniya · on June 19, 2022

> Could we have it run in 010ms vs 250ms?

if it runs once per hour, then it may be inconsequential; if it runs every 30s, then it might be an extra 1hr of battery life for your phone, but people will still keep using it and not complain. is there a problem? maybe.

thesz · on June 19, 2022

30s is a 120 times per hour. Or, 30 sec of difference per hour between 250ms and 0 ms (instanteneous).

To get a hour of difference my phone needs to have battery life of 120 hours. For more reasonable 10 hours of battery life I will get about 5 minutes saved.

I see why no one will complain about that difference.

leeoniya · on June 19, 2022

you might be right, but there's more to power consumption than CPU time. something that takes 250ms vs 10ms might wake more cores and/or prevent them from sleeping, might use high-perf cores instead of efficient cores, it might read from storage instead of from cache, etc.

i've certainly seen a 10x improvement in one bottleneck turn into a 20x-30x improvement holistically, due to reduction in contention, back-pressure, etc; there is almost always a cascading/multiplier effect, in my experience.