Hacker News new | past | comments | ask | show | jobs | submit login
Why is it so hard to avoid premature optimizations? (justinappears.com)
26 points by bentlegen on July 21, 2010 | hide | past | favorite | 14 comments



I've always thought there were 2 types of things that could be optimized:

1. Things that need to be "cleaned up".

2. Things that never should have been written in the first place.

Simple example of Type 1: You rush to get something up and running, and in your first code review, you find the exact same code multiple times. So you write a function, parameterize a few variables, tighten it up, and reference it all over the place. Cool.

Simple example of Type 2: You have an SQL SELECT inside an iteration. At 500 iterations it runs smoothly. At 50,000 iterations, it becomes non-functional. Your only hope to scale this thing is to rethink the whole process to run with one SQL SELECT (and maybe a database redesign) outside the iteration. You basically have to start over. What were you thinking?

You need to trust your "process" that Type 1 things will rise to the surface in due time, thus avoiding premature optimization.

For Type 2 things, there is no such thing as "premature optimization". They need to be designed and written properly in the first place.


Your example 2 is actually not as ridiculous as you state. For example, the following lines are from Market Watch Frame 1 in the TPC-E benchmark.

  open stock_list
  do until(stock_list.end_of_cursor) {
     fetch from
       stock_list cursor
     into
       symbol

     select
       new_price = LT_PRICE
     from
       LAST_TRADE
     where
       LT_S_SYMB = symbol
We spend a lot of time trying to optimize these types of OLTP transactions and in certain cases you may be better off using this structure than a more esoteric, less optimized construct.

Edit: I guess I should mention that the cursor in there is going to hurt. Just an example -- don't always actually do that and rely on us to speed up your transaction.


As a gradually reforming software performance engineer, I spent a lot of time discovering type 2 items off in enterprise software. I think your background is similar.

What is more amazing, is I think a number of projects I worked on as a performance tester had no developer on staff or team for a large enterprise software product who could make a determination that the type 2 error had occurred. I think this was mainly the result of having only background in a specific tech stack (I saw lots of low-hanging opportunities for doing set operations in the database rather than pulling stuff in with J2EE or even in db code written in an imperative style . . . <shudders when thinking of T-SQL in legacy product> ).


It's too easy to wrongly believe you have a good Type 2 optimization. One might be tempted to replace their language's brute-force string searching with Boyer-Moore... after all, it obviously uses fewer comparisons. But brute force is usually fast enough in regular text for non-search intensive applications - it can find the last sentence of Moby Dick in 5ms on my laptop. And most of us aren't searching through Moby Dick each time a webpage is loaded. I hope.

Optimizations should really only be made at the Type 1 level. I'm all for cleaning up and refactoring code. Your Type 2 example is really a Type 1 problem in disguise, since it's not the simplest way to do it! If you want to do a Type 2 change, measure it and prove to yourself it's worthy of your time. Just solve the problem simply and cleanly and hope for the best. The huge performance hogs are almost never where you expect to see them anyways.


I think whenever the topic of premature optimization comes up most people forget the whole Knuth/Hoare quote:

> We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

Everyone seems to latch on the second part, but the first part, and the context, are very important. At the time the quote was first coined they were considering "premature optimization" to be the use of "inc eax" instead of "add eax, 1". The key, I think, is the bit about small efficiencies -- we shouldn't worry that "inc eax" uses two bytes less than "add eax, 1" from the outset, but cache thrashing should definitely be a concern.

Optimization is something that should always be in a programmers mind, especially when programming embedded machines. If you're not at a point where you are optimizing the code you are working with then you should be writing it in a way that it can be optimized down the road.

I would think that anyone who uses "premature optimization is the root of all evil" as a retort is fair game for being challenged on how they would actually optimize their code when they have the chance :)

(disclaimer: I write code for video games, my world view may not be the same as yours, though I still believe this applies globally.)


Avoiding premature optimization involves looking at the whole situation, instead of the optimization.

* Working at a strategic level is often nebulous and requires some definition.

* Little changes are simple, concrete, and instantly rewarding (if only I could do X, this would work better).


Premature optimization can be a form of procrastination.

If you're looking for a way to avoid something, PO is an easy out. It even looks like you're doing something.


Sadly, I've done this even knowingly. It's like mentally pacing from one side of the room to the next. Aware you're doing something, but it's not necessarily getting you anywhere.


Completely agree.

To make matters worse, the optimization usually results in making things at least slightly more complex. When you actually do get around to doing what really needs to be done, you'll often find that the job is now at least slightly more difficult.


In workplaces, it tends to happen because it is something the developers control rather than other peoples control (i.e. marketing, sales, support, ...). Control makes some people happy.

In my personal projects, it happens because I delude myself into thinking that the optimization is the "next big thing" and thus I am blinded to see it as a premature optimization. Of course, over the years, I have tempered myself when I think I've discovered the "next big thing", I try to be objective and think of it terms of an investment and cost; sometimes, it works.


Various forces drive programmer to do various things. One things come to the top is motivation. When you code alot of things, you want to get a reward, make something run. So that you won't feel like programming is an endless treadmill. When you mix it with do it right the first time, and fact that coder often playing with a moving target - his understanding what is best for the project is different from when it was planned first. This makes early optimizations incredibly difficult.

If you have self-contained bit of software like TeX, that Donald wrote where you know what you will get in the end, then you can be quite methodical. But when you try to put an end user product that is composed of various systems, that can be described as moving targets, somewhat incomplete and sometimes missing certain things. Then this dance become a shuffle to keep things on even keel while keeping general plan of action in sight.

If you ever poured coarse matter, like grains of salt or sand - shook the jar so that vibration to compress grains in - just to get more space for extra stuff. Well development is like that our there in the wild. We coders always keep Knuth's axiom on early optimization. But we are human too and we'd like to get some love from object of our undivided attention too.


Because it takes hard work and discipline to objectively determine where to optimize or whether an optimization is necessary - let alone to determine which situations need to be optimized for in the first place.[1]

And, frankly, quite a bit of premature optimization takes less time than doing that work.

[1] Optimizing for readability has definite pay-off over the life of a project, but good luck quantifying it ahead of time.


When you have written lots of code, you remember some of the things you had to optimize in the past, and when you find yourself in a similar situation you're tempted to just do it now so that you don't have to do it later.

The fact that sometimes you're right just reinforces this behavior, but when you get it wrong it tends to blow up in your face.


I do believe you should make sure the algorithms aren't crud. Also, any project where there is a separate DB Team kinda requires you to get your tables right very early because of the "cost of change" in most organizations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: