Normalized data is for sissies

edw519 · on March 5, 2008

What is the ratio of retrieving (reporting) to updating? You just may want to ask this question before you decide how far to normalize your data. Got 13 instances of the same thing? EVERY ONE of them must be updated simultaneously with EVERY update (including all the necessary locking, committing, etc.)

Calling people sissies is what sissies do. (Oops, does that make me a sissie?)

brlewis · on March 5, 2008

I agree. I think the best strategy is to start with a normalized database, and only denormalize when you measure a problem that can't be fixed by adding an index.

Flemlord · on March 6, 2008

I'm working on a project where we're in the process of storing de-normalized links for a recursive table relationship, so we can query from the top-level table directly to the bottom-level table without worrying about the messy inbetween. One of the few non-speed reasons I've ever come across where denormzalization seems jutified.

brlewis · on March 6, 2008

Might want to check Joe Celko's Trees and Hierarchies in SQL to see if there's a normalized way to do what you want.

Flemlord · on March 6, 2008

Thanks. I just overnighted it from Amazon. My coding partner wants to use recursive database queries to do this--I want to avoid it because it's slower, breaks our object model, and makes the application non-database-agnostic. (Currently, we're using NHibernate and can run on most major databases.)

tim2 · on March 6, 2008

On the other hand. Just de-normalizing ahead of time can really save you on code complexity and is usually the highest performance setup.

So I would say, de-normalize ahead of time and if you find you have performance problems then normalize it. That's the most probable way of avoiding pre-mature optimization:)

menloparkbum · on March 5, 2008

I was about to say that denormalization is so 2004, and then I noticed that he made that post in 2004.

fendale · on March 5, 2008

Well, the smart thing would be to store the data in normalized tables like your supposed to, and then denormalize it with a fast refreshed materialized view (or one that is updated on a schedule, depending on how stale data can be tolerated), or add the extra speed with a caching layer.

If your hitting flickr traffic, then you have some tough problems to solve, but I don't think starting your app with non-normalized data would be very smart!

cstejerean · on March 6, 2008

"the scalability problems being described in your post, and in his presentation, could easily be rectified through the use of indexes and stored procedures."

Not all scalability problems can be solved by throwing indexes and stored procedures around. A improperly designed schema just for the sake of normalizing everything can cost a lot in terms of performance. On the other hand keeping 13 copies of the same data around sounds scary.

raghus · on March 5, 2008

It sure does this: "Makes searching faster" As long as you take care of this little nit: "Have to ensure consistency in the application logic"

RyanGWU82 · on March 5, 2008

It took me quite a while to realize he was talking about normalized databases. "Normalized data" generally means something else.

wallflower · on March 6, 2008

I remember working for a small company with a moderately successful web app for the real estate industry. I had to make a modification to one of the data models so I downloaded a trial of an expensive ER app to reverse engineer the SQL server DB. The tool revealed zero foreign keys - I thought it was an error until I confronted the on-his-way-out developer who had written the app. He said he got tired of having to change the data model constantly - that he decide to wipe out all the FK constraints. In his mind, the constraints were in the code.

tim2 · on March 6, 2008

SQL databases are such marvels of engineering and computer science!

(cough, like hell they are...)

gduffy · on March 6, 2008

What's wrong with them?

tim2 · on March 6, 2008

Where to start..

For one, while we're on this denormalization topic, postgresql refuses to use indexes sanely when I run queries against views. Heck, I don't see why postgresql even has views if they can't really be used.

Yes, I am complaining about the performance of current implementations of relational databases, not how good they "theoretically" could be. Still, after this many years, it's apparent that they aren't going to be getting better any time soon!

dejb · on March 6, 2008

You need to implement materialized views using triggers. Not ideal but sometimes it has to be done.