What is the ratio of retrieving (reporting) to updating? You just may want to ask this question before you decide how far to normalize your data. Got 13 instances of the same thing? EVERY ONE of them must be updated simultaneously with EVERY update (including all the necessary locking, committing, etc.)
Calling people sissies is what sissies do. (Oops, does that make me a sissie?)
I agree. I think the best strategy is to start with a normalized database, and only denormalize when you measure a problem that can't be fixed by adding an index.
I'm working on a project where we're in the process of storing de-normalized links for a recursive table relationship, so we can query from the top-level table directly to the bottom-level table without worrying about the messy inbetween. One of the few non-speed reasons I've ever come across where denormzalization seems jutified.
Thanks. I just overnighted it from Amazon. My coding partner wants to use recursive database queries to do this--I want to avoid it because it's slower, breaks our object model, and makes the application non-database-agnostic. (Currently, we're using NHibernate and can run on most major databases.)
On the other hand. Just de-normalizing ahead of time can really save you on code complexity and is usually the highest performance setup.
So I would say, de-normalize ahead of time and if you find you have performance problems then normalize it. That's the most probable way of avoiding pre-mature optimization:)
Well, the smart thing would be to store the data in normalized tables like your supposed to, and then denormalize it with a fast refreshed materialized view (or one that is updated on a schedule, depending on how stale data can be tolerated), or add the extra speed with a caching layer.
If your hitting flickr traffic, then you have some tough problems to solve, but I don't think starting your app with non-normalized data would be very smart!
"the scalability problems being described in your post, and in his presentation, could easily be rectified through the use of indexes and stored procedures."
Not all scalability problems can be solved by throwing indexes and stored procedures around. A improperly designed schema just for the sake of normalizing everything can cost a lot in terms of performance. On the other hand keeping 13 copies of the same data around sounds scary.
I remember working for a small company with a moderately successful web app for the real estate industry. I had to make a modification to one of the data models so I downloaded a trial of an expensive ER app to reverse engineer the SQL server DB. The tool revealed zero foreign keys - I thought it was an error until I confronted the on-his-way-out developer who had written the app. He said he got tired of having to change the data model constantly - that he decide to wipe out all the FK constraints. In his mind, the constraints were in the code.
For one, while we're on this denormalization topic, postgresql refuses to use indexes sanely when I run queries against views. Heck, I don't see why postgresql even has views if they can't really be used.
Yes, I am complaining about the performance of current implementations of relational databases, not how good they "theoretically" could be. Still, after this many years, it's apparent that they aren't going to be getting better any time soon!
Calling people sissies is what sissies do. (Oops, does that make me a sissie?)