A schema is a way of designing data structures such that they are efficiently or...

HeroOfAges · on Feb 14, 2020

I agree, and I've come to the conclusion that you should avoid designing a database schema until you have some clear understanding about how the application you're persisting data for will be used. The data structures and schemas will practically jump out at you. It seems, at this point in my career, a very obvious thing, and yet I cannot communicate this point of view in a way to my colleagues that will affect change.

ak39 · on Feb 14, 2020

>I agree, and I've come to the conclusion that you should avoid designing a database schema until you have some clear understanding about how the application you're persisting data for will be used.<

I don't buy this argument, with due respect. (I see a lot of this thought these days and I'm genuinely worried we are about to swing the pendulum of software design into the dark ages where focus from the data model is taken away in favour of convenience of process design, front end design and data access frameworks).

In my experience: the objects from the problem domain will always have certain immutable relationships with each other right from the beginning. You have to look for them first before you do anything. They are clearly identifiable. What to persist and what not to persist are also discernible from existing manual business processes and the documents operational staff use daily (many of these are legal documents like tax invoices, contract confirmation, mandates and policy agreements etc ... a great way to start is to look at what reports users might need. Yes, those boring reports! Start there). Then, cardinalities and ordinalities will be known the moment you understand the objects relationships too (Can a client with product A also have product B? etc). Then, identifiers, business keys and other coding formats are also determined very early on in the project.

All of this before you even choose a front-end framework or the people's favourite ORM library.

I start with the data model. I use a good ER modeling tool to avoid frustrating myself when large changes are needed for my model. The models, when sufficiently complete, are printed in large format and pasted on walls for all to see daily. That is then considered holiest of holies.

(I understand that data models that need changes other than adding new columns or new sub entities can be painful to refactor ... but the fear of this eventuality should not automatically translate into turning the development process on its head.)

Just my 2c.

DanielBMarkham · on Feb 14, 2020

Analysis is great and absolutely necessary.

Too often we confuse analysis with design, then the heartaches start. Some problem domain things are immutable, some are not, some change over a fixed range, some are firm but expected to change, etc. All that stuff is critical to know as part of analysis.

I'd argue that you can do analysis incrementally right along with everything else. Reports sound like a great starting point. I agree with your advice, even the part about the pendulum swinging too far the other way. The problem is that, honestly, we suck no matter which way the pendulum swings. We continue to confuse the process with the goals.

ak39 · on Feb 14, 2020

>All that stuff is critical to know as part of analysis.

Exactly. I wonder if the frustration to get started as soon as possible ultimately results in data models that aren't ready yet and therefore brittle (IOW: insufficient analysis and logical testing of the data models before UI and other process logic commenced).

AlphaSite · on Feb 14, 2020

I think the fear from defining the models too early is that you won’t see what’s not needed, this approach probably has a tendency towards adding unnecessary models, indices, etc.

nabeards · on Feb 14, 2020

For my last few projects, I’ve started by building with a document-based database knowing a rewrite will come within 6-12 months. On rewrite, I can design the new relational db with 5NF very quickly.

pc86 · on Feb 14, 2020

> I would design the schema for a three-microservice widely-distributed high-performance application far differently than I would a local checkbook app for my brother.

I think it might be helpful to expand on what some of those differences are. I think most of us here would be able to come up with at least a satisfactory schema for a local checkbook app. What are the changes you need to make to that type of schema to make it appropriate for a multi-service app, specifically? What about distributed? HP? Are any of these changes in conflict with each other? Etc

rguzman · on Feb 14, 2020

i agree with everything you said, except for the part of your being unable to help. i think OP is asking for what information informs schema decisions, and what are some heuristics to use. expanding on the differences between your two examples would be very valuable!

DanielBMarkham · on Feb 14, 2020

I get what you're saying, that by moving to the meta level we can talk about heuristics and patterns of development.

Unfortunately, and I apologize for sounding difficult, this is still far too broad to gain traction on.

I think the thing to remember when you're learning various architectures, from database schemas to build pipelines, is that many times the people teaching you are teaching you from a position of having a completed project and then looking back on the lessons learned and applying some heuristic-making to them.

For instance, if you look at database normalization, which I started with when I started coding, it makes total sense for a small-ish project. Back in the day, you controlled the app, the machine, the storage, and the code. You owned it all. So changes to the schema involved a finite and easyish-to-do set of practices.

This started falling apart really quickly, though, with folks talking about impedance mismatch just a few years after relational databases went mainstream. If I had to generalize, everything got more and more complicated and the assumption that you could grab the entire application in your head easily and change it was no longer true. Then came a ton of CASE tools, now ORMs, and so forth, all in an effort to get us back to easily owning and changing data schemas.

But the real problem was there all along we just didn't realize it: thinking you knew everything and could manage it. This idea worked great in classrooms, worked great in personal and small apps. It even worked great in larger apps with tight control. But at the level of complexity we have now, it just doesn't seem realistic to be teaching people the perfect way to do things. (Of course, they should be aware of them!). Instead, what's needed is how to gradually get from here-to-there in a complex world without getting lost. So the heuristics you'd get would be perfect world completed apps, and what you really need to know is, well, how to develop software. That kind of advice ain't happening in an HN thread.

Hope that made some sense. Ping me offline if I can help explain any more.