Marriage and Object Databases (response to the similarly-named article from earlier today)

giardini · on Nov 22, 2008

From the article:

"Polymorphism is where OO excels and where the relational model fails..."

But nothing can be done with OO structure that cannot be just as easily modeled in a relational form. One only need think in Prolog to find the appropriate model. And that's the beauty of relational databases: all constraints are in the form of relations (i.e., not some as relational constraints and others as inheritance constraints).

Various bugs (there are probably more) in the implementation: The article's database allows two entries of the form: (marriage_id1, Tom, Mary) and (marriage_id2, Mary, Tom). to coexist in the database,

The article's database also allows such transitivities as (marriage_id1, Tom, Mary), (marriage_id2, Mary, Bill), (marriage_id3, Bill, Tom). Relational or OOP, a constraint to eliminate these cases requires a search of the database. It is insufficient to require only that one cannot marry oneself.

The article's database also fails to handle the not uncommon case where two persons marry, divorce and then remarry (often consecutive times).

About performance: There's never been a question about performance: object databases can perform faster than relational databases, usually by an order of magnitude. Most OO database use primarily pointers while relational databases use primarily indexes. Computers are fast enough today that in most instances we don't notice the difference. But where speed is absolutely necessary OOP databases (and similarly structured databases such as CODASYL, ISAM, etc.) are used.

Some reasons relational databases are preferred: - ease of maintenance, - the relational model allows a standardized query language, - constraints are handled within the database engine proper.

jrockway · on Nov 23, 2008

Thanks for your comments.

The article's database allows two entries of the form: (marriage_id1, Tom, Mary) and (marriage_id2, Mary, Tom). to coexist in the database,

This mostly stems from using a set based on eq. If it used a custom equality function, this wouldn't problem. (The code implementing this involves a long trip into the weeds, and I wrote the article for people that have never heard of object databases... and for people that aren't necessarily comfortable with CL.)

The article's database also fails to handle the not uncommon case where two persons marry, divorce and then remarry (often consecutive times).

I would just add a fresh marriage object to the set. The reason we "change-class" on the marriage when divorcing is because a divorce cannot exist without a marriage. A remarriage is no different from a fresh marriage... in my mind anyway.

If you want to query on remarriages, though, then you should probably make a type for this. A quick "defclass" and you are set.

Some reasons relational databases are preferred: - ease of maintenance

Not sure there is a big difference here. There is some care and feeding that BDB requires, but nothing extreme. (You can also use a SQL-based database as the backend for Elephant.)

the relational model allows a standardized query language

There is no reason you couldn't declare classes so that information necessary for querying is present, and then add a query langauge on top of that. AllegroCache does exactly this.

constraints are handled within the database engine proper

I'm not sure why you see a difference between (network + relational database engine) and (rpc frontend + object database). I mean sure, you can touch the files on disk and ignore the RPC interface, but I can rm -rf /var/lib/mysql too. I don't see a difference in safety, other than the many years of production testing that the SQL databases have.

This is an implementation detail, though, and not a fundamental difference.

neilc · on Nov 22, 2008

I don't know what would be going through the mind of someone that thinks that Enums should be modeled by using separate tables (i.e. one table for males, one for females; modeling the enum Gender = Male | Female), so I am not even going to talk about those. I am going to jump right to a model that won't make people cry.

I didn't see anywhere in the original article that models "gender = male | female" as two additional tables. There is a schema variant that uses a single additional table "genders", with foreign keys pointing into that table from the "humans" table. That seems like quite a plausible choice for modeling an enum in a normalized fashion, and it provides a clean way to handle when the set of enum choices changes (that is, new choices can be added trivially, and data integrity is automatically checked when options are removed). It's not the most efficient or concise representation, of course, but those are not the only criteria when choosing a data model.

whatusername · on Nov 23, 2008

he was talking about the start of the OP: There was a "Male" table and a "Female".

It was options 1 through to 6 I believe.

Hexstream · on Nov 22, 2008

I really don't get why he declares that a &rest variable is of type list... Of course it is.

jrockway · on Nov 23, 2008

Indeed, but delete-duplicates wants to know that it's a list and not a vector.

Hexstream · on Nov 24, 2008

But the compiler already knows that &rest variables are lists...

jrockway · on Nov 24, 2008

SBCL doesn't. Try it and see.

bhiggins · on Nov 22, 2008

you can model type inheritance relationally. it's completely orthogonal.