Hacker News new | past | comments | ask | show | jobs | submit login

Last time I saw the EAV pattern used was in Magento. It was an absolute nightmare to query against.



I loved using Datomic, but I got too used to its features. I'm now back in a PostgreSQL shop and it feels like we're back in 2002. So many times have folks asked for features that would have been available for free in Datomic or another datastore that are just way too complex to maintain on top of a traditional rdbms.


Could you elaborate some of those features?


There always seems to be some demand to see how records change over time. If you plan ahead, know which tables the business wants historic values for, then you can build in that ability from the start. Unfortunately, these demands always arise at a later point in time or as one-off uses, and are hard to justify the effort to ever implement. With datomic, looking across history is free, fast, and easy.

Additionally, the ability to pin a database at a specific point in time is useful for APIs. For one, on pagination so that you don't see pages change out underneath as records are updated. Another is in APIs that join configuration values with time-series data or aggregations. If our data decisions drive revenue, for example, looking at 2018-08-31 revenues with today's business data has a lot less insightful value than being able to see them with other data pinned at that same date. Pointing the queried DB to a fixed timestamp in Datomic is free.


Interesting, I was fairly confident you were going to write about Datalog and about the flexibility of having such a minimal and evolvable schema :)

One of the motivations for Crux was that `as-of` queries still weren't powerful enough for what the business really needed, which was something like `as-of` that could cope with retroactive corrections for out-of-order and delayed ingestion (of upstream transaction data).


PostgreSQL is behind on this particular SQL feature, but MariaDB has it: https://mariadb.com/kb/en/library/temporal-data-tables/


That’s a very cool addition! I wonder if this can be used in an event sourcing architecture with minimal overhead.


I think to be really useful an event sourcing data store needs `as at` support in addition to `as of` (i.e. bitemporality) to cope with retroactive and proactive events and corrections. There are a couple of references for this in the Crux docs: https://juxt.pro/crux/docs/bitemp.html#_known_uses


I agree that the experience of querying EAV databases using SQL can feel cumbersome [0]. But Datomic's dialect of Datalog [1] is a great fit for EAV and can produce much more readable and simpler queries.

[0] https://gist.github.com/grafikchaos/1b305a4e0b86c0a356de

[1] https://docs.datomic.com/on-prem/query.html


EAV is not for everyone. You have to have a data set for which each entity can have values defined for some of, but not all of a large set of attributes. It’s a bit like sparse matrices, actually.

And, yes, you want your tools and application to abstract away some of the messiness involved in something as simple as “load/save this object from the database,” otherwise you’ll end up with a much larger amount of code for such a simple operation. I’ve seen it done in a Django app, and it’s not bad to work with once those abstractions are in place.


Why not just use a json store for those sparse attributes?

EAV performance at large scale is really dreadful. So many queries.


EAV with a query engine and syntax not designed for it is awful. EAV (and AEV and AVE and VAE) with datalog queries is performant, a joy to use, and way more flexible than json stores.


I don't think Magento used it right.

I used EAV in an application and it was fine (I used Datascript on the client-side, with server-side data being stored in a JSON document database, RethinkDB). I did run into some annoying issues with the lack of nil value handling. I'd say writing queries was difficult, but not significantly more difficult than in any other language, if you care about performance and want to know what the query actually does.

Overall, I felt there was a good mapping between my domain model and EAV.

I eventually dropped this solution in favor of Clojure data structures: if you have an in-memory (in-browser) database anyway, why keep the data in Datascript, if you can simply keep it as ClojureScript data structures?


As I mentioned here: https://news.ycombinator.com/item?id=20308175 , I think Magento did use it right, but EAV is impractical on a relational database not designed for EAV. This fact, however, doesn't show itself until you've reached a certain (arguably low by today's standards) scale.


I also started with Datascript both on frontend and backend, but ended up using instead its core data structure - persistent-sorted-set [1], which works really well if all you need is indexing.

[1] https://github.com/tonsky/persistent-sorted-set


I don't know for sure, but I suspect there is a significant usability difference between an "EAV pattern" and something engineered from the ground up to make EAV practical.


For those of us who have never used such a thing, what didn't you like about it?


The EAV model is great in theory and is probably great in practice _if_ the backing software has been _designed_ with EAV in mind.

In the context of Magento, it is a real _nightmare_ and is one of the major contributor of the slowness of the Magento platform (at least for magento < 2.0).

The reason for this slowness is that in a relational database, the EAV model makes it so that e-v-e-r-y s-i-n-g-l-e SQL query is one gigantic query made of tons of JOINs.

To give you an example, querying a product in Magento may need to join no less than 11 tables!

catalog_product_entity, catalog_product_entity_datetime, catalog_product_entity_decimal, catalog_product_entity_int, catalog_product_entity_gallery, catalog_product_entity_group_price, catalog_product_entity_media_gallery, catalog_product_entity_text, etc, etc.

In order to fix this issue, the Magento team created what they call "flat tables" which are tables that are created by querying the database with an EAV query (i.e. the query with a million joins) and putting the results in a table with as many columns as there is attributes being returned by the original query.

In theory choosing to use EAV was an amazing idea. In practice, this idea did not scale for large Magento stores and it has made Magento hugely complex, slow and hard to use.

We use Magento at betabrand.com and I can confidently say 90% of the slowness of our website is due to Magento's EAV tables and we have spent a humongous number of engineering hours optimizing this.


Are you on Magento 1 or 2? We've been building a site in Magento 2 and while the number of joins, hassle with re-indexing performance, and lack of null vs zero vs empty string handling in the flat tables are all very annoying, the actual site is quite fast with all the caching turned on on a server with enough gerbils behind it.

My main complaint with Magento 2 is with the feature that is it's biggest selling point: its flexibility. The fact that any public method on any class is wrappable/replaceable, and that any class in the system can be replaced wholesale, and any javascript or any template in the system can be wrapped or replaced, from anywhere, by any module, at a distance, just makes the whole thing a huge cluster to deal with once you get any number of 3rd party modules.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: