What I meant by that is the joins happen explicitly in code (as opposed to impli...

emehrkay · on June 6, 2015

Thanks for replying. Have you tried this at scale? What is the performance like when you have thousands of records? Have you looked into traversals? I ask these things because I tried to build a product using MySQL that should had used a proper graph db and I ran into a lot of issues. I think now that I'm a bit wiser, sql may be able to do similar things, but I haven't tried.

edit: the model layer is the most impressive part about this. You should consider making it a stand-alone package.

mikeland86 · on June 6, 2015

In general it scaled pretty well if you avoided loading tens of thousands of edges in a single call. A similar system was used on an app that would try to find connection strength between people using sent emails as signal. At it's peak the node table had tens of millions of rows, with some of the nodes (users) having thousands of edges each. The main pitfalls are

* loading too many edges (10K+) and associated nodes will be slow. * Traversing nodes in a meaningful way can be difficult.

To solve these, the schema has the following indices:

On edge table: (`from_node_id`,`type`,`updated`) On node_data table: (`type`,`data`(128))

Since edges rarely change, the first index allows you to paginate over edges by using updated as the order. As long as you request a reasonable number of edges, things should work OK. The second index is needed to get a node given some data, but a secondary use is sorting. By precomputing some score and saving it in node_data you can traverse nodes in that order (this is not currently built but is simple to do in SQL).

All this being said, the schema is pretty index heavy so if MySQL is forced to kick some of those out of memory it may lead to a bad time.

Thanks for the kind words on the model, I never thought about it being separate, but it makes 100% sense.