Hacker News new | past | comments | ask | show | jobs | submit login

What I meant by that is the joins happen explicitly in code (as opposed to implicitly in queries). I got burned using another framework where my lack of knowledge of how the ORM managed DB queries would lead to really inefficient stuff - basically really bad joins that I could have avoided if I knew what the framework would do.

When I say no magic, what I mean is how you can understand what data is being loaded every step of the way so the mistake I mentioned previously is harder to make.

Also, I wanted to avoid query joins for easy sharding (if needed), but that adds an extra round trip so if needed the developer can write their own joins (giving up a bit of flexibility in the process).




Thanks for replying. Have you tried this at scale? What is the performance like when you have thousands of records? Have you looked into traversals? I ask these things because I tried to build a product using MySQL that should had used a proper graph db and I ran into a lot of issues. I think now that I'm a bit wiser, sql may be able to do similar things, but I haven't tried.

edit: the model layer is the most impressive part about this. You should consider making it a stand-alone package.


In general it scaled pretty well if you avoided loading tens of thousands of edges in a single call. A similar system was used on an app that would try to find connection strength between people using sent emails as signal. At it's peak the node table had tens of millions of rows, with some of the nodes (users) having thousands of edges each. The main pitfalls are

* loading too many edges (10K+) and associated nodes will be slow. * Traversing nodes in a meaningful way can be difficult.

To solve these, the schema has the following indices:

On edge table: (`from_node_id`,`type`,`updated`) On node_data table: (`type`,`data`(128))

Since edges rarely change, the first index allows you to paginate over edges by using updated as the order. As long as you request a reasonable number of edges, things should work OK. The second index is needed to get a node given some data, but a secondary use is sorting. By precomputing some score and saving it in node_data you can traverse nodes in that order (this is not currently built but is simple to do in SQL).

All this being said, the schema is pretty index heavy so if MySQL is forced to kick some of those out of memory it may lead to a bad time.

Thanks for the kind words on the model, I never thought about it being separate, but it makes 100% sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: