I’m really excited this was finally shared. A little backstory. This framework is the manifestation of ideas that Mike developed at Mixtent, a startup that produced three products in 2 years and eventually got acquired by Facebook.
The biggest advantage this framework has over other more traditional ones like RoR or Django is being able to model product ideas as a graph in code abstractions. This enables product engineers to rapidly prototype ideas (no need to interact with the DB), and jump into features built by other engineers (the node-edge API is standardized).
While the first product Mixtent built used more traditional django-style models, it resulted in features that became hard to manage over time. Each model had its own DB table and making changes was painful. The next two were built using a similar graph framework on top of CodeIgniter, and the benefits to prototyping speed and ease-of-understanding were visibly felt by all engineers (including myself).
What I meant by that is the joins happen explicitly in code (as opposed to implicitly in queries). I got burned using another framework where my lack of knowledge of how the ORM managed DB queries would lead to really inefficient stuff - basically really bad joins that I could have avoided if I knew what the framework would do.
When I say no magic, what I mean is how you can understand what data is being loaded every step of the way so the mistake I mentioned previously is harder to make.
Also, I wanted to avoid query joins for easy sharding (if needed), but that adds an extra round trip so if needed the developer can write their own joins (giving up a bit of flexibility in the process).
Thanks for replying. Have you tried this at scale? What is the performance like when you have thousands of records? Have you looked into traversals? I ask these things because I tried to build a product using MySQL that should had used a proper graph db and I ran into a lot of issues. I think now that I'm a bit wiser, sql may be able to do similar things, but I haven't tried.
edit: the model layer is the most impressive part about this. You should consider making it a stand-alone package.
In general it scaled pretty well if you avoided loading tens of thousands of edges in a single call. A similar system was used on an app that would try to find connection strength between people using sent emails as signal. At it's peak the node table had tens of millions of rows, with some of the nodes (users) having thousands of edges each. The main pitfalls are
* loading too many edges (10K+) and associated nodes will be slow.
* Traversing nodes in a meaningful way can be difficult.
To solve these, the schema has the following indices:
On edge table: (`from_node_id`,`type`,`updated`)
On node_data table: (`type`,`data`(128))
Since edges rarely change, the first index allows you to paginate over edges by using updated as the order. As long as you request a reasonable number of edges, things should work OK. The second index is needed to get a node given some data, but a secondary use is sorting. By precomputing some score and saving it in node_data you can traverse nodes in that order (this is not currently built but is simple to do in SQL).
All this being said, the schema is pretty index heavy so if MySQL is forced to kick some of those out of memory it may lead to a bad time.
Thanks for the kind words on the model, I never thought about it being separate, but it makes 100% sense.
based on the example code: nothing. In theory, you could map urls to the graph but here it looks like an HTTP framework that use MVC where the Model maps to a graph database (supported by mysql).
Recently I had some related ideas, although from a completely different background. Here is what I would have done differently:
1) Implement other storages. Although databases are a natural part of web applications, a direct storage in files and/or directories may be justified. Also, for most applications the full dataset fits easily in the RAM on modern systems.
2) Keep full history. Or provide for this possibility at least. Adding history features to classic database models becomes cumbersome quickly, but for a simple schema it may be provided directly by the framework. This provides for a great audit log if something went wrong. Probably to be disabled if really unwanted and/or storage size is an issue.
Regarding the different background: Although most people want graph databases if they don't want to enforce a certain schema, my desire is the exact opposite. I want more constraints, data integrity as much as possible. I want more than can be easily reached even in PostgreSQL with user-defined functions, such as constraints across foreign keys. So a separate checker is needed, and I believe graph structure with plain links provide a good, simple base to define a constraints framework upon it.
1) I like the RAM idea and in general more storage adapters. I hadn't really thought of local storage, but it makes sense for super simple prototyping.
2) I want to add DB profiling at some point (right now Libphutil does this for me at a query level, but I want to do it at a graph abstraction level). History could presumably be similar to a permanent profiler.
I think constraints vs flexibility is always a tradeoff. The main benefit of this model is very rapid prototyping where design decisions can be changed or reversed with minimum effort.
> I think constraints vs flexibility is always a tradeoff.
Well, that depends on the application. In the applications I have in mind, I have to make tradeoffs in the opposite direction: I know that some more constraints make totally sense. But: Is it worth implementing them, considering how hard it is to express them in the database?
So I have to make tradeoffs between missing constraints and ease/feasibility of implementation. Or checking some constraints only in the application layer, but sometimes it's even more cumbersome there, than with triggers/etc. in the database.
The Readme talks about a bank account.
How would transactional update look?
If I want to transfer $20 from user A to user B, the following four things need to happen transactionally:
- make sure user A has the necessary funds (reserve)
- put money into user B account (debit)
- take money from user A account (credit)
- commit
At the same time, nobody else must make an operation where they see partial results, and we can't let two operations reserve the same funds in parallel.
I don't see how this use case is supported, but that may just be because I don't know where to look?
Yeah back account may have been a bad example, but still solvable:
You can use MySQL's SELECT FOR UPDATE (not currently implemented in graphp) to do this transaction safely.
Alternatively, within the framework you can do the reserve amount as a time stamped node and connect it to the user. Then you query for reserve nodes and process them in order.
The biggest advantage this framework has over other more traditional ones like RoR or Django is being able to model product ideas as a graph in code abstractions. This enables product engineers to rapidly prototype ideas (no need to interact with the DB), and jump into features built by other engineers (the node-edge API is standardized).
While the first product Mixtent built used more traditional django-style models, it resulted in features that became hard to manage over time. Each model had its own DB table and making changes was painful. The next two were built using a similar graph framework on top of CodeIgniter, and the benefits to prototyping speed and ease-of-understanding were visibly felt by all engineers (including myself).