Thanks for posting my article. Quick update, someone did more benchmarks with different versions of Ruby and adding MongoMapper: https://gist.github.com/1894055
Thanks for the update. I like seeing articles that cover tangible, measured detail. It's the kind of the stuff that's often overlooked in favor of gee-whiz syntactic sugar.
It's been a long time, but I think your assumption is probably flawed.
DataMapper isn't faster by accident. This benchmark is trivial. Try loading up 1,000 objects with an actual query for each O/RM and benchmark that. Then try a single object 1,000 times.
Instantiation is neat, but "back in the day" my findings were that you often spent just as much time if not more simply building the objects on the Ruby side as any reasonable database query you might be executing.
In fact the cost was so prohibitive it quickly became obvious that simple things in ASP Classic like selecting 2,000 rows and rendering them in a table on a web-page could in many situations be so slow as to be impractical in Ruby. Especially since the entire request is buffered (that's more or less still true today, the flushing available in Rails isn't nearly the equal of writing directly to a TCP socket).
Method dispatch kills.
The example of selecting a single object 1,000 times nicely illustrates this since it stresses the "front end" of your query-interface as opposed to the "back end" of materialization.
The price of admission in Ruby is _absurdly_ high. It doesn't take years to build an O/RM because it's easy. ;-)
> It's been a long time, but I think your assumption is probably flawed.
My assumption? My assumption was merely that you should measure it before you start worrying about it. I'm not trying to say that isn't a problem (it might very well be), just that you should measure first.
> In fact the cost was so prohibitive it quickly became obvious that simple things in ASP Classic like selecting 2,000 rows and rendering them in a table on a web-page could in many situations be so slow as to be impractical in Ruby.
I agree, rendering 2,000 rows isn't the the regular use-case for ActiveRecord. If you're attempting to do so, it would be silly to create objects for all of them. AR isn't "magic sauce"; it's a convenience library for making things easy to work with.
> Especially since the entire request is buffered (that's more or less still true today, the flushing available in Rails isn't nearly the equal of writing directly to a TCP socket).
Buffering is the only sane default if you want safety. If you don't buffer, and there's an error in the middle of the response, you've already sent a broken page with a 200 response (cacheable and everything).
> Buffering is the only sane default if you want safety.
A LOT of sites on the web survived for many years without it. And they felt more responsive because of it.
I think it's a fair trade-off to optimize for your user experience instead of the machines. Honestly, Google is not going to ding you for this, and as long as you've got an exception notifier going and are on top of it, there's no reason it should become a real problem for most sites. Being able to make a choice per-request/action would be nice of course.
> My assumption was merely that you should measure it before you start worrying about it.
I have. I wrote the original versions of DataMapper (which BTW, was a DataMapper despite popular belief. The methods hanging off the models were just helpers to the Session originally but it lost it's way at some point).
I can tell you for a fact that it's slow. Sure, go ahead and measure it. I'm not saying that's bad. Just apply some judgement to the situation. Are you grabbing a lot of rows? Are you performing lots of individual selects? (Or using AR+include options where each include is effectively the same as another query?)
You can feel free to write slow code until it becomes a problem, or you can develop some good practices up front, before ever needing to profile, to prevent much of that. Disregard for reasonable optimization and forming good habits based on generalized rules of thumb is just as evil as premature optimization. We're not talking about twiddling minutia here.
> it would be silly to create objects for all of them
Why?
Hibernate can do it. NHibernate can do it. LLBLGenPro could do it (probably the closest to Ruby's Sequel, but here it was a non-issue even with Models). Simpler AR style O/RM's like Wilson O/R Mapper could do it.
Given that AR doesn't really provide a decent DAL tool that I can recall, I'd say it's perfectly reasonable to expect that the "Full Stack" framework you're using covers the bases without nasty surprises. Especially if you've come from a background where millions of method-dispatch events imposes nanoseconds of overhead instead of milliseconds of overhead (Java and .NET at least).
I don't think it's at all "silly". I think it's a perfectly reasonable goal.
Sequel looks pretty interesting. I've really fallen out of love with ActiveRecord ... mostly because it really only simplifies the most basic scenarios and as an abstraction it's so very "leaky" as Joel would say. I've begun work on an idea for an ORM that just simplifies the most tedious parts of writing SQL but requires no boilerplate classes and is light and speedy https://github.com/iaindooley/PluSQL (but it currently only supports MySql because it requires buffered query sets)
My experience has been that ORMs generally 'suck' outside the trivial CRUD use-cases, and it's usually faster and simpler just to write native queries again your database.
I've used hibernate, linq, active record and mongoose in real production applications, and pretty much every time the same sequence of events happen.
1) Wow, this ORM is so cool, I've saved so much time as I dont need to learn anything about the underlying DB.
2) New feature comes in that requires something other than CRUD. Oh how do I do this with the ORM. Either you extend it with plugins, or learn the ORM query language that's kinda like the native query language but worse.
3) Now we got some real data, why is it slow as hell? Oh, those 'cool' abstractions are creating n^2 queries, implicit joins, etc
4) Let's rip apart the ORM, and use a thin layer that just execute the native queries.
A few years back I worked on a quant finance application that had 100GBs of tick data in a oracle DB, and the only way to get reasonable performance AND abstraction was to write optimised stored procs for each feature in the end application, and have the middle tier just call them through a thin abstraction layer.
At our startup dump.ly, we started with using Mongoose with Mongo because it was so easy just to persist some JSON data. However very quickly we went thought the above loop, and are now 100% Redis, with a thin layer that executes native Redis commands. It just ended up being simpler and faster.
Takeaway is that in general ORMs are an anti-pattern. Every DB (noSQL or yesSQL) is a compromise between a set of features, performance, consistency, scalability etc. You need to understand these in detail and hence should not be abstracted away.
I worry that this sort of article could be misinterpreted easily, even with the disclaimer.
It's talking about optimizing something that takes less than 1% of the vast majority of an application's cycles. Which doesn't really mean anything... Be careful using this sort of benchmark to disproportionately affect decision making.
Developers perhaps not as well informed could do themselves a real disservice by reading your comment to mean this stuff doesn't matter.
I think we'd agree that yes, Model instantiation most likely doesn't matter.
On the other hand, Model materializaion and query-interfaces most certainly DO.
In fact, in your _average_ Ruby Web Application, there's a good chance that the O/RM and template rendering makes up the vast majority of your request time.
Not database calls. Not controller logic. Only the Ruby side of O/RM interaction and rendering templates.
It's impressive that 37Signals has gotten most requests to process in under 50ms for Basecamp Next.
The truth though, that you frankly don't hear often enough, is that that's not exactly any great accomplishment for many languages/frameworks. I've worked on ASP3 and ASP.NET applications that had the same performance using hardware that was easily five times slower than anything you could buy today. The kicker is that that kind of performance came with entirely uncached pages delivering hundreds and oftentimes thousands of rows of financial reports.
Back to basics, if you want it fast, it's a good bet that a simple DAL using Sequel Datasets is going to outperform just about anything else. Skip the models when all you're doing is dumping data for display.
After that, see what you can do about limiting interpolation and String instantiation in your templates. That's where all your Template time is going.
It's something you can build in a little bit at a time, and worrying about 5% here, and 20% there definitely will add up.
I hear you, but my point wasn't to show how to optimize AR but instead what's going on when you initialize a model.
In other words it's more about software design and architecture decisions than performance.
My take-away was that it's a good example of how to think through software behavior and measure things. When and where the results might matter are a different issue.
At least in my own apps. Model instantiation being a bottle neck practically implies that you generate models in a tight loop and do nothing else with them, such as persisting them.
But, people do precisely that. I was working with parsing log files and converting each row to a Mongoid object took forever. It was in a tight loop and did nothing but persist it. So, I had to drop into the Ruby Mongo driver, avoiding the ORM altogether.