What ORMs have taught me: just learn SQL (2014)

StevePerkins · on June 26, 2016

Ten years ago, there was a blog post every other week bemoaning ORM's. Ten years ago, those posts often had merit.

In 2016, this sentiment is outdated. A few points:

1. If you think that using an ORM means you don't have to learn SQL, then you're going to have a bad time. This is where most of the bad press originates... from people who never really learned SQL or their chosen ORM. An ORM provides type checking at your application layer, and typically better performance (unless you plan on hand-rolling your own multi-level cache system). But you still must understand relational database fundamentals.

2. If you're not using an ORM, then you ultimately end up writing one. And doing a far worse job than the people who focus on that for a living. It's no different from people who "don't need a web framework", and then go on to re-implement half of Rails or Spring (without accounting for any CSRF protection). Learning any framework at a professional level is a serious time investment, and many student beginners or quasi-professional cowboys don't want to do that. So they act like their hand-rolled crap is a badge of honor.

3. Ten years ago, it was a valid complaint that ORM's made custom or complex SQL impossible, didn't play well with stored procedures, etc. But it's 2016 now, and this is as obsolete as criticizing Java by pointing to the EJB 2.x spec. I can't think of a single major ORM framework today that doesn't make it easy to drop down to custom SQL when necessary.

toast0 · on June 26, 2016

> 2. If you're not using an ORM, then you ultimately end up writing one. And doing a far worse job than the people who focus on that for a living. It's no different from people who "don't need a web framework", and then go on to re-implement half of Rails or Spring (without accounting for any CSRF protection). Learning any framework at a professional level is a serious time investment, and many student beginners or quasi-professional cowboys don't want to do that. So they act like their hand-rolled crap is a badge of honor.

As a professional cowboy -- I detest layers; if it's running on my server, I need to support it, even if i didn't write it. An ORM or a web framework adds thousands of lines of codes and usually two or three layers of indirection. It's way less effort to just output some HTML, or just make a database query, than to learn a framework or ORM, and as a bonus it's usually faster too. Also, it's a lot easier to fix slow queries when there's one place that has the whole query (with placeholders where possible -- I may be crazy, but I'm not insane).

Bahamut · on June 27, 2016

I'm currently at a company that is not using an ORM - what has happened is that developers have written endpoints with inconsistent data formatting for similar data types, which prevents the possibility of creating abstractions cleanly on the client. It would have been nice to have a lightweight ORM, if only for object consistency.

bunderbunder · on June 27, 2016

I've been at companies that did use ORMs, and still ended up with that kind of mess.

No tool is ever a substitute for discipline.

lmm · on June 27, 2016

> No tool is ever a substitute for discipline.

My experience is exactly the opposite. Humans are too fallible for discipline to ever be reliable. If you want something done, automate it.

cptskippy · on June 27, 2016

In my experience you're both right.

The enterprise wants everyone to be a jack of all trades and to understand everything. Part of the reason for the massive layering that takes place is so that things are compartmentalized and easier for an individual to pickup and put down as they're shuffled around from project to project. Standardization, documentation, consistency, and discipline are essential for enterprise developers to be able to quickly grok a project and make the necessary changes.

Automation also plays an important role because it allows less skilled and knowledgeable individuals of lower pay grades monitor the blinky lights and only call on the big boys when things fail.

EGreg · on June 27, 2016

ORMs let you structure your queries in a way that allows the app to understand them and analyze them, for sharding purposes. True, you can write SQL and then parse it yourself in a proxy, but why?

z92 · on June 27, 2016

If your ORM allows you to drop down to SQL, which the parent claims all modern ORMs do, then you will still need that SQL parser in your ORM.

EGreg · on June 27, 2016

You can have a small subset of your queries in raw SQL and if you really need to shard them, you mark them up with structured metadata. But the rest would be automatically structured, analyzed and sharded.

lmm · on June 27, 2016

Sure. So you can use the SQL parser from a mature, widely used library, or you can write your own. I know which option I would prefer, even if I was writing every single query by hand (not that I'd ever do that).

sangnoir · on June 27, 2016

> It's way less effort to just output some HTML, or just make a database query, than to learn a framework or ORM, and as a bonus it's usually faster too. Also, it's a lot easier to fix slow queries when there's one place that has the whole query (with placeholders where possible -- I may be crazy, but I'm not insane).

Do you work in a team? Or do you plan to hand over that code to someone else for maintenance at some point in time? If not, then by all means, go ahead. Another advantage of ORM/web frameworks is that they are properly documented and only have to be learned once. I once had the displeasure of maintaining someone elses custom 'framework' with no documentation.

coldtea · on June 27, 2016

>Do you work in a team? Or do you plan to hand over that code to someone else for maintenance at some point in time? If not, then by all means, go ahead.

Did anything of what the parent wrote sounded like it was worse for teams?

rkuykendall-com · on June 27, 2016

Because toast0 knows what they wrote, but a team member wouldn't. So instead of googling "Flask add cookie to response" you have to start digging through all of toast0's code to see if they even have a way.

At least that's why I would consider working on a team like that much worse.

toast0 · on June 28, 2016

This is the first link for 'php send cookie' or 'php add cookie'

http://php.net/manual/en/function.setcookie.php

grep -r setcookie in my code will let you find out if i sent any cookies. Incidentally, this is the same grep I would have to do on a framework, but the framework (pick one) is probably bigger than all of my code.

As an example -- we used to have a WordPress blog; now we have a custom blog that's a total of 3628 lines of php that lives on the frontend and I included the Makefile for deployment and some extra include files cause I don't want to spend the time to consider which ones aren't needed. Content is from text files pushed with another process.

By comparison Wordpress includes 298,643 lines of php. Wouldn't you rather dig through all of my code than all of theirs? (Yes, Wordpress has a bunch of features -- but even when you turn them off, the code is there lurking, and sometimes it still runs)

Someone else from my team fixed up the mobile support and handles the css/javascript -- they were able to just look at the code and do what needed to be done. (The mobile page was broken in our install of Wordpress too, so I got it to parity anyway)

It took me about a week to write it and polish it (three days dedicated, including exporting the content from mysql, two days I was doing other things too), but on the plus side, I don't need to spend half a day to figure out how to deploy a Wordpress upgrade every time they have a security release; there have been 12 security releases since then, so I've earned a day of my life back.

coldtea · on June 27, 2016

>Because toast0 knows what they wrote, but a team member wouldn't.

They wouldn't know vanilla SQL, HTML, WSGL, etc, --decade old standards-- etc, but they would know some random framework like Flask?

rsyntax · on June 27, 2016

"Custom" framework.. idiosyncratic code. Code that is probably not properly documented, not properly vetted, or maintained as technology evolves. It is not to say it can't be done (e.g. fb, google, ms), but the cost of doing so greatly outweighs the benefits. For example, security concerns of your "custom" framework and onboarding of new engineering into your team. And are you really in the business fo "custom" framework?

coldtea · on June 27, 2016

>"Custom" framework.. idiosyncratic code. Code that is probably not properly documented, not properly vetted, or maintained as technology evolves.

Well, Flask is 95% just two persons [1] and 5% among 10 others. And the one responsible for creating it and writing 60% of it doesn't even have it as his first priority (has multiple other libs, codes Go, etc).

A lot of widely used Javascript libs are not even that lucky. Heck, GTK+ itself, one of the pillars of Gnome and used by millions, was down to 1 maintainer a few years ago, who openly complained about it (not sure if things improved since).

Unless it's some large, focused project with bit traction, tons of books and courses on it, etc., like React, Angular etc (usually with some corporate backing too), a lot of the things that pass for serious frameworks and libraries are anything but, and a lot is hardly a "community" project either.

Some person slapped together something and thousands adopted it because "frameworks are better than writing your own code". Then the person moves on, or some small team that leads the project decides to rewrite it from scratch to try new stuff because they have short attention spans, and you're left with an abandoned code in your hands, which you haven't wrote yourself, and don't have time to understand fully.

A framework can be good to start with if you're doing stuff close to its confines, but otherwise its how Joe Armstrong described OOP: "You wanted a banana but what you got was a gorilla holding the banana and the entire jungle".

Notice how all major companies, from Facebook and Google, to Twitter and Microsoft and Apple etc have written their own frameworks.

Perhaps their programmers didn't know that already existing frameworks are better? Or were they focused on their actual problems and pain points, instead of lowest common denominator solutions?

[1] https://github.com/pallets/flask/graphs/contributors

RussianCow · on June 28, 2016

But that doesn't really matter, because those frameworks have mindshare. If I have a specific issue that I don't know how to solve in Angular or Flask or whatever, I can search online, and chances are good that someone has run into the exact same issue and solved it. If I'm using someone else's custom-built framework, I have to either consult the original author, which may not be practical, or figure it out myself by digging through the code. For anything that needs to be maintained in the long run, I'd take something written using a popular framework, with all its flaws, over something written from scratch any day.

taurath · on June 27, 2016

A purpose built application if built well will not only do the job more efficiently but also be easy to maintain. If you write exclusively attached to a framework it creates just as much work for someone who knows the underlying technologies but not the framework as it would for someone who knows a framework. You'll end up tying your application to that framework. Sure, say you want to quickly hire 20 coders to work on a project - use a framework (or be like Facebook, or Google and create your own...). I've found hiring framework developers is a good path to getting work only in that framework. Devs who are more versed in the underlying technologies have in my experience been more flexible and put out better products for the domain they're working in.

gedrap · on June 27, 2016

In some cases you are right. If all you need is a simple landing page, with a simple form or two, then by all means - do it.

However, if your application is more complex, you should be careful. Chances are, you will add your own layers. And chances are, they will suck more than some popular framework or ORM. It's not that you make a conscious decision, a plan to do so, usually it's bit by bit. And having seen all the custom frameworks and shitty ORMs (because it seemed easier at some point early in the project), don't write your own ORMs for the sake of the maintainers mental health if not your own.

PeCaN · on June 26, 2016

> If you're not using an ORM, then you ultimately end up writing one.

I disagree with this. A lot of things people use ORMs for are rather easily solved with stored procedures, especially in Postgres where you can write stored procedures in Perl, Ruby, etc. Validations, “fat models”, etc are all managed with SQL easily (and this means you get that functionality from _anywhere you access the database_, not just from your framework with an ORM). For convenient access you can roll 30 lines of Perl to wrap DBI or whatever (I use Perl for most web backends these days) and call your stored procedures in normal syntax with a little metaprogramming.

Maybe this sort of scheme (heavy usage of stored procedures and offload almost everything to the DB) doesn't work for everyone, but I like databases and it works for me.

manigandham · on June 26, 2016

> A lot of things people use ORMs for are rather easily solved with stored procedures

More like 1 thing. ORMs are meant to make interfacing through the object/relational impedance mismatch easier and through the regular code in your application. Stored procedures do not come anywhere close to this and are usually the same as just calling any other SQL query as you would when not using an ORM.

If you think any majority of what an ORM is used for can be replaced by stored procedures, then you're not really using an ORM for much at all.

kerkeslager · on June 27, 2016

I agree with this, but I'd also add that if your ORM is doing a lot beyond the capabilities of your database, it's an indicator of a hacky design in your application. If there's a large object/relational impedance mismatch, then either the objects or the relational DB are a poor fit for the problem you're trying to solve, and an ORM can't really fix that.

If your ORM is just providing a mapping between select/map, where/filter, join/zip, etc., you have a fairly list-of-records-ish functional application and your objects are only nominally objects. The ORM is only providing an interface that lines your types up (and the benefits of that are not to be underestimated--it allows, for example, running your app on different database engines).

But if your ORM is doing more than that, it's usually because your objects have a lot of complicated graph vertices that are poorly represented by tables. ORMs can reduce the difficulty of this object/relational impedance mismatch, but ultimately they can't provide a general solution for it, because the data structure they're sitting on top of doesn't have the capability. ORMs can make your code simpler, but they can't make SQL performant on, for example, directed graph or deep tree structures. Ultimately, they only mitigate the object/relational impedance mismatch, they don't solve it.

Again, as I parenthesized earlier, the benefits of ORMs still aren't to be underestimated. But I think a lot of those benefits can be realized with a simple wrapper around SQL (i.e. LINQ). Beyond that, all ORMs provide is a little fudge factor which lets you get away with some things that SQL doesn't support well, but ultimately ORMs aren't a general solution to those kinds of problems.

manigandham · on June 27, 2016

> If there's a large object/relational impedance mismatch, then either the objects or the relational DB are a poor fit for the problem you're trying to solve.

Why? You're just making that assumption but the fact is that relational databases and the normalized storage of data is completely different from the way OO languages deal with rich nested objects. And there's nothing wrong with that mismatch because there will always be a mismatch. It's just 2 different paradigms of handling data. All an ORM is doing is giving you a tool to make that translation easier, if you want it. If you really can't have the mismatch then there are document-stores available but in most cases, the O/R mapping is just not a big deal.

> But I think a lot of those benefits can be realized with a simple wrapper around SQL (i.e. LINQ).

That's basically still an ORM. Again why the assumption that just because you have an ORM that anything and everything must be piped through it? The modern ones let you use ORM methods in your code and chain them with custom raw SQL too. ORM usage is on a spectrum, it's not binary and there's definitely no "right" way to use them.

kerkeslager · on June 27, 2016

I suspect you're viewing my previous comment as a criticism of ORMs. I tried to make it clear that it was not a criticism of ORMs. It's more a criticism of people who try use ORMs to whitewash a bad design or get around a problem that relational databases can't solve. Relational databases lend themselves to a very specific way of structuring data, and if you don't structure your data that way or your data can't be structured that way, an ORM won't fix it. That doesn't mean ORMs are bad, it means that ORMs have to be used as intended and often people don't use them as intended. That's not the fault of the ORM any more than it's the fault of a screwdriver when it's used to saw wood.

ourmandave · on June 27, 2016

Nobody uses a screwdriver to saw wood. But they're great as replacement paint stirrers and hole punches.

That's my bad ORM misuse analogy.

As a maintenance programmer, it really fries my bacon when I find all the screwdrivers next to the old paint cans.

lmm · on June 27, 2016

I think when people talk about using an ORM or not they're mostly talking about using a LINQ-like wrapper or not. There's maybe a conversation to be had about some kind of fancier features on top of your ORM and whether they're worth it, but I don't think those kind of features are what people usually mean when they talk about ORMs (whether they're technically what the acronym "should" mean or not).

snaky · on June 27, 2016

> interfacing through the object/relational impedance mismatch

Do we really need the regular code with it's traditional OOP and fat application server to convert result set to JSON and spit it out - that most of the time is all that code is doing actually? I suppose these typical tasks might be pretty much covered not only by PostgREST [1], but with just a sweet combination of two PostgreSQL functions - array_to_json() and array_agg().

[1] http://postgrest.com/

zamalek · on June 27, 2016

> you're not really using an ORM for much at all.

The ORM can do transient fault recovery (including other cloud patterns) as well as modernise the interface (Micro-ORMs in general).

BurningFrog · on June 27, 2016

One problem with the stored procedure approach is that it scales terribly.

If your logic is in app servers and your state in a DB, you can add more app servers, and it will be a long time until your DB get overwhelmed.

When your DB is doing both, the choking point is much earlier.

Jweb_Guru · on June 27, 2016

Look, this is just wrong. All the best-performing research databases use stored procedures because they perform much better than other approaches. Your bottleneck when it comes to validating your data is, generally speaking, your database, so you're not going to scale better by moving the logic farther away from it (and introducing a bunch of round-trip latency) and this is empirically confirmed in benchmarks of stuff like TPC-C. It's a really tired argument, especially when virtually every aspect of stored procedures but their performance is irritating.

BurningFrog · on June 28, 2016

That stored procedures scale badly is different from if they perform badly.

With application logic on separate stateless app servers, you can quickly spin up as many of those as you need when your site goes viral. If it's in your DB, you can't.

If the stored procedures are fast or slow doesn't really enter into that equation. It's about the difference between 1 and N.

I actually did work at a startup that got into serious trouble because much of their application logic was in the DB.

Jweb_Guru · on June 28, 2016

Were you serving something that had to live in the database, or were you serving cached copies that didn't actually need to be updated or read within a transaction? There's a crucial difference there. The moment you are required to access the database at all, the equation changes (which is usually true for data validation, inserts, updates or deletes, and constraint management). If all you're doing is reading static data, there's no need to touch the database more than once and you can do much better. Moreover, if all you need is causal or eventual consistency (for application-specific reasons), you can use databases that provide those semantics. In pretty much all cases, once you've chosen the correct database / semantics for your use case, you'll be better off from a performance perspective with stored procedures; 1:N doesn't really enter into it because databases can run with their own isolated state on multiple nodes just as well as applications can (indeed, there are some caching systems built around this observation)!

paulryanrogers · on June 27, 2016

IME, the win in the application layer is when you can have faster copies pulled from a cache or validate rules that don't relate to other records.

peschkaj · on June 27, 2016

I've yet to see empirical proof that stored procedures don't scale well. If you write an application in your SQL dialect of choice, that likely won't scale well, but putting data access behind an API should scale well no matter if that API is in Rails, Spring, or a stored procedure.

btilly · on June 27, 2016

I don't know what you consider "empirical proof", but it seems obvious to me. And should be obvious to anyone competent who has ever had to scale stuff. A system fails to scale when it has a bottleneck, and that bottleneck gets overwhelmed. You make it scale better by scaling the bottleneck, which can be done by moving work out of the bottleneck, or by parallelizing it in some way.

The natural bottleneck for any system that has to synchronize data is the locking around synchronizing that data. That is because things that do not need synchronization can easily be parallelized. You therefore scale that bottleneck until the more fundamental one emerges.

In a standard database driven website, that bottleneck is always in the database. And therefore your scaling limit is the capacity of your database. As follows normal scaling advice, you need to move work out of the database, or remove the database as a scaling limit.

Moving work from stored procedures to the application is an example of moving work out of the database. So is having queries run against read-only replicas instead of the read/write master. Sharding your database and moving to a distributed NoSQL architecture are examples of removing the bottleneck.

Of the two approaches, the much simpler and safer one is to move work out of the database. Going NoSQL is cool, but unless you really know what you're doing, it is both unlikely to buy you what you wanted, and leaves you open to obscure data consistency problems.

Jweb_Guru · on June 27, 2016

When it comes to data validation, none of what you just said applies, because you need to perform your work inside a transaction. You can validate the data outside your database and then confirm that nothing's changed (optimistic concurrency control) but you can do that just as easily inside the database, with lower latency and greater throughput (and in situations with lots of contention this can lead to many more aborts than other concurrency control mechanisms, so be careful!) because many databases have OCC built in. If you can afford to relax consistency due to aspects of your data model, you can use a database with a relaxed consistency model and--again--get far better performance than an ad-hoc solution in your application.

It's hugely unclear to me why you think you skirting transactional requirements by performing work in your application is less complex than using a NoSQL database (or using a database that utilizes MVCC or can otherwise provide long-lived read snapshots).

Frankly, I also disagree that for most websites the bottleneck is the database. For many websites, database latency / throughput constraints don't ever become the dominant factor in end-to-end requests because of all the layers they have to get through in order to get to the database in the first place, combined with a relatively low number of requests per second (commodity relational databases on commodity hardware can easily handle many thousands per second, and IIRC Google Search only had to handle 40k rps from real clients in a recent press release) and inefficient code elsewhere in the stack.

btilly · on June 27, 2016

Just put transactions in the application. With row level locking, the odds of running into contention problems are low. If you've been careful to always lock tables in the same order, then deadlocks are a non-issue. At that point, having longer running transactions does not matter. What matters is contention for the internal locking mechanisms inside of the database. And limiting how much extraneous stuff you have improves that.

Now it is easy to screw up an application. It is easy to screw up a database design. It is easy to screw up queries and query plans. But all of those are fixable in relatively straightforward ways. And once you do that, you will wind up with database throughput as your bottleneck.

As for NoSQL, the problem is this. Moving to that architecture requires taking an up front hit on transactional complexity, usually requires several times the hardware (data needs to be stored multiple times for hardware failure), puts a lot of stress on your network and latency, and is really easy to screw up. Just using a popular out of the box solution is not enough - see https://aphyr.com/tags/jepsen for a list of real failure modes on stuff that will look perfectly fine in testing.

It is a necessary challenge to accept if you want to go beyond a certain scale. But you should not accept that challenge unless you have good reason to do so.

Jweb_Guru · on June 27, 2016

The thing you're missing is that unless your validation logic is really complicated or you have a really fast network (think Infiniband, which is still not commodity), it will take less time and fewer resources to perform the validation logic (once the locks are taken, which has to happen regardless) than it does to marshall the data over the network to your application for processing, often by orders of magnitude. That is why more or less everyone trying to win benchmarks at transaction processing uses stored procedures.

FWIW, there's lots of concrete evidence of this beyond the fact that benchmarks almost always use stored procedures. If you look at the very best-performing storage systems, like MICA or FaRM, which achieve 100 million tps or more, you'll see that a huge amount of their optimization comes from circumventing the OS networking stack, taking advantage of built-in queueing mechanisms in NICs, and offloading work to avoid taking up cache lines and cores from the processing CPUs. Many extremely recent database designs also take this approach (separating the transaction processors from the executors) for similar reasons, like Bohm, Deuteronomy, and the SAP HANA Scale-out Extension, as well as deterministic database systems like Calvin. Given that these are literally the best-performing database systems out there, I have a hard time accepting that reducing network latency and doing as much processing work as possible in the database isn't the correct way to achieve high performance.

Using stored procedures also allows for (in theory) analysis of the allowed transactions and conflicts between them, which can boost performance further (e.g., if you can guarantee that they can't conflict with each other ahead of time, or if you can reuse cached data because nothing could have changed, or you might be able to order data in epochs and guarantee that there are no deadlocks, or you might be able to periodically check to see if a record was particularly highly contended and rebalance or split, if commutativity is possible--all of these options and more have been explored in recent database research). It's often difficult for an application to guarantee this because several web servers may be contacting the database at once, and they don't all know what the other web servers are doing, and this is doubly true if you allow ad-hoc queries. Not knowing whether a transaction will finish quickly (as is the case if an application is allowed to hold locks) also greatly increases chances of contention and/or disconnection, which can lead to further issues.

To be doubly clear--I think stored procedures are a massive pain in the ass and there are lots of good reasons to do exactly what you're proposing above. But performance is not one of them.

btilly · on June 27, 2016

I wrote https://news.ycombinator.com/item?id=11989138 while you were writing that, and addressed your point there.

Performance and scalability are very different things. Stored procedures are great for performance, but bad for scalability.

Jweb_Guru · on June 27, 2016

Please see my response there. It's not as simple as you're implying; often your throughput suffers if you can't handle requests fast enough. MICA, for instance, injects packets directly into L3 to avoid cache misses; it's important that MICA handle these requests as fast as possible because otherwise the L3 cache will fill up, the queues will start to back up, and ultimately they'll put backpressure on incoming packets and slow down the whole system (greatly decreasing throughput). The fact that it has tail latencies measured in microseconds is good on its own, but its real benefit is its effect on throughput!

takeda · on June 28, 2016

I believe the poor performance of database is often caused by people who are new to the subject writing applications in such way that they do N+1 problem.

peschkaj · on June 27, 2016

I think we're in agreement writing applications in the database isn't a good idea.

By "empirical proof", I mean that I have never seen anyone conclusively prove that switching data access from stored procedures to ad hoc SQL through an application was the cause of performance improvement. I've seen proof that removing complex application logic from the database fixed a problem, or that better indexing improved performance, but I've never seen anyone prove that removing stored procedures alone fixed the problem.

Further, there's nothing in your proposed solution of scaling out with readable replicas that precludes the use of stored procedures for data access. As a huge fan of both databases and separation of concerns, stored procedures make tremendous sense. They let a domain expert tune the database as needed. Even going so far as allowing that expert to re-write queries, provide optimizer hints, or even change the physical data model for better performance without ever changing application code.

Although I may or may not be competent, I have scaled "stuff" in a database. But I do appreciate you getting a vague ad hominem into the first sentence. Bravo.

The typical bottlenecks that I found were almost always IO - either through crappy storage, crappy indexing, or some combination of the two. Modern databases typically aren't bottlenecked by the lock manager.

I'd also agree that moving work out to a NoSQL database is particularly tricky. For three years I maintained the .NET Riak client and helped developers make better decisions when they were considering moving away from an RDMBS.

btilly · on June 27, 2016

Scalability IS NOT performance.

I like to say that scalability is like a Mac truck. Good if you have to move a lot of stuff, but not necessarily the best tool for getting your groceries. Moving logic from stored procedures to the application adds latency and network traffic. This is never going to be good for how fast you process an individual request. However it can let your system handle more requests per second.

Jweb_Guru · on June 27, 2016

That's only true if there aren't resources taken up by your in-flight transactions, and if those resources don't exceed the cost of just running the logic in-place. Otherwise, you're not sacrificing latency for throughput, you're just sacrificing latency and throughput. Unless you have a very heavy application-level thing to do that doesn't require periodic new access to data, that's usually not the case, especially if you're trying to process requests at line rate (NIC queues will get exhausted just trying to hold all the pending requests in memory, and you'll start to suffer from lock manager contention, page buffer swapping, etc.).

btilly · on June 28, 2016

True.

But my experience with Oracle specifically is that handling connections is just general overhead, while the real failure modes have to do with overloading random latches somewhere deep inside of the system. I also found that logic in stored procedures caused us to hit limits faster than leaving it in the application.

The specific case I saw this in was a simple set of queries to test if you were in an A/B test, and if not to assign you to a random variant. After I left my application logic was moved into a complex stored procedure, and they then had scalability limitations that they didn't have before. For political reasons they declared success and ran fewer A/B tests...

(I've used a lot of other databases as well, but Oracle is the only one I've really pushed to its scalability limits.)

collyw · on June 27, 2016

The database is usually the bottleneck because of IO, not because of load on the actual database in my experience. Assuming you know how to design and index a database.

PeCaN · on June 27, 2016

That's a valid point. Seeing as I don't build anything that has to scale well I can't really comment on this, but I don't think it scales too badly. With stored procedures it's more or less just your model in the database, not your whole application logic. Using a database to replace your controllers wouldn't scale so well, but replacing fat models would still scale fine.

btilly · on June 27, 2016

It significantly depends on the application, but it shouldn't be more than a factor of 5-10 or so...

A factor of 5-10 can matter. A lot.

codersteve · on June 27, 2016

Many applications don't need Facebook/Twitter scale and will never have this problem.

titanomachy · on June 26, 2016

How do you manage version-control on stored procedures? Can they be checked in with the rest of the application logic?

programmarchy · on June 26, 2016

That's the way I've done things in the past -- save stored procs in a file in a project repo, along with deploy and rollback scripts.

gfody · on June 27, 2016

In Microsoft land we have SSDT and stored procedures live in source control right next to the schema and any CLR functions. The whole thing is built, compile-time checked, and deployed. It's all pretty wonderful.

sopooneo · on June 26, 2016

I myself am wary of stored procedures except in very specific and uncommon circumstances. That said, you could absolutely version control your stored procedures by creating them from within database migration files that are version controlled by default.

platz · on June 26, 2016

What is the source of the wariness?

Udik · on June 26, 2016

Don't know about the parent, but I always felt that stored procedures end up by incorporating a good share of the business logic, extracting it from the main code of the application. This creates a messy situation in which you have your business logic split up between two completely separate and different systems, one of which (the stored procedures) is much harder to read, write, maintain and test. That said, I also deeply hate ORMs, I much prefer to use thinner layer to interface with a database (query builders and row mappers).

spacecowboy_lon · on June 27, 2016

That is normally considered a good thing other wise different applications accessing the same data have to roll there own - do you really want multiple versions of biz logic.

eg a large organizations like a bit telco may have multiple applications that update customer records.

Udik · on June 27, 2016

Should they do that through a single service api instead of directly hitting the database?

kedean · on June 27, 2016

Yeah, the proper way to be doing that is via a middleware api that can be maintained and versioned by it's own team, rather than coupling all these applications to a very specific database. If you have 4 apps talking to the same oracle database, and you decide to ditch the license, that's going to be way harder to deal with than if you just had one api being maintained by a team who's job is to know databases.

spacecowboy_lon · on June 27, 2016

Midleware is the normal term

platz · on June 26, 2016

I also do think putting too much logic in stored procs is a danger, and was/is a scourge for some shops. But was curious if that is the primary objection.

snuxoll · on June 27, 2016

I try to avoid anything but mapping or querying data in sprocs, but sometimes they are a better place to put logic that is needed to ensure consistency. Your database will outlive your application, or you will eventually need to integrate data from a third party system - as a result your database engine is always the last line of defense to protect your data, and if that means using a sproc than so be it. I'll admit, I've written some business logic into stored functions in PostgreSQL, but it's mostly a background job to update cached data because it is much faster to do it in the database than pulling a bunch of records down to a service and sending them back up.

PeCaN · on June 26, 2016

It's fat models vs thin models, just with fat database vs thin database. You have to be careful.

collyw · on June 27, 2016

I am working on an app where the original developer has followed a "thin models fat controllers philosophy". I have never seen this referred to as a good thing anywhere else (and Django works far better the other way around). Is there any information on why this might be a good idea / benefits to this approach?

EGreg · on June 27, 2016

How about the fact that in mysql you need root access to add / drop stored procedures? So my app's installer requires the regular user to have admin credentials to mysql instead of just creating a database for them?

There is hardly any reason to use stored procedures anyway, when your favorite mysql library supports multiquery.

sitharus · on June 26, 2016

Yes, there are tools to do this. It varies by database server and may need custom integration with your deployment system but they definitely exist.

The alternative is the old StoredProc_v1, StoredProc_v2 etc in the database and have the application check what's available. This allows you to roll the parts independently.

The big advantage with stored procedures is they can benefit from the DBMS caching and applying additional optimisations (depending on the DBMS of course).

IndianAstronaut · on June 26, 2016

Yes. Data warehousing projects I have worked on have lots of stored procedures in with the code.

One way that works is make sure you make a simultaneous tag of your code and stored procedure that works with it. You really want development to run hand in hand with stored procedure development, not as two parallel processes.

bigpeopleareold · on June 27, 2016

I have found Sqitch (http://sqitch.org) to be a useful solution.

snaky · on June 27, 2016

> 30 lines of Perl to wrap DBI

In Java it's even easier with JPA @NamedStoredProcedureQuery annotations

http://www.thoughts-on-java.org/call-stored-procedures-jpa/

craigvn · on June 26, 2016

But once you have run the stored proc how do you display it to the user? How do you get the data from the UI to the stored proc to execute? You write some code or it happens by magic? If you write some code, then you have just written an ORM.

dhd415 · on June 26, 2016

It's overly reductive to consider any code that interacts with a database an ORM. Broadly speaking, the big claims ORMs have made are 1) the ability to automatically map between an object model and a relational data model and 2) portability between database vendors. #1 falls apart when the object model becomes complex enough to suffer from object-relational impedance mismatch. #2 may work for smaller or simpler applications, but you're generally going to have to take advantage of vendor-specific database features to achieve optimal performance with applications that have anything more than trivial scale or performance requirements. Going with stored procedures generally means giving up on both #1 and #2 in exchange for a more flexible design (relational model can be very different from your object model) and higher performance (at the cost of having to hand-write all your own SQL, of course).

PeCaN · on June 26, 2016

> If you write some code, then you have just written an ORM.

What? No, I've written code to query a database. Not an Object-Relational Mapper.

  my $q = $db->prepare("get_my_stuff_by_name(?);");
  $q->execute($name);

or with fancy metaprogramming:

  use DBMagicStuff 'postgres://…';
  get_my_stuff_by_name($name);

That's not really an ORM.

intrasight · on June 27, 2016

You are both partially right. An ORM deals with the black magic of connections, parameters, transactions, etc. If you don't use an ORM, then you still have to deal with those semantics. But I do agree that you aren't writing an ORM.

PeCaN · on June 27, 2016

> connections, parameters

The database driver takes care of that, not the ORM.

(It's also not black magic.)

> transactions

Huh? That's a database feature. BEGIN/COMMIT/ROLLBACK are ANSI SQL.

ORM is an _Object Relational Mapper_. You're confusing it with a database driver (or, in the case of transactions, just a database in general).

mixedCase · on June 26, 2016

In Go I only need to annotate the column name of the struct variable to have it load automagically from DB using just SQL and a little bit of tooling (sqlx). 95% of the convenience of an ORM, none of the problems.

manigandham · on June 26, 2016

That's because sqlx (and the go foundation) are already a basic ORM.

mixedCase · on June 27, 2016

You could argue that anything that interfaces with a database is an ORM, but it is certainly no ORM in the Hibernate/ActiveRecord/EntityFramework sense.

It does very little "magic" and you're in full control of the SQL from the start.

manigandham · on June 27, 2016

Sure, let's say an ORM is anything capable of turning normal app code into SQL statements by itself.

What do ORMs do that's magic? What are the problems? Are you not in full control of them? It's your code after all that's using them.

I really don't get how they suddenly force any issues on you that you don't create yourself. The output SQL doesn't really matter if it gets the job done, and in cases it does you still have full control to write it yourself, and even use the same ORM to save you time in executing that.

mixedCase · on June 27, 2016

>What do ORMs do that's magic? What are the problems? Are you not in full control of them? It's your code after all that's using them.

http://www.joelonsoftware.com/articles/LeakyAbstractions.htm...

DougBTX · on June 27, 2016

Thanks for these questions. Initially it wasn't clear to me what the difference was either, but now I understand the distinction a little more.

The ORMs discussed in the article map a whole object model to a whole relational model. The idea is that a single piece of primitive data, say a bool flag on an entity, will have one location in the relational model, and one location in the object model, and the mapper's job is to get the data from one to the other and back again.

Hibernate will guarantee things about the object model, for example within a particular session, the entity A which maps to a record with id 1 in the database will be represented by the same object instance, so even if A is referenced indirectly through two different routes, eg, obj.foo.a and obj.bar.a, it would reference the same instance of A. The instance will only be fetched once, can be mutated through either route, and then saved back to the database using the mapping.

That's the sort of thing which is possible with a single mapping, but the abstraction breaks down outside that mapping. For example, questions like this on StackOverflow:

http://stackoverflow.com/questions/2470129/how-can-one-fetch...

He wants to select part of a database record, not the whole record. The mapping will only be defined for whole entities, so he has to use the complex projection syntax instead of a normal fetch. The top answer suggests mapping the data into a User object anyway, which will result in his code having some User objects will all their properties set, and some with just a subset.

This puts pressure on the programmer to stick to the single mapping. Partial fetches like this could lead to an object which represents a particular user which may or may not still be the same instance, and it may or may not support being saved back into the database.

> Are you not in full control of them?

So, yes, you are in full control, but the benefits come when there is a single mapping between the relational model and the object model. In practice, there isn't a single object model, let alone a single mapping.

takeda · on June 28, 2016

The issue with it is that SQL is a bit different kind of language. When using it we don't tell the database what to do, but instead we say what we want. That does not translate well.

Second issue is that tables and their relations don't map too well to objects and their relations.

teacup50 · on June 26, 2016

> If you're not using an ORM, then you ultimately end up writing one.

Only if you assume that Object Relational Mapping is the only way to express type checked database access at the application layer.

ORMs are broken by design; there's no lossless bridge between relational theory and OO -- only inconsistent, error-prone, complex approximations.

Instead of bringing relational theory to the programming language, ORMs try to emulate OO behavior/method dispatch/etc in a world of transactions, deadlocks requiring retry of previous operations, on-disk serialization of data, etc.

In 2016, it's ORMs that are outdated.

gozur88 · on June 26, 2016

This. Ye Gods, this.

A project with nontrivial data relationships should start with the data model, not with a code model superimposed on data. ORM leads to bad data models and performance problems that are unfixable.

snuxoll · on June 27, 2016

If only people would listen to this advice. The number of times I have started a project by trying to think about the data model only to get told "just build the models, we can change it later" drives me insane. Code is easy to change, your data model is not once it is in production.

spronkey · on June 28, 2016

Only if you use a single, concrete data model. If your domain supports the use of something like event sourcing, your "data model" is surprisingly easy to change. Or in this case, is = are, and data model = data models, and they're actually derived from your domain events, which can be replayed. At any time.

spacecowboy_lon · on June 27, 2016

testify Brother/Sister Testify

cmrdporcupine · on June 27, 2016

Yep, for years my bias has been trending to the relational method of modeling data rather than the object oriented way.

The relational model is more flexible -- you can slice data up in any number of ways, rather than through a fixed hierarchy.

The relational model was developed by Date/Codd to address the needless complexity and brittle nature applications built around network or hierarchical databases of the 60s/70s, which are in many ways analogous to today's OO relationships.

lmm · on June 27, 2016

More like relational databases are outdated. There was one truly committed effort to bring the database model into the programming language - EJB - and there's a reason it's now a curse word. The successful systems of the past ten years have been those that moved away from the relational model, using simple datastores controlled by application code.

int_19h · on June 27, 2016

I'd bet that there is more code written in the past 10 years that uses SQL, than that which uses "simple datastores". You don't hear it much, because it just works, all the problems are well known (and so are the various solutions & workarounds), and so it's boring. Whereas, "Company X has adopted No-SQL solution Y for its project Z! That means it works!" makes headlines.

lmm · on June 27, 2016

Probably true as far as it goes. But I think that SQL is a smaller percentage of systems now than it was 10 years ago, and that the trend continues to be in that direction.

teacup50 · on June 27, 2016

Only if you believe type theory and typed programming languages are outdated.

I'm not impressed with NoSQL; they have no answer to the limitations of relational theory beyond simply abandoning formalism entirely.

lmm · on June 30, 2016

I believe type theory and typed programming languages are very valuable. But if you keep your logic in code and your store as a dumb store then the store doesn't need to know about them.

Animats · on June 26, 2016

Marshalling support is useful. SQL result -> struct and struct -> SQL insert code is repetitive to write. Getting fancier than that may be overkill.

Marshalling in general needs more compile-time support. Kludges such as Google protocol buffer preprocessors are a fast but clunky way to do it. It would be useful if languages could be given a reference to an SQL CREATE TABLE and could use that information usefully. Field names, type information, and enum values should come from the CREATE TABLE info.

dhd415 · on June 26, 2016

This is the sweet spot of ORMs. A team lead early in my career wrote a tool that read the database schema and generated all the marshaling code at the application layer. We got a degree of type safety and relief from a lot of boilerplate code but no promises of automatically generating SQL or abstracting away the database schema, etc. It worked so well that I wrote one from scratch in another language a decade later for another database-backed application.

kenshaw · on June 26, 2016

We released a tool, xo, that generates safe and idiomatic Go code from a database schema that does not make any use of an ORM. I have been meaning to add support for generating code for other languages but have not yet had time to do that. That said, it is very easy and very straightforward to do so.

Check it out: https://github.com/knq/xo

kawera · on June 27, 2016

xo is great, we've used it a few times for existing databases, great timesaver. Thanks!

moosingin3space · on June 27, 2016

Despite not being a huge Go fan myself, I do appreciate this aspect of Go's database/sql and sqlx packages.

Retra · on June 27, 2016

One of the problems with learning a web framework is that there are a lot of them, and the effort it takes to find out which one best serves your needs can seem to be on par with rolling your own, without the assurance that you'll actually get your needs met. And I suppose the ORM issue faces the same decision-making problem.

Hand-rolled crap is at least crap that was rolled to suit the problem you're facing, and you'll do it using tools you understand. You're not learning "any" framework, you're learning a bit of them all and hoping that's enough to point you in the right direction. Sometimes it's not, and then you're stuck with a hammer for screws and someone else's code that you don't understand.

danpalmer · on June 27, 2016

I see learning a framework as less about trying to get all the things you need, and more about 'borrowing' someone elses architectural thinking. When you encounter something the framework doesn't provide, you probably have all the architectural tools available to you to create something good that fits in well with the rest of the framework. Also, when you need to move to another framework, you have a base of experience you can draw from to help you learn much faster.

thomasahle · on June 26, 2016

> An ORM provides type checking at your application layer

This. When composing complex queries, we really want type checking and SQL injection safety.

woah · on June 26, 2016

Don't parameterized queries provide all the safety one might need?

eudox · on June 26, 2016

Parameterized queries are to prevent injection. They don't tell you, at compile time, when you're building an invalid/unrepresentable SQL query.

jdreaver · on June 26, 2016

> They don't tell you, at compile time, when you're building an invalid/unrepresentable SQL query.

Sure they do, with a proper library. Haskell's persistent library does this very well.

thomasahle · on June 26, 2016

The Persistent library is an ORM. It provides syntax and safety for sql. For example

    delete $ from $ \t -> do
             where_ $ (t ^. TutorialAuthor) ==. 
                      (sub_select $ from $ \a -> do
                                    where_ (a ^. AuthorEmail ==. val "anne@example.com")
                                    return (a ^. AuthorId)){-/hi-}
    tuts <- select $ from $ \t -> do
            where_ (t ^. TutorialSchool !=. val True)
            return (t ^. TutorialTitle)

looks a lot like something you might get with a good ORM. It might be more general and slightly different, but haskell usually does things slightly different.

virtualwhys · on June 27, 2016

Looks like Persistent has no support for table joins...bit of a show stopper that, but there's a companion library[1] which looks quite powerful (similar to Scala's Slick wrt to type safety and composability).

Can't say I care for the syntax much (`^.`, `where_`, etc.) but I guess that's par for the course in Haskell given the open world (i.e. module system leaves something to be desired relative to OCaml and SML).

[1] http://www.yesodweb.com/book/sql-joins#sql-joins_esqueleto

teacup50 · on June 26, 2016

Haskell doesn't have OO, so it can hardly be an ORM.

The library in question is modeling relational theory at the language/type level. That's not remotely like an ORM.

chowells · on June 27, 2016

Haskell does OO just fine. It doesn't do nominal subtyping, but that's really a misfeature in OO anyway.

That said, persistent doesn't do OO.

teacup50 · on June 27, 2016

Data encapsulation (e.g. via ADTs) does not equal "OO".

int_19h · on June 27, 2016

I would argue that OO is polymorphism + innate object identity. Though the latter only matters if you have mutability; but then, whether OO without mutability is "true OO" is one of those deeply philosophical questions.

teacup50 · on June 27, 2016

I find the dividing line in formalism; we can always find similarities with language-level features from OO, but OO is by definition an informal set of ideas about how programming is best done and understood, rooted deeply in empiricism, not formal theory and analysis.

If I can model your type/language model as a higher-kinded dependently typed lambda calculus (e.g. using Boehm-Berarducci encoding for non-turing complete recursive ADT definitions), then it's not OO -- it's functional programming, rediscovered.

logicchains · on June 27, 2016

Existential types are kind of like OO interfaces (allow storing a collection of things that only have in common that they implement the interface), and typeable can be used for casting them back to their base types.

eudox · on June 27, 2016

Yes, but that library is using Haskell functions to compose SQL strings at run-time. I should have clarified that by interpolation I meant manually writing the string you send to the DBMS.

JustSomeNobody · on June 27, 2016

No, but your unit tests should catch bad queries and then you're good to go.

thomasahle · on June 27, 2016

You can use that argument against any kind of static analysis/checking.

krapp · on June 26, 2016

Unfortunately, ORMs sometimes undermine safety for the sake of convenience. Even ORMs which use parameterized queries (which any sane framework will) may be vulnerable if they build dynamic queries using string concatenation.

kpil · on June 26, 2016

No ORM will pass queries using string concatenation? (Right? I know nothing about ORMs written in PHP by beginners that don't know SQL if it jumped up and bit them in the ass... But surely no half-decent ORM would concatenate strings to pass arguments?)

Anyway. Type safe queries like QueryDSL is extremely nice to work with.

But as was mentioned, it all boils down to this: There IS NO silver bullet.

You have to learn SQL, and then the ORM tool. And the abstractions will leak, and you will be pissed of sometimes, but It Is Worth It because you will save a lot of development time.

There are some pain points. Large joins where you'd need to eager-fetch a few one-to-many "leaves" at the end of a huge and complex join is a bit of a pain, as the ORM will need to split the joins for efficiency, and there is no obvious way of reusing the complex part between the calls available to the user of the ORM.

(Like how you could use a temporary table when using Oracle for instance. Nothing should stop a ORM to use that as a join-strategy though, when I come to think of it...)

The other obstruction is mindset. To use a ORM efficiently, the developer need to step away from the data-layer model.

There is no separate data access layer when working with persistent objects. The objects represent the model, and are simply persistent. If they are changed, the change stays.

Preferably, the model should be available to the whole application, and the objects should be changed and used where it is suitable, not restricting access based on that they will trigger a database access. (The important thing should be to keep the model and it's rules together - in some sort of abstraction, not that some things happen to write to a database.)

viraptor · on June 27, 2016

> But surely no half-decent ORM would concatenate strings to pass arguments?

I wish... Drupal had an issue in ORM itself. https://www.drupal.org/SA-CORE-2014-005

mixedCase · on June 26, 2016

>And the abstractions will leak, and you will be pissed of sometimes, but It Is Worth It because you will save a lot of development time.

Only on some languages. As I mentioned on another comment, on Go I'm very productive using simple database/sql + sqlx. On C# I could die writing mapping boilerplate before getting any business logic done.

thomasahle · on June 27, 2016

Does database/sql check syntax at compile time? Or do you just try to unit-test you out of that?

snuxoll · on June 27, 2016

No, but even though Entity Framework (as an example) does check types at compile time it doesn't prevent you from making mistakes that will cause runtime errors (oh the number of times I have run into issues trying to do simple things like ThingsToCheck(x => x.Created <= DateTime.Now.AddHours(-1)) which you would think works, after all it compiles, but it will just throw a runtime error because LINQ to Entities doesn't know how to parse the expression tree).

Type-safe ORM's don't preclude integration testing, end of story.

bpicolo · on June 26, 2016

No ORM is building query bits from user input with string concat.

snuxoll · on June 27, 2016

I don't need a full O/RM to provide query composition, type checking and protection against sql injection. SQLAlchemy core provides all of these (you can use the ORM if you really want, and I have done so, but when I am pulling 16K records in the whole object mapping layer becomes expensive when a simple tuple works fine).

int_19h · on June 27, 2016

The problem is that doing queries over relational databases in a type-safe way requires either 1) some form of structural typing (for projections), or 2) ignoring projections altogether, which brings a slew of perf problems to the table.

Ironically, this means that OCaml is probably the best [reasonably popular] language to use an ORM in.

fn1 · on June 27, 2016

> 2. If you're not using an ORM, then you ultimately end up writing one.

There are other ways, like when using "event sourcing", or servers like postgrest [0] that give you a REST-api on top of your database.

ORMs are mainly useful if you have objects to create, read, update and delete.

[0] http://postgrest.com/

MustardTiger · on June 27, 2016

>If you're not using an ORM, then you ultimately end up writing one. And doing a far worse job than the people who focus on that for a living.

I don't understand why people keep repeating this myth. I do not use an ORM. I did not write one, or anything resembling one. I have absolutely no problems with accessing my database from my applications.

GrumpyNl · on June 27, 2016

Same here and after heavily testing, the old school simple approach is much faster. ORM, you never know how many lines of code are involved.

eru · on June 30, 2016

> 2. If you're not using an ORM, then you ultimately end up writing one.

Why? That presumes that you are using objects, doesn't it?

frik · on June 27, 2016

It's 2016, and this article from 2014 has very valid points.

Maybe read the article first? The headline isn't a complete summary. You wouldn't have written point 2-3 otherwise - have you read what he wrote about DB schema in the article?

Gustek · on June 26, 2016

I like to say "ORMs make easy thing easier and hard things harder".

What i mean by that is, any simple CRUD operations are much easier in ORM. The hard things, i mean any complex queries that need more than one join you are probably better of writing yourself.

In the end i prefer to do inserts, updates and deletes with ORM (or some other database abstraction tools) but most SELECTs i write myself, fetching exactly what i need and mapping result to objects if needed manually.

craigvn · on June 26, 2016

I pretty much do the same. The CUD is ORM, and some simple selects in C# Linq, but as soon as the query is a bit more complicated I crack out the SQL. It does not have to be all or nothing.

IndianAstronaut · on June 26, 2016

Do uou use metadata from the db driver to do type mappings?

Gustek · on June 27, 2016

Not sure what you asking about. Do you mean if I define database structure in my code? The answer is: yes. I don't know about any database abstraction library that would provide you with convenience methods to query database with type checking and stuff without giving it the schema.

manigandham · on June 27, 2016

Not this again. Why is this coming up at all in 2016? There isn't even a valid debate here.

ORMs are a tool, that's it. The relational operations of SQL and the object-oriented (or functional) logic of your application code are usually very different and it's nice to have a mapper that lets you interact with your app's language while it takes cares of automatically mapping it to SQL.

For 99% of database ops where it's pulling some stuff out, editing some fields and then saving it back, an ORM will save you a massive amount of time with better safety, security and performance. Even complicated queries/mappings/procedures run pretty well and you can always write raw SQL when you need it. Modern ORMs will even let you execute custom SQL and still give you conveniences like easy parameterization and hydrating the results back as objects.

Maybe I've been spoiled with the .NET ecosystem with great ORMs like NHibernate, EntityFramework, and Dapper (along with C# features like Linq) but it seems like most of these complaints are from people who use shitty ORMs or can't comprehend that it's just a tool that they can choose to use or skip, at very granular method by method level. There's absolutely no reason for any extreme here.

eru · on June 27, 2016

> ORMs are a tool, that's it. The relational operations of SQL and the object-oriented (or functional) logic of your application code are usually very different and it's nice to have a mapper that lets you interact with your app's language while it takes cares of automatically mapping it to SQL.

Functional and relational models work really well together.

At Standard Chartered we even went so far as to add relations as a datatype to our Haskell-like language. It's a charm; and comparable for me to my experience first going from C-style arrays only to eg Python's dicts.

Only that this time the dict-style data organization was the `before'. Dicts are essentially equivalent to hierarchical databases, a model whose flaws relational databases were invented to address.

pbowyer · on June 27, 2016

> At Standard Chartered we even went so far as to add relations as a datatype to our Haskell-like language. It's a charm; and comparable for me to my experience first going from C-style arrays only to eg Python's dicts.

Have you written about this? It sounds really interesting.

eru · on June 27, 2016

I haven't written about this yet, alas. There's a bit of information at https://www.reddit.com/r/haskell/comments/2u0380/slides_from...

insulanian · on June 27, 2016

Could you provide some examples?

eru · on June 29, 2016

Sorry, lacking the time for that. But basically all the advantages of relational over hierarchical databases apply for relations vs dicts as datastructures.

CuriousSkeptic · on June 27, 2016

My experience with hibernate, entity framework, nhibernate, and lastly dapper has basically left me thinking Dapper is all you'll ever want. I can't imagine a use case where I'd rather opt for NHibernate or EF at the moment.

(I might add, Dapper in combination with C#6 even gives me enough type saftey to be happy. String interpolation and the nameof operator complements dappers DTO approach nicely)

Since you seem to have similar experiences Im curious. When would you ditch dapper for the others?

ta_donk_gt · on June 27, 2016

> basically left me thinking Dapper is all you'll ever want

Same here. And I don't know anybody who has worked on large enterprise EF systems that hasn't come to roughly the same conclusion. The type safety EF offers is extremely nice to have, no doubt, but the problem is it makes it so easy to create performance problems that _every_ system ends up with them. Probably half of my billable hours in the last few years has been addressing EF performance problems.

Dapper, with a few extensions, can give you type safety for 80% of your typical queries, and the rest can easily be done in stored procedures or with (my preference) Dapper's SqlBuilder. And using SSDT instead of EF code-first, you can easily and in a version-controlled way manage your schema, views, stored procedures, indices, etc., simplifying performance management and getting static analysis in your sql while still not losing straightforward and configurable migrations.

pbz · on June 27, 2016

Type safety (no strings) and just as fast as Dapper (actually faster in some cases):

https://github.com/linq2db/linq2db

mistermann · on June 27, 2016

Do you use a Dapper extension for populating and persisting entities? (Last I looked at it as I recall this wasn't default functionality.)

Renner1 · on June 27, 2016

Dapper does have very low-level basic functionality to insert or update entities. You need to write the SQL yourself so this gets very painful very quickly for complicated models where you want to persist child objects.

That someone would say "Dapper is all they need" leads me to think they only work on very small projects. You will drown in large projects if you use Dapper everywhere. Better to use a high level ORM like EntityFramework and sprinkle in some Dapper in performance critical code.

The only time EF creates performance problems is when the developer doesn't understand the concepts behind SQL. This isn't an ORM problem, it's a training problem.

CuriousSkeptic · on June 27, 2016

As I see it, if your model is trivially mapped, you could probably go with something like ormlite on top of dapper. But if you need to custom map you model anyway you might just as well write SQL to map it as some custom dsl mapping thing for the ORM in question.

And, I don't use dapper for performance, I actually like its transparency and friction free interface to the db. Let's me get things done without workarounds. And most importantly, lets me use ssms with a repl flow, copying sql verbatim between vs and ssms.

maxxxxx · on June 26, 2016

I think ORMs are a great tool to get something off the ground quickly. Like with most tools you will hit a point where they make things more difficult and then it's probably time to switch to SQL only or mix SQL with ORM especially for performance critical queries. In most applications I have seen the ORM provided a lot of value but there were cases where it needed to be augmented with raw SQL.

I never understand why people want one or the other exclusively. Both have their place.

I think stored procedures would be very useful if they integrated better with source control and the app code. Maybe we need an ORM for stored procedures that automatically creates stored procedures from the project code?

fishtoaster · on June 26, 2016

I completely agree.

90% of the queries in my app are no more complex than selecting from a table with a simple condition. I definitely find

  users = User.where(has_foo: true).limit(10)

to be a lot more readable than

  rows = connection.exec_query("SELECT * FROM users WHERE has_foo=true LIMIT 10")
  users = rows.map { |row| User.build(row) }

(And that's an example with no user-provided input)

Likewise, any app of sufficient size seems to end up with a handful of queries that really are a pain (or impossible) to cram into an ORM. Trying to do so would result in an unreadable mess, and using raw sql improves the situation immensely.

zepolen · on June 26, 2016

Here's the thing, anyone who knows SQL will find the second one readable, and only Ruby programmers who have used ActiveRecord will know how the first one does.

Merad · on June 26, 2016

I've never written a line of Ruby in my life and I know what that code does, because it's essentially identical to what you'd write in C# LINQ to query with Entity Framework, LINQ2SQL, etc.

    var users = Users.Where(u => u.HasFoo).Take(10);

prodigal_erik · on June 26, 2016

Unfortunately that argument works both ways, and we are saddled with a majority of mediocre developers who know an ORM but can barely read trivial SQL and can't really write it at all.

RubyPinch · on June 26, 2016

> and only Ruby programmers who have used ActiveRecord will know how the first one does.

do you really think the first one is /that/ hard to understand?

jerf · on June 26, 2016

In my experience, programmers are often very fuzzy on all of the types involved in such libraries. I don't even mean "strong typing", but just the operations that they can perform with the various chunks of that expression. This is often complicated by the fact that the library itself often has bizarre limitations that have more to do with its internal implementation and limitations than with SQL. It's rare for me to see anyone use a library like that with fluency. Which seems to me to be a valid way of saying that, yes, in practice, it does seem to be harder to understand the first version in Ruby than you might initially think. Using it isn't particularly harder, but understanding it is.

Also note this is an observation of mine, not a logical argument, so trying to logically argue about why it shouldn't be harder would be arguing a point I'm not making. As much as I don't like people bashing strings together programmatically to generate SQL queries due to the ease of screwing it up, I observe that a lot more programmers are capable of this (even if they screw up the security) than seem to understand how to use things like ActiveRecord equally fluently. YMMV.

mattdw · on June 26, 2016

Really? connection.exec_, .map, .build – there's a lot more non-SQL going on in the second example than the first. The first may be syntactically far from SQL, but the use of familiar vocabulary makes it pretty understandable from a SQL point-of-view.

platz · on June 26, 2016

And what are the connection and allocation semantics of the first? (Yes, the second way is more explicit.)

lmm · on June 27, 2016

What's your concern here?

Readability by maintenance programmers? How many Rails programmers will not know ActiveRecord? How many are going to be better at maintaining SQL than they are at maintaining ActiveRecord queries?

Readability by business people who don't know any programming languages? The first is a lot clearer than the second IMO.

zwischenzug · on June 27, 2016

Not true. They're both readable, but trying to do anything more interesting becomes painful, and this is where having a common tool for the job helps.

It's like regexps - imagine if there were 30-40 different implementations instead of 2 or 3.

GrumpyNl · on June 27, 2016

For me thats the opposite. I prefer "rows = connection.exec_query("SELECT * FROM users WHERE has_foo=true LIMIT 10")", because i can read and understand the query. By reaing it i can already tell if it is a good query etc. while "User.where(has_foo: true).limit(10)" tells me nothing.

lostcolony · on June 27, 2016

Read and understand, yes, the first is slightly simpler. Modify? Now I want to add an 'and' to the where clause. What's easier to figure out how to modify, for a random developer? I'd contend the latter, and that that is true for any modification you have to make.

jimmaswell · on June 26, 2016

In this example, it seems like it could be an issue that you take everything and then only use the first 10, instead of only taking the first 10 to begin with. Is there a way to not make it take them all, like putting the limit parameter in where()?

kaolinite · on June 26, 2016

Rails doesn't execute the query until you actually use it in some way, by which point it knows you've added a limit clause.

spacecowboy_lon · on June 27, 2016

No professional developer should EVER use Select * in working code.

bigpeopleareold · on June 27, 2016

At least it is an obvious fix. :) The AR example/solution would tend to just assume you want all the fields. To those naive of how AR behaves, it isn't obvious.

ams6110 · on June 26, 2016

It's quite trivial to keep stored procedure code in source control with the app code or in its own repo. I am always mystified when I hear the complaint that this is difficult because in my experience it's no more difficult than managing any other code in a SCM repo.

brightball · on June 26, 2016

My biggest concern has always been around zero downtime deployment with stored procedures. Your database has to work with both the old and new versions of web code during the deploy in case of rollback to make that process work and that always struck me as the weird edge case that makes things tough. If a procedure never changes or remains backward compatible then it should still be fine though.

platz · on June 26, 2016

Give the new proc a different name (e.g. append a version number). On the n+1 deployment, clean up the old proc.

Nullabillity · on June 26, 2016

And so you've reinvented the version control system on top of your deployment system, on top of your version control system.

That also sounds fun if you use the proc from more than one location...

platz · on June 26, 2016

Zero downtime deployment with stored procedures should not be confused with "version control".

It's not a version control scheme; once deployed the procs are never updated.

brightball · on June 27, 2016

Yea, I suppose backwards compatibility is really the best policy there. If a proc is going to be updated to be more efficient it's one thing. If it's going to change what it does then we just need a new proc.

platz · on June 27, 2016

Yep. (if there are schema, changes too, zero downtime & backward compatability is difficult to pull off, but thats a chore for any strategy )

greydius · on June 26, 2016

In my limited experience with the enterprise world, the problem with using stored procedures was that the devs had no control over the database. We could have come up with some nicer solutions to certain problems, but no one wanted to deal with the bureaucracy necessary to create and maintain parts of our applications within the database.

FLUX-YOU · on June 26, 2016

Even if devs had control, it's difficult to know which version is actually on the database without pulling it up (this is a timesink), especially if you have multiple environments/configurations.

AFAIK there's no way to diff/history of stored procs in the database (and certainly not against your VCS), so large companies usually do comment blocks at the top of each one.

snuxoll · on June 27, 2016

> AFAIK there's no way to diff/history of stored procs in the database (and certainly not against your VCS), so large companies usually do comment blocks at the top of each one.

Sqitch, go check it out. Versioning database code with linear migrations always has this issue, you add a column and where's your diff on that without going through your list of migrations. Stored procedures are no different, and the sooner people use better tools the happier they will be with maintaining their database migrations.

nickpeterson · on June 26, 2016

Also Java and C# developers often produce bad SQL procedures because they tend to think procedurally about problems.

maxxxxx · on June 26, 2016

I think if they were better integrated with the IDE and easier to debug then they would probably write better stored procedures.

lmm · on June 27, 2016

And easier to unit-test, and profile, and had a better standard library available, and...

SQL is a terrible general-purpose programming language by the standards of today, which makes it a terrible way to express business logic.

ZanyProgrammer · on June 26, 2016

As a C# dev myself, fuck stored procedures, to be quite honest.

intrasight · on June 27, 2016

Please elaborate

ZanyProgrammer · on June 27, 2016

1.They are a pain in the ass to keep track of in version control. 2. They are hard to debug, especially when lots of business logic gets dumped to them

maxxxxx · on June 27, 2016

Totally agree. It's hard to do code and stored procedures right with the same level of quality. The tools are different and the thinking is different.

Something1234 · on June 26, 2016

What would be the better thought process for writing sql procedures? It seems like PL/SQL is fairly procedural.

nickpeterson · on June 27, 2016

The rule of thumb would be, use SQL statements to model and execute business logic, not cursors/variables. While the execution is still procedural, the goal is to make all the decisions 'at once' rather than explicitly looping. Behind the scenes a similar set of steps are taken, but the dbms is free to perform the operations as it sees fit.

Spooky23 · on June 26, 2016

This is a great point.

In a smaller company, those barriers aren't really meaningful. In an enterprise, you're easily adding a week or more to change control process to ship.

I think another factor towards why database focused solutions aren't popular in small companies is that MySQL historically hasn't been the best platform for that approach, and you need more expertise to scale the database.

jjoonathan · on June 26, 2016

It's not only easy to manage stored procs separately, it has huge advantages: you can play with your SQL in an IDE built for the task with luxuries like real-time error highlighting, profiling, debugging, and auto-complete, and in some cases you can even iterate on your code without restarting your application (just update the proc).

nodamage · on June 27, 2016

Just once I'd like to see someone write an opinion about ORMs without resorting to sweeping generalizations ("Vietnam of computer science") or making assumptions about how they are being used ("it wasn't a good fit for my use case so therefore it must never be").

Maybe, just maybe, it's possible that ORMs are useful for solving certain types of problems and less suitable for other types. If you work on web stuff or applications that are report-oriented where most of the time you're just fetching data (possibly with complex queries) and rendering it to display, then maybe ORMs aren't a good fit.

On the other hand if you work on client-side apps where your objects are backed by a database but are otherwise long-lived, then sometimes the other features (beyond SQL generation) that ORMs provide come in handy (tracking units of work over the object graph, maintaining an identity map for consistent access to objects, and providing change notifications when an object or collection of objects is manipulated).

If you've never needed any of these features that's perfectly fine. I've never to needed to use a bulldozer either. But I'm not going around declaring bulldozers useless just because I've only ever needed to use a shovel.

notyourwork · on June 27, 2016

Is this not the general sentiment of any piece of software? It was implemented to serve a purpose, if that purpose is not sufficing a goal of yours it is not relevant to you.

nodamage · on June 27, 2016

Indeed. It would be nice though, if people, having realized that a particular tool isn't useful for their particular goals, could still acknowledge that it could be useful in other situations, instead of blaming the tool itself.

gregopet · on June 26, 2016

I've come to love jOOQ, the Java library that I can write type-safe pure SQL in (and with code that is immediately understandable to anyone who knows SQL). It provides mechanisms to smooth the clash of Java/SQL worlds, but otherwise never assumes it is smarter than the wisdom accumulated through decades of database usage. I've come to appreciate many aspects of databases again that tools like Hibernate try to hide from me.

http://www.jooq.org/

I do still use Hibernate, probably because my usual framework of choice makes it so easy to, but anything above medium complexity goes through jOOQ nowadays.

_pmf_ · on June 27, 2016

I was intrigued by jOOQ, but the huge amount of generated code for internal metadata of the DB was a bit offputting. I don't want 4MB of generated classes for 3 very small tables.

lukaseder · on June 27, 2016

Use the code generator's <inputSchema> or <excludes> flag or other means and you'll only get those 3 small tables generated...

_pmf_ · on June 27, 2016

Ah, thank you, this does the trick; I first tried to use my database name as inputSchema, but I had to use "public" in PG.

lukaseder · on June 27, 2016

I see, yes. The database corresponds to jOOQ's Catalog, not the Schema (except in MySQL, which mixes up these concepts)

vinceguidry · on June 26, 2016

The thing I love about ActiveRecord is that it makes it easy to, anywhere you want, and at any level of the abstraction stack you want, to just toss in SQL fragments. This gets you the best of both worlds, the fluidity of being able to just define methods on model objects, and the ability to utilize database tech to the fullest. You can take any query and call .to_sql on it and it shows you exactly what it's passing to the database. I prefer Sequel's semantics but ActiveRecord is an excellent workhorse that I have no problems relying on.

sirn · on June 26, 2016

But how do you do:

    SELECT
      posts.*,
      (SELECT COUNT(1) FROM comments WHERE post_id = posts.id) AS comments_count
    FROM posts;

In ActiveRecord, without 1+N queries, or caching comments_count in a column somewhere?

Admittedly, that was not the best example. The last time I need something more intertwined than a simple COUNT in subquery, the answer was "give up and just use Arel." But at this point it is no longer quite ActiveRecord, but rather a SQL without strings.

This is my biggest gripe against the Active Record pattern in general, as it ties its model too tightly to the underlying database. It is convenient for a simple CRUD tasks, which may fit about 90% of use case, but that's not the only thing the database is capable of.

samwillis · on June 26, 2016

I don't know about active record but with Django it's simply:

  Post.objects.all().annotate(Count('comments'))

Which produces (simplified, Django would actually explicitly select each column):

  SELECT posts.*, COUNT(comments.id) AS comments_count
  FROM posts
  LEFT OUTER JOIN comments ON (posts.id = comments.post_id) GROUP BY posts.id

Nice, easy and without 1+N queries.

sirn · on June 26, 2016

That is actually very nice and elegant!

fennecfoxen · on June 26, 2016

I've actually never used ActiveRecord myself, but when an ORM breaks down and it becomes hard to make certain queries I've had good luck with other ORMs by creating a view in the database and telling the ORM that it's a table. Obviously you can't insert into arbitrary views, but most are happy to give you an object back, with appropriate relationships to the actual database-writable objects.

I'm surprised there isn't more explicit support for this in more ORMs (things like not having 'save' methods on the ORMed classes).

_gi12 · on June 26, 2016

That's really easy.

    Post.select("posts.*, count(comments.*) as comments_count").joins(:comments)

jeltz · on June 27, 2016

That is not the same query at all. You need to do a left join and a group by for it to become the same query. Wihout a left join posts without comments wont show up in the result.

_gi12 · on June 27, 2016

You're totally right. My example is an inner join. I will admit that doing outer joins in ActiveRecord is quite painful. Here's a more realistic example, which is not nearly as pretty:

  Post.select("posts.*, COUNT(comments.*) AS comments_count").joins("LEFT JOIN comments ON comments.post_id = posts.id").group(:comments)

sirn · on June 26, 2016

Thanks for the answer!

Unfortunately, what I needed to do back then were more complex than that (say, I need to perform query on top of subquery, e.g. "SELECT ... FROM (SELECT ...) AS a HAVING ... GROUP BY ..." sort of thing) which seems to be something ActiveRecord wasn't designed for.

In the end, I solve the problem by using Arel with an ActiveModel and unwrap all the data manually.

jimktrains2 · on June 26, 2016

Wouldn't you need to do a group by in your query as well?