Goodbye MongoDB, Hello PostgreSQL

jgrahamc · on March 10, 2015

As a greying developer I am most amused by people discovering that 'old' technologies like SQL databases work really well.

The only useful piece of advice I can give a younger developer is... be careful when drinking the newtech koolaid.

And one more thing:

    star = Sequel.lit('*')

    User.select(:locale)
      .select_append { count(star).as(:amount) }
      .select_append { ((count(star) / sum(count(star)).over) * 100.0).as(:percentage) }
      .group(:locale)
      .order(Sequel.desc(:percentage))

just makes me want to cry. Learn SQL rather than wrapping it.

YorickPeterse · on March 10, 2015

It's not so much about not wanting to write/understand SQL (both are still very much required), but about composability. If you want to re-use bits of a SQL query written as a string literal your only option is string concatention or using some kind of string builder/template system. In both cases there's little validation of the query's correctness (syntax wise) until you actually run it.

While I agree that many ORMs go too far or even worse, not implement certain powerful features, Sequel (and similar tools I imagine) strikes a nice balance. The particular examples I gave on their own are not super useful, but we already have quite a few queries that are composed/re-used without having to concat strings.

So tl;dr: it's about composability, not being "lazy" or "ignorant" to SQL.

bigpeopleareold · on March 10, 2015

I once agreed with this, but now I don't. I just want to write SQL (dammit!).

I can never, ever remember the intricacies of the Sequel API or any one of these query builder APIs. I am always looking up something that is rather trivial because I am thinking in SQL, the language, and always have to convert back to Ruby or whatever language I am working in.

CTEs and SQL functions in PostgreSQL strike a good balance in the composibility department for me, while still being able to write SQL.

ad_hominem · on March 10, 2015

Let's say you have a product search screen in your application. There's a text field for filtering on product title (WHERE title LIKE), one for filtering on UPC (WHERE upc LIKE), a couple range filters for min/max prices (WHERE price <= and/or WHERE price >=), and then on the results screen the user sort on a few different columns (ORDER BY) as well as paginate and set number of results per-page (LIMIT + OFFSET).

How exactly are you going to just write SQL if the actual query statement needs to change based on the user input? Are you going to hardcode every possible permutation, or are you going to use a query builder? Or are there some advanced Postgres features that I'm not aware of that would allow all these combinations in some kind of prepared statement?

edit: The question I'm asking is pointed towards the people who are implying that you can "just use SQL" as static statements that are not dynamically assembled. Like a static function or prepared statement that takes some parameters, and at runtime you only pass in those parameters - not rejigger the actual SQL statement fragments. Your ad-hoc query builder implementations with SQL injection vulnerabilities are not relevant to what I'm asking.

moe · on March 10, 2015

How exactly are you going to "just write SQL" if the actual query statement needs to change based on the user input?

How about something like this:

  s = Select.new
  s.add "WHERE title LIKE #{title}" if title
  s.add "WHERE price <= #{price_range}" if price_range
  s.add "LIMIT #{limit}" if limit
  s.add "OFFSET #{offset}" if offset
  s.add "id, blah, boo FROM products"
  s.execute

Note how I deliberately shuffled the order and didn't bother with escaping.

Also note how anyone who knows SQL could immediately work with this, learning curve: 5 seconds.

Why is there no ORM that works like this?

zo1 · on March 10, 2015

Python SQLAlchemy works a bit like this, if you use it that way:

    query = session.query(Products)

    if limit:
        query = query.limit(limit_val)

    if offset:
        query = query.offset(offset_val)

Sort is a bit tricky, but as I'm a noob at SQL Alchemy, I had some not-so-pretty boilerplate code implemented for decorating queries with custom sorts. But essentially it boiled down to:

    if sort_type == SORT_REVERSE_CONST:
         sort_obj = sort_obj.desc()
    else:
         sort_obj = sort_obj.asc()

    query = query.order_by(sort_obj)

And if you want to search through dynamic tables, change the initial query instantiation to something along these lines:

    query = session.query(retrieve_db_object(table_name))

And to execute:

    return query.all()

moe · on March 10, 2015

Python SQLAlchemy works a bit like this

All ORMs work "a bit like this". I don't want "a bit like this". I want exactly like this.

Because with my proposed interface I could be productive immediately and permanently. I would never have to refer to any documentation. Not once.

With every other ORM this is a pipe dream. Their "fancy" chainable wrapper-APIs are nothing but a ball on a chain.

aidos · on March 10, 2015

Well, no, they're more than a "ball on a chain" - they're a tradeoff - like everything in software.

They give you the ability to manipulate the query in interesting ways at any point before you execute it. They let you join different queries together, built by different parts of the system, in a safe way. They let you work with the native language you're working in instead of having to construct clauses in a foreign language, using strings. They let you post optimise your loading strategies.

You might want "exactly like that", but it's not going to be as flexible as a chainable wrapper system.

Hell, I could probably give you pretty close to that in python with a little work, but it's not something I'd want to use myself.

I get it. I like working in SQL too. I know it really really well, and I cringe when I see developers writing totally sub-optimal code because they don't understand the relational data model.

But there are other ways to do things, and what you're describing doesn't give you much more than raw SQL, so why not just use raw SQL? You've added some syntactic sugar that, in my language (Python), would be a bit of a horror show (strings access local variables implicitly, no thanks). What else do you gain?

This is more verbose, granted:

    q = session.query(Products)
    if title:
        q = q.filter(Products.title.like(title))
    if price_range:
        q = q.filter(Products.price.lte(price_range))
    if offset:
        q = q.offset(offset)
    if limit:
        q = q.limit(limit)
    products = q.all()

But then you get more stuff for free, like drilling down into the other tables:

    p = products[0]
    p.supplier.contracts[0]

But that's rubbish, because you'll be loading in a really inefficient way. That's ok though, tell the system how you're going to want to load the additional data.

    q = q.options(
        joinedload('supplier').
        subqueryload('contracts')
    )

See what I got with my "ball and chain"? Turns out it was actually the anchor for the whole boat. Sure, you have to learn a new syntax, sure, it's not sql, but that doesn't make it bad or wrong.

Use whatever makes sense for your use-case. Don't limit yourself because you'd have to learn something new. Honestly, before using SQLAlchemy I mostly felt the same as you do. Many ORMs get in the way, but that's not really a problem intrinsic to ORMs.

hueving · on March 11, 2015

>See what I got with my "ball and chain"? Turns out it was actually the anchor for the whole boat. Sure, you have to learn a new syntax, sure, it's not sql, but that doesn't make it bad or wrong.

If you are fine with the new debugging this entails. Take the joined load example you gave. Almost every time I encounter SQLAlchemy code, the author didn't understand the lazy joining semantics and every attribute access on each record is emitting a new query and you now have O(n) queries. It's not obvious that is happening until it bites you once you have thousands of records.

For another example, I can count on one hand the number of developers I have encountered that can understand the expiration of the objects in the session or what the engine builder stuff is doing in the boilerplate they paste in to get it to work. It always requires experimentation, a trip back to the SQLAlchemy docs, and then finally logging all SQL queries to see if it's finally emitting the optimal queries with the transaction guarantees you are looking for.

My gripe about SQLAlchemy is that it separates you too much from what is happening underneath.

aidos · on March 11, 2015

Yup. I'll give you most of that - there's a pretty big learning curve with SQLAlchemy.

But, taking the lazy joining example. Someone didn't understand how it worked and now you have to fix it, which is probably as simple as changing that loading strategy to something more suitable for the way you're using the data.

Let's rewind to the late 90s. If I wanted two tables of data to write into the page in an interlaced fashion I would look at how wide they were. I'd make a decision about using two queries or using one larger join. The two query approach was sometimes required due to the size of the data, but it meant cryptic output logic to track positions of cursors relative to each other. The single joined query was simpler to deal with, but still required tracking a last_parent_id so you could swap during the interlacing.

Other developers (those same ones that didn't understand lazy joining) would loop over the first query, running extra queries in the loop (I saw this a lot). Same bad performance as the lazy join.

When you discovered this issue in the code it was a total pain to fix. You're talking about rewriting whole load of code to the point of being unrecognisable from the code you started with.

Contrast that with how easy it is to fix in the SQLAlchemy case. I mostly don't worry about loading strategies now until I'm deep in development. Something's running a bit slow, take a look at some logs, tweak a couple of things and you're golden again. That's such a powerful abstraction.

Regarding the config of the engines etc, again, it's something you need to learn. But really, someone's starting an application from scratch and they just want to dump some code in to handle all their db interactions, but they don't want to know how it works? That's on them, either learn it or use the sensible defaults (in, for example, flask-sqlalchemy).

SQLAlchemy ORM separates you from what's underneath, but it's an ORM, that's kind of the point. If you need to be closer to the metal, use SQLAlchemy Core.

emodendroket · on March 10, 2015

Good news! Almost every language offers some sort of string concatenation with no protection whatsoever against injection, which seems to be what you're asking for.

nknighthb · on March 10, 2015

His proposal is not string concatenation, and it is not vulnerable to injection.

djur · on March 10, 2015

"WHERE price <= #{price_range}" looks like raw interpolation to me. How do you make that not vulnerable to injection unless you're escaping all variables that might be used in a query?

nknighthb · on March 10, 2015

It's pseudocode for an implementation that doesn't exist. There is nothing in his proposal that requires this hypothetical implementation to copy the value of the price_range variable into the string at all, much less unescaped.

The strings don't even have to be sent to the DB at all. You've utterly missed his point. He wants the language to be intelligent about what the SQL means and do the right thing.

It's not raw SQL, it's an abstraction.

djur · on March 11, 2015

The psuedocode in question happens to be syntactically valid Ruby. If the goal was to demonstrate how simple query generation could be without abstraction layers like Sequel, it is a valid criticism to note that the example given is eliding a necessary feature. Especially since helping avoid injections is one of the primary advantages of such an abstraction layer.

The original argument was that if one could "just write SQL" one wouldn't need to refer to documentation to become productive, "not once". I don't personally think it's practical to use any database library without reading the documentation, but certainly the moment you get away from "the SQL I type in goes straight to the database" you're going to need documentation to tell you what's going to change.

What's being described here sounds like a library that parses SQL fragments, combines them into whole queries, and sends them off to the database. To implement this this without resorting to simple string concatenation needs an intermediate representation of SQL language components. That intermediate representation is going to look a lot like SQLAlchemy, or Sequel, or Arel. That means that instead of "just writing SQL" you're actually adding an extra layer of abstraction and another chunk of documentation to read.

All that said, the "just use SQL" approach is already fairly well served. Most languages have mature bindings to specific databases as well as at least one ODBC-style API to simplify access to those bindings. Nearly all of these have support for client-prepared statements, which handles the interpolation problem. If you want to build more complex queries on the fly, you can use string concatenation or CTEs or decide to use a SQLAlchemy/Sequel/Arel-type library for just that one case.

And, honestly, you don't need to give up SQL to make use of Sequel. I personally really dislike the query composition API demonstrated at the top of this thread, but you can do useful things with Sequel without it:

    db = Sequel.sqlite
    db.execute "create table foos (id int not null, name text)"
    db.execute "insert into foos (id, name) values (1, 'bar')"
    # or
    db[:foos].insert(id: 2, name: "baz")
    db["select * from foos where id = ?", 1].first # ==> {:id=>1, :name=>"bar"}
    # or
    db[:foos]["id = ?", 2] # ==> {:id=>2, :name=>"baz"}

No need to go any further than raw SQL, but increasing levels of abstraction available if you feel like it.

nknighthb · on March 11, 2015

> The psuedocode in question happens to be syntactically valid Ruby.

So is everything else anyone types, code-like or not. He even said "Note how I deliberately shuffled the order and didn't bother with escaping.".

The response was flippant, intelligence-insulting, and obviously the result of failing to read thoroughly.

And speaking of intelligence-insulting, we all know you can run raw SQL through Sequel.

You're not having a useful dialogue, you're being combative, like the person I initially replied to.

emodendroket · on March 11, 2015

As everyone has repeatedly pointed out, once you deal with the obvious issues you quickly end up with something that looks like the gazillions of other ORMs on the market; sorry if I hurt anyone's feelings.

djur · on March 11, 2015

I said the code snippet looked like raw interpolation to me, and I asked how it could be made not vulnerable to injection. It was an honest observation and a genuine question. No flippancy was involved. You're free to think I'm an idiot, but you are the one being insulting and combative here.

moe · on March 10, 2015

How do you make that not vulnerable to injection unless you're escaping all variables that might be used in a query?

It was just a mockup. But you are right, in reality it would end up looking more like this (and use custom interpolation for escaping):

   s.add "WHERE foo > $(bar)"

Likewise a smart syntax for clause combining (AND/OR) and some kind of nesting would probably be needed.

I believe both of these problems should be solvable without compromising the simplicity of the approach.

grey-area · on March 11, 2015

You still need to supply the local variables to your ORM-like so that it can perform interpolation into the string (a feature which is actually best done by the database driver, NOT your ORM-like). Perhaps we can suggest this syntax:

s.add "WHERE foo > ?", bar

Or even this:

s.Where "foo > ?", bar

At that point, you've reinvented ActiveRecord or hundreds of other query builders, it also avoids you having to remember the ordering rules of sql as to which part of the query must be built first. There's a reason these query builders have converged on similar syntax.

I see where you're coming from - you don't want to learn two sets of syntax, but many query builders nowadays are very well thought out, and have a simple syntax which just echoes SQL while avoiding sqli and statement order issues, they also usually have an sql() option which lets you just send SQL if you wish for a complex query.

There are very good reasons people keep inventing layers over SQL:

They help avoid sqli

They centralise db access so you can optimise in one place by say adding caching layers to queries

They mean you don't have to deal with the vagaries of SQL for simple queries unless you want or need to

qooleot · on March 11, 2015

"You still need to supply the local variables to your ORM"

That can be done automatically in many languages. At least in the sense you can do:

"where value=$(bar)"

and not:

"where value = ?", bar

OpenACS was an old web framework for the Tcl language, but it even had that 20 years ago!. The company backing it failed in the .com era, and the lack of types was a turnoff for many - but it made SQL much much more readable. And of course the queries were turned into prepared statements. A query looked like this:

select foo from table where age>:min_age and name != :user_name

Things like this can also be done in node.js (disclaimer: I wrote the blog post below):

http://ivc.com/blog/better-sql-strings-in-node-js/

moe · on March 11, 2015

Perhaps we can suggest this syntax: s.add "WHERE foo > ?", bar

I'd say stick with keeping it a string and use:

  s.add "WHERE foo > $(bar)"

but many query builders nowadays are very well thought out, and have a simple syntax which just echoes SQL

And that's the problem. Everyone "echos" SQL, just slightly differently, each with their own idiosyncraticies.

If all you do is echo anyway, why don't you just let me write plain SQL, for christ's sake?

It's much easier for humans to memorize a slight SQL dialect than a different god-awful, nested ORM API boilerplate for every language they're working with.

I can write this without thinking, for queries of almost any complexity:

  s.add "SELECT id from users WHERE"
  s.add "name == $(name)" unless name.nil?
  s.add "AND age > $(age)" unless age.nil?
  s.add "GROUP BY age"

A parser/query-builder doesn't need amazing smarts to infer exactly what I want from this sloppy syntax; e.g. drop the "AND" if the first condition was false and present me with meaningful error messages when something is ambiguous.

Ask me to formulate the same query in any ORM and I'll have to refer to the documentation. And almost every time I'll waste a stupendous amount of time on getting some irrelevant, syntactic detail just right. Only because the ORM insists on an autistic API.

They mean you don't have to deal with the vagaries of SQL for simple queries unless you want or need to

After having worked with Hibernate, SQLAlchemy (the one-eyed amongst the blind), ActiveRecord, DataMapper, Sequel and a few others that I don't even remember, I conclude with passion: They have it backwards.

It's not the vagaries of SQL that I don't want to deal with. It's the vagaries of the ORM du jour that I don't want to deal with.

grey-area · on March 11, 2015

If all you do is echo anyway, why don't you just let me write plain SQL, for christ's sake?

They usually do more. SQL is not very pretty or friendly IMO, ORMs clean it up a bit (for example allowing arbitrary ordering, sqli, construction of multiple queries from one base). The example I gave was an arel query - very similar to yours if you move the WHERE out into the function name.

Your ORM proposal above looks fine (and very similar to existing ones but with more strings), but personally I find it uglier than the many which are available, and you haven't actually tackled the protection against sqli or explained how that would work. Normally you'd pass that through to the db driver as a $n param, rather than trying to escape a string at the ORM level. If your ORM is going to infer details from the sql, it basically has to parse that sql first and handle all the same things that current ORMs do by using named functions to add SQL - you'd find the same issues with consistency and edge cases I imagine, and of course have to be very careful about parameters and sqli.

So if you don't like ORMs, don't use them, but you do have to be sure that your own ad-hoc home-grown ORM that you're writing instead covers all the bases current ones do. I prefer the syntax of the ones you're denigrating for simple queries, even if for complex queries it might be better to write sql directly (via the functions they provide), but would be interested to see it in action - when are you going to start :)

vinceguidry · on March 11, 2015

Or you could just use Sequel and not have to build up your own library.

ivanr · on March 10, 2015

There is. MyBatis, which is a minimal ORM that aims to keep you as close to SQL as possible, has support for dynamic SQL: https://mybatis.github.io/mybatis-3/dynamic-sql.html

Edit [responding to moe, below]: that's a matter of taste. I prefer to have my SQL _outside_ my code. If I have to write a little XML to make it happen, so be it. Additionally, unlike your example, a strict separation of SQL and data ensures SQL injection is not possible. That's also a worthy goal.

moe · on March 10, 2015

MyBatis

Does MyBatis have an API similar to what I outlined above?

Because the page that you link to shows a horror that I can't even begin to describe:

  <select id="findActiveBlogWithTitleLike"
       resultType="Blog">
    SELECT * FROM BLOG
    WHERE state = ‘ACTIVE’
    <if test="title != null">
      AND title like #{title}
    </if>
  </select>

SQL mixed with proprietary XML soup? Talk about combining the worst from all worlds...

bigpeopleareold · on March 10, 2015

Get rid of the XML and you have a good start, I'd say.

solidsnack9000 · on March 10, 2015

> Why is there no ORM that works like this?

Because no one wants to write a SQL fragment parser. You would have write a parser that:

a) Could be started at an arbitrary place in a query and figure out what it needed to do.

b) Would be SQL-dialect aware. (Maybe -- it might be possible to skip this.)

Were it not for (a), I'd imagine you could quickly build this by using some kind of debug feature of the database -- sending it a query to parse and getting back a parse tree.

neilk · on March 10, 2015

What you're describing isn't an ORM, it's a SQL query abstraction.

Knex is a very convenient, Promise-oriented query abstraction, and it has "raw" methods when only SQL snippets will do. http://knexjs.org/

lugg · on March 10, 2015

Generally I'll go with something like this for these conditional search based queries

    select id, blah, boo
    from products
    where 1
    and (title like :title} or if(:title = '', false, true))
    and (price <= :price_range or if(:price_range = -1, false, true) 
    limit :limit
    offset :offset

Its a trade off, bit of extra complexity in the query for simpler code logic. Really, I avoid using sql for these types of "search" / find methods, they're probably going to be slow as hell as soon as you get a sizable data set.

I really despise orm's its essentially taking a black box (sql/database) and throwing it inside a much less tested, less optimized, less documented black box.

For what reason, I have no idea, nobody has ever been able to convince me there is a problem that needs solving. Almost every argument i've heard amounts to problems sql already solved a decade ago. I can only speculate but I'll hazard a guess it comes down to lack of enthusiasm to really learn sql properly (even though the person uses it / debugs it daily)

GFischer · on March 10, 2015

Microsoft's Entity Framework and LINQ work pretty much like that, only with nice IDE support.

I'll try to write an example like that when I get home.

Of course, they're not "just SQL", and I do understand where the above comments come from.

And yes, sometimes I have to go look up how to do things I'd do trivially on SQL.

encoderer · on March 10, 2015

> Why is there no ORM that works like this?

Because you're only showing a query builder, the "relational" not the "object mapper".

From an OOD point of view, if the end result of that query will be Product instances, why am I using a Select object to create them and why is it having to do some sort of string parsing to determine the objects I'm loading?

moe · on March 10, 2015

if the end result of that query will be Product instances, why am I using a Select object to create them

Because we can just infer the type to be returned via the FROM-clause of the query.

and why is it having to do some sort of string parsing to determine the objects I'm loading?

Because, to cite the immortal Larry Wall:

   The computer should be doing the hard work.
   That's what it's paid to do, after all.
   -- Larry Wall

It has to do the string parsing, escaping and intelligent validation so I, the human, can write

   s.add "count(*) / sum(count(*)) * 100 as percentage"

instead of

  .select_append { ((count(star) / sum(count(star)).over) * 100.0).as(:percentage) }

The difference is that most people can read and write the former without thinking.

The latter version may not look much more complicated at a glance. But as we all know these seemingly trivial fragments, more often than not, take ungodly amounts of very frustrating trial & error before they play along.

djur · on March 11, 2015

You don't have to pass a block to select_append. With Sequel, you can do this:

   # db is a Sequel::Database
   s = db[:foobars]
   s = s.select Sequel.lit("max(id) as best_id")
   s = s.select_append Sequel.lit("count(*) / sum(count(*)) * 100 as percentage")
   s.sql
   # ==> "SELECT max(id) as best_id, count(*) / sum(count(*)) * 100 as percentage FROM `foobars`"

The only real difference here is Sequel.lit, which is needed for security (any secure DB interface needs to somehow be notified that strings are safe to put into a query without escaping). If I'm writing code that leans heavily on Sequel to build queries, I'll make a private method #sql that is an alias for Sequel.lit.

It's entirely feasible to use Sequel this way. I've used it like that in production. I loathe the query building DSL, myself, but it's strictly optional.

moe · on March 11, 2015

The only real difference here is Sequel.lit

Well, no.

I repeat: The real difference is that most people can't write even this utterly trivial snippet without studying the Sequel documentation first.

Now what if I want a WHERE-clause? Do I have to use s.where? Or s.select_append("where ...")? What if I need to combine them with AND/OR?

It's not ok that we have to think about all this boilerplate that has nothing to do with our actual query.

We shouldn't have to translate our perfectly unambiguous request from english (SQL) to klingon (Sequel API) in order to have it processed.

djur · on March 11, 2015

I'm skeptical that it's possible to assemble a working SQL query of much complexity unambiguously from an arbitrarily-ordered set of clauses. CTEs, subselects, and boolean conditions would require a nesting construct. Boolean conditions, order clauses, and some joins need to be assembled in a meaningful order. Eventually you're going to find a level of complexity in the query where it's necessary for a developer to read the documentation, either to figure out the API or to understand the method the library uses to stitch queries together.

There might be a sweet spot yet unreached in terms of allowing developers to work with SQL on top of as minimal a native binding as possible. I'm certainly not claiming that Sequel/SQLAlchemy/etc. is at or near that sweet spot. It might be necessary to use a language with language-level support (or macros) to really reach it.

encoderer · on March 12, 2015

How often do you have a need for non-engineers to be able to alter your database queries? Why is this a requirement or even desirable? There seem to be few cases where a query change needs to be made in isolation, without having to touch any of the downstream consumers of that data.

So if we remove this non-requirement of "anybody can write it", we're back to my original question: Ok, we the coders know we want a Product instance, but to build it we create a Select instance and then ask a library do so some sort of parsing?

Possibly no ORM exists that meet your criteria because there are issues with the design goals you're trying to achieve?

qooleot · on March 11, 2015

Have you thought about JSON as a partial query placeholder, and all the native methods available for traversal, merging, plucking, etc.?

For example, here is :

{ type: 'select', columns: [{ expression: 'orgJobs.org_job_code', alias: 'job_code' }], joins: [{ table: 'org_jobs', alias: 'orgJobs' }], limit: 100, offset: 0 }

which yields:

select orgJobs.org_job_code as job_code from org_jobs as orgJobs limit 100 offset 0

and so its easy to do things like:

query.limit = 100;

and of course:

query.columns.push(additionalColumn);

and use something like lodash/underscore (or equivalent for your language) to add collection/array/object helper methods so you never have to write for-loops for things like mapping and property plucking.

rmc · on March 11, 2015

Now you have to be aware of SQL escaping to prevent SQL injection attacks.

onion2k · on March 11, 2015

Everyone who works with SQL should be aware of SQL escaping. There's nothing wrong using a library to do it for you, but what you should have an understanding of what the library is actually doing. Otherwise how could you possibly test it? Just by hoping it's right?

cmdkeen · on March 11, 2015

One of the things I like about EntityFramework is that you write LINQ, which odds are these days every C# dev should be familiar with. So, standby a few gotchas (which SQL has as well), the code you write to query an in memory collection is the same code you write to query the database. This also makes it fairly easy to test because you can create in memory representations of your data and check you filters, projections, orders etc.

tracker1 · on March 11, 2015

Stuff like your example is where, in general I prefer a sproc, or something more like the following...

http://www.ivc.com/blog/better-sql-strings-in-io-js-nodejs-p...

merb · on March 12, 2015

there is, it's called anorm. https://www.playframework.com/documentation/2.3.x/ScalaAnorm

You don’t need another DSL to access relational databases SQL is already the best DSL for accessing relational databases. We don’t need to invent something new. Moreover the SQL syntax and features can differ from one database vendor to another.

jbergens · on March 10, 2015

Micro ORMs usually works very much like this. There is a problem when you want sub records, they are handled differently in each library or not handled at all.

hammerandtongs · on March 10, 2015

You've formulated the composibility reasonably in that sentence but it's important to remember that is not the only problem the world faces in data manipulation.

I think the irony is that much of the problems of sql are related to a lack of tooling.

The lack of tooling is result of large numbers of devs being distracted by ORM which moves that particular problem into the language ide/editor.

Essentially we lack great *.sql editing, ide, macro, refactoring because of the lack of focus on sql itself.

edit: as an aside, string concatenation in the language of choice is its own road to hell.

rdtsc · on March 10, 2015

> Essentially we lack great *.sql editing, ide, macro, refactoring because of the lack of focus on sql itself.

Except that my other tools are just an editor like emacs and makefiles. So now to be productive I have to have and ide for sql, learn sql, learn macros.

Interesting that SQL came first. If it was so easy and obvious to use it we would not have seen any ORMs by now. But every other project that uses SQL databases ends up with some kind of ORM.

One of the reason is probably because there is an impedance mismatch. Code is already written dealing with objects, functions, pointers, data. But when it talks to the database it has to talk via a command line string interface so to speak.

There is another type of impedance mismatch and that is learning a new language. One can argue SQL is there to keep & manipulate the most precious things you have -- data, so one should spend time learning it well. But the other point is, SQL is there to keep & manipulate the most precious things you -- data, so don't mess with it if you don't know SQL very well. Let someone else learn it (the author of the ORM).

I am not arguing for one point or the other, just kind saying how it is interesting how we have been going in a circle the last 15 years or so.

tracker1 · on March 11, 2015

I think it's a mixed bag... and depends on what you want to accomplish... a lot of the time, I will find myself bypassing an ORM about 20% of the time... the other 80%, the ORM model is simpler.

I find LINQ + Entity Framework to be a blessing in C. Though I use the monad/function syntax not the LINQ query sugar when I use it.

hammerandtongs · on March 10, 2015

I also use emacs(evil-mode actually). An ide is not much different then a fully configured vim or emacs.

sql-mode for some databases(mysql and oracle iirc) reads the DDL of the tables you are typing about, why is this not better and more general?

Sql is so well established and general as a solution that not learning it is a poor choice.

emodendroket · on March 10, 2015

You mean something like this? https://www.jetbrains.com/dbe/

hammerandtongs · on March 10, 2015

Yes that was in my mind as I was typing it actually :)

Notice that its only a yearish old I think?

nileshk · on March 11, 2015

The IntelliJ IDEA Database plugin [1] on which 0xDBE is based on has existed for quite a while. I'm not sure when it become very useful as I've only used it for a few years, but it is quite powerful and useful.

The SQL code completion is quite good (and it caches your entire database schema so it is very quick). That's not particularly special on it's own, but what makes this especially useful is IDEA's "language injection" [2] feature. This allows you to, for example, get completion for SQL when it is contained in some other language, which could be anything (e.g. a Java String, Ruby String, XML, etc). It will also analyze and report errors in this SQL on the fly.

And SQL statements contained in concatenated strings are no longer a problem because you can edit those in a separate editor window where you are only editing the SQL, and it automatically gets placed into the concatenated string.

Not sure if it addresses all of your specific concerns, though.

[1] https://www.jetbrains.com/idea/features/database_tools.html

[2] https://www.jetbrains.com/idea/help/using-language-injection...

bigpeopleareold · on March 10, 2015

I am rather fond of Sqitch and pgTAP as tools for database development.

hammerandtongs · on March 10, 2015

Sqitch problem domain I handle a different way.

Pgtap is on my list of things to poke around with.

edit: thanks btw

joshuahutt · on March 10, 2015

I would write a SQL function that takes all of those as optional parameters and includes a lot of these:

    WHERE (_title    IS NULL OR title LIKE _title)
    AND   (_minPrice IS NULL OR price > _minPrice)
    AND   (_maxPrice IS NULL OR price < _maxPrice)

schrodinger · on March 10, 2015

That ends up being very inefficient as the optimizer will come up with one mediocre plan that works for all those criteria.

ad_hominem · on March 10, 2015

This actually answers the question, although I would imagine the ORDER BY handling will look pretty messy (CASE statement perhaps?).

And I don't know what it would look like if the requirements changed to allow ordering by multiple columns with different possible sort directions... that might get back into dynamic SQL using a RETURN QUERY EXECUTE type of thing, which is basically using a query builder in your query language.

babuskov · on March 11, 2015

Most SQL engines support ordering by column number.

select c1, c2, c3, ... order by 2;

would order by "c2"

wallyhs · on March 10, 2015

> Or are there some advanced Postgres features that I'm not aware of that would allow all these combinations in some kind of prepared statement?

I can't speak specifically to Postgres, but, in my experience, there is nearly always a way to do what you want with bound parameters. No advanced features are required. It often results in poor performance and redundant code that is hard on the eyes. You will get what you deserve, but sometimes you don't have a choice.

Here is an example of the horror:

    WHERE (:1 IS NULL OR :1 = item_id)
      AND item_num BETWEEN NVL(:2,0) AND NVL(:2,9999999999)
      AND item_date BETWEEN NVL(:3,'01-jan-1900') AND NVL(:3,'31-dec-4712')
      AND item_name LIKE NVL(:4,'%')
    ORDER BY
      CASE :5 WHEN 'NUM' THEN item_num WHEN 'DATE' THEN item_date ELSE item_name END,
      CASE :6 WHEN 'NUM' THEN item_num WHEN 'DATE' THEN item_date ELSE item_name END

Edit: I suppose you could also parameterize the ascending vs. descending sort, although I have never tried. My first thought is to duplicate each line in the ORDER BY clause: one bind parameter for ASC and another for DESC. Have each CASE return a constant if the bound value is NULL, and then bind NULL for the direction you do not want. Yuck.

I am not advocating any of this but am pointing out that bind parameters can be abused in surprising ways if you are backed into a corner.

mbreese · on March 10, 2015

You might be over thinking this... something like this might work out just fine. (I wouldn't necessarily do things this way, but rather keep a list of clauses and join them with " AND " to avoid keeping track of the "WHERE"s and "AND"s, but you get the point...)

    sql = 'SELECT * FROM products'
    args = []

    if title:
        sql += ' WHERE title LIKE %?%'
        args.append(title)

    if upc:
        if args:
            sql += ' AND'
        else:
            sql += ' WHERE'
        sql += ' upc LIKE %?%' 
        args.append(upc)

    if price_min:
        if args:
            sql += ' AND'
        else:
            sql += ' WHERE'
        sql += ' price >= ?'
        args.append(price_min)

    if price_max:
        if args:
            sql += ' AND'
        else:
            sql += ' WHERE'
        sql += ' price <= ?'
        args.append(price_max)

    if order_by_name:
        sql += ' ORDER BY name'
    elif order_by_price_low:
        sql += ' ORDER BY price'
    elif order_by_price_high:
        sql += ' ORDER BY price DESC'


    conn.query(sql, args)

ad_hominem · on March 10, 2015

This is a query builder. If you're using your programming language to dynamically assemble the SQL statement fragments at runtime, then you're using a query builder regardless of if it is a library with a fancy DSL that assembles an in-memory SQL AST or some kind of ad-hoc string concatenation you rolled yourself like this.

The question I'm asking is pointed towards the people who are implying that you can "just use SQL" as static statements that are not dynamically assembled. Like a static function or prepared statement that takes some parameters, and at runtime you only pass in those parameters - not rejigger the actual SQL statement fragments.

mbreese · on March 10, 2015

Yes, logically, it's a query builder. But it's one that doesn't force any extra dependencies in my project and is very easy to troubleshoot.

Seriously, how is this different from "just use SQL". SQL was never really intended to be used by itself (except for manually typing queries into a console). Yes, what I wrote is code that writes code, but it's far different from a larger ORM or querying framework.

ad_hominem · on March 10, 2015

Disregarding my edit and how your query builder is irrelevant to my question anyway, if I change the requirement to allow the user to sort by multiple columns instead of one, that case statement is going to either combinatorially explode or you'll also have to add some ad-hoc SQL escaping to guard against SQL injection since you can't use bound parameters in ORDER BY clauses.

It's easy to see how libraries like Sequel get invented to help people do this, and I don't really see what the problem is when the DSL stays close to SQL semantics.

mbreese · on March 10, 2015

Dealing with multiple ORDER BY clauses isn't any more difficult, but you'd have to figure out how to get the user input into some kind of list. That will require validation too... Just using an ORM doesn't let you ignore input validation. You wouldn't trust a user to use their own column names in a WHERE clause anymore than you'd trust them in an ORDER BY clause.

Trying to differentiate between a DSL that interfaces with SQL and SQL is nonsensical. SQL is the original DSL.

rdtsc · on March 10, 2015

> SQL was never really intended to be used by itself.

Hmm I thought that is how it was intended to be used? If it didn't, it would look like datalog, lisp or some binary protocol with prefixed lengths and whatnot.

It was intended to be typed in by analysts at a console. Who would then print the report on the dot matrix printer and mail to the headquarters or something of that sort.

mbreese · on March 10, 2015

That data has to get into the RDMBS somehow...

I guess a better wording may have been to say that SQL was never intended to be used in isolation. It can be, but you've always had interfaces to other languages / platforms. SQL is a data manipulation and retrieval language - the rest of the business logic has to be performed somewhere else (excepting complex stored procedures).

collyw · on March 11, 2015

I was told it was intended to be easy to use for non techies at University. It is quite wordy, kind of the opposite of some Perl.

jbergens · on March 10, 2015

SQL was often combined with Cobol (might still be where they have Cobol).

collyw · on March 11, 2015

was? You sound like lots of paces have got rid of it.

adrianmsmith · on March 10, 2015

I know this isn't central to your point, but I thought I'd say it anyway. These days I have taken to doing:

    sql = "SELECT * FROM products WHERE TRUE"
    ....
    if upc:
        sql += ' AND upc LIKE %?%'
        ....

This saves having to worry about "AND" and "WHERE", and the extra "WHERE TRUE" results in the same execution plan (thus the same performance).

chaz · on March 11, 2015

I also add an extra column to my SELECTS:

  SELECT
    users.id,
    users.email,
    1
  FROM
    users

I can just use "--" to comment out any line without having to manage the last comma. Especially useful when building aggregations.

clebio · on March 11, 2015

Awesome. That's such a straight-forward solution to one of the crappy edge-cases that SQL is so bad about (ending lists without commas, like JSON requires too, being another).

jimbokun · on March 10, 2015

Now how is this an improvement over the Sequel example?

I understand the impulse to "Just write SQL." But in practice, with all the string concatenation needed to generate actual queries, you can't really see what the SQL will be without running all the code in your head anyway.

mbreese · on March 10, 2015

In practice, I'd reduce this to a query building function in my own code, so the readability would be much better. The benefit is that you don't need to include another library for it to work. Well, that and it is a strategy that works for all languages, not just Ruby.

scott_s · on March 10, 2015

And the downside is now you need to spend effort maintaing and debugging your custom query builder.

IanCal · on March 10, 2015

Yeah but a lot of it is quite generic, so could be reused in a few places and therefore make that extra effort worthwhile. In fact, it might be handy in a few places so you could package it up and then have a single place you have to maintain the query building code for several projects.

Wait a minute...

mbreese · on March 10, 2015

I think you're over estimating the amount of effort that I spend on this.

falcolas · on March 10, 2015

I've done this quite a bit as well: string building is pretty well understood, and works remarkably well.

hobs · on March 11, 2015

Dynamic prepared(NO ADHOC, THIS IS CRAP) TSQL is a common occurrence for me in production, it is often a better choice that writing a bunch of conditional logic in SQL (to get sargeable queries) or a bunch of really unperformant case statements in where clauses.

I have written TSQL that changes the entire query (which was on the order of 80 joins) dynamically based on about ~20 different parameters, only a few actually passed into the stored procedure, the rest based on settings and configuration. It took a few hours to grok how everything came together at first, but a few tricks that the optimizer will remove means you can add arbitrary conditionals in whatever order you please.

The example you give is VERY trivial for dynamic SQL. postgres - http://www.postgresql.org/docs/9.1/static/ecpg-dynamic.html mssql - https://msdn.microsoft.com/en-us/library/ms188001.aspx

rpedela · on March 10, 2015

That can be solved by writing a SQL function and the user input values are arguments to the function.

falcolas · on March 10, 2015

    query = sprintf("select count(*) from data where %s = ?", column)
    results = sql.execute(query, filter_value);

So long as you leave the "value" portion of the query as a '?' (or %s, or whatever your connector requires) and don't use direct user input for the column names, you're still safe from SQL injection.

solidsnack9000 · on March 10, 2015

Even with PL/pgSQL, there is no way to build a SQL statement dynamically _and_ safely -- it's all just string concatenation there, too. However, using `CASE`, `WITH` and `LATERAL` you can have a root `SELECT` that returns one of a few different queryable code paths, and `WITH`/`LATERAL` allow you to reuse definitions.

rpedela · on March 11, 2015

Use format() for safely constructing dynamic queries.

http://www.postgresql.org/docs/9.4/static/functions-string.h...

solidsnack9000 · on March 11, 2015

You end up doing a lot of string concatenation even with that, though. It's not quite the same thing as having some kind of literal or combinator based setup.

  select(from('tab'), where(and(like('name','Mo%'),not(is('id',NULL)))

rpedela · on March 11, 2015

You can get a similar syntax with format() by nesting format() calls such as:

  EXECUTE format('SELECT * FROM tab %s', format('WHERE %I IS NOT NULL', 'id'));

snambi · on March 11, 2015

Answer : Learn SQL. It is very easy to pass variables.

bigpeopleareold · on March 10, 2015

I'll deal with the problem when I have to.

However, the point is that I'd rather use SQL, it's simple as that.

dustingetz · on March 10, 2015

Orms are not about syntax, they're about things you can't natively "think about" in SQL, like inheritance, composition, references and graph navigation

collyw · on March 11, 2015

I prefer them for taking away the grunt work of needing to code updates and inserts for everything.

jballanc · on March 10, 2015

I'd argue it's less about composability, and more about laziness (in the computational sense). As pointed out elsewhere, if you have all of the information needed to generate a dynamic query, it's often not a huge leap from an ORM to composing strings (especially given how relatively flexible SQL syntax can be).

However, sometimes I want one part of my program to be responsible for one bit of a query, and a separate piece to be responsible for something else. To take a trivial example, say I want one object/function to be responsible for doing the right sorting, another to be responsible for any pagination, and yet another to be able to group results when needed. In that case, having a programmatic abstraction over a query (whether it be an object or a datatype, doesn't matter) can be very useful.

Udik · on March 10, 2015

I agree with you example, in that case it is useful. It's just that it is a pain in the ass in so many other cases.

I always wonder why developers (usually young and enthusiastic ones...) pick a set of the most challenging requirements they can think of and then use them to justify the usage of some library or pattern that makes life a nightmare in the simpler cases, which are obviously the majority of the use cases.

It's like using a tank for going to work, because the roads might possibly have been destroyed and there might be enemies shooting at you. Then ending up destroying and having to rebuild your own garage because the tank is too big to fit in. Complete nonsense.

jballanc · on March 10, 2015

> I agree with you example, in that case it is useful. It's just that it is a pain in the ass in so many other cases.

It's a pain, because what we really want in such cases is a datatype server, not a relational database. For all of SQL's merits, the NoSQL backlash was not without cause. For too long the SQL hammer has been wielded to pound in too many screws.

Of course, programmers being programmers, NoSQL turned into just another hammer used in just as many inappropriate situations. The example I outlined is not a theoretical one. I've dealt with such situations...but I've done so with Datomic, which is designed with datatype-composability in mind. Similarly, I've found a healthy mix of SQL, Redis, memcache, and Datomic all in moderation can go a long way.

...but I've still not found a solution where Mongo is the obvious answer.

bigpeopleareold · on March 10, 2015

The biggest challenge is to understand and find the simplest case and deal with the hard cases when they happen. I had to learn this over time.

rikkus · on March 10, 2015

Here's an example in C#. Imagine you're querying a database of products (here represented by integers). Users can enter filter parameters - you want to build your query dynamically based upon what they enter. With LINQ, you can do this kind of composing with no effort. You also get to run the same code on any kind of Queryable, so if you feel like doing some of the work in RAM and some using a DB, your query is usually going to be very similar, if not the same.

Yes, you still have to understand what you're querying against and how you should build your queries to make best use, but I'd much rather write this kind of code than try to concatenate SQL.

  IQueryable<int> Source()
  {
    return Enumerable.Range(0, int.MaxValue).AsQueryable();
  }

  class UserFilter
  {
    public bool? EvensOnly { get; set; }
    public int? Minimum { get; set; }
    public int? Maximum { get; set; }
  }

  IEnumerable<int> Search(IQueryable<int> source, int currentPage, int pageSize, UserFilter filter)
  {
    var result = source;

    if (filter.EvensOnly.HasValue && filter.EvensOnly.Value)
    {
      result = result.Where(i => i % 2 == 0);
    }
		
    if (filter.Minimum.HasValue)
    {
      result = result.Where(i => i >= filter.Minimum.Value);
    }
		
    if (filter.Maximum.HasValue)
    {
      result = result.Where(i => i <= filter.Maximum.Value);
    }
		
    return result.Skip(currentPage * pageSize).Take(pageSize);
  }
	
  void Main()
  {
    var currentPage = 4;
    var pageSize = 5;
		
    var filter = new UserFilter {
      EvensOnly = true,
      Minimum = 1000
    };
		
    var searchResults = Search(Source(), currentPage, pageSize, filter);
  }

zaroth · on March 10, 2015

FYI, you can also write it this way, although you may find it less readable, it emits the exactly same query to the DB, and I personally find it much more pleasing;

  IEnumerable<int> Search(...) {
     return result
        .Where(i => !f.EvensOnly        || i % 2 == 0)
        .Where(i => !f.Minimum.HasValue || i >= f.Minimum)
        .Where(i => !f.Maximum.HasValue || i <= f.Maximum)
        .Skip(currentPage * pageSize).Take(pageSize); 
  }

Nit: EvensOnly should not be nullable since it has only two states.

rikkus · on March 11, 2015

I find this more readable. In fact, the code here looks more like this:

    .FilterByEvensOnly()
    .FilterByMinimum()
    .FilterByMaximum()

Obviously the example is trivialised and the real filters are more complex, requiring joins, but the pattern gives a very readable way of writing an efficient query.

Unfortunately EF produces an unreadable query when there are a few dozen filters, but the LINQ code is readable, so there hasn't been any difficulty debugging.

bmurphy1976 · on March 10, 2015

Works great until you need some feature that is only implemented in SQL and in my experience that's unfortunately about 30% of the time.

mbreese · on March 10, 2015

How often do you really reuse bits of a SQL query? SQL is the language that an ORM will generate. The question is, how much work is done to avoid using SQL and is it worth it?

The only time that I've found ORMs useful for composing queries when I have to dynamically create a query at runtime based upon user input. And even in that case, today, I'd probably still just concatenate strings for a proper SQL statement.

scardine · on March 10, 2015

I use this a lot for report-builder interfaces, and according to my personal experience SQLAlchemy in Python maps really well to raw SQL while adding a lot in terms of DRY, portability and security.

I guess the underlying question is how much your development style involves meta-programming (and how much someone enjoy reinventing the wheel with their own query builder engine instead of trusting a mature ORM).

vidarh · on March 10, 2015

With Sequel I reuse bits all the time, because you can generally very easily chain statements that return dataset's until you force evaluation of the query.

collyw · on March 11, 2015

I think there is a problem of tooling in SQL, which means it is difficult to reuse it.

I spent the other day writing a big query, joining about 10 tables. So usually I start by joining the two main ones. Check that gives me the results I want. Add in another table or conditions, check again. Repeat until all tables are joined into the query and conditions are added.

Then I noticed the results of my GROUP_CONCAT were not as I expected. I had a couple of suspect joins that I tried removing. Same problem. In these situations, it is often easier to go through the same process from scratch of adding in one table at a time and ensure that it is working.

We need some kind of unit test equivalent for SQL.

johan_larson · on March 12, 2015

Mature relational databases support views, pre-built queries that developers can define and other developers can then use as "tables" in higher-level queries, allowing some code reuse. This has been around a long time.

jbergens · on March 10, 2015

It happens a lot for me. Basically you have to choose between copy-pasting a lot of sql code or trying to reuse parts when the business logic has multiple cases that are similar but not exactly equal.

Udik · on March 10, 2015

The composability requirement only justifies itself when... your queries need composability.

I'm absolutely in favour of using a query builder when it avoids the need to fiddle with strings, but at the same time any decent programmer can write in half a day a query builder that satisfies the vast majority of the use cases.

So the question is: once you have covered those cases, do you need to use the query builder to write static queries? No. Do you need an extremely complex query builder to build extremely complex queries? Again, no. SQL is great for writing very readable and optimized static queries, which are almost all the SQL queries you need when you're not dealing with search forms.

Bottom line: use your favourite query builder or write a tiny one, but use it only when the resulting code is simpler than the plain SQL + string joining.

YorickPeterse · on March 10, 2015

We already have quite a bunch of query fragments we use to compose bigger queries, so I'd say in those cases it's more than justified.

gfody · on March 10, 2015

> So tl;dr: it's about composability, not being "lazy" or "ignorant" to SQL.

If you want composability you should factor the common sql fragments out to parameterized views (table valued functions), or just regular views. Doing this across a large project can really improve performance as well.

bborud · on March 11, 2015

> In both cases there's little validation of the query's > correctness (syntax wise) until you actually run it.

Yes? So? When is this really a problem? Even if you do generate queries dynamically. When does this become such a big problem that it is worthwhile killing readability and bogging down the system with lots of extra complexity?

As for ORMs: I've done software for 25 years. I still have not seen a single example of ORMs resulting in better software. Not a single time.

kiyoto · on March 10, 2015

I agree with this 100%. SQL works incredibly well for purely declarive, relatively simple operations. However, the moment you start doing "Do this, then that, then that", it starts to look (literally the appearance) daunting.

I myself come from the R/Python background, and my hypothesis is this difficulty around composability (especially for dplyr/pandas crowd) is what deters data scientists from writing more SQL.

frik · on March 11, 2015

Use an IDE like JetBrain's IDEA that checks the SQL syntax against the database.

Pxtl · on March 10, 2015

I use Entity Framework as an ORM and I have to say I like it. As much as there have been a lot of false-starts related to trying to reinvent SQL, I think Entity Framework hits many of the right notes for me:

1) All my code lives in Visual Studio with compile-time type-checking. No maintaining stored procedures outside of my main codebase, no mucking about with strings. And because Entity Framework puts the Select after the From clause, I even get good intellisense on column-names.

2) I can quickly shift code into being in-memory queries from database queries.

3) While it's a little muddy mixing SQL and C#, there are places where C#/EF's syntax is so much more brief and clean since it's hyper-aware of relationships. For example, instead of writing 2 joins to get the country of a user, I can say User.Centre.Country.

Of course, the Entity Framework is adding a layer of complexity to my work - you still have to know what's going on in SQL, and that mental overhead is a downside. But it pays off.

SQL is a brilliant language, and that's why it's stood the test of time in spite of its advanced age.

But seriously, it has a lot of bad flaws - it's often difficult to make reusable components in SQL like you can in other platforms.

The underlying relational algebra is brilliant. That's why I like ORMs - they admit that SQL is ideologically beautiful, but the SQL language itself could use some loving.

jgrahamc · on March 10, 2015

Yes. I pretty much agree with you and am happy for people to use ORMs (or raw SQL if that's their thing and they are into maintaining it). It's actually the underlying 'throw away all this old SQL database stuff' that is the biggest loss for people because they miss out on the power and optimization that's gone into those databases.

masklinn · on March 10, 2015

But doesn't the Sequel snippet just write SQL as Ruby and get Ruby objects out? It's still pretty raw SQL, just not SQL-as-embedded-strings-guess-the-syntax-error. And surely it's easier to extract, compose or otherwise manipulate reified queries than by concatenating floating bits of strings.

p8952 · on March 10, 2015

Yep, even better imo is that when you run a query it outputs the generated SQL. Often I'll have a psql console open next to vim while creating Sequel queries and just copy this output over to verify the generated SQL does what I expect.

atburrow · on March 10, 2015

When writing raw SQL, I tend to test my queries against the database first with sample queries, then carry them into code once all the logic has been figured out.

wavefunction · on March 10, 2015

I will just say that no one should have inline SQL in their code. For security's sake, use a parameterized query!

snuxoll · on March 10, 2015

Entity Framework still generates pretty crappy looking SQL. It's nice to get a project going, but every time I try to look at why we are having performance issues in production and see the horrible query EF is generating I cry.

Someone1234 · on March 10, 2015

While I agree the SQL it produces looks ugly, performance wise I haven't found too many issues (using EF 5 then 6). The few cases where it produced legitimately bad SQL (slow), was due to the developer failing to grasp how queries are built (in particular abusing "in" and having multiple layers of selects instead of just a flat query which could have done the same thing).

I will say EF requires extra training/knowledge. So it isn't a "free" framework. However being able to migrate your entire code base from one database to another only changing a single annotation in the Code First classes is pretty nice. As is being able to generate your databases via the initialisers.

Is it for every project? No. However for "classical" database projects where you're going to be writing a bunch of CRUD and shoving it into C# classes, it is very helpful.

mattmanser · on March 10, 2015

No-one ever migrates their db unless they really are in some sort of serious shit. I'll also guess you've never done it because the EF + MySQL = a world of pain. MySQL does not like nested queries, the EF uses them like they're crack, one of the reasons the SQL it produces is so hard to read. I'd guess that the EF + [any db that's not MS SQL] probably suffers from similar "holy shit why did the DB just die, oh it's the EF" problem.

I'm also guessing John's problem with that query is that it's a simple aggregate query in SQL, there's no good reason to use the ORM, mainly because you have no idea what convoluted SQL it's going to spit out. There's only downsides with no upsides.

Someone1234 · on March 10, 2015

> No-one ever migrates their db unless they really are in some sort of serious shit.

I've been involved in several migrations. All of which were planned over years in some cases. There was no "serious shit" at any point in the process.

One popular reason for migrations is licensing costs (e.g. escaping Oracle or IBM), or due to mergers where the other company had a different database system and they wanted to consolidate both technology but also expertise.

> I'll also guess you've never done it because the EF + MySQL = a world of pain.

I've never done what? I've never done EF with MySQL. I never claimed I did however.

I've used EF with other database systems including Oracle 11g, MS SQL (several), and other integrated it with some smaller database systems that had EF providers available.

> MySQL does not like nested queries, the EF uses them like they're crack, one of the reasons the SQL it produces is so hard to read.

You, the developer, formulate the queries. If you're seeing excess nesting then examine how you're doing things.

> I'd guess that the EF + [any db that's not MS SQL] probably suffers from similar "holy shit why did the DB just die, oh it's the EF" problem.

That's not been my experience. While I have found some other EF providers "limited" compared to Microsoft's provider, most of the problems encountered were due to underlying bad configuration or bad entity layout rather than EF itself. I will say Code First is significantly better than EDMX-style modelling. In particular in larger projects (the Visual Studio designer kind of sucks).

> mainly because you have no idea what convoluted SQL it's going to spit out.

It is very easy to see exactly what EF spits out. You can even pipe that output all over the place (e.g. error system, performance logs, etc). There's no excuse for being ignorant of what EF is doing under the hood anymore.

cmdkeen · on March 11, 2015

Indeed for web development tools like MiniProfiler give you access to the queries in the page along with execution time and duplicate query detection. We use it on Oracle with the Devart provider and it's pretty snappy even with the somewhat bonkers looking queries it sometimes generates.

Because in my office the alternative is PL/SQL functions that dynamically build WHERE clauses, and I'll choose EF over that any day.

spacemanmatt · on March 11, 2015

Given a best-case MySQL database that needs to be migrated elsewhere for data quality or performance reasons, it could be a very straightforward task that any qualified DBA could manage. That isn't the most common case, but it's one you can achieve by intention.

br1 · on March 10, 2015

Don't try to read the generated SQL. Debug the query plan directly.

emodendroket · on March 10, 2015

Yes, but using projections and generally avoiding descending deep down object graphs for queries of multiple objects generally yields acceptable performance.

radicalbyte · on March 10, 2015

> And because Entity Framework puts the Select > after the From clause

I think that you're confusing LINQ with the ORM.

LINQ is a language feature, Language INtegrated Query. It operates on IEnumerable<T>. They introduced a new interface, IQueryable<T> which extends IEnumerable<T> with Expression. Basically it exposes your "query" as an Expression Tree - this can then be used by QueryProviders to do cool stuff like generate SQL.

A number of ORMs implement LINQ QueryProviders; Entity Framework, NHibernate, BLToolkit are all good examples.

EntityFramework, like a lot of fully-featured ORMs suffers from all of the traditional ORM problems: runtime performance is sub-optimal, it's query heavy and if you use the visual designer it encourages bad development practices.

On the other hand it allows people with very little knowledge to be productive; for small-scale projects the bang-for-buck is hard to compete against.

platz · on March 10, 2015

EF has too much churn to be relied upon. They completely re-write it every few years. This time they are really taking an axe to the whole existing structure

emodendroket · on March 10, 2015

I like EF well enough but if you don't know what you're doing it definitely makes it very easy to write code that works but runs really slowly when n is bigger than like, 10.

JonnieCache · on March 10, 2015

Sequel is the best ORM I've ever seen or used, but that is (probably IMO) not a good application of it.

It gives you total choice over what level of abstraction you want, so this is a particularly egregious use of it, being as nothing about that query is dynamic :)

You can use it purely to execute handwritten SQL queries loaded from files, or stored procedures, or any level of abstraction between raw SQL and the monstrosity you posted. The model layer is totally optional and is built cleanly on top of the query/relational layer, not inside of it.

The thing is a fucking work of art to be honest. The design shows impeccable taste, ruby's more exotic features are deployed only when really needed, everything is so clear, especially if youre used to activerecord.

BTW also the developer is super helpful and pops up everywhere to answer questions.

jonahx · on March 10, 2015

jeremy is incredible. he is more helpful and available (and certainly more knowledgeable) than most paid support people for premium products.

if you are learning sequel, stop by the irc channel and he'll probably answer your question.

JonnieCache · on March 10, 2015

seriously the man is a hero. I owe him about a brewerys worth of drinks.

EDIT: I just went to look at the github: 1821 stars, zero issues. And yes, github is the projects official bug tracker. There are 662 closed issues.

I must see if I can arrange for work to send him some money...

netghost · on March 10, 2015

Or a brewery!

yawniek · on March 11, 2015

sequel + postgresql is pretty much a dream team. i really miss it now that i am on a python stack.

api · on March 10, 2015

Agreed, but it's important to understand what drove the adoption of NoSQL and schema-less stuff as well as the related trend of dynamic languages like Ruby and JavaScript.

(1) SQL server software itself was clunky, hard to scale, complex to deploy in a clustered fashion, and generally "old" in a bad way. This made NoSQL seem like a breath of fresh air.

(2) Startups! Ship now! MVP! Fail fast! The whole industry has been operating on this ship-fast-fail-fast way of doing things that prioritizes development speed and agility over correctness. That's because half the time what you are developing will be thrown away anyway -- it will fail, or you'll "pivot" and rip it all apart and almost start over. No point in static languages, SQL schemas, or other explicit design elements in that world.

#2 is I think the more significant of the two for things like schema-less MongoDB or dynamic languages. Once you get product/market fit, you typically end up taking the crap you wrote with no types and schemas and re-doing it in a "real language" with a "real database." But as they say in Startupistan, that's a "good problem to have."

dasil003 · on March 11, 2015

Where this line of reason goes completely off the rails is in thinking that schema-less databases are a hedge for future uncertainty. Yes they let you churn out code slightly faster, but then your data becomes a ball of mud just as quickly as your code base, except the former is much much worse because when you pivot you still need your data even if you decide to chuck out all the code that goes with it.

In fact, a traditional RDBMS is designed to allow for any kind of ad-hoc querying you desire with reasonable performance, and the ability to normalize/denormalize, index and materialize views in order to optimize unforeseen use cases. The excuse of poor scalability is just a rationalization that some kids who didn't understand SQL used to justify using something that has a shiny marketing page that they ignorantly found more viscerally appealing. The tradeoff was all wrong, because 99.9% of projects will never need to scale beyond a single DB server, and for those that do, the flexibility of a well-defined schema and ad-hoc queryability will give them an early advantage in what kind of pivots are reasonable—if they are lucky enough to have to scale then they can replace the RDBMS with the context of knowing what exactly their use case that they must scale is. And at that point you'll presumably have the resources to do it, which is much more effective than prematurely attempting to design a scalable infrastructure and discovering that you didn't have the first clue what the real bottlenecks or even the full use case would be.

collyw · on March 12, 2015

The "schemaless" argument is nonsense as well. No schema enforcement at the database level means more checking at the application level (more work basically).

pjmlp · on March 10, 2015

I imagine "whole industry" == Silicon Valley?

I am yet to do any project that doesn't use SQL servers for data storage.

api · on March 10, 2015

Yeah, I mean "Greater Silicon Valley" by which I mean SV itself and its wider diaspora. Other names include Startupistan, Hackerdom, etc.

Sivart13 · on March 10, 2015

I assure you that whoever wrote that code knows SQL.

The builder syntax is convenient because it allows queries to be composable without ugly string-smashing. For instance, you could take the object built out of that "User.select..." and pass it around into other functions that append extra 'where' clauses, etc.

chousuke · on March 10, 2015

I may just be rehashing sibling arguments here, but to me that particular API looks very much what I think an SQL-wrapping library ought to be: a replacement for string concatenation and something that allows you to treat SQL queries as data. I don't know anything about Sequel, but that example still feels close enough to SQL.

My experience with ORMs has been that I eventually end up regretting using one if I try to model my data as objects because with ORMs it's easy to code yourself into a corner where you end up wishing that your design supported the relational model instead.

I do understand the desire for a good ORM, though. SQL is extremely powerful and a well-designed database is a joy to work with, but a sequence of tuples is often not the most convenient datastructure to process in most programming languages.

fennecfoxen · on March 10, 2015

In my experience, it's great to start with ORMs, and they help you go really fast, but if the project is important then sooner or later you're going to find a case where you need to go around it and write custom SQL (probably for performance reasons).

And that's fine, but it does mean that you should pick an ORM that plays nicely with that workflow (or otherwise design your ORM to be compatible with it). Back when I was working with Perl I found that you could do some pretty good stuff with RoseDB -- it could load objects from your custom queries and they'd still work like normal ORM objects (i.e. you can load related objects from them). I've missed that capability on several occasions since.

collyw · on March 12, 2015

This sums up my experience as well. Great for all the tedious insert update type stuff, but more hassle than they are worth for complex queries.

drzaiusapelord · on March 10, 2015

>As a greying developer

The problem with being a young person who wants to make his or her mark on the world is that if the people before you did an excellent job, you can only make things worse. In fact, I think this is one of our larger problems not only in technology but in society (see the recent move to extremism in many aspects of our political and religious life the people who grew up under more moderate times are pushing for).

Some things should be boring and stable. Build on top of those things. Redoing the foundation every six months is just sub-optimal. Accept that you may not make that big mark in the world, or if you do, it'll be in ways you completely did not expect, just like everyone else who made their mark.

JustSomeNobody · on March 10, 2015

I agree with you. Also, we should be developing products that last. And be proud of that. Why does everything always have to be new?

threeseed · on March 10, 2015

You sound like someone who thinks we thinks we should still be using Cobol and IBM Series mainframes.

Technology changes. It improves. It gets faster, easier and more responsive to business requirements. If you don't embrace change in the IT industry then get out. Because you simply won't survive.

Silhouette · on March 11, 2015

Technology changes.

Yes.

It improves.

Sometimes.

It gets faster, easier and more responsive to business requirements.

Occasionally.

There are a lot of ideas in technology that are widely accepted, particularly by younger, less experienced generations, as being The Best Way to do things. However, if challenged, most of those people don't really know why they believe that. They have never personally seen any hard evidence that The Best Way is actually better than known alternatives. They have never spent significant time trying those alternatives for themselves to gain first-hand experience of the pros and cons. In fact, their belief is usually nothing more than a matter of faith, a blind trust in the dogma of the day and the rhetoric they find on-line or hear from their peers or managers.

See also: OOP, design patterns, UML and round-trip development processes, frameworks, ORMs, XML, dynamically typed languages, more frameworks, TDD, NoSQL databases, and so on, many of which were either invented or popularised as a way to work around the artificial problems created by one or more of the preceding items.

Of course I'm being a little facetious here. But if you think I'm entirely kidding, just spend a few minutes with Google and see how much advocacy there has been for each of those ideas in their day and how much actual data there has ever been to justify that advocacy. And really I'm being kind here, because you talked about things that are supposedly "faster, easier and more responsive to business requirements" and I didn't even mention words like "agile" or "craftsmanship" in my list.

If you don't embrace change in the IT industry then get out. Because you simply won't survive.

I embrace using the right tool for the job. Whether it's new and shiny or old but tried-and-tested doesn't really matter, though if I had to pick one or the other for some bizarre reason, I'd pick tried-and-tested eight days a week. There have been only a few truly significant advances in the past decade or two, but very many technologies that were initially much-hyped but then failed to stand the test of time for exactly the reasons that critics outside the hype bubble had always predicted.

rewqfdsa · on March 11, 2015

I've noticed this phenomenon as well, and it's very frustrating. Let's add async-and-future-everything to the list as well. I honestly don't understand how to convince people to make an honest assessment of the alternatives, especially when they're unable to articulate what exactly they're purchasing with the complexity they're adding.

Will our profession ever stop being fad-driven?

JustSomeNobody · on March 11, 2015

That's rather going to extremes with what I said. So, no, I don't think like that at all.

coriny · on March 10, 2015

PostgreSQL is a very different beast to what it was 10 years ago.

So for a while it didn't support use cases as well as various NoSQL/NewSQL databases. However, PostgreSQL has also been built by great bunch of developers who are happy to adapt and implement the new, instead of harping on about the past.

So personally I adapt that advice to instead "use evidence to guide decisions".

stcredzero · on March 10, 2015

The lack of a schema may sound interesting, and in some cases it can certainly have its benefits. However, for many the usage of a schemaless storage engine leads to the problem of implicit schemas.

As another greying developer, one thing that was seen with ORMs were implicit schemas, often motivated as bureaucracy shortcuts.

So, you need a dictionary of configuration data for your user, but the bureaucratic overhead of adding that to your schema is too much? Why, then take your fancy OO serialization technology and save that little dictionary as a binary blob and stick it into just one new column in the database.

Problem solved...until years later when information in one of those blobs becomes tied (by a maintenance programmer who didn't entirely understand the system architecture) to what is effectively an implicit schema.

pbowyer · on March 10, 2015

And I thought I was the only one who'd encountered that...

matwood · on March 10, 2015

I started with SQL in the 90s and completely agree with you. People who do not know SQL tend to reinvent it poorly in a language not designed to work with sets of data.

We have started using jOOQ and it strikes a great balance of raw sql/typed sql and result -> object mapping. I don't ever seen going back to a heavyweight ORM again. There is just too much magic and lost time spend figuring out performance issues for a minimal gain in ramp up for those who do not know SQL.

serve_yay · on March 10, 2015

Gosh, no kidding. (I sorta disagree about ORMs though -- if you're selecting by ID it's monkey work to write those queries, but anything complex, sure, use SQL)

macavity23 · on March 11, 2015

Right. A good ORM should deal with all your common cases (if not, your schema probably needs improving), and it saves you writing a lot of boilerplate code.

For the edge cases where you need complex joins and/or extra performance, don't try and bother with the ORM, just drop to SQL.

silon3 · on March 11, 2015

Agreed. Also it should be generating queries based on foreign keys/indexes (including joins).

Others should not be allowed in an interactive app anyway: developer should be warned/prevented using a "batch" query by accident.

mtanski · on March 11, 2015

I would like to say lets step back and not conflate SQL and relational databases together. Clearly SQL as the language the primary way most people interact with relational database.

In my my mind SQL as a language is a huge PITA. First, parsing of complex statements is expensive (there's workloads where SQL parsing takes more time then processing the results).

Second, as SQL exists today (SQL2011) it's a large, complex language that's not implemented uniformly. So I understand why people want to build programmable interfaces for generating queries versus writing giant string statements that expand to SQL.

I personally would wish that there was an alternative language for interacting with relational databases that isn't SQL. Just expose the whole relational expression tree to users say ala S expressions. It's not like the relational engine is going to optimize / re-order it anyways.

I mean something along the lines of:

  [ GROUP_AGGREGATE,
    [ "name" ],
    [ [ SUM, "COUNT(*)" ],
      [ SUM, "cost" ]],
    [ JOIN, [ ==, "user_id", "customer_id"],
      [ FILTER, [
         [ &&, 
           [ >=, "age", "30" ],
           [ IN, "state", "NY", "CT", "NJ"]],
         [ TABLE, "customer" ]
      [ TABLE, "orders" ]]]]]

Is it more verbose, yes. But much easier to compose, parse and machine transform by software (code is data). Also, makes you think in terms of relational operations/expressions versus SQL the language.

jrapdx3 · on March 11, 2015

Actually, some work on this has been done for CHICKEN Scheme in extensions "ssql" and "ssql-postgresql"[0]. I'm not too familiar with these particular eggs, but the flavor might be conveyed by this example:

    (ssql->sql #f '(select (columns (col actors name) 
                                    (count (col roles id)))
                     (from roles actors)
                     (where (= (col roles actor_id) (col actors id)))
                     (group (col actors name))))

translates to "SELECT actors.name, COUNT(roles.id) FROM roles, actors WHERE (roles.actor_id = actors.id) GROUP BY actors.name"

In the above sexpr, the SQL query is expressed as a Scheme/Lisp hierarchical list, which I think is a quite elegant notation. I'm sure other Scheme implementations provide variations on this theme of SQL generation. Using some macro magic, there would certainly be a lot of ways to organize it and probably it's not even all that hard to do.

[0] documentation at: http://wiki.call-cc.org/eggref/4/ssql

mtanski · on March 11, 2015

It's def a step in the right direction. At least in terms of letting you mechanically transform queries.

The downside (IMO) is that it's still reflecting SQL the language ... not the relational model.

meistro · on March 10, 2015

You should learn SQL and understand relational databases. But using an abstraction (ORM) to cover 80% of the use cases is much better than writing tons of boilerplate SQL. http://java.dzone.com/articles/martin-fowler-orm-hate

clebio · on March 11, 2015

Thank you. I searched to see if his article had been mentioned, and glad to find you did. The link he gives to Ted Neward's post is gone, but I found it in the Wayback Machine: https://web.archive.org/web/20141205230114/http://blogs.tedn...

Seems like this thread has gone down a very old, well worn rabbit hole.

baldfat · on March 10, 2015

The real issue isn't "newtech koolaid" it is the definition of the problem.

I find that people that don't understand the problem like MongoDB for the ability to be "flexible" in designing and modifying down the road. Though I like to argue you have flexibility in SQL also.

warble · on March 10, 2015

I agree, I've read a few 'screw Mongo, I'm going to SQL' type blogs, it really seems like they either didn't grasp how to architect Mongo correctly and tried to do things that don't work well with it, or they had a problem that was better solved by SQL in the first place.

SQL does many things very well that noSQL stuff won't. Also vice-versa.

Twirrim · on March 10, 2015

This is really what he's driving at with the newtech koolaid. People jumped on MongoDB etc. bandwagon because they drank the koolaid of "Schema less is best!" and "It's so fast and shiny!" without really understanding the problem, let alone the right solution.

Unfortunately this seems to be a regular pattern in tech, a mistake (mostly) repeated by the young fresh developers coming in. Usually after you get burned once or twice you stop making that mistake.

I'm not a Comp Sci grad so can't speak from experience, but most of the syllabuses I've seen seem to focus on a very short term, single semester (or at most year) approach to projects. It would be good if they could start with a project and be required to re-write, adapt or improve some kind of project from beginning of the college time to the end. It would be a good introduction to technical debt, amongst other problems. (Heck if you really want to go for broke, maybe even mix things up a little, next semester you'll be working with someone elses code! That would be painful but might teach people the value of good comments, tests etc.)

dspillett · on March 11, 2015

> This is really what he's driving at with the newtech koolaid. People jumped on MongoDB etc. bandwagon because they drank the koolaid of "Schema less is best!" and "It's so fast and shiny!" without really understanding the problem, let alone the right solution.

Exactly. People report "X makes Y so easy!" and because of the breathy excited "look what I did in just an hour" blog posts the naive believe that using X for Y or similar things negates the need to properly design and think ahead.

If you take that approach then even if you are accidentally using the right tool doe the job, you may well be using it very wrongly and creating an unmaintainable and/or inefficient mess that you'll have to fix later.

A few days coding can save you a good hour or three of design and thinking beforehand.

emodendroket · on March 10, 2015

I guess but college isn't a vocational school. The point is to understand the CS concepts and not necessarily to know all the engineering kind of stuff you'd need in a "real-world" project (which often won't be "challenging" in the CS department).

SnacksOnAPlane · on March 10, 2015

I honestly don't think I've ever seen a valid use case for Mongo. If you're going to query your data, you have to know what fields you're looking for, right? So why not create a schema that has those fields?

rmrfrmrf · on March 10, 2015

If you're doing a "tracer bullet"-style method of development where the requirements aren't quite set in stone, MongoDB and other schemaless databases can really save time when it comes to adding basic functionality quickly.

Another use case: if you have an ad-hoc group of developers working on a project in a decentralized fashion, they can each work on local codebases of the project without needing an up-to-date setup script or migration scripts.

I'd maybe even argue that a lot of these "goodbye MongoDB" developer blog posts are sort of unnecessary. MongoDB allows for fast deployment and the convenience of a flexible database at the cost of speed, so eventual migration to a more solid relational database could/should be part of a long-term plan when devs choose to use MongoDB. It's all about using the right tool for the job.

nawitus · on March 10, 2015

You can know a field or fields of the data while leaving the possibility of extra fields that you don't need to know. In practice if you're making a web application and dealing with JSON, it's typically very easy to work with JSON-based databases instead of using serialization functions. You can enforce the schema with various tools at the application level when needed.

MongoDB also has some performance benefits over many traditional databases, and has "native" support for binary data.

I personally prefer CouchDB over MongoDB for most application that are well suited to use a "NoSQL" database, but MongoDB is sometimes decent too.

stephenboyd · on March 10, 2015

Mongo doesn't stop you from using a schema, but you'll have to enforce it in your application code rather than the database itself.

darkarmani · on March 11, 2015

> but you'll have to enforce it in your application code rather than the database itself.

You mean in every one of your application's code. You are using a schema no matter what, but it's a missing feature in mongo.

platz · on March 10, 2015

the only valid use case is when your data is so unstructured so that you don't care what the fields in it are and you treat it like a black box. So maybe some kind of user-generated markup that you just shift in and out.

those kinds of applications are not usually kinds of problems we have to solve.

jonaldomo · on March 10, 2015

I couldn't agree more with you. I love SQL for when I have a concrete data model, when I don't I love NoSQL. I could use both if I really wanted to, but you need to know when to force a square peg into a circle. Mongo has a good how to on it here. http://blog.mongodb.org/post/72874267152/transitioning-from-...

jgreen10 · on March 10, 2015

SQL databases had their imminent death coming. They were stuck. However, PostgreSQL is under more active development than most other databases, SQL or NoSQL. It is new and cool technology.

In the end, whether you speak SQL or use NoSQL construct matters as much as whether you write a program in C or Pascal. You can have endless debates about form and function and you might want to see and identify with a winner, but in the end whether the tools and the community surrounding a technology fits people's needs is the only thing that really matters.

collyw · on March 11, 2015

Whenever I see someone shoehorning something into MongoDB or something else I ask the reasoning for not using a relational database. Its always something along the lines of "We don't do that now" or "schemaless" (which as the article points out means more work for the dev) something similar, with the "fast" added somewhere into the description.

The crazy thing is most of them are working on a single node, so no horizontal scaling. And they seem to think non-relational is the new way to do things.

theseoafs · on March 10, 2015

If the API surrounding the ORM is well-designed, using an SQL wrapper like this eliminates the possibility of SQL injection. There are arguments for not always doing things the same way.

DennisP · on March 10, 2015

Using sprocs or parameterized queries also prevents SQL injection.

joevandyk · on March 10, 2015

The advantage of this is that you can pass the relational dataset that's a result of this code around to other methods. Those methods can add additional filters, different ordering, etc. You can't do that with raw sql strings.

jjtheblunt · on March 10, 2015

Agreed. And one might note that there's a really good reason Oracle has dominated for decades, and Larry Ellison (who read the SQL research paper, and formed a company to implement it nearing 40 years ago) is a billionaire.

threeseed · on March 10, 2015

Oracle's success is due to their aggressive expansion into business applications. Their database products have been taking a hammering as enterprises moved to data lakes and horizontally scaled databases like Casssandra.

Your argument would've made sense a few years ago but not now.

spydum · on March 11, 2015

i think there is a line to be drawn.. If the ORM is saving you no effort, I fully agree. However, I suspect most ORMs in the short term boost early productivity quite a bit.

It's like writing an app from scratch in assembly, versus C.. The compiler will do quite a bit of the dirty work for you, and if you look at it in a disassembler no doubt you could find tons of improvements to be made (errmmm depending on the compiler I suppose). Doesn't mean either one is the wrong way to go about it. Now if all you do is stuff ASM code in your C app, yeah.. Why bother?

scott_s · on March 10, 2015

I agree with the commenters who argue for the value in a library that wraps strings that happen to be SQL statements, but none that I saw touched on this point: I think most users of such a library will have to know SQL. These libraries seem to have a one-to-one matching between concepts in them, and concepts in SQL. Hence, the point is not protecting developers from learning SQL (because you can't, for these needs), but in finding the least painful way of integrating SQL into the application logic.

Animats · on March 10, 2015

Right. If you write SQL directly, you can also use commands such as EXPLAIN to see what the lookup strategy will be. If there's a full table scan of a large table involved, maybe you need a different query or a new index. If the SQL generation is hidden by some library, you can't do that.

Wikipedia runs on MySQL. They have replicated read-only copies of the database and ngnix caches which handle most read-type operations. Is your site busier than Wikipedia?

evantahler · on March 10, 2015

Wait... does preferring to write SQL (and knowing how) make me old now?

Dang.

jakejake · on March 10, 2015

I think the problem is that NoSQL has been the cool new thing, which causes people to use it without really understanding why it is better/worse than other solutions.

Tools like MongoDB can do things that are extremely difficult or impossible with MySQL/PostgreSQL and they are a great choice for those situations. Using it simply out of laziness or misunderstanding, though, is probably going to create problems later.

ttty · on March 10, 2015

I sincerely still prefer mongodb syntax because:

- Fits well with a programming language; {a: data.x} is better than 'WHERE A="' + data.x + '"' (sanitize?) or similar which are harder to read. SQL queries are good for direct input, mongodb queries styles are better to be used with a programming language;

TylerE · on March 10, 2015

If you're building queries in ANY language by concatenating strings you're doing it wrong.

In a sane language that query might look something like

cursor.execute("select field from tbl where long_obtuse_column_name=:foo and status != :status", foo=query, status='open')

All parameters are properly escaped by the api of course, so even if foo is "'; drop table audit; " you don't have any problems.

hackhat · on March 10, 2015

I've said similar because of this, so your comment is included in my statement.

chc · on March 10, 2015

That is not how you do parameterized queries. With any civilized database library, it would be something along the lines of `"WHERE a = ? AND b = ?", data.x, data.y`, so that the parameters are like function parameters.

ttty · on March 11, 2015

I've said similar because of this, so your comment is included in my statement. That doesn't change much

EpicEng · on March 10, 2015

Your users don't give two craps about the syntax. The technology choice is one of application requirements. If you chose Mongo over e.g. a SQL DB because you like your code to look pretty then I have to admit that I wouldn't trust your software at all.

ttty · on March 11, 2015

>Your users don't give two craps about the syntax.

Of course not, but your developers yes. Use a complex syntax and more bugs will appear. Do you care about bugs? No, but your users will.

> If you chose Mongo over e.g. a SQL DB because you like your code to look pretty then I have to admit that I wouldn't trust your software at all.

This is not my only reason.

aethertap · on March 10, 2015

Your comment made me realize why I prefer to work in ORMs instead of raw SQL, even though I'm frequently frustrated by their limitations - it all comes down to composability.

The mongodb syntax looks like it supports composing statements much more readily than SQL. It makes me wonder if there's been any serious work done making a query language that's fully as general as SQL, but is designed to be safely and easily composable so that you can build queries up from parts.

Anyone know of such a thing? The mongodb approach of query-as-data-structure is as close as I've seen.

masklinn · on March 10, 2015

> The mongodb syntax looks like it supports composing statements much more readily than SQL. It makes me wonder if there's been any serious work done making a query language that's fully as general as SQL, but is designed to be safely and easily composable so that you can build queries up from parts.

Haven't you noticed that it's exactly what the Sequel snippet does? It's also what SQLAlchemy's Expression Language does. It's a reification of SQL queries in the host language, so it can/should give pretty much all the tools of SQL.

TylerE · on March 10, 2015

In some ways it does, but also remember it's a MUCH less rich query language. It's a lot easier to make a simple api mapping when you don't have such exotic things as "joins".

aethertap · on March 11, 2015

The ORMs I've used have always tried to map tables to classes (or whatever similar language structure is available). That works well almost all the time, but I get tripped up when the queries get to be more complex or specialized to a particular use.

I haven't used SQLAlchemy's expression language, but glancing over it now it looks like it might be just what I was looking for (at least for python), so thanks for the pointer.

aikah · on March 10, 2015

If you really care about your data and you don't think a database should just be a bag where you put stuff but something that can be trusted , and efficient when it comes to storing data then you wouldn't be using MongoDB.

jv22222 · on March 10, 2015

I agree, using ORMs and such hide the real SQL and make it much harder to optimize by using explains etc.

SnacksOnAPlane · on March 10, 2015

Just have Rails dump the query and stick "EXPLAIN" before it.

JustGotHere · on March 10, 2015

What I'd like to see is a universal SQL that can be translated to whatever dialect of SQL my current database is using. That way I won't have to relearn SQL every time I start a project with a different database engine.

tom9729 · on March 11, 2015

You could use Teiid (https://github.com/teiid/teiid). The SQL dialect is similar to Postgres, and it has built-in translators to handle all of the popular relational databases (Postgres, Oracle, MySQL). Added bonus is that even some NoSQL databases are supported, and you can do things like join a table from a MySQL database against a collection in MongoDB. Full CRUD is supported for most translators.

If you're using Java (JDBC), there's an embedded kit so you don't have to run a standalone server. If you're not using Java, you can run the standalone server which also emulates the Postgres ODBC protocol so you can (for example) use `psql` to connect and run queries.

towelguy · on March 11, 2015

Very interesting, thanks!

billrobertson42 · on March 11, 2015

Relevant: https://twitter.com/stuartsierra/status/571386786238369796

dchest · on March 11, 2015

Stuart Sierra (@stuartsierra): "SQL is already a DSL"