Sure, but consider how this works in ReQL: r.table('foo').filter(...) r.table('f...

meritt · on March 26, 2014

Are you sure about that? Looking at even your example page [ http://www.rethinkdb.com/docs/sql-to-reql/ ] this appears to be wrong or at least confusing?

It's suggesting:

    SELECT category,
       SUM(num_comments)
    FROM posts
    GROUP BY category
    HAVING num_comments > 7

and:

    r.table("posts")
     .filter(r.row['num_comments']>7)
     .group('category')
    .sum('num_comments')

are identical.

I don't think they are? Your ReQL to me looks like it's applying WHERE num_comments > 7 and then aggregating that.

I mean regardless, your example SQL should be doing the >7 against an aggregate (e..g sum(num_comments)) not a field, that SQL does not work as written.

coffeemug · on March 26, 2014

Whoops, that's a great catch -- it's a bug in the docs. Will fix momentarily. Thanks!! (that is indeed embarrassing)

In ReQL any command you call after `group` runs on each group. So once you've called `group`, you can run anything you could run on a table on each group and that just works.

meritt · on March 26, 2014

So, to be clear here, you have created two fundamental things (both called "filter") -- A pre-group and a post-group filter. Users must still understand the difference and when to utilize them.

SQL just happens to call those WHERE and HAVING instead of "filter" both times.

coffeemug · on March 26, 2014

You're right, but it's not just `filter`. Any command that can run on a group can run on a full table and vice versa. For example:

  # get a sample of 3 elements from a table
  r.table('foo').sample(3)
  
  # get a sample of 3 elements from each group
  r.table('foo').group('category').sample(3)

You could say that we created two versions of `filter`, and `sample`, and every other command. But another way to say it is that we use polymorphism, which is widely considered an advantage in modern programming languages.

meritt · on March 26, 2014

Yeah, it's definitely more expressive and terse I just don't think your argument against WHERE/HAVING is incredibly strong. Someone could also just do subqueries and only use WHERE and achieve the exact same behavior that you offer (albeit with a ton more typing).

HAVING exists because it was created prior to subqueries/dynamic tables, otherwise we'd have been likely to just use:

    select * from (select category, sum(num_comments) as comments from posts group by category) as temp where comments > 7

br1 · on March 27, 2014

As a LINQ user, I expected group('category').sample(3) to sample 3 categories, not to sample inside each category.

coffeemug · on March 27, 2014

In ReQL you can do it with group('category').ungroup().sample(3). It's really powerful once you get the hang of it.

haberman · on March 26, 2014

I don't think this is a "good language" / "bad language" dichotomy. SQL just sits at a different level of abstraction.

From the examples you have posted, it appears that ReQL is a lower-level abstraction that SQL. In SQL you specify what you want logically and the DB turns this into a query plan. It appears that ReQL is more like a query plan itself, where you explicitly specify the data flow from stage to stage of query evaluation.

As a more specific example of this, it appears that in many cases in ReQL the user specifies what index should be used in the query itself. SQL is more abstract than this; the idea is for the query planner to figure out what index(ex) should be used.

boomzilla · on March 26, 2014

In theory, yes. In practice, the best language is one that maps developer thought process into working code in the most natural ways. I don't know the community that ReQL is targeting, but for the current crop of data scientists, SQL is a more natural language in my opinion. (There is a reason why Hive is so popular among big companies, where you'd expect to find big data sets.