I’ve been using this for a bit more than a week already, I would say this is a great feature for iteration and interactive queries. Here’s my review:
Doing data exploration, analysis, and cleaning, this is way more productive than just SQL. As an example, fusing the aggregate functions with the group by keys creates a much faster workflow. I like that it unifies WHERE/HAVING/QUALIFY, and the set/extend/drop functions help (though Snowflake still wins for being able to declare a column and use it in the same select). Ultimately this gives me a one liner for situations where I’m creating hard to name intermediate CTEs, and that’s awesome!
Iterating on array columns or parsing non-trivial JSON is much better with this syntax too.
This is a shift back to the data frame API of a few years ago, but we’re still missing typing helpers and support that data frame APIs could provide. It would be nice to have a system like this which plugs into language servers, or can list fields at each step with a mouse over/cursor hover, and getting editor completions (the BQ console does an okay job).
This syntax is great for DBT macros. You can just drop in entire transforms without worrying about polluting the query namespace or working around existing columns on your reference tables.
There’s a dark side to this syntax. The imperative coding style this comes with a tradeoff that the reader needs to track internal state through many steps. It’s the same reason why SELECT * is often not recommended.
As a best practice I like to throw a `|> SELECT X,Y,Z` at the end of these pipe blocks to reinforce to the reader what the output looks like.
I should mention that it’s not as portable, but frankly all the major DBs aren’t portable, and other DBs also now support this syntax.
tl;dr: I like this feature, but use it sparingly. Avoid overuse in model files, definitely lean on it in analyses.
Just a question:
if you're adding stuff at the end of the query... does the previous steps get cached?
I mean, if you type first:
FROM customers
|> WHERE signup_date >= '2024-01-01T00:00:00Z'
And then you add (and execute)
|> COUNT() as tally GROUP BY CHANNEL
Doing data exploration, analysis, and cleaning, this is way more productive than just SQL. As an example, fusing the aggregate functions with the group by keys creates a much faster workflow. I like that it unifies WHERE/HAVING/QUALIFY, and the set/extend/drop functions help (though Snowflake still wins for being able to declare a column and use it in the same select). Ultimately this gives me a one liner for situations where I’m creating hard to name intermediate CTEs, and that’s awesome!
Iterating on array columns or parsing non-trivial JSON is much better with this syntax too.
This is a shift back to the data frame API of a few years ago, but we’re still missing typing helpers and support that data frame APIs could provide. It would be nice to have a system like this which plugs into language servers, or can list fields at each step with a mouse over/cursor hover, and getting editor completions (the BQ console does an okay job).
This syntax is great for DBT macros. You can just drop in entire transforms without worrying about polluting the query namespace or working around existing columns on your reference tables.
There’s a dark side to this syntax. The imperative coding style this comes with a tradeoff that the reader needs to track internal state through many steps. It’s the same reason why SELECT * is often not recommended.
As a best practice I like to throw a `|> SELECT X,Y,Z` at the end of these pipe blocks to reinforce to the reader what the output looks like.
I should mention that it’s not as portable, but frankly all the major DBs aren’t portable, and other DBs also now support this syntax.
tl;dr: I like this feature, but use it sparingly. Avoid overuse in model files, definitely lean on it in analyses.