I'm quite opposed to the idea "from should be first".
I want to understand what exactly the query returns, not the implementation detail of the source of this data (that can later be changed).
Literally first example from page - I have no idea what is being returned:
from employees
filter country = "USA" # Each line transforms the previous result.
let gross_salary = salary + payroll_tax # This _adds_ a column / variable.
let gross_cost = gross_salary + benefits_cost # Variables can use other variables.
filter gross_cost > 0
aggregate by:[title, country] [ # `by` are the columns to group by.
average salary, # These are the calcs to run on the groups.
sum salary,
average gross_salary,
sum gross_salary,
average gross_cost,
sum gross_cost,
count,
]
sort sum_gross_cost # Uses the auto-generated column name.
filter count > 200
take 20
of course, similar things are happening to SQL too, with CTEs becoming more widespread and "real" list of the columns hidden somewhere inside, but it's still parseable
Quick, what is this query about? What's ironic is that I think you have it backwards: the columns are the implementation detail, not the table. The table is the context: you can't change that without having to change everything else. But columns are the last step, the selection after the filters, joins, etc. They can be changed at any time without affecting the logic.
This is... An odd choice. I'd assume I'm not without context looking at a query to know why I would want those columns.
And the auto complete story is backwards. Often I know what columns I want, but I'm not clear what table I need to get them from. Such that, if you make a smarter suggest in the from to only include tables that have the columns, I'd be much happier.
Just throwing in another point of anecdata onto this pile: "Often I know what columns I want, but I'm not clear what table I need to get them from" does not make sense to me. I don't relate at all to their being a global namespace of columns, rather than a namespace of tables, each with its own columns specific to its context.
I challenge this. I accept that there are ambiguities, but I assert that you can go really fast by just telling someone to fetch a few columns by name.
I further assert that if your database is filled with "Id" and "name" columns, instead of "department_name" and similar, you are probably as likely to mess up a join as any benefit you get from the name being short. (And really, what advantage is there in short names nowadays?)
That all said. I worded my take too strongly. My point should have been that auto suggest should not be confined in either direction.
I think we have just done most of our data work in different environments.
When I'm trying to query stuff, the first question is "which service's database is that in?", so I can guess "user_service" (or whatever I think it is called), but I have no idea what they call anything in their schema, but now that the autocomplete system knows what table I'm interested in, it can help me figure that out.
I'm used to emacs, with a global namespace. Such that I'm used to searching all variables globally. Feels that searching all columns would be just as easy, all told.
That said, I want to be clear that I think both methods are valid and work.
Yeah, it's just that I don't know what the columns are called. The namespaces (databases and tables) are what guides me to the columns, not vice versa.
I can think back to projects / companies where this may have been different, basically just with fewer different distinct schemas, but it just isn't my recent experience.
(I appreciate your magnanimity in these later comments by the way.)
It's query asking for the id, name, and author fields. Very straightforward, I have no idea how this is confusing.
> The table is the context: you can't change that without having to change everything else.
Except even in the provided single-table example this isn't true - you're getting subselected/CTEd results. No functional joins are demonstrated unfortunately.
For example:
from employees
left_join positions [id=employee_id]
...is equivalent to...
SELECT * FROM employees LEFT JOIN positions ON id = employee_id
No data is selected from positions in either example, and it's unclear on why we're joining that table (other than just for the heck of it). It's not a workable example.
You restated the query; I was asking what it's about. Is it a query across publications? Or is it a query over news articles? That context changes everything: how the query is written, what it can be joined with, how it can be filtered, how it is used, etc. Putting the FROM clause first means that you immediately have context to understand the rest of the query.
What fields am I expecting in the resultset? Sure I have some context, I know it'll be about articles, but I have no idea what actual data I care about.
You're arguing that:
FROM articles
SELECT id, name, author
Is substantially superior to:
SELECT id, name, author
FROM articles
I don't see them as markedly different with such a small example.
HOWEVER, where the difference comes in is the "at a glance what am I getting in my resultset" data that is much easier to see in the latter, second easiest in the former, and not at all present in the linked article.
I think what this boils down to is what you (any reader, not you specifically) individually expect to need knowledge of when you're writing SQL. In most cases when sitting down to write a brand new piece of code to pull some data from the database, putting the list of tables involved in the query first matters most to some, whereas putting the list of fields to expect in the resultset matters most to others (I put myself in this camp).
For what it's worth, as I've stated elsewhere while I don't prefer it and would find it annoying to debug personally, I do recognize that the idea of SQL that allows you to list FROM / JOIN / etc. first is very appealing to some. What I think is completely off is the near-obfuscation of the examples in the linked article.
> I don't see them as markedly different with such a small example.
Of course not, because we're talking about a fundamental change in syntax that affects more than just two-line queries. PRQL puts the SELECT clause practically at the end of the query, after every join, filter, and aggregate. If you only care about the output, that's cool too: just look at the end. But if I want to understand a query in SQL, I have to read the entire thing backwards, clause by clause! By contrast, if the SELECT is at the end, that's not much of a problem.
Now you'll come back and say "but it's the same if you care about context: just look at the end!" And that's where we differ. I care more about writing, debugging, and understanding queries, whereas you think it's more important that the column names are up front, even if it makes understanding a nontrivial query much, much harder.
The big advantage of "from first" like we have in Kusto KQL (a database we use at Microsoft) is that it provides much better autocomplete (if I write the `from` it can easily autocomplete the projection).
If you want an interesting example of how a query language built for developer experience and autocompletions looks definitely check it out!.
That's interesting because it also explains why I was going to say I do like having from first. When trying to reason about a query, I mentally go through the following:
1. What tables are being pulled from? This speaks to the potential domain of the query.
2. What data is being selected (I can now know what is or isn't being pulled from the aforementioned tables...)
3. What operations, aggregations, groupings, etc. are being performed to work on the pulle data
Of course from vs select ordering is completely arguable, but my thinking process seems to follow that of the auto complete--in other words that my cognitive load of looking at the select statement is lessened when I know from what the columns are being selected.
It also follows (at least to me) the mental process of writing the query. First look at the tables, then decide what columns, then decide what functions to apply.
I said it in a sibling, but I feel this is somewhat missed. Auto complete that simply lists the tables is easier if from is first. But... Auto complete that helps me know what tables can give me my requested columns works the other direction.
You would think that but having used both I find writing Kusto/KQL much smoother, neater and faster and if I have to choose between writing a query in either one I'd pick KQL.
I understand this is just an opinion but it's an opinion held by everyone in my org who writes both.
(I can see the result type both by hovering on the query but also by just looking at the end of it - and in SQL most of the SELECTed items in complex queries are from subqueries anyway - at least in my use case)
Building for autocomplete is building for human understanding. If it is impossible for a computer to determine the context of your query, why would a human do much better?
Technically to be equivalent you need to wrap the second one in parentheses so you can use ToList() on it. Unfortunately a bit ugly. I'm not sure why they didn't add one more keyword to handle pipelining into other functions. Something like "feed", "into", or "pipe". Or just pluck the |> operator from F#.
I’ve been frustrated by toilets where I have to contort my body to reach the dispenser. Similarly, I’ve had dispensers intrude on the space where my legs would normally be and make it awkward to even just sit on the toilet.
Toilets are absolutely designed to make the dispenser placement convenient. You just don’t think about it because 95% of toilets get it right, so it just doesn’t bother you that much that it can be wrong.
In SQL, some decisions are right about 10% of the time and are annoying and awkward the other 90%.
That’s why the order matters. Because everything else got it right.
I'd agree if there was any way whatsoever of fixing this issue, but there simply isn't. The editor can't even begin to guess what you might want until you write your FROM.
Maybe in gigantic systems with more tables than makes sense. Realistically, all of the columns available in a database can be fit in memory with ease.
Then, the ide could basically fill my from out for me, based on what I'm asking for. Can even suggest what join I will need, if I list columns from multiple tables.
>Maybe in gigantic systems with more tables than makes sense. Realistically, all of the columns available in a database can be fit in memory with ease.
Every table has more than one column
So there's always more columns to remember than tables and generally tables are pretty easy like user invoices blabla
I worked with systems that had like 500 tables and some of them with 20-50 columns
you really want good intellisense in such a environment
500 times 50 is still not a big number. And you could do decent statistical suggestions on the current columns in.
Good intelligent suggestions is, of course, helpful. And I agree that suggesting one of 500 is easier than the other. That said, neither is hard for a computer. And even asking friends what table I want will often be done with starting with the actual columns I want.
And one of use cases is writing queries which it helps immensely.
Best of both worlds would allow both orders.
Just automatically transform the query to the usual form after it's execution.
I think it's quite a common convention in engineering - not just software - that the input to a process "goes in the top and out the bottom". We humans read top->bottom (regardless of left/right/vertical, I don't know any languages that write bottom up). Conventional voltage in circuit diagrams usually flow top to bottom. Gravity loads in schematics flow top to bottom. Chemical pathways are usually written top to bottom. And of course functions take arguments up top and return at the bottom, maybe with some short circuits. I think the only counter example of note is distillation columns.
Where is the data coming from? Employees table. What's coming out? 20 rows of sum_gross_cost.
What could improve this is function signatures. It's kind of nice to have the whole abstraction up top...like an abstract.
I agree that the columns of the results should be more obvious. But I am a proponent of "from should be first". I have never written a SQL query without thinking about the contents of a table or its relations. If it was my way, I would describe where the data I'm pulling from, then describe any filters/joins, then describe the columns that I'm interested in (last).
It's a fair sentiment, but it can be handled without losing directional flow and composability, some of the bigger advantages of reworking SQL.
One idea would be along the lines of a function prototype: a declaration, up front, about the columns and types that a query is expected to return. It's a good place to put documentation, it's redundant information which should protect against mistakes but not so redundant that it would be too taxing - the author should know what the query returns. The prototype would only be used for validation of column names and types.
Another idea would be requiring the last element in a query to be a projection, a bit like the return statement in a function body: here's what I'm returning out of the grand set of symbols available (e.g. via various joins) in scope from previous operations in the flow.
Without the `let` I would imagine having trouble reading it as well, I'm not sure if that would go away with familiarity but my instinct is that it's a useful addition.
This feels like a English-language thing. In english we tend to put our adjectives first, it feels natural, "Where is my red, round ball?", rather than some other languages (like German) where you put the subject first. Equivalent of "Where is my ball, red & round?"
While it inherently feels unnatural I do agree with the others here that the context is actually easier to understand once over the initial uncomfort.
_from_ is kind of one of the most important context about the data being returned. It provides the type information. Columns you select are just properties of that type.
In SQL, where _from_ is placed at the end, we are essentially writing equivalent of 'property.object'; eg.: name.person, age.person
Both CTEs and this idea address the same problem: poor readability of complex SQL queries. Compared to CTEs, the author takes the idea to split the complex query into parts to the next level.
To your point - a solid IDE will show you what's being processed at each line (or returned, if the cursor is on the last line) - in an autocomplete window or a side panel.
I want to understand what exactly the query returns, not the implementation detail of the source of this data (that can later be changed).
Literally first example from page - I have no idea what is being returned:
of course, similar things are happening to SQL too, with CTEs becoming more widespread and "real" list of the columns hidden somewhere inside, but it's still parseable