I'm quite opposed to the idea "from should be first". I want to understand what ...

phailhaus · on Jan 24, 2022

    SELECT id, name, author

Quick, what is this query about? What's ironic is that I think you have it backwards: the columns are the implementation detail, not the table. The table is the context: you can't change that without having to change everything else. But columns are the last step, the selection after the filters, joins, etc. They can be changed at any time without affecting the logic.

taeric · on Jan 24, 2022

This is... An odd choice. I'd assume I'm not without context looking at a query to know why I would want those columns.

And the auto complete story is backwards. Often I know what columns I want, but I'm not clear what table I need to get them from. Such that, if you make a smarter suggest in the from to only include tables that have the columns, I'd be much happier.

sanderjd · on Jan 24, 2022

Just throwing in another point of anecdata onto this pile: "Often I know what columns I want, but I'm not clear what table I need to get them from" does not make sense to me. I don't relate at all to their being a global namespace of columns, rather than a namespace of tables, each with its own columns specific to its context.

taeric · on Jan 24, 2022

I challenge this. I accept that there are ambiguities, but I assert that you can go really fast by just telling someone to fetch a few columns by name.

I further assert that if your database is filled with "Id" and "name" columns, instead of "department_name" and similar, you are probably as likely to mess up a join as any benefit you get from the name being short. (And really, what advantage is there in short names nowadays?)

That all said. I worded my take too strongly. My point should have been that auto suggest should not be confined in either direction.

sanderjd · on Jan 25, 2022

I think we have just done most of our data work in different environments.

When I'm trying to query stuff, the first question is "which service's database is that in?", so I can guess "user_service" (or whatever I think it is called), but I have no idea what they call anything in their schema, but now that the autocomplete system knows what table I'm interested in, it can help me figure that out.

taeric · on Jan 25, 2022

I'm used to emacs, with a global namespace. Such that I'm used to searching all variables globally. Feels that searching all columns would be just as easy, all told.

That said, I want to be clear that I think both methods are valid and work.

sanderjd · on Jan 25, 2022

Yeah, it's just that I don't know what the columns are called. The namespaces (databases and tables) are what guides me to the columns, not vice versa.

I can think back to projects / companies where this may have been different, basically just with fewer different distinct schemas, but it just isn't my recent experience.

(I appreciate your magnanimity in these later comments by the way.)

BeefWellington · on Jan 25, 2022

It's query asking for the id, name, and author fields. Very straightforward, I have no idea how this is confusing.

> The table is the context: you can't change that without having to change everything else.

Except even in the provided single-table example this isn't true - you're getting subselected/CTEd results. No functional joins are demonstrated unfortunately.

For example:

   from employees
   left_join positions [id=employee_id]

   ...is equivalent to...

   SELECT * FROM employees LEFT JOIN positions ON id = employee_id

No data is selected from positions in either example, and it's unclear on why we're joining that table (other than just for the heck of it). It's not a workable example.

phailhaus · on Jan 25, 2022

You restated the query; I was asking what it's about. Is it a query across publications? Or is it a query over news articles? That context changes everything: how the query is written, what it can be joined with, how it can be filtered, how it is used, etc. Putting the FROM clause first means that you immediately have context to understand the rest of the query.

BeefWellington · on Jan 27, 2022

We can both play this game though:

    FROM articles

What fields am I expecting in the resultset? Sure I have some context, I know it'll be about articles, but I have no idea what actual data I care about.

You're arguing that:

    FROM articles
    SELECT id, name, author

Is substantially superior to:

    SELECT id, name, author
    FROM articles

I don't see them as markedly different with such a small example. HOWEVER, where the difference comes in is the "at a glance what am I getting in my resultset" data that is much easier to see in the latter, second easiest in the former, and not at all present in the linked article.

I think what this boils down to is what you (any reader, not you specifically) individually expect to need knowledge of when you're writing SQL. In most cases when sitting down to write a brand new piece of code to pull some data from the database, putting the list of tables involved in the query first matters most to some, whereas putting the list of fields to expect in the resultset matters most to others (I put myself in this camp).

For what it's worth, as I've stated elsewhere while I don't prefer it and would find it annoying to debug personally, I do recognize that the idea of SQL that allows you to list FROM / JOIN / etc. first is very appealing to some. What I think is completely off is the near-obfuscation of the examples in the linked article.

phailhaus · on Jan 27, 2022

> I don't see them as markedly different with such a small example.

Of course not, because we're talking about a fundamental change in syntax that affects more than just two-line queries. PRQL puts the SELECT clause practically at the end of the query, after every join, filter, and aggregate. If you only care about the output, that's cool too: just look at the end. But if I want to understand a query in SQL, I have to read the entire thing backwards, clause by clause! By contrast, if the SELECT is at the end, that's not much of a problem.

Now you'll come back and say "but it's the same if you care about context: just look at the end!" And that's where we differ. I care more about writing, debugging, and understanding queries, whereas you think it's more important that the column names are up front, even if it makes understanding a nontrivial query much, much harder.

inglor · on Jan 24, 2022

The big advantage of "from first" like we have in Kusto KQL (a database we use at Microsoft) is that it provides much better autocomplete (if I write the `from` it can easily autocomplete the projection).

If you want an interesting example of how a query language built for developer experience and autocompletions looks definitely check it out!.

lemmsjid · on Jan 24, 2022

That's interesting because it also explains why I was going to say I do like having from first. When trying to reason about a query, I mentally go through the following:

1. What tables are being pulled from? This speaks to the potential domain of the query. 2. What data is being selected (I can now know what is or isn't being pulled from the aforementioned tables...) 3. What operations, aggregations, groupings, etc. are being performed to work on the pulle data

Of course from vs select ordering is completely arguable, but my thinking process seems to follow that of the auto complete--in other words that my cognitive load of looking at the select statement is lessened when I know from what the columns are being selected.

It also follows (at least to me) the mental process of writing the query. First look at the tables, then decide what columns, then decide what functions to apply.

taeric · on Jan 24, 2022

I said it in a sibling, but I feel this is somewhat missed. Auto complete that simply lists the tables is easier if from is first. But... Auto complete that helps me know what tables can give me my requested columns works the other direction.

majkinetor · on Jan 24, 2022

Too bad we can't use Kusto with anything except Azure.

_lqaf · on Jan 24, 2022

Designing languages around autocomplete is like designing toilets for better toilet paper dispensers.

The language should be right for human understanding, not automated mad-lib generation.

inglor · on Jan 24, 2022

You would think that but having used both I find writing Kusto/KQL much smoother, neater and faster and if I have to choose between writing a query in either one I'd pick KQL.

I understand this is just an opinion but it's an opinion held by everyone in my org who writes both.

Theoretical correctness loses to pragmatism a lot and I'd read the KQL every day. Look at the examples at https://docs.microsoft.com/en-us/azure/data-explorer/kusto/q... - look at the examples at https://docs.microsoft.com/en-us/azure/data-explorer/write-q... and tell me they're not more readable than comparable SQL?

(I can see the result type both by hovering on the query but also by just looking at the end of it - and in SQL most of the SELECTed items in complex queries are from subqueries anyway - at least in my use case)

keithnz · on Jan 24, 2022

now if MS made a KQL -> TSQL or support it natively in SQL Server, that would be great :)

adamrezich · on Jan 24, 2022

yes please!!

phailhaus · on Jan 24, 2022

Building for autocomplete is building for human understanding. If it is impossible for a computer to determine the context of your query, why would a human do much better?

_lqaf · on Jan 24, 2022

They are not fully-aligned goals, and autocomplete should not be given equal consideration on par with human clarity.

If you want nice autocomplete too, that's fine, but if there is a tradeoff, human understanding is the primary concern.

tester756 · on Jan 24, 2022

I don't understand why do you think about it this way

C#'s LINQ (really powerful tool similar to SQL) works the same way

look:

var list = new List<int>{1,2,3}

var extracted = list

.............................Where(x => x > 1)

.............................Select(x => $"my number: {x})

.............................ToList();

or

var extarcted =

........................from x in list

........................where x > 1

........................select $"my number: {x};

ziml77 · on Jan 25, 2022

Technically to be equivalent you need to wrap the second one in parentheses so you can use ToList() on it. Unfortunately a bit ugly. I'm not sure why they didn't add one more keyword to handle pipelining into other functions. Something like "feed", "into", or "pipe". Or just pluck the |> operator from F#.

jiggawatts · on Jan 24, 2022

I’ve been frustrated by toilets where I have to contort my body to reach the dispenser. Similarly, I’ve had dispensers intrude on the space where my legs would normally be and make it awkward to even just sit on the toilet.

Toilets are absolutely designed to make the dispenser placement convenient. You just don’t think about it because 95% of toilets get it right, so it just doesn’t bother you that much that it can be wrong.

In SQL, some decisions are right about 10% of the time and are annoying and awkward the other 90%.

That’s why the order matters. Because everything else got it right.

oldsecondhand · on Jan 24, 2022

There's a person behind the IDE. If you help the IDE, you help the person using the IDE.

emteycz · on Jan 24, 2022

I'd agree if there was any way whatsoever of fixing this issue, but there simply isn't. The editor can't even begin to guess what you might want until you write your FROM.

taeric · on Jan 24, 2022

Maybe in gigantic systems with more tables than makes sense. Realistically, all of the columns available in a database can be fit in memory with ease.

Then, the ide could basically fill my from out for me, based on what I'm asking for. Can even suggest what join I will need, if I list columns from multiple tables.

tester756 · on Jan 24, 2022

>Maybe in gigantic systems with more tables than makes sense. Realistically, all of the columns available in a database can be fit in memory with ease.

Every table has more than one column

So there's always more columns to remember than tables and generally tables are pretty easy like user invoices blabla

I worked with systems that had like 500 tables and some of them with 20-50 columns

you really want good intellisense in such a environment

taeric · on Jan 24, 2022

500 times 50 is still not a big number. And you could do decent statistical suggestions on the current columns in.

Good intelligent suggestions is, of course, helpful. And I agree that suggesting one of 500 is easier than the other. That said, neither is hard for a computer. And even asking friends what table I want will often be done with starting with the actual columns I want.

tester756 · on Jan 24, 2022

On the other hand I don't see why it couldnt work both ways

if you start query with

SELECT MiddleName then you could receive auto complete thats adds "FROM Users" and moves your cusor after MiddleName.

if you start query with

FROM Users SELECT _ and know intellisense drops list of columns

taeric · on Jan 24, 2022

Agreed. I'm really just arguing that it doesn't have to be from first.

Xelbair · on Jan 24, 2022

And one of use cases is writing queries which it helps immensely. Best of both worlds would allow both orders. Just automatically transform the query to the usual form after it's execution.

kortex · on Jan 25, 2022

I think it's quite a common convention in engineering - not just software - that the input to a process "goes in the top and out the bottom". We humans read top->bottom (regardless of left/right/vertical, I don't know any languages that write bottom up). Conventional voltage in circuit diagrams usually flow top to bottom. Gravity loads in schematics flow top to bottom. Chemical pathways are usually written top to bottom. And of course functions take arguments up top and return at the bottom, maybe with some short circuits. I think the only counter example of note is distillation columns.

Where is the data coming from? Employees table. What's coming out? 20 rows of sum_gross_cost.

What could improve this is function signatures. It's kind of nice to have the whole abstraction up top...like an abstract.

quocanh · on Jan 24, 2022

I agree that the columns of the results should be more obvious. But I am a proponent of "from should be first". I have never written a SQL query without thinking about the contents of a table or its relations. If it was my way, I would describe where the data I'm pulling from, then describe any filters/joins, then describe the columns that I'm interested in (last).

BeefWellington · on Jan 25, 2022

You've never authored a SQL query that does things like check special functions that don't exist in a table?

For example:

    select @@version

fragmede · on Jan 25, 2022

Seems easy enough to work around with a magic table name in this hypothetical future reworked dialect of SQL?

    from @@special
    select version

tapas73 · on Jan 26, 2022

or

    from dual
    select @@version

barrkel · on Jan 24, 2022

It's a fair sentiment, but it can be handled without losing directional flow and composability, some of the bigger advantages of reworking SQL.

One idea would be along the lines of a function prototype: a declaration, up front, about the columns and types that a query is expected to return. It's a good place to put documentation, it's redundant information which should protect against mistakes but not so redundant that it would be too taxing - the author should know what the query returns. The prototype would only be used for validation of column names and types.

Another idea would be requiring the last element in a query to be a projection, a bit like the return statement in a function body: here's what I'm returning out of the grand set of symbols available (e.g. via various joins) in scope from previous operations in the flow.

samatman · on Jan 24, 2022

I'm also completely unfamiliar with the PRQL syntax, outside of right now.

Reading the comment however, it would seem that `let` adds columns which are implicitly returned in the order they are defined.

I do see benefits in this, and can imagine pitfalls. Hard to judge without kicking the tires.

Update: It's quite possible we saw different syntax!

https://news.ycombinator.com/item?id=30063266

Without the `let` I would imagine having trouble reading it as well, I'm not sure if that would go away with familiarity but my instinct is that it's a useful addition.

KerryJones · on Jan 24, 2022

This feels like a English-language thing. In english we tend to put our adjectives first, it feels natural, "Where is my red, round ball?", rather than some other languages (like German) where you put the subject first. Equivalent of "Where is my ball, red & round?"

While it inherently feels unnatural I do agree with the others here that the context is actually easier to understand once over the initial uncomfort.

zigzag312 · on Jan 25, 2022

_from_ is kind of one of the most important context about the data being returned. It provides the type information. Columns you select are just properties of that type.

In SQL, where _from_ is placed at the end, we are essentially writing equivalent of 'property.object'; eg.: name.person, age.person

mcsoft · on Jan 24, 2022

Both CTEs and this idea address the same problem: poor readability of complex SQL queries. Compared to CTEs, the author takes the idea to split the complex query into parts to the next level.

To your point - a solid IDE will show you what's being processed at each line (or returned, if the cursor is on the last line) - in an autocomplete window or a side panel.