SQL is having somewhat of a moment in the bigdata world, thanks in part to 'mode...

eatonphil · on Aug 24, 2022

> I think the following is needed for sql to be finally accepted as 'serious' contender.

Whatever way you slice it, SQL is one of the most used languages today [0].

[0] https://spectrum.ieee.org/top-programming-languages-2022

dominotw · on Aug 24, 2022

right. I get lots of pushback for using sql at my clients. They just defeat me with one single point. "where are your unit tests" :D

ericHosick · on Aug 24, 2022

> where are your unit tests

I do unit testing in SQL, and something I'm working on and use extensively myself (https://www.npmjs.com/package/sql-watch) indirectly supports unit tests.

There are also SQL testing frameworks available.

dominotw · on Aug 24, 2022

Yea I do that via dbt by setting up a mock data tables in database and using a macro to use those as sources/refs when run in test mode.

However what we are doing here isn't 'unit testing' its a black box integration testing. When I write equivalent code in scala, i just test the logic via unit tests,

eg: logic to filter out some orders that don't qualify, I extract method in scala code and just test that logic as part of development lifecycle. There is no dependency on a database.

Analog here is using selenium or some ui testing framework to test if button turns blue if order exceeds limit. Thats not unit testing.

ericHosick · on Aug 24, 2022

> I extract method in scala code and just test that logic as part of development lifecycle. There is no dependency on a database.

Unit testing seems to depend on where the unit of code is which is being tested. At the middle tier, you may mock out parts of the code so the tests aren't reliant on external sources (apis, databases, libraries, etc.).

It seems that unit testing database code would happen at the database layer: it's still a unit test as the test isn't dependent on external sources.

dominotw · on Aug 24, 2022

> It seems that unit testing database code would happen at the database layer:

Its not database code though. "Orders over 100$ should be marked as vip" is domain logic.

Database code like interactions with database, connection pools, primary/secondary switching ect yes they should be tested with a database.

taeric · on Aug 24, 2022

I was going to say the same. With the prevalence of embedded databases, and how cheap it is to stand up a container with non-embedded options, build time testing of queries has never been easier.

world2vec · on Aug 24, 2022

Dbt to the rescue!

vendiddy · on Aug 24, 2022

I think these are great suggestions.

It seems like you're suggesting that someone could design a functional-style programming language that compiles to SQL.

2 & 3 are my biggest pain points. I can't just extract functions like I can with a regular programming language. Instead, SQL queries get increasingly complex with no great tools to manage that.

For 3, products like https://materialize.com/ look interesting for being able to create derived materialized views that can efficiently be kept up to date.

alphanumeric0 · on Aug 24, 2022

I'm not familiar with Ecto, dplyr, or DBT, but I would love an ML-like language to replace SQL. I'm imagining being able to pass a table (or any table-oriented data, like a sub-query) to functions that would type-check columns and would return table-oriented or scalar data. I'm not sure if this is actually possible in practice, but one can dream.

For instance, a "top 10" function that could be re-used on any table (apologies for my pseudo types and code):

  selectTop10 : Column -> Table -> Table
  selectTop10 orderByColumn table =
    SQL.selectAllFrom table
    |> orderDescBy orderByColumn
    |> limit 10

  limit : Int -> Table -> Table
  limit n rows =
    SQL.limitBy n rows

  orderDescBy : Column -> Table -> Table
  orderDescBy orderByColumn rows =
    SQL.orderBy [orderByColumn] SQL.Ordering.Desc rows

chrisjc · on Aug 24, 2022

Not 100% sure about what you're suggesting, but wouldn't it be easier to pass your functions to your table/sub-query?

And that's exactly what you're able to do in most of the modern data warehouse services such as Snowflake. Inferences can be contained within internal/external user defined functions.

This is very reminiscent of made the big-data/map-reduce movement so notable, sending your query to the data instead of moving your data to the query. Sending your model to the data, instead of sending the data to the model.

alphanumeric0 · on Aug 24, 2022

I think I see what you're saying, and in my idea I'm suggesting the same - sending the function(s) to the data.

rawgabbit · on Aug 25, 2022

I don't quite follow what you are saying. In some SQL engines you can use row_number() function and derived tables (or CTE) to get top 10.

  SELECT Id,col1,col2
  FROM
  (
    SELECT 
    Id
    ,col1
    ,col2 
    ,row_number () over (partition by columnkey1, columnkey2 order by anycolumnwilldohere desc) as _row
    FROM _table
  ) as anytablealiaswilldohere
  WHERE _row <=10

alphanumeric0 · on Aug 26, 2022

My idea is you wouldn't have to cut and paste implementations like this. You would be able to just call the function with a column name and table name.

sbuttgereit · on Aug 24, 2022

I'm currently doing work that uses Elixir's Ecto which goes a great deal towards what I think you're aiming at. I can write my SQL in a familiar, yet functional and composable style; while knowing what SQL will be produced in the end. Ecto, as I understand it, was inspired by Microsoft's/c# LINQ. I've not worked with that, but heard similar praises for that as exists with Ecto.

I'm saying this as with most of my experience being in SQL.

eatonphil · on Aug 24, 2022

> It seems like you're suggesting that someone could design a functional-style programming language that compiles to SQL.

Not exactly but sort of this:

http://blog.hydromatic.net/2020/02/25/morel-a-functional-lan...

fatherzine · on Aug 24, 2022

"someone could design a functional-style programming language that compiles to SQL". See e.g. R dplyr

coob · on Aug 24, 2022

DBT solves 2 & 3

boaarmpit · on Aug 26, 2022

Ibis might be an option. It has syntax similar to pandas and can compile to a number of types of sql, pyspark, or dask.

https://github.com/ibis-project/ibis