Hacker News new | past | comments | ask | show | jobs | submit | chrisjc's comments login

Since you're using python, have you looked into sqlglot? I think it has some pretty-print options.

https://github.com/tobymao/sqlglot



Probably also worth mentioning that DuckDB can interact with SQLite dbs.

https://duckdb.org/docs/extensions/sqlite.html https://duckdb.org/docs/guides/database_integration/sqlite.h...

Thus potentially making duckdb an HTAP-like option.


Probably not exactly what you mean or asking for, but the work Motherduck is doing looks promising.

https://motherduck.com/blog/differential-storage-building-bl...

Hopefully it finds its way into duckdb's repo some day.


Ibis looks very promising.

There are also other ways to use both.

https://duckdb.org/docs/guides/python/polars.html

All of this dataframe compatibility is awesome. (much thanks to Arrow and others)


Perhaps not exactly what you're talking about, but maybe? (unsure bc the with statements are sometimes called "temp tables")

https://duckdb.org/docs/sql/query_syntax/with#cte-materializ...

Obviously, the materialization is gone after the query has ended, but still a very powerful and useful directive to add to some queries.

There are also a few DuckDB extensions for pipeline SQL languages.

https://duckdb.org/community_extensions/extensions/prql.html

https://duckdb.org/community_extensions/extensions/psql.html

And of course dbt-duckdb https://github.com/duckdb/dbt-duckdb


Although only experimental and probably off topic to the discussion, it's worth mentioning DuckDB also provides a Spark API implementation.

https://duckdb.org/docs/api/python/spark_api

And while on the subject of syntax, duckdb also has function chaining

https://duckdb.org/docs/sql/functions/overview.html#function...


Real awesome project. This would be even better if some of the differential storage functionality makes its way into duckdb.

https://motherduck.com/blog/differential-storage-building-bl...

Unsure if it's even possible, but if there was a way to write out (simulate) the streams to WAL files, you might be able to accomplish the same without having to consolidate the duckdb file every time.

A couple of other ideas, that may or may not diverge from your project's purpose or simplicity...

It's also too bad that duckdb isn't included in the out-of-the-box Snowpark packages, potentially allowing you to run all of this in a Snowflake procedure/DAG and persisting the duckdb file to object storage. You could of course achieve this with Snowflake containers. (But this would probably ruin any plans to support Redshift/BigQuery/etc as sources)

If the source tables were Iceberg, then you could have an even simpler duckdb file to produce/maintain that would just wrap views around the Iceberg tables using the duckdb iceberg extension.

    create view x 
    as select 
        * 
    from 
        iceberg_scan('metadata_file');
If you're going to enable change tracking on tables then perhaps another solution is to use CHANGES to select the data from the table instead of reading streams. That way you could use the same timestamp across all the selects and get better consistence between your materializations to duckdb.

    set ts = {last_materialization_ts};
    select * from X CHANGES AT(timestamp=>$ts);
    select * from Y CHANGES AT(timestamp=>$ts);

    -- triggered task to kick off materialization to duckdb
    create task M
    when
        SYSTEM$HAS_DATA(X_STREAM) AND
        SYSTEM$HAS_DATA(Y_STREAM) AND ...
    AS 
        CALL MY_MATERIALIZE(); -- send SNS, invoke ext-func, etc

Here's another similar (but different) project that I've been tracking https://github.com/buremba/universql


Perhaps similar to https://github.com/duckdb/dbt-duckdb , but SQLMesh instead of DBT obviously.


Ah gotcha! Do you have a use case where you'd look to remodel/transform the data between warehouses?


Not the original parent, so unsure of their use-case. But I've seen the approach where some/basic development can be done on duckdb, before making its way to dev/qa/prd.

Something like your project might enable grabbing data (subsets of data) from a dev enviroment (seed) for offline, cheap (no SF warehouse cost) development/unit-testing, etc.


This makes sense, thank you!


It should be just as easy as adding:

    authenticator="externalbrowser"
Adding a Snowflake connection configuration option that allowed for standard Snowflake connection conventions might be a good option. That way you could connect to Snowflake with your existing configurations (.snowsql/, .snowflake/). Or explicitly specify by adding matching args/params to your project's config.

    # myconf.toml
    [test-connection]
    account=mysfaccount
    authenticator="externalbrowser"
    ...

    # config/config.yaml
    source:
        type: snowflake
        connection:
            file_path: test-connection
            name: ./myconf.toml
        change_tracking_database: melchi_cdc_db
        change_tracking_schema: streams

    sf.connect(connection_name=?, connections_file_path=Path(?).resolve())


I just pushed the external_browser branch to github that should offer 1/ externalbrowser authentication and 2/ the ability to use TOML files with profiles with instructions on how in the readme. if you run the following you should be able to test it out.

     git clone https://github.com/ryanwith/melchi.git
     git pull external_browser
     # remaining steps are the same
Would you mind testing it out and letting me know if it works for you? Would really appreciate it!

You can also let me know if it works in discord here: https://discord.gg/bTg9kJ92


Thank you! Will take a look at this over the next few days


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: