Hacker News new | past | comments | ask | show | jobs | submit login

Interesting project! Thank you for open sourcing and sharing. Agree that local and embedded analytics are an increasing trend, I see it too.

A couple of questions:

* Iā€™m curious what the difficulties were in the implementation. I suspect it is quite a challenge to implement this support in the current SQLite architecture, and would curious to know which parts were tricky and any design trade-off you were faced with.

* Aside from ease-of-use (install extension, no need for a separate analytical database system), I wonder if there are additional benefits users can anticipate resulting from a single system architecture vs running an embedded OLAP store like DuckDB or clickhouse-local / chdb side-by-side with SQLite? Do you anticipate performance or resource efficiency gains, for instance?

* I am also curious, what the main difficulty with bringing in a separate analytical database is, assuming it natively integrates with SQLite. I may be biased, but I doubt anything can approach the performance of native column-oriented systems, so I'm curious what the tipping point might be for using this extension vs using an embedded OLAP store in practice.

Btw, would love for you or someone in the community to benchmark Stanchion in ClickBench and submit results! (https://github.com/ClickHouse/ClickBench/)

Disclaimer: I work on ClickHouse.




Thanks for your thoughtful questions!

* SQLite has rock solid support for transactionally persisting data to disk, which is a difficult and complex thing that stanchion gets almost for free. But the downside is fitting the architecture of stanchion into sqlite's way of doing things. So that is a long way of saying yes, the biggest difficulty was first learning how sqlite works and the implementing features to work with sqlite.

* It's a good question and certainly an area worth exploring. I don't know the answer, but doing, for instance, a comparison of power consumption and performance on a phone for the same use case between sqlite+stanchion and chdb running side-by-side with sqlite would be very interesting. To spitball some ideas of areas that may benefit stanchion: caching (both code and data, application caching and hardware caching), data sharing (no need for decoding and re-encoding either within the host application or between databases), and unified transactions (this one is a stretch). As you mention, chDB (and also DuckDB) benefit from having an architecture designed for analytics.

* As mentioned elsewhere in these comments, the sqlite virtual table system does have some limitations, so I think you're right when it comes to query performance. However, those limitations are limited to the way sqlite queries data in virtual tables, so I think stanchion can be competitive on data size on disk and potentially on insert performance.

I do plan to run and publish some benchmarks of stanchion against chDB and DuckDB in the near term. So far, I haven't focused on performance with stanchion, but that will be more of a focus going forward. Plus it's good to measure first and use that to track improvements. Stay tuned!


Judging from [1] I think your intuition about performance and particularly storage improvements is correct.

1: https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: