Moving data between systems is problematic. Where this product is actually neede...

woodhull · 2024-11-07T20:17:16 1731010636

For my use case of something similar on Clickhouse:

We load data from postgres tables that are used to build Clickhouse Dictionaries (a hash table for JOIN-ish operations).

The big tables do not arrive via real-time-ish sync from postgres but are bulk-appended using a separate infrastructure.

exAspArk · 2024-11-07T22:31:17 1731018677

Would you be able to share how you implemented "bulk-appended using a separate infrastructure" at a high level?

exAspArk · 2024-11-07T19:56:42 1731009402

Fair point. We think that BemiDB currently can be useful when used with small and medium Postgres databases. Running complex analytics queries on Postgres can work, but it usually requires tuning it and adding indexes tailored to these queries, which may negatively impact the write performance on the OLTP side or may not be possible if these are ad-hoc queries.

> (multi-TB databases under load) is where logical replication won't be able to sync your tables in time

I think the ceiling for logical replication (and optimization techniques around it) is quite high. But I wonder what people do when it doesn't work and scale?

delive · 2024-11-08T03:36:58 1731037018

What would you consider to be small or medium? I have a use case for analytics on ~1 billion rows that are about 1TB in postgres. Have you tried on that volume?

exAspArk · 2024-11-08T03:46:43 1731037603

We haven't tested this with 1TB Postgres databases yet, assuming that most companies operating at this scale already built analytics data pipelines :) I'm curious if you currently move the data from this Postgres to somewhere else, or not yet?

delive · 2024-11-08T04:12:13 1731039133

Not yet, mostly just kicked the can down the road due to costs. Like you said in another post, careful indexes on postgres get you quite far, but not nearly as flexible as a columnar DB.

I think your project is great. I suspect incremental updates will be a big feature for most uptake (one we would need to try this out at least).