Hacker News new | past | comments | ask | show | jobs | submit login

> I've loaded a 100 Billion Rows

Have you done any load tests that would more closely mirror a production environment such as performing queries while clickhouse is handling a heavy insert load?




I'm working on developing benchmarking tools for internal testing, but both Yandex and CloudFlare use Clickhouse for realtime querying. I'm still in development phase for my product, but I'll make sure to post information & results when we launch here.

https://blog.cloudflare.com/http-analytics-for-6m-requests-p...

But I've spent a long time looking at the various solutions out there, and while ClickHouse is not perfect, I think it's the best multi-purpose database out there for large volumes of data. TimescaleDB is another one, but until they get sharding it's dead on arrival.


I'd say kdb (from kx systems) is the best database for this problemspace, but it is prohibitively expensive for the http analytics use case. It is also a pain to query, but unbelievable what it can do.


Very cool, I'll check this out!


It's a quirky piece of software and has limitations that need to be considered when standing up a production cluster- such as you cannot reshard it currently. If you have a 3 node cluster, it's messy and requires downtime to add another node.


Still a bit messy, but the clickhouse-copier utility helps a bit: https://github.com/yandex/ClickHouse/issues/2579


I have open issues about it on GitHub. It does not work correctly at this time. If you dig through the issues, there are statements by the devs saying the tool has been neglected.


We load millions of rows per second and use a handful (or more?) of materialized views to build appropriate summaries. Various clients make queries to the "raw data" and the views and it all works fine, basically.


> it all works fine, basically

The "basically" is what intrigues me :D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: