> I've loaded a 100 Billion Rows Have you done any load tests that would more cl...

lykr0n · on Nov 28, 2018

I'm working on developing benchmarking tools for internal testing, but both Yandex and CloudFlare use Clickhouse for realtime querying. I'm still in development phase for my product, but I'll make sure to post information & results when we launch here.

https://blog.cloudflare.com/http-analytics-for-6m-requests-p...

But I've spent a long time looking at the various solutions out there, and while ClickHouse is not perfect, I think it's the best multi-purpose database out there for large volumes of data. TimescaleDB is another one, but until they get sharding it's dead on arrival.

SEJeff · on Nov 28, 2018

I'd say kdb (from kx systems) is the best database for this problemspace, but it is prohibitively expensive for the http analytics use case. It is also a pain to query, but unbelievable what it can do.

haggy · on Nov 28, 2018

Very cool, I'll check this out!

lykr0n · on Nov 28, 2018

It's a quirky piece of software and has limitations that need to be considered when standing up a production cluster- such as you cannot reshard it currently. If you have a 3 node cluster, it's messy and requires downtime to add another node.

kermatt · on Nov 28, 2018

Still a bit messy, but the clickhouse-copier utility helps a bit: https://github.com/yandex/ClickHouse/issues/2579

lykr0n · on Nov 28, 2018

I have open issues about it on GitHub. It does not work correctly at this time. If you dig through the issues, there are statements by the devs saying the tool has been neglected.

askbjoernhansen · on Nov 29, 2018

We load millions of rows per second and use a handful (or more?) of materialized views to build appropriate summaries. Various clients make queries to the "raw data" and the views and it all works fine, basically.

haggy · on Dec 7, 2018

> it all works fine, basically

The "basically" is what intrigues me :D