Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Time Series Database?
4 points by thisisbrians on June 9, 2020 | hide | past | favorite | 3 comments
I'm curious what other HNers are using for their timeseries data needs. At my company, we're specifically working in IoT with hundreds of thousands of sensors cumulatively sending in ~5k observations per second (~5 TB of uncompressed data on disk). We've tried all sorts of options and currently are using TimescaleDB, but are thus far very unsatisfied with query speeds (typically, we just want to see hourly averages over the course of some arbitrary time period for a few sensors, but sadly this can take several minutes for just 3 sensors over the period of a year, as an example). As the data has grown in size over the years, experimenting with new options has become more and more burdensome, so I figured I'd poll the crowd to see what approaches are working for others.



I am also using TimescaleDB and face similar issues around the OLAP workloads. I tested Clickstream and QuestDB, both of which claim superior OLAP performance, however the ingest rates and cardinality of my unstructured data did not perform on par with TimescaleDB in my limited tests. Lacking time, instead I implemented a number of strategies within Postgres - trading space for speed with additional computed index columns and partitioning around these, applying table layering at various time resolutions, and using Real-time Aggregates as introduced in TimescaleDB 1.7. The Real-time Aggregates make a significant improvement to performance on large datasets for very little effort and appear to be a feature well-suited to solving your problem. This has alleviated performance enough to keep me looking within the Postgres ecosystem. For now the performance is within target budgets again. Going forward, time permitting, I’ll be looking at additional Postgres plugins which target OLAP performance using combinations of columnar storage and SIMD processing: Swarm64, VOPs (https://github.com/postgrespro/vops), pg_strom


I'd take a look at what Prometheus is using, I think that is IndexDB.


Clickhouse can bring a performance advantage on the query side, and with a proper cluster config scale for ingest.

CloudFlare has some useful blog posts on their uses of it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: