Hacker News new | past | comments | ask | show | jobs | submit login

Do you have a schema available publicly? I would like to build a similar system using custom software + S3 + Parquet + Athena for this task and see if it works.



The schema is

    CREATE TABLE points(Timestamp DateTime,Client String,Path String,Value Float32,Tags Nested(Key String,Value String)) ENGINE = MergeTree() ORDER BY (Client, Timestamp, Path) PARTITION BY toStartOfDay(Timestamp)
And this is a like query I was using

    SELECT (intDiv(toUInt32(Timestamp), 15) * 15) * 1000 as t, Path, Value as c FROM points_dist WHERE Path LIKE 'tst_val1' and Tags.Value[indexOf(Tags.Key, 'server')] = 'node' and Timestamp >= toDateTime(1543421708) GROUP BY t, Path, Value ORDER BY t, Path
This table was made on 5 servers via a distributed table partitioned on the timestamp- so distribution was even.


Thanks! How big is the 100B rows in your system?


Takes up 200gb or so across 5 servers (this is according to ClickHouse's query stats). Actual disk might be a bit higher.


Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: