Hacker News new | past | comments | ask | show | jobs | submit login

> They expect seconds to minutes scale queries on hundreds of GB of data.

Use BigQuery from Google.




On-premise cluster.

Cloud solution are totally out due to the nature of the data. Not everything can be done in cloud.

If you have such huge amount of data, the total amount of time it takes to transfer there and compute is not as competitive as an on-premise solution, unless all your data live in the cloud.


I would look into https://spark.apache.org/ then. You can get quite good performance out of it, but you need to spend more effort in babysitting your data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: