Hacker News new | past | comments | ask | show | jobs | submit login

"I think one interesting project in the near future could be to try and build a column-oriented storage engine that's "good enough" for both OLAP and OLTP workloads." SAP Hana is an example of a system that fits into this category. This isn't new either and has been existing since the 90s. Sybase IQ (which SAP acquired) was the first commercially successful columnar database. They have an in-memory row engine to handle OLTP. OLAP queries perform exceptionally well due to the column oriented nature of the storage. Customer deployments are in the 100s of TBs and low PBs these days. Blows most open source software in terms of performance if you are willing to shell out the $. Source: I work at SAP.



Oh, I don't mean a database frontend that can handle both OLTP and OLAP workloads, usually by having some kind of OLAP column-store and some kind of OLTP main memory row-store. I know there's a lot of those (not only HANA, but also MemSQL, SQL Server, etc.)

The fun thing to try and imagine here is having literally the same physical data format that works for both kinds of workloads.


You actually don't need to have the same storage data layout if you use a time series as a starting point; because you can maintain different data layouts in parallel, and the time dimension permits strong consistency across them all.

If this is what you mean by a "database frontend", I am really confused as to why you object to this?

I think this property of time series is going to prove very important in the 2020s


Again, I don't care about distributed consistency here, nor is it mutually exclusive with what I'm talking about.

The question we're trying to answer is whether there exists at all a storage engine, at the scope of a single node (obviously generalizable/scalable), that can fit all use cases well. Once you figure out the answer to that, obviously having one storage engine is simpler than having two storage engines.


Parallel but consistent indexes could be done in a single process too. The tradeoff is natural reads at the cost of extra constant factor of write computation and storage.

Which is exactly the tradeoff RDBMS indexes already make.


That's still not what I'm talking about. https://cs.brown.edu/~ugur/fits_all.pdf


thank you for that link !!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: