One of the founders of Rerun here. I don't remember exactly where the idea came from but I think it was basically two things. First off, creating chunks of columns to store or pass around is a pretty common approach in data systems. Parquet files have the concept of row groups for instance which is pretty similar (main difference is that chunks don't have to include all columns). Second, it was just quite obvious that we needed to amortize the fixed costs better for small data somehow
This part really caught my attention:
Where'd you guys get the idea for this approach? Did you know you could get this kind of improvement?