Awesome work. I tried something similar a while back, but I gave up once I realized the memory requirements for Ruby numeric primitives make it near impossible to massage large data sets in Ruby.
Pandas and numpy skirt this with custom numeric types. I've seen some efforts to build pandas clones in Ruby, but none have come close to the performance needed to handle a few gigs of data.
I think they were adding something like python’s buffer protocol to ruby3, which should pave the way for something like numpy if there is enough demand. I’m referencing https://bugs.ruby-lang.org/issues/14722
Pandas and numpy skirt this with custom numeric types. I've seen some efforts to build pandas clones in Ruby, but none have come close to the performance needed to handle a few gigs of data.