WebAssembly can be certainly faster than JS in certain situations, but well-structured, monomorphic JS with the kind of type-hints for numeric values that asm.js is based off of can be extremely fast as well.
It'd be great to check the assumption that WASM is actually faster in this case, and by how much, especially given the friction of sending anything other than numbers across the JS/WASM boundary.
You will get me into trouble! I said 80x faster than the Python prototype. I did benchmark this vs Pandas over the years and depending on the usecase the two codebases trade blows with each other for static data. However for streaming datasets Perspective has a large advantage over Pandas since pandas does a full group by on every update, while perspective does its work incrementally using deltas.
We do some benchmarking just for catching performance regressions and validating assumptions against the non-WebAssembly version, but definitely need more in this regard.
[benchmark for wasm build](https://jpmorganchase.github.io/perspective/examples/benchma...)
I'd also be interested in some performance comparisons with a library like CrossFilter [1]. Does the improvement outweigh the penalties of crossing the JS/WebASM boundary?
The boundary-crossing is definitely the bottleneck right now. We are currently putting alot of work into the Apache Arrow support specifically to avoid this crossover, which will allow us to send data from the server in binary and avoid parsing in the browser.
Can you elaborate on how that could work? Does arrow really allow for abstracting away the need for serialization even in JS - server scenarios? I though it was more of a shared memory data frame utility ?
There is an early arrow example in the examples package - superstore-arrow.html. The idea is that, instead of converting your data to internet->text->JSON->ArrayBuffer, you just keep the data in binary and write it directly into the C++ heap ArrayBuffer as-is. We currently do not read this in C++ directly for various reasons related to how emscripten allocates memory, but the general idea is the same.
We (Graphistry) recently contributed a native JS reader/writer into the Apache Arrow project, so may help both teams! We did it as legwork for our beyond-native efforts (GPU cloud streaming) and taming our JS datatypes, so similar needs here I'm guessing!
Funny enough: was in NYC today talking with banking teams about related tech. Too bad we didn't know about this effort, would have loved to meet!
In addition to what texodus has said below, Crossfilter only implements a small subset of what perspective provides. For example, no streaming (although there is a related project from the Heer lab that supports incremental updates - https://github.com/jheer/datavore), only a single level of grouping, only in 1 dimension, and you can only support 16 dimension fields at once (without increasing a constant in the codebase).