Did you experiment with doing it JS-only prior to the C++ version? If so, what k...

spankalee · on Feb 6, 2018

This++

WebAssembly can be certainly faster than JS in certain situations, but well-structured, monomorphic JS with the kind of type-hints for numeric values that asm.js is based off of can be extremely fast as well.

It'd be great to check the assumption that WASM is actually faster in this case, and by how much, especially given the friction of sending anything other than numbers across the JS/WASM boundary.

kodablah · on Feb 6, 2018

And to be fair, you'd have to build it with something like prepack to get some of the optimizations LLVM gives you.

dman · on Feb 6, 2018

No, when I first prototyped it I had written a Python version initially. From memory the first C++ build was ~80x faster than the Python prototype.

goatlover · on Feb 6, 2018

80x faster than using Pandas? What if you just used the Numpy values array?

dman · on Feb 6, 2018

You will get me into trouble! I said 80x faster than the Python prototype. I did benchmark this vs Pandas over the years and depending on the usecase the two codebases trade blows with each other for static data. However for streaming datasets Perspective has a large advantage over Pandas since pandas does a full group by on every update, while perspective does its work incrementally using deltas.

carapace · on Feb 6, 2018

I'm curious if you did any profiling? If the intent was always to rewrite the whole prototype in C++ then it kinda wouldn't matter, eh?

texodus · on Feb 6, 2018

We do some benchmarking just for catching performance regressions and validating assumptions against the non-WebAssembly version, but definitely need more in this regard. [benchmark for wasm build](https://jpmorganchase.github.io/perspective/examples/benchma...)

sciyoshi · on Feb 6, 2018

I'd also be interested in some performance comparisons with a library like CrossFilter [1]. Does the improvement outweigh the penalties of crossing the JS/WebASM boundary?

[1] http://square.github.io/crossfilter/

texodus · on Feb 6, 2018

The boundary-crossing is definitely the bottleneck right now. We are currently putting alot of work into the Apache Arrow support specifically to avoid this crossover, which will allow us to send data from the server in binary and avoid parsing in the browser.

polskibus · on Feb 6, 2018

Can you elaborate on how that could work? Does arrow really allow for abstracting away the need for serialization even in JS - server scenarios? I though it was more of a shared memory data frame utility ?

texodus · on Feb 6, 2018

There is an early arrow example in the examples package - superstore-arrow.html. The idea is that, instead of converting your data to internet->text->JSON->ArrayBuffer, you just keep the data in binary and write it directly into the C++ heap ArrayBuffer as-is. We currently do not read this in C++ directly for various reasons related to how emscripten allocates memory, but the general idea is the same.

BenGosub · on Feb 6, 2018

bringing Apache Arrow in the browser alongside wasm is exciting to say the least! Amazing capabilities are coming to browsers...

lmeyerov · on Feb 7, 2018

We (Graphistry) recently contributed a native JS reader/writer into the Apache Arrow project, so may help both teams! We did it as legwork for our beyond-native efforts (GPU cloud streaming) and taming our JS datatypes, so similar needs here I'm guessing!

Funny enough: was in NYC today talking with banking teams about related tech. Too bad we didn't know about this effort, would have loved to meet!

texodus · on Feb 7, 2018

Yes, this is the library we use.

We have met before actually, you did a demo at JPM in midtown several years back. Graphistry has come a very long way since then - impressive work!

lmeyerov · on Feb 7, 2018

Ah, small world. That was probably when we first started on client<>server GPU streaming. Looking forward to digging into the Perspective source!

dman · on Feb 7, 2018

Drop me a note the next time you are in NY, would love to meet up.

infinite8s · on Feb 6, 2018

In addition to what texodus has said below, Crossfilter only implements a small subset of what perspective provides. For example, no streaming (although there is a related project from the Heer lab that supports incremental updates - https://github.com/jheer/datavore), only a single level of grouping, only in 1 dimension, and you can only support 16 dimension fields at once (without increasing a constant in the codebase).