Hacker News new | past | comments | ask | show | jobs | submit login

Did you experiment with doing it JS-only prior to the C++ version? If so, what kind of performance increase did you see with WebAssembly?



This++

WebAssembly can be certainly faster than JS in certain situations, but well-structured, monomorphic JS with the kind of type-hints for numeric values that asm.js is based off of can be extremely fast as well.

It'd be great to check the assumption that WASM is actually faster in this case, and by how much, especially given the friction of sending anything other than numbers across the JS/WASM boundary.


And to be fair, you'd have to build it with something like prepack to get some of the optimizations LLVM gives you.


No, when I first prototyped it I had written a Python version initially. From memory the first C++ build was ~80x faster than the Python prototype.


80x faster than using Pandas? What if you just used the Numpy values array?


You will get me into trouble! I said 80x faster than the Python prototype. I did benchmark this vs Pandas over the years and depending on the usecase the two codebases trade blows with each other for static data. However for streaming datasets Perspective has a large advantage over Pandas since pandas does a full group by on every update, while perspective does its work incrementally using deltas.


I'm curious if you did any profiling? If the intent was always to rewrite the whole prototype in C++ then it kinda wouldn't matter, eh?


We do some benchmarking just for catching performance regressions and validating assumptions against the non-WebAssembly version, but definitely need more in this regard. [benchmark for wasm build](https://jpmorganchase.github.io/perspective/examples/benchma...)


I'd also be interested in some performance comparisons with a library like CrossFilter [1]. Does the improvement outweigh the penalties of crossing the JS/WebASM boundary?

[1] http://square.github.io/crossfilter/


The boundary-crossing is definitely the bottleneck right now. We are currently putting alot of work into the Apache Arrow support specifically to avoid this crossover, which will allow us to send data from the server in binary and avoid parsing in the browser.


Can you elaborate on how that could work? Does arrow really allow for abstracting away the need for serialization even in JS - server scenarios? I though it was more of a shared memory data frame utility ?


There is an early arrow example in the examples package - superstore-arrow.html. The idea is that, instead of converting your data to internet->text->JSON->ArrayBuffer, you just keep the data in binary and write it directly into the C++ heap ArrayBuffer as-is. We currently do not read this in C++ directly for various reasons related to how emscripten allocates memory, but the general idea is the same.


bringing Apache Arrow in the browser alongside wasm is exciting to say the least! Amazing capabilities are coming to browsers...


We (Graphistry) recently contributed a native JS reader/writer into the Apache Arrow project, so may help both teams! We did it as legwork for our beyond-native efforts (GPU cloud streaming) and taming our JS datatypes, so similar needs here I'm guessing!

Funny enough: was in NYC today talking with banking teams about related tech. Too bad we didn't know about this effort, would have loved to meet!


Yes, this is the library we use.

We have met before actually, you did a demo at JPM in midtown several years back. Graphistry has come a very long way since then - impressive work!


Ah, small world. That was probably when we first started on client<>server GPU streaming. Looking forward to digging into the Perspective source!


Drop me a note the next time you are in NY, would love to meet up.


In addition to what texodus has said below, Crossfilter only implements a small subset of what perspective provides. For example, no streaming (although there is a related project from the Heer lab that supports incremental updates - https://github.com/jheer/datavore), only a single level of grouping, only in 1 dimension, and you can only support 16 dimension fields at once (without increasing a constant in the codebase).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: