WebAssembly can be certainly faster than JS in certain situations, but well-structured, monomorphic JS with the kind of type-hints for numeric values that asm.js is based off of can be extremely fast as well.
It'd be great to check the assumption that WASM is actually faster in this case, and by how much, especially given the friction of sending anything other than numbers across the JS/WASM boundary.
You will get me into trouble! I said 80x faster than the Python prototype. I did benchmark this vs Pandas over the years and depending on the usecase the two codebases trade blows with each other for static data. However for streaming datasets Perspective has a large advantage over Pandas since pandas does a full group by on every update, while perspective does its work incrementally using deltas.
We do some benchmarking just for catching performance regressions and validating assumptions against the non-WebAssembly version, but definitely need more in this regard.
[benchmark for wasm build](https://jpmorganchase.github.io/perspective/examples/benchma...)
I'd also be interested in some performance comparisons with a library like CrossFilter [1]. Does the improvement outweigh the penalties of crossing the JS/WebASM boundary?
The boundary-crossing is definitely the bottleneck right now. We are currently putting alot of work into the Apache Arrow support specifically to avoid this crossover, which will allow us to send data from the server in binary and avoid parsing in the browser.
Can you elaborate on how that could work? Does arrow really allow for abstracting away the need for serialization even in JS - server scenarios? I though it was more of a shared memory data frame utility ?
There is an early arrow example in the examples package - superstore-arrow.html. The idea is that, instead of converting your data to internet->text->JSON->ArrayBuffer, you just keep the data in binary and write it directly into the C++ heap ArrayBuffer as-is. We currently do not read this in C++ directly for various reasons related to how emscripten allocates memory, but the general idea is the same.
We (Graphistry) recently contributed a native JS reader/writer into the Apache Arrow project, so may help both teams! We did it as legwork for our beyond-native efforts (GPU cloud streaming) and taming our JS datatypes, so similar needs here I'm guessing!
Funny enough: was in NYC today talking with banking teams about related tech. Too bad we didn't know about this effort, would have loved to meet!
In addition to what texodus has said below, Crossfilter only implements a small subset of what perspective provides. For example, no streaming (although there is a related project from the Heer lab that supports incremental updates - https://github.com/jheer/datavore), only a single level of grouping, only in 1 dimension, and you can only support 16 dimension fields at once (without increasing a constant in the codebase).
I believe you'd need to port Python to WebAssembly to make much use of pandas - unaware of any projects attempting the former.
The grid is just a plugin for the excellent [Hypergrid](https://github.com/openfin/fin-hypergrid), which is editable, but you'd need to manually push those edits back to the engine for now.
Briefly in the animation I see there's hierachical (or multi-dimensional) data used. Could you tell me about what sort of data (statistics, monitoring) this supports?
Also, what methods are available for getting data streamed [into Perspective]? I'm skimming through the docs, but am not quite getting it...
Ok, thank you for this. I'll have to see some time about that Arrow support. JSON and CSV is already plenty enough.
I didn't have time for a proper look yet, so no worries. If at all possible post the code and data (or something like it) that is in those animations you have. That would perhaps allow users to dive in by tinkering here and there.
Just stalked you on linkedin with a similar question, but I was wondering if you were involved with the actual process of getting the library open sourced, and how that works at a large security focused company like JPMorgan Chase.
(former Chase employee)
Not the op, but there's more to frontend development than JS. You need to account for the API of browsers, the DOM, CSS and the myraid of frameworks and libraries built for the specific UI requirements of frontend development (e.g. React). So even if your favorite language of choice becomes compilable to WebAssembly, you'll still be learning an entirely different way of development that will be completely unfamiliar regardless.
The only thing I see languages compiled to WebAssembly taking over is development of highly resource intensive applications, algorithms or libraries. Or if there's a quantum leap with some specific language + framework that eliminates a lot of work with current frontend development. Otherwise, there will be too much fragmentation and you'll see the regression to the mean effect as has been the case with predecessors such as CoffeeScript, Elm, PureScript, ReasonML, and every other compile-to-JS language du jour.
A paradigm shift could happen and what's cool is that it could be entirely something new. This reminds me of the early days of the WWW when the first programming paradigm was Perl/CGI, then Cold Fusion, then Legacy ASP, and then ActiveX, and so on.
The possibility of some company creating, from scratch, a new web development paradigm, given everything we know today, is exciting.
Sure, and based on historical precedent, that's almost guaranteed to happen. But I doubt it will be whatever-my-favorite-backend-language-of-choice-is and more likely some-domain-specific-language-and-new-framework. In other words, we'll all be relearning everything again 5 years from now.
Java applets were supposed to provide that in the 90s. Netscape was going to actually embed Java in the browser as an alternative to JavaScript, but there wasn't enough time, so they just shipped with JS, and Java was provided via a plugin.
As a fellow backend developer I certainly hope so. I think web assembly gets us very close. Only two things I missed were direct canvas access from C++ without going through a JS shim and access to some kind of blob storage api in the browser.
By virtualization I meant also not loading entire dataset to memory, but being able to work only on a constrained view of the data. For example, database has 1TB dataset, I only view 100 MB, further columns/sections are loaded when I scroll.
It is virtual in the sense that it does not realize the entire dataset in the JS side of the WebAssembly bridge - this is a performance optimization but entirely in-memory. While you can run the engine in node.js and use it to efficiently stream updates to a symmetric engine in the browser, we do not currently implement server-virtualized views.
Though, we have quite alot of experience doing this in the past - and the design of Perspective is very much a reaction to what we learned, at lesat in regards to the typical financial dataset which is much smaller than 1TB.
Not through the UI yet, only in the engine API - but the pivoting itself is quite fast, so should still be suitable for drilling down "on the fly" so to stpeak
Even outside of C/C++, Emscripten seems to be required in some fashion for just about every "compile X to webasm". Which i'm not that happy about as it's quite heavy and takes a long time to compile, which makes it harder for newcomers to try it out because it requires a fairly large time investment to even compile hello world.
Yeah I'm super excited about Rust's new webasm compiler, as well as some new-ish stuff with .Net that works without Emscripten.
I get the ecosystem is super young, but it was honestly what kept me from playing with Webasm for a while, because each time I'd sit down to play with it I'd be in for a 3 hour compile after spending an hour getting my windows machine to have the right compilers and not stomping over other things I have setup on here.
Sounds like you were seeing an old emscripten SDK bug, where it compiled LLVM+clang unnecessarily and with debug info (which is very slow).
Currently there shouldn't be anything like that, it will download a binary build of LLVM+clang for your system. It should be ready to use immediately after that download.
No, he is right. cargo-web supports three backends, of which only two include emscripten and the third using a native rust backend.
https://github.com/koute/cargo-web#features
Rust can be compiled to wasm without emscripten using wasm32-unknown-unknown.
Does that mean I can install cargo-web without emscripten or do I still need it even though I am not going to use it just because the two other targets use it? In other words, is emscripten an optional or a hard dependency?
Yes for much of its life perspective has been a python library. It offers a streaming dataframe abstraction in Python. I believe the Python bindings have not been open sourced yet.
Having worked at a branch off a university devoted to its online school and content marketing where the inputs and outputs of production are entirely digital, I can say they sure as hell aren't a software company; nobody above on-the-ground grunts think much about even the big pictures of domain modeling / workflows, data ingestion / transformation, computing / efficiency / performance, software development lifecycle / product portfolio management / QA, growth hacking / optimization / testing etc. because they simply aren't interested in the slightest. I'm sure plenty of organizations with at least one foot in pre-internet industries are in the same boat, and it's exciting to see counterexamples.
I think it is more correct to say that a large subunit of JPM works as a software company -- for its internal use. All modern financial companies are heavily tech oriented, even those you wouldn't normally think of that way. The best tech firms I've ever been involved with financial firms from front desk trading on back. One of the best tech teams I've ever met with has been one of Bloomberg's teams dealing with their internal core services and bond platform (I was totally floored at how amazing these guys were). These companies are too large to say they are entirely software, but there are definitely Directors and CTOs that live and breath all those things you mention.
Sure yes, in modern business environments, everyone's business interacts at least in some way with computing, digital products like media and software, and the internet. Even if you had a remote industrial site cutoff from the wider internet, there would be some automation to be done and some code to do it. Every organization should have a CTO, if nothing else than to make procurement and outsourcing decisions. Organizations shouldn't outsource that which differentiates them, and there's lots of room for differentiation in the quality and degree of automation, thus internal software teams. Etc.
All I'm getting at is that those present set of truths, that that reality hasn't spread everywhere yet, and it's still worth celebrating when it has and when it goes well.
There are plenty of people given plenty of power in leadership that won't bring themselves to understand technology and have yet to retire.
You would be pleasantly surprised, then, that this reality has already permeated the modern financial industry. For instance, see http://www.businessinsider.com/goldman-sachs-wants-to-become... . While there are still “relationship bankers” in expensive suits, most acknowledge that software is key to giving them an edge over the competition, and that building a culture around software only helps them.
J.P.Morgan is not "the stock market" and financial companies are not software companies. They are not selling software. They are selling financial services.
They consider themselves a banking company. It's about the culture, so it is indeed refreshing to see non-traditional tech companies open sourcing things.
This is great. Finance people especially love this sort of thing - they do want to browse through entire datasets vs seeing higher level metrics in a lot of cases.
I love seeing new UX on Web Assembly. Someone needs to port or create a standard layout engine like WPF to build on this. If the tools were there, C# + VS Code + solid layout engine...I'd dump JS + HTML + CSS in a "flash".
Sure. I've played with Blazor. It's very raw at this point and just a side project. I'd like to see MS Research or even a full blown R&D team at MS go full force at this.
Presumably, the point of using WASM and service workers is for performance, but I see no benchmarks. I also have trouble imagining that this actually improves performance, unless you’re doing all of your compute in the browser and it’s CPU bound (this seems like bad design).
What is the performance of doing it this way vs. without WASM and workers?
Until we see numbers, this smells like a recruiting play.
For ticking data being pushed into bespoke pivots, running this on the backend and, say, pushing renders to the frontend isn't generally better. You're not likely to be sharing much of the grunt. Also, latency will be an issue so you need to be pretty careful with GC runtimes that aren't optimised for latency.
> What is the performance of doing it this way vs. without WASM and workers?
@dman says in another thread that the original was in Python and this was 10x faster. If this met their performance goals, it does the job. It may be possible that another way is better, for some definition of better, but who cares if it works as required?
> Until we see numbers, this smells like a recruiting play.
It's presumably OK not to use it if you don't want to.
And what if they think this may help recruiting? Seems like a reasonable trade to me.
We use Web Workers principally to separate data CPU load from rendering load, as the datasets we deal with update very frequently and are quite large, and e.g. Highcharts can take on teh order of 100ms+ on detailed charts.
We have some light benchmarks we use for regression testing, but definitely need more work in this area
First, let me say this looks awesome! I’m curious if your considered React.js for the presentation layer (i.e. the js/html5 parts) - on the surface it seems like it would be a good fit - and if so, what you saw as the benefits or drawbacks of it vs the approach you went with.
We chose to go with a Web Components based interface for compatibility across frameworks, but alot of where we go in the future will be determined by the expertise of the developers we hire to work on it, and the community if there is interest there.
First of all, great work everyone involved! I know first-hand how hard it is to push something like this past compliance ;)
Going with WC is a sound decision since its trivial to wrap it into ember, react, or any other framework-du-jour... especially so since more and more of them are adopting WC patterns and design practices.
I might try to wrap this into react and will see if bosses permit opensourcing it!
For those interested, keep in mind the only way to use WebAssembly with CSP in Chrome is by turning on 'unsafe-eval'. FF/Edge/Safari all at least support compilation from URLs with more locked down policies
This is seriously cool. Unfortunately I have no use-case for it in the product I currently work on, but still fascinating to flick through the source code and play with it.
I followed the directions to the letter, and everything installed correctly on MacOS. When I tried the same on Linux, though, I got tons of the following errors on the build step (keep in mind I do have boost development libraries installed in /usr/include):
(Sorry about the formatting.... I tried <code>).
Anyway, TL/DR it cannot find boost.
$ ./node_modules/.bin/lerna run start --stream
lerna info version 2.8.0
@jpmorganchase/perspective: > @jpmorganchase/perspective@0.1.1 start /home/dj/usr/src/perspective-clone/packages/perspective
@jpmorganchase/perspective: > npm run compile && (npm run perspective & npm run compile_test & npm run compile_node & wait)
@jpmorganchase/perspective: > @jpmorganchase/perspective@0.1.1 compile /home/dj/usr/src/perspective-clone/packages/perspective
@jpmorganchase/perspective: > mkdir -p build build/wasm_async build/wasm_sync build/asmjs && (cd build/; emcmake cmake ../; emmake make -j8; cd ..)
@jpmorganchase/perspective: -- Configuring done
@jpmorganchase/perspective: -- Generating done
@jpmorganchase/perspective: -- Build files have been written to: /home/dj/usr/src/perspective-clone/packages/perspective/build
@jpmorganchase/perspective: Scanning dependencies of target psp
@jpmorganchase/perspective: [ 1%] Building CXX object CMakeFiles/psp.dir/src/cpp/base_impl_win.cpp.o
@jpmorganchase/perspective: [ 2%] Building CXX object CMakeFiles/psp.dir/src/cpp/arg_sort.cpp.o
@jpmorganchase/perspective: [ 4%] Building CXX object CMakeFiles/psp.dir/src/cpp/calc_agg_dtype.cpp.o
@jpmorganchase/perspective: [ 5%] Building CXX object CMakeFiles/psp.dir/src/cpp/aggspec.cpp.o
@jpmorganchase/perspective: [ 8%] Building CXX object CMakeFiles/psp.dir/src/cpp/aggregate.cpp.o
@jpmorganchase/perspective: [ 8%] Building CXX object CMakeFiles/psp.dir/src/cpp/base.cpp.o
@jpmorganchase/perspective: [ 9%] Building CXX object CMakeFiles/psp.dir/src/cpp/base_impl_linux.cpp.o
@jpmorganchase/perspective: [ 10%] Building CXX object CMakeFiles/psp.dir/src/cpp/build_filter.cpp.o
@jpmorganchase/perspective: [ 12%] Building CXX object CMakeFiles/psp.dir/src/cpp/column.cpp.o
@jpmorganchase/perspective: In file included from /home/dj/usr/src/perspective-clone/packages/perspective/src/cpp/calc_agg_dtype.cpp:11:
@jpmorganchase/perspective: In file included from /home/dj/usr/src/perspective-clone/packages/perspective/src/include/perspective/calc_agg_dtype.h:12:
@jpmorganchase/perspective: In file included from /home/dj/usr/src/perspective-clone/packages/perspective/src/include/perspective/schema.h:13:
@jpmorganchase/perspective: /home/dj/usr/src/perspective-clone/packages/perspective/src/include/perspective/base.h:29:10: fatal error: 'boost/unordered_map.hpp' file not found
@jpmorganchase/perspective: #include <boost/unordered_map.hpp>\n ^~~~~~~~~~~~~~~~~~~~~~~~~
</code>
If anyone finds this project exciting and is interested in learning more about working on Open Source at J.P.Morgan, feel free to send me a message - we are always looking to hire experienced, passionate talent!
I'm not knocking the code...I think it's brilliant. I just think you're going to cross wires with Google here. Will perspective mean "filtering troll comments" or will it mean "displaying analytics in web assembly" is an unfortunate set of choices.
you'd be surprised how many naming collisions we have, I assume most teams that code-name their software hit the same issue when they go to open source
We very much plan to continue developing this entirely in the open! It is reputationally important we do not just dump our dead projects on the internet as Open Source, which is one of the reason why we chose Perspective for this project in the first place.