I do not believe this is the right approach to the problem, but I do appreciate the problem you're trying to solve here. However, in my opinion - Clojure libraries shouldn't be trying to reinvent the wheel. If your goal is to expose a better interface for vector arithmetic in Clojure - write a library that does that really well.
But if your primary concern is performance, please don't roll your own vector or matrix "native" interface. You will certainly never come close in speed to what has come before (BLAS implementations galore, et al). Also it's just a lot of work that is basically keeping you from working on the higher order problems out there that we desperately need to tackle.
If your goal is more "Clojurey" syntax then just spend a day or two wrapping the functions you want over a tried and tested numerics implementation. Additionally, there is likely a pre-existing Java wrapper which does just that for whatever you need considering that Java is still beloved by university professors, a key demographic for fast math libraries.
On the other hand, I think Vertigo ( github: https://github.com/ztellman/vertigo ) is taking a very interesting approach to the Clojure->Native problem, which I believe might be of use to any library wanting to bring performant numerics to Clojure. Unfortunately, ztellman has deprecated his OpenGL and OpenCL libraries, but I think that Vertigo in combination with OpenCL and the kernels courtesy of clMAGMA would be fantastic.
> If your goal is more "Clojurey" syntax then just spend a
> day or two wrapping the functions you want over a tried
> and tested numerics implementation.
This is exactly what we're trying to do: provide some Clojure macros that give nicer syntax for interacting with Java arrays with high performance. We're explicitly not introducing a new vector type.
Most of the work here wasn't in the wrapping -- hiphip itself consists of very little code -- but in figuring out what's fast and what's not, documenting this, and making it easy to do things the fast way.
I think what the GP is arguing is that if you want to be really fast you should forget using Java arrays and just wrap BLAS, LAPACK, etc., which are written in "close to the metal" languages, optimized within an inch of their life, have been around for decades, and are used by others with similar goals (numpy/scipy, etc.). As the GP says, Java libraries that already do this are probably available, so this may be a pretty trivial task.
I don't know why no one has answered this (it is brought up a few places in this thread) but if I had to guess why they didn't want to go this route I would say it's the trade-off of not not having your data be native. They presumably have a somewhat highly involved pipeline/topology of computations that data flows through. In the interests of good readable and maintainable code, having a nice declarative data representation is a big plus, and doing the computation with native Java data structures is apparently fast enough for their needs.
You've just pretty much described the motivation for core.matrix: it's an API that wraps various other back end vector/matrix libraries (including JBlas etc.) with a nice, standard Clojure API.
Yup. I'm really glad to see that you guys are working with Incanter now as well.
BTW, as a total aside, I'm a big fan of Clisk. I think you nailed the API for functional image manipulation. I'm interested in seeing an OpenCL backend for it. If I get anything working well, I'll be sure to contact you.
All arithmetic operations on these boxed objects are
significantly slower than on their primitive counterparts.
This implementation also creates an unnecessary intermediate sequence
(the result of the map), rather than just summing
the numbers directly.
Clojure's Reducers framework might address the described issues in a future when, in Rich's words, "those IFn.LLL, DDD etc primitive-taking function interfaces spring to life". For now, they only solve the intermediate-collections part of the problem.
We're also anxiously awaiting this -- it seems with gvecs and reducers and primitive fns the pieces are all there, we just need the glue to put them all together. Unfortunately, for now I think we're stuck with arrays, and we're trying to make the most of it :)
Another author here (Emil). It's been a pleasure and a great learning experience working with Prismatic (and Climate) on this. Hopefully it'll show that, given enough macros and coffee, all problems are shallow, or something to that effect.
I've been a big follower of your entire team, specifically on their talents in machine learning and NLP (e.g. http://nlp.stanford.edu/jrfinkel/papers/jrfinkel-thesis.pdf). The fact that you also use Clojure and give back so much is icing on the cake. Thank you!
Looks awesome! One data issue I've seen go relatively unaddressed in the Clojure community is the serialization of big matrices and arrays.
There's a start on a clojure hdf5 (hdf5 is a container format common in scientific circles) implementation, but it's a long ways from done. https://github.com/clojure-numerics/clj-hdf5 I'm not the author, but I am the negligent steward.
I'd love it if someone smarter / better at Clojure than me was interested in helping to think about useful, idiomatic high-level abstractions on top this high-performance data store.
PyTables does a great job of making gobs of hdf5 data easy to work with for analysts--I'm just too novice at Clojure/FP to know what is a reasonable analogue for Clojure.
Without knowing anything about hdf5 specifically, Vertigo [1] will let you treat a memory-mapped file (or a piece of one) as a normal Clojure data structure, as long as the element types are fixed-layout.
This seems like one of the 10x productivity / mastery cases where your three minutes and creativity have produced something that would take me ages, even if you set me to the task of duplicating your design.
Are you willing to spend another 3 minutes producing a logo for an unrelated programming project? :-P
Cool library, can imagine how moving away from boxing / unboxing can be a huge boost for them.
I've been looking for something that gave SIMD intrinsics to Java programmers - does anyone know if such a thing exists? Could be a nice addition to this lib.
What brought you to develop this library rather than relying on Incanter/Colt?
The scope of HipHip seems different, of course, but there is enough of an overlap to warrant the question.
Incanter's default Matrix implementation is now Catrix as well - which is a Clojure friendli(er) wrapper over jBLAS matrices. https://github.com/tel/clatrix Check out the source.
We did, and we've been talking to the developers about a potential future collaboration. Our goals are really complementary; hiphip is about getting your code into the inner loop of Java bytecode (not just a set of canned operations), whereas core.matrix is about abstractions for a fixed set of operations across different matrix types. There may eventually be overlap, if core.matrix gets into compiling expressions into new operation types, which sounds like something they're interested in.
core.matrix developer here :-) we're definitely looking at expression compilation. Also hiphip could be very useful for writing fast core.matrix implementations for the standard API. So definitely good room for collaboration.
One thing I've found is that with macros, it can actually be easier to write performant primitive-reliant code. Still not up to Common Lisp standards, but much better than, eg, having to use a scripting language to generate all the primitive specializations of your data structure, like Trove and Fastutil do.
Having written my own naive Clojure dot product, I can definitely appreciate what you guys have done!
Any plans to attack sparse vectors? Performance on the sparse vector operations I wrote was poor, but being new to Clojure it wasn't a great implementation.
vectorz-clj has sparse vector support.... it's bit of an hidden feature at the moment (you'll have to use Java interop to instantiate a SparseIndexedVector) but it works and is pretty fast for many operations.
Huh, cool! I kinda assumed the JIT already took care of this sort of low-hanging fruit, we'll test this out and if it works include it in the next version of hiphip.
But if your primary concern is performance, please don't roll your own vector or matrix "native" interface. You will certainly never come close in speed to what has come before (BLAS implementations galore, et al). Also it's just a lot of work that is basically keeping you from working on the higher order problems out there that we desperately need to tackle.
If your goal is more "Clojurey" syntax then just spend a day or two wrapping the functions you want over a tried and tested numerics implementation. Additionally, there is likely a pre-existing Java wrapper which does just that for whatever you need considering that Java is still beloved by university professors, a key demographic for fast math libraries.
On the other hand, I think Vertigo ( github: https://github.com/ztellman/vertigo ) is taking a very interesting approach to the Clojure->Native problem, which I believe might be of use to any library wanting to bring performant numerics to Clojure. Unfortunately, ztellman has deprecated his OpenGL and OpenCL libraries, but I think that Vertigo in combination with OpenCL and the kernels courtesy of clMAGMA would be fantastic.