Hacker News new | past | comments | ask | show | jobs | submit login
Benchmarks for Blaze, A high-performance C++ math library (code.google.com)
63 points by wall_words on Aug 25, 2015 | hide | past | favorite | 30 comments



I remember this library being discussed on the Eigen list in 2012. Here's the thread:

http://thread.gmane.org/gmane.comp.lib.eigen/3423

The main critique at the time was that Blaze always assumes perfectly aligned data, including inside(!) the matrix, and pads the data with zeros if that is not the case. Of course, this makes it impossible to map external data, which is a huge downside. I'm not sure if that is still the case, but from skimming through the docs it doesn't look to me like this has changed.


An old thread on the Eigen list also mentioned that the Blaze folks were a little tricky with their benchmarks. Some of the reported performance numbers with Blaze were for calls out to Intel MKL routines. Eigen also supports MKL as a kernel backend, but the Blaze folks failed to enable this feature for the reported comparisons, if I recall.


> a little tricky

That's rather generous. If you're going to do serious work with Eigen, or Armadillo, or Blaze, you're going to include a BLAS library like OpenBLAS or Intel MKL (if you can afford it). Not including them is dubious, at best.



Or the analog of the link in the OP, https://bitbucket.org/blaze-lib/blaze/wiki/Benchmarks

HN mods: could someone change the discussion to point to this link? It has working pictures.


Thanks!

Off-topic: I like bitbucket better, so it is nice to see some projects going over there.


Thanks, I tried finding the link to the Bitbucket repository but I had a hard time finding the link via Google.


It actually took me more than a few minutes as well, had to use BitBucket's search since Google (rarely enough) was returning crap


I have noticed Google not being too useful in some queries this week... I'm wondering if they are trying new updates or something like that...


I am considering converting a C++03 math library to C++14 as a side project to learn C++14 and I examined Eigen and Blaze. Eigen's code size seems to be a fraction of Blaze, even though their functionalities are similar. Eigen also has some design documents while Blaze has papers but not much more. It seems I will try my hands on Eigen library for now. It is amazing that a couple of people could do in a few years; Blaze has hundred of thousands of lines of code.


If you are interested in C++14 then you are "looking to the future" as it were, in which case I would suggest Eigen. Why? One word: tensors. Eigen has already a quite usable tensor implementation, whereas as far as I can descry Blaze has no plans in that direction. Tensors show up everywhere in scientific computing. They are incredibly useful. For this reason (along with several others, which I can elaborate on if you are interested) I feel like Eigen will become the Numpy of C++ numerical array software in the future.


Eigen is header only and heavily templated, which certainly keeps the source down.

However, SLOC is a poor metric for the quality of a codebase, especially scientific ones. I've certainly found many instances where longer line counts are more performant, for instance with hand-unrolling loops (very rare edge case, not suggesting doing this as a rule!).

Unless you intend to become involved in development of the library, I see no reason you would care about the lines of code. Even at that point, design philosophy, features, etc. are more likely to be major factors in your choice.

FYI: Computational Scientist here. I am neither affiliated with Blaze nor Eigen.


I'd say lines of code does matter if the set of functionality is the same, i'd wager that, if you have the same functionality in less lines of code, generally it's easier to verify that it's correct + avoid daft bugs.

.. Unless it's written in a completely uncomprehensible way of course (such as some meta c++ stuff), but in languages with good metaprogramming.


Scientific software should absolutely, always be verified through regression and unit tests. Anything less is non-negotiable.

In a decade of work in hpc and computational science, I have very seldom found looking at the code to be a useful tool for either verification or debugging.

Instead, use the scientific method: hypothesis testing by constructing simple examples with known analytic solutions and using that for clues as to where the real problem lies.


Scientific software should absolutely, always be verified through regression and unit tests. Anything less is non-negotiable.

I like your world. Let's live there. :)


I agree with Arcanus; lines of code isn't a good measurement here. It's not uncommon for high performance math and science libraries to have specialized code to handle a lot of different cases the fastest way possible, or the most accurate way possible, etc. and some libraries are even able to switch between them heuristically based on the data they're being used on.

A common example is matrix multiply. For smaller matrices it's faster to use naive O(n^3) multiply because of the large constant factor with Strassen's algorithm. At some point n^3 will dominate the constant factor, and Strassen's becomes better. To get the best performance in all cases, both algorithms need to be implemented, which increases the code size.


Take a look at Armadillo as well. I did the same thing about a year ago, and looked at all of them. In the end, Armadillo made writing things very easy. Of course, I was really only focused on matrix multiplication, so you may have different requirements.


Does Blaze implement its own BLAS subroutines? Or does it wrap around an existing BLAS library?


Turns out that it does implement its own, but it also has the possibility of wrapping existing libraries:

https://bitbucket.org/blaze-lib/blaze/src/e09d62ee714745b297...

Of course, if you're seriously going to use this library, you're going to use an established BLAS library like OpenBLAS or Intel MKL.


It would be interesting to see benchmarks of sparse operations, as well.


I've had great success with Blaze, despite the fact that it has received little publicity compared to alternatives like Eigen, Armadillo, etc. Blaze is consistently the leader of the pack in benchmarks, and even outperforms Intel MKL on the Xeon E5-2660 (the CPU for which the benchmark results are shown).


For what problems? General statements like this are hard to back up, especially in the wild world of numerical linear algebra.

From my experience, there are currently no good distributed-memory open source sparse-direct solvers.

No good distributed-memory ILU implementation, either. Scalability is almost non-existant beyond 100 cores.



I've used Blaze for machine learning applications, where I've relied on the performance of elementwise operations and dense matrix multiplication on a single machine (the results advertised in the benchmark). Eigen has more functionality, but in my experience is not always optimized as well as Blaze. Neither has support for distributed computing, but I believe this is a problem that HPX is trying to address: https://github.com/STEllAR-GROUP/hpx


That's because direct solvers can't scale. If you want to solve a large (distributed over hundreds of nodes) sparse linear algebra problem as fast as possible, decades of research have been poured into efficient techniques (Krylov methods, Multigrid, preconditioners) for solving them iteratively.


Can't scale in a weak, strong, or asymptotic complexity sense? And for what sorts of problems (I assume you're thinking of 2D and 3D PDEs discretized with local basis functions)?


Yes, I'm thinking of discretizations of elliptic 2D/3D PDEs. They don't scale in the weak or strong sense, and they can't hold O(n log n) asymptotic complexity due to fill-in from Cholesky/LU-style factorizations.


In the first plot, why do all libraries slow down at the n=1000 mark? something to do with cache?


Yep. A 32KB L1 cache can hold at most 4000 doubles. 4000/(2 input vectors + 1 result) =1333.

http://danluu.com/3c-conflict/


I'm guessing that's the point at which the working set exceeds the L1 cache size. You can see a few more subtle dips in the performance graph at later points; these correspond to working set spilling out of the L2 and L3 caches.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: