While I did not succeed in making the matmul code from https://github.com/Mozill...

jart · on April 7, 2024

Here's a complete working example for POSIX systems on how to reproduce my llamafile tinyBLAS vs. MKL benchmarks: https://gist.github.com/jart/640231a627dfbd02fb03e23e8b01e59... This new generalized kernel does even better than what's described in the blog post. It works well on oddly shaped matrices. It needs however a good malloc function, which I've included in the gist. Since having the good memory allocator is what makes the simple implementation possible.