It'd be quite a boring world if observations of the past could reliably predict ...

taeric · on Feb 11, 2022

In this space, it is more they you don't want to lose all of the progress made in these older libraries. Coming up with a new ecosystem is an immediate race to parity with the older. And a lot of smart people were involved with that. (Such that you aren't competing with a single idea or implementation, but an ecosystem of them.)

Hats off for making a good shot. But don't be surprised to see reluctance to move.

sarch · on Feb 12, 2022

There's very promising work on upgrading older libraries/legacy code with a technique called verified lifting. The technique has been used successfully at Adobe to automatically lift image processing code written in C++ to use Halide. The technique also guarantees semantic equivalence so users can trust the lifted code.

Paper: https://dl.acm.org/doi/pdf/10.1145/3355089.3356549

derefr · on Feb 12, 2022

Before reading the paper: is "verified lifting" kind of like "deterministic decompilation" — where you ensure, with every modification to the generated HLL source, that it continues to compile back into the original LL object code?

(See e.g. the Mario 64 decompilation — https://github.com/n64decomp/sm64 — , whose reverse-engineering process at all times kept a source tree that could build the original byte-identical ROMs of the games, despite being increasingly [manually] rewritten in an HLL.)

brrrrrm · on Feb 12, 2022

To clarify, I’m not in any way involved with the paper described in the article. Just interested in the space :)

AnimalMuppet · on Feb 11, 2022

I'm not an expert. But with current CPUs, doesn't HPC critically depend on memory alignment? (For two reasons: Avoiding cache misses, and enabling SIMD.) Sure, algorithms matter, probably more than memory alignment. But when you've got the algorithm perfect and you still need more speed, you're probably going to need to be able to tune memory alignment.

C lets you control memory alignment, in a way that very few other languages do. Until the competitors in HPC catch up to that, C is going to continue to outrun those other languages.

(Now, for doing the tweaking to get the most out of the algorithm, do I want to do that in C? No. I want to do it in something that makes me care about fewer details while I experiment.)

mhh__ · on Feb 12, 2022

Nothing C does is even remotely special anymore. If anything it's total crap by modern standards because it makes performant abstractions harder. C code uses a lot of linked lists because linked lists are easy.

C isn't the Lingua Franca of fast software anymore. The reason why C programs can sometimes be faster, today, however is not an intrinsic property of the language but rather its inability to allow incompetent programmers to hide their bad data structures.

C also encourages programming styles that are harder to optimize so some loop optimizations are no longer possible.

jstimpfle · on Feb 12, 2022

> C code uses a lot of linked lists because linked lists are easy.

Strong disagree. I don't see any reason why a competent programmer would hand-code a linked list, which is more of a hassle to make than a simple array (e.g. using buffer pointer + capacity and/or length field), unless the linked list makes actual sense for performance.

mhh__ · on Feb 12, 2022

I'm not saying you're duty bound to do it but rather giving an example of something which in my experience is encouraged by C since it's the path of least friction.

Try writing a type that can automatically switch layout from AOS to SOA in C, you just can't do it.

jstimpfle · on Feb 12, 2022

You can't hide AOS vs SOA in C and in most other languages, but that has nothing to do at all with linked lists.

(you could use a macro to encapsulate the layout though, or in C++ could code a function that returns a reference)

In what situations are linked lists the path of least friction??

mhh__ · on Feb 12, 2022

I picked linked lists as a real example of something I've seen people do. It doesn't really make any difference my point.

C is a bad programming language in 2022

jstimpfle · on Feb 12, 2022

Well picking an invalid example (C encourages the use of linked lists in situations where they aren't a good idea) and a very questionable example (automatic conversions between different SAO/AOS representations are not mostly at an experimental stage but are an important feature for performance programming in 2022) is not a good way to support a claim that isn't empirically evident.

mhh__ · on Feb 12, 2022

It's empirically evident in that no-one is picking C for new projects anymore.

Anything fast is in C++ or C++ like languages.

uluyol · on Feb 12, 2022

Out of curiosity, what are some languages that can automatically switch layout between struct-of-arrays and array-of-structs?

mhh__ · on Feb 12, 2022

The only language I know for sure to do it for you (as in you don't have to write the type) was Jai a while back (I'm told Blow removed that feature).

The only language I've actually done it in, is D. It's probably doable in many other nu-C languages these days, but D at very least can make it basically seamless as long as you do some try-and-break-shit testing to make sure nothing is relying on saving pointers when they shouldn't. This obviously constrains the definition of automatic ;)

I don't have my implementation to hand because it grew out of patch that failed due to aforementioned pointer-saving in code that I'm not paid enough to refactor (), but here's one someone else made https://github.com/nordlow/phobos-next/blob/master/src/nxt/s... there's another one in that repository too. I've never used those particular implementations but they're both by people I know so hopefully they're not too bad.

A more subtle thing, which I haven't used in anger, but would like to try at some point is to use programmer annotations (probably in the form of user defined attributes) to try and group things so things are stored such that temporal locality <=> spacial locality, but I've never bothered to actually do it.

There are some arrays of structs in an old bit of the D compiler that are roughly the size of a cacheline, and aren't accessed particularly uniformly. I profiled this and found that something like 75% of all LLC misses (hitting DRAM) were due to 2 particularly miserable lines... inside an O(n^2) algorithm.

wahern · on Feb 12, 2022

ISPC has some support for this using the "soa<>" qualifier. For example,

  struct Point { float x, y, z; };
  soa<8> Point pts[...];
  for (int i = 0; i < 8; ++i) {
    pts[i].x + pts[i].y
  }

behaves as if originally written like

  struct Point { float x[8], y[8], z[8]; };
  Point pts;
  for (int i = 0; i < 8; ++i) {
    pts.x[i] + pts.y[i];
  }

See https://ispc.github.io/ispc.html#structure-of-array-types

I've not yet had an excuse to use ISPC myself, unfortunately :(

dangets · on Feb 12, 2022

Zig has a compile time annotation that can do it.

https://zig.news/kristoff/struct-of-arrays-soa-in-zig-easy-i...

tjalfi · on Feb 13, 2022

Jovial 73 has the PAR qualifier which converts an AoS to a SoA.

rixed · on Feb 12, 2022

> C isn't the Lingua Franca of fast software anymore.

What is it, then?

> The reason why C programs can sometimes be faster, today, however is not an intrinsic property of the language but rather its inability to allow incompetent programmers to hide their bad data structures.

I'm not sure I manage to parse that properly. What's wrong about not being able to hide bad data structures? And how is this making C faster?

pjmlp · on Feb 12, 2022

C++ naturally.

urthor · on Feb 12, 2022

Increasingly less so over time I presume. Sneering at C++ is rapidly overtaking insulting Javascript as the world's conversational standby.

pjmlp · on Feb 12, 2022

“There are only two kinds of languages: the ones people complain about and the ones nobody uses.”

otabdeveloper4 · on Feb 12, 2022

Sour grapes.

pjmlp · on Feb 12, 2022

It never was.

brrrrrm · on Feb 11, 2022

You're correct that low level details very much matter in the HPC space. The types of optimizations described in this paper are exactly that! To elucidate, check out Halide's cool visualizations/documentation (which this paper compares itself to) https://halide-lang.org/tutorials/tutorial_lesson_05_schedul...

KolenCh · on Feb 14, 2022

Memory alignment is important and languages let you control that and/or make them align automatically given sensible assumptions does have an edge here.

Just a random example, I wrote a two lines Python function using Numba jit, because I know in my mind that some Python behavior is going to make it less performant and it is a high performance kernel so we need that fast (and also lower memory footprint.) Compiling using Numba jit is no brainer because I just add one more line there with like 10 seconds effort.

But the code base I am merging to has a policy against Numba (reasonably as we are targeting HPC platform where Numba has some performance problem related to oversubscribing if not setting up carefully.)

So I end up rewrote that in C++ and wrap it with pybind11. And the result is that it is faster by around 30%.

Since the algorithm is entirely trivial, the only explanation I have is exactly memory alignment, where I can control that in C++, but in Numba jit there's no way (both to guarantee the array allocated is aligned, or tell the compiler to assume that.)

(The 30% number is also reasonable in textbook examples.)

P.S. But it has a cost:

> Showing 14 changed files with 230 additions and 36 deletions. > from https://github.com/hpc4cmb/toast/commit/a38d1d6dbcc97001a1ad...

where the 36 deletions are mostly just documentations. So it's an ~200 lines effort comparing to a 4 line effort...

pjmlp · on Feb 12, 2022

Most compiled languages allow to control alignment even before C was an idea.

Avshalom · on Feb 12, 2022

That's pretty literally the framework of science.