Hacker News new | past | comments | ask | show | jobs | submit login

Interesting, never heard about them asking about ISA changes. Where was that done?

There are some lovely new instructions in Power10 but until they get the firmware source out fully I won't use them in the Firefox JIT.




It was through the IRC channel. I sent a direct message as response and exchanged a couple of sentences, then they gave me their IBM email address to submit the detailed proposal. My understanding was that it would have to be justified and for that I would have to show that it could be more efficient than an implementation based on the existing SIMD instructions which I’m not familiar with. I suspect that the kind of instructions that could actually be put there might not perform better than that.

I regret not sending at least an amateurish proposal, I still think it would be a good idea to have no-cost UTF-8 en-/decoding. Not just for text, but for general variable-length encoding of other kinds of data.


Very interesting. Which channel specifically?

VSX is really where the new development is happening, but it's become quite complete. The PC-direct instructions first made available in P9 also really closed a gap (beforehand you had to do bl with a weird flag to get PC in LR without trashing the history table).


The IRC channel was #talos-workstation on FreeNet, now on Libera.Chat.

Do you know of some minimal example of calling VSX instructions from C?

Ideally just one .c file and one Makefile or README with the exact GCC command to compile it, plus a pointer to documentation describing each instruction. I’ve seen assembler inserted into C source code with GCC, but I’ve never done it and I assume there are some non-obvious details to take into account.


Sorry, didn't see this until now (out all day). Here is a very stupid example that uses `xxbrd` to byteswap a 64-bit quantity.

  #include <errno.h>
  #include <stdio.h>
  #include <stdint.h>
  #include <stdlib.h>
  
  int main(int argc, char **argv) 
        uint64_t v = 0;
        double o;
  
        if (argc != 2) { 
                fprintf(stderr, "usage: %s quantity\n", argv[0]);
                return 1;
        }       
  
        v = strtoull(argv[1], NULL, 0);
        if (errno == EINVAL || errno == ERANGE) {
                perror("strtoull");
                return 1;
        }       
  
        __asm__(
                "xxbrd %0, %1\n"
                :"=f"(o)
                :"f"(*(double *)&v)
        );      
        fprintf(stderr, "0x%lx\n", *(uint64_t *)&o);
        return 0;
  }
  
  % gcc -o xxbrd xxbrd.c
  % ./xxbrd 0x123456789abcdef
  0xefcdab8967452301


I don't know what you want of the VSX, but if you want vectorized code, what do you expect to gain over letting the compiler do it on your C? If you want an example, there's the kernels in OpnBLAS and FFTW (and BLIS, but that seems to be broken on POWER9).

There's an IBM web page somewhere with three(?) alternatives for using VSX, one of which is just using SSE intrinsics -- I don't know how well that works -- and another is a library that's now in Fedora, whose name I forget.

That said, it's obviously not competitive with AVX2 or, presumably SVE, unless you can win on parallelization (or plain clock speed, which you probably can't).


What I’d like to do is a quick proof-of-concept to see whether whatever instructions are available in my CPU can be leveraged for UTF-8 en-/decoding.

For instance, does it work any better than my C implementation? https://github.com/Sentido-Labs/cedro/blob/master/src/cedro....

Maybe the compiler already compiles that to an optimal SIMD version, I don’t know. That’s what I would like to find out. And if the VSX instructions are not a good fit for this task, which instructions would be needed? Can I come up with a combination of logic gates that does that? Maybe not, there might be no way of implementing any significant part of the algorithm without branches or look up tables.

The thing is that I need to start somewhere, and for that classichasclass’ example is exactly what I need.


Just keep in mind that the FPRs and vector registers are now aliased together (in VMX-only CPUs this wasn't necessarily the case). What is particularly stupid about my example is that it may have to spill to memory to move the uint64_t (a GPR) into the VSX register (an FPR) and then move it back because PowerPC famously had no direct GPR-FPR moves for quite a while. Since I didn't specify -mcpu=power8 (or higher), gcc doesn't issue the new instructions and I'm not sure it would know how to.

A better way would be to explicitly use the newer mtvsrd (mtfprd) and mfvsrd (mffprd) instructions and avoid the spill. So here's a revision 2.

  #include <errno.h>
  #include <stdio.h>
  #include <stdint.h>
  #include <stdlib.h>
 
  int main(int argc, char **argv) {
        uint64_t v = 0;
 
        if (argc != 2) {
                fprintf(stderr, "usage: %s quantity\n", argv[0]);
                return 1;
        }
 
        v = strtoull(argv[1], NULL, 0);
        if (errno == EINVAL || errno == ERANGE) {
                perror("strtoull");
                return 1;
        }
 
        __asm__(
                "mtfprd %1, %0\n"
                "xxbrd %1, %1\n"
                "mffprd %1, %0\n"
                :"=r"(v)
                :"r"(v)
        );
        fprintf(stdout, "0x%lx\n", v);
        return 0;
  }
If v is already in a register, then it can just stay there.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: