Hacker News new | past | comments | ask | show | jobs | submit login

Isn't there a sort of obvious "best of both worlds" by having a vector instruction ISA with a vector length register, plus the promise that a length >= 4 is always supported and good support for intra-group-of-4 shuffles?

Then you can do whatever you did for games and multimedia in the past, except that you can process N samples/pixels/vectors/whatever at once, where N = vector length / 4, and your code can automatically make use of chips that allow longer vectors without requiring a recompile.

Mind you, I don't know if that's the direction that the RISC-V people are taking. But it seems like a pretty obvious thing to do.




In my opinion "the best" would be to support only one, fixed, largest vector size and emulate legacy instructions for smaller sizes using microcode (this is slow, but requires minimum transistor count). There is no need to optimize for legacy software; instead it should be recompiled and compiler should generate versions of code for all existing generations of CPUs.

This way the CPU design can be simplified and all the complexity moved into compilers.


The entire Wintel monopoly has been built on people NOT recompiling old software. They still want it to run as fast as currently possible.


That would be awful for power consumption and only works well if everyone compiles everything from source. Otherwise every binary is awful performance for almost everyone.


> everyone compiles everything from source

One thing I learned from the M1 transition is that people will do it if someone tells them to. I bought a Mac this weekend to do exactly that; lots of users complaining about lack of M1 support. Time to add it (in a way that I can test). I have no choice.


M1 has a very good x86 emulator for the moment. The users probably underestimate how good it is.

But GPU programs get compiled from source at runtime all the time, and sort of are all about vectors. (M1’s GPU doesn’t actually have vectors.)


Intel tried that (Itanium, with a variation of VLIW called EPIC). Didn't go so well. It was more like Epic Fail.

They poured a lot of money into the compiler. It didn't work out.

Again and again, complexity is proven cancerous. RISC is the way to go.


Itanium tried to move branch prediction to the compiler from CPU. Plus it’s failure was not necessary related to CPU design, it can well be a bad business execution.

The grandparent comment is about a compiler generating multiple code copies for different CPU architecture iterations not to support all legacy instructions or at least to allow to implement those via microcode emulation in later CPUs.


and all the complexity moved into compilers.

Do not want. Especially seeing how hostile to programmers they've become.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: