> Are those Intel vector register sizes going to increase until they catch up to...

> Are those Intel vector register sizes going to increase until they catch up to the old Cray? Or was going up from 256 to 512 bit chosen to fit something else in the CPU architecture, like that you can fill the register in so many clock cycles?

The opposite. GPUs seem to be converging onto 1024-bits wide (32x 32-bits)

GPUs used to be 64x 32-bits wide (2048-bits), but both AMD and NVidia seem to have settled on 32x 32-bits wide (1024-bits).

It seems that at the point of ~1024-bit wide, its more appropriate to parallelize your processors instead of increasing vector size. Ex: Instead of having 32x (Compute Units) 64x (Threads per CU) 32-bit, you should have 64x (Compute Units) 32x (Threads per CU) 32-bit compute units.

The smaller size (32x instead of 64x) makes thread-divergence easier to handle.

--------

AMD Vega64 was logically 64x wide (2048-bits), although it was physically a 16x wide processor (the 16x cores per vALU would repeat themselves for 4 clock cycles. Logically 64x cores, but physically only 16x cores).

By switching to NAVI 32x wide instead, efficiency went up but overall TFlops went down. The AMD 5700 XT is 40x 2x32x 32-bit in organization (40x compute units, 2x 32x SIMDs per compute unit, x32 bits each). Total of 2560 cores.

Vega64 was 64x 4x16x 32-bit in organization, for a total of 4096 SIMD cores.

Vega64 and 5700xt are roughly the same speed in practice, despite the 5700xt having only 62% of the cores and fewer TFlops than the Vega64. I guess the CryptoCoin miners prefered the ol Vega64, but in practice, its easier to write efficient programs for a narrower SIMD unit.