While I agree that the RISC-V V extension is questionable, it does not necessarily make decoding into uop more complex.
When the vector register is allocated there is a known vector register size, and we have then 3 ways to go:
1) Producing several uops for each cut in the vector register;
2) Producing a single uop with a size field, the vector unit know how to split the operation;
2) Producing a single uop size-agnostic, the vector unit know the size from the vector register metadata.
The worst is probably 1), it will stress the ROB, uop queue and the scheduler, and make the decoding way more complex. But solutions 2) and 3) keep the decoding simple and are much more resource efficient.
An implementation aspect that may be complicated is the vector register allocation depending on the vector-unit microarchitecture (more specifically, depending on the vector register file microarchitecture)
1) Producing several uops for each cut in the vector register;
2) Producing a single uop with a size field, the vector unit know how to split the operation;
2) Producing a single uop size-agnostic, the vector unit know the size from the vector register metadata.
The worst is probably 1), it will stress the ROB, uop queue and the scheduler, and make the decoding way more complex. But solutions 2) and 3) keep the decoding simple and are much more resource efficient.
An implementation aspect that may be complicated is the vector register allocation depending on the vector-unit microarchitecture (more specifically, depending on the vector register file microarchitecture)