Onur Mutlu has a similar (definitely not same) idea of Processing in Memory. Basically the idea is to put some operations nearer to the data. Your idea is nearer to the register and his is in the memory controller nearer to memory.
I could see memory getting a vector FPU that takes an entire DRAM row (64Kbit these days) and does things like scalar/vector MAC. Since DRAM is so slow it could be a relatively slow FPU (10 cycles or more). The biggest issue would be standardization. How do you send it instructions and operands? How do you manage the internal state? A standard would be difficult and seems premature given the rapid changes happening even with FP data formats. Oh, looks like that paper goes into some depth on this stuff!
https://arxiv.org/pdf/2012.03112