we used to use Fixed point multiplications (Q Format) in DSP algorithms on different DSP architectures. https://en.wikipedia.org/wiki/Q_(number_format). They used to be so fast and near accurate to floating point multiplications. Probably we need to use those DSPs blocks as part of Tensors/GPUs to realise both fast multiplications & parallelisms.