I'm not saying this to be rude, but I think you have a deep misunderstanding of how AI training works. You cannot just skip the matrix multiplications necessary to train the model, or get current hardware to do it faster.
No offence taken! As far as my (shallow!) understanding goes, the main challenge is the need for many GPUs with huge amounts of memory, and it still takes ages to train the model. So regarding the use of consumer GPUs, some work has been done already, and I've seen some setups where people combine of these and are successful. As for the the other aspects, maybe at some point we distill what is really needed to a smaller but excellent dataset that would give similar results in the final models.