"Made for inference" just means "too slow for training" if you are pessimistic or "optimized for power efficiency" if you are optimistic.
Otherwise training and inference are basically the same
Training and inference are only similar at a high level, not in actual application.
(ETA: In case it's not obvious, I'm agreeing with david-gpu's comment, and adding more reasons that training currently differs from inference.)