Your CPU gets maybe 700-800 gflops depending on your all-core frequency (fp32 because you don't have Sapphire Rapids.) The T4 benchmarked would be crunching what it can at ~65 tflops (fp16 tensor.) Newer GPUs hit 300 tflops (4090) or even nearly 2 petaflops (H100).
To give you an idea of the order of magnitude of compute difference. Sapphire Rapids has AMX and fp16 AVX512 to close the gap a little, but it's still massive.
To give you an idea of the order of magnitude of compute difference. Sapphire Rapids has AMX and fp16 AVX512 to close the gap a little, but it's still massive.