I wonder why they didn't test against the broadly available Strix Halo with 128GB of 256 GB/s memory bandwidth, 16 core full-fat Zen5 with AVX512 at $2k... it is a mystery...
Strix Halo has the problem that prefill is incredibly slow if your context is not very small.
The only thing that might be interesting about this DGX Spark is it's prefill manages to be faster due to better compute. I haven't compared the numbers yet, but they are included in the article.
Something is wrong with your numbers: gpt-oss-20b and gpt-oss-120b should be much much faster than what you are seeing. I would suggest you familiarize yourself with llama-bench instead of ollama.
Running gpt-oss-120b with a rtx 5090 and 2/3 of the experts offloaded to system RAM (less than half of the memory bandwidth of this thing), my machine gets ~4100tps prefill and ~40tps decode.
Your spreadsheet shows the spark getting ~94tps prefill and ~11tps decode.
Now, it's expected that my machine should slaughter this thing in prefill, but decode should be very similar or the spark a touch faster.
Your system RAM is probably 1/20th the VRAM bandwidth of the 5090 (way way less than half) unless you're running a workstation board with quad or 8 channel RAM, then it's only about 1/10th or 1/5th respectively.
We actually profiled one of the models, and saw that the last GeMM, which is completely memory bound, is taking a lot of time, which reduces the token speed by a lot.