This is meant to run on GPUs with 16GB RAM. Most M1/M2 users have at least 32GB (unified memory), and you can configure a MBP or Mac Studio with up to 96/128GB.
The Mac Pro is still Intel, but it can be configured with up to 1.5TB of RAM, you can imagine the M* replacement will have equally gigantic options when it comes out.
If you look closely there's 16GB of GPU memory and over 200GB of CPU memory. So none of the currently available M* have the same kind of capacity. Let's hope this changes in the future!
Apple silicon has unified memory, the GPU has access to the entire 32/64/96/128GB of RAM. It's part of the appeal.
I would really like to see how stuff performs on a Mac Studio with 128GB memory, 8TB SSD (at 6GB/s), not to mention the extra 32 "neural engine" cores. It seems the performance of these machines has been barely explored so far.
I think that here the main bottleneck is data movement. If you are streaming weight data from a 6GB/s SSD you'll get under 10% of the performance shown for 3090 (which will be moving data at PCIe 4 speeds of 64GB/s).
Once in unified memory the weights are accessible at about half the rate they are on the 3090 (400GB/sec on M2 Max vs 936GB/sec on 3090).
The Mac Pro is still Intel, but it can be configured with up to 1.5TB of RAM, you can imagine the M* replacement will have equally gigantic options when it comes out.