An NVIDIA RTX 4090 generates 73 TFLOPS. This iPad gives you nearly half that. The memory bandwidth of 120 GBps is roughly 1/10th of the NVIDIA hardware, but who’s counting!
The 4090 costs ~$1800 and doesn't have dual OLED screens, doesn't have a battery, doesn't weigh less than a pound, and doesn't actually do anything unless it is plugged into a larger motherboard, either.
When you line it up like that it's kinda surprising the 4090 is just $1800. They could sell it for $5,000 a pop and it would still be better value than the highest end Apple Silicon.
Comparing these directly like this is problematic.
The 4090 is highly specialized and not usable for general purpose computing.
Whether or not it's a better value than Apple Silicon will highly depend on what you intend to do with it. Especially if your goal is to have a device you can put in your backpack.
I'm not the one making the comparison, I'm just providing the compute numbers to the people who did. Decide for yourself what that means, the only conclusion I made on was compute-per-dollar.
And Nvidia annihilates those scores with CUBlas. I'm going to play nice and post the OpenCL scores since both sides get a fair opportunity to optimize for it.
Yeah where are the bfloat16 numbers for the neural engine? For AMD you can at least divide by four to get the real number. 16 TOPS -> 4 tflops within a mobile power envelope is pretty good for assisting CPU only inference on device. Not so good if you want to run an inference server but that wasn't the goal in the first place.
What irritates me the most though is people comparing a mobile accelerator with an extreme high end desktop GPU. Some models only run on a dual GPU stack of those. Smaller GPUs are not worth the money. NPUs are primarily eating the lunch of low end GPUs.