I was kinda stunned when I found out how much my computer can actually do. I've been playing with Halide[0] and I wrote a simple bilinear demosaic implementation in it and when I started I could process ~80 Megapixels/s.
After optimising the scheduling a bit (which thanks to Halide is only 6 lines of code), I got that up to 640MP/s.
When I scheduled it for my Iris 6100 (integrated) GPU through Metal (replace the 6 lines of CPU schedule with 6 lines of GPU schedule), I got that up to ~800MP/s.
Compare this to naïvely written C and the difference is massive.
I think it's amazing that my laptop can process nearly a gigapixel worth of data in under a second. Meanwhile it takes ~7s to load and render The Verge.
After optimising the scheduling a bit (which thanks to Halide is only 6 lines of code), I got that up to 640MP/s.
When I scheduled it for my Iris 6100 (integrated) GPU through Metal (replace the 6 lines of CPU schedule with 6 lines of GPU schedule), I got that up to ~800MP/s.
Compare this to naïvely written C and the difference is massive.
I think it's amazing that my laptop can process nearly a gigapixel worth of data in under a second. Meanwhile it takes ~7s to load and render The Verge.
[0]: http://halide-lang.org/