In retrospect, it makes sense that fastest general-purpose inflate implementations can be beaten by fine-tuning for the kind of specific non-text data found in PNG images.
That's great. Having a usable baseline is important to ship it in more than a handful of handpicked functions.
But the whole approach with fixed-length instructions seems terrible to me. It takes Intel a decade to add another batch of instructions for another width, and the existing applications don't benefit from the new instructions, even if they already process wide batches of data.
The CUDA approach is so much more appealing: here's my data, and you can process it in how many small or large units as you want.
The CUDA approach is just a software abstraction layer. The hardware of the NVIDIA GPUs is no more similar to the CUDA model than AVX-512 (the NVIDIA GPUs use 1024-bit vectors instead of 512-bit vectors).
There exists also for Intel AVX/AVX-512 a compiler that implements the CUDA approach (Intel Implicit SPMD Program Compiler). Such compilers could be written for translating any programming language into AVX-512, while using the same concurrency model as CUDA.
Moreover, as a software model the "CUDA approach" is essentially the same as the OpenMP approach, except that the NVIDIA CUDA compilers are knowledgeable about the structure of the NVIDIA GPUs, so they are able to map automatically the concurrent threads specified by the programmer into GPU hardware cores, threads and SIMD lanes.
The car isn't the problem, but it's tied to an incredibly overvalued stock, which in turn made it possible for Musk to buy massive influence and political power, and wield it irresponsible self-serving ways.
Quite the opposite. Pebble is great at sleep tracking. They manage to do a better job with Pebble's limited sensors than Apple can with all of their hardware. I have both, I use sleep tracking a lot, and I've compared them.
It's a chaotic system (turbulent flow is chaotic). Even tiniest differences between the real and simulated state will add up and amplify over time.
Fluid simulation is a notoriously hard problem. We don't have a solution to Navier-Stokes equations. Practical implementations have limited resolution in time and space, and plenty of simplifying assumptions.
If you don't count manual SIMD intrinsics or inline assembly as C, then Rust and FORTRAN can be faster than C.
This is mainly thanks to having pointer aliasing guarantees that C doesn't have. They can get autovectorization optimizations where C's semantics get in the way.
Buggy unsafe blocks can affect code anywhere (through Undefined Behavior, or breaking the API contract).
However, if you verify that the unsafe blocks are correct, and the safe API wrapping them rejects invalid inputs, then they won't be able to cause unsafety anywhere.
This does reduce how much code you need to review for memory safety issues. Once it's encapsulated in a safe API, the compiler ensures it can't be broken.
This encapsulation also prevents combinatorial explosion of complexity when multiple (unsafe) libraries interact.
I can take zlib-rs, and some multi-threaded job executor (also unsafe internally), but I don't need to specifically check how these two interact.
zlib-rs needs to ensure they use slices and lifetimes correctly, the threading library needs to ensure it uses correct lifetimes and type bounds, and then the compiler will check all interactions between these two libraries for me. That's like (M+N) complexity to deal with instead of (M*N).
Like all ICE engines, it still emits NOx pollution that cities want to get rid of.
It is even less efficient than hydrogen fuel cells. It combines energy-inefficiency of ICE with the energy-inefficiency of hydrogen generation and distribution.
Hydrogen is a worse fuel than gasoline, so these engines are more complex and deliver less power.
Such engines in busses would be more expensive to run, more expensive to maintain, and still have tail-pipe emissions.
The Rust PNG crate took the same approach: https://lib.rs/fdeflate