Hacker News new | past | comments | ask | show | jobs | submit | more pornel's comments login

In retrospect, it makes sense that fastest general-purpose inflate implementations can be beaten by fine-tuning for the kind of specific non-text data found in PNG images.

The Rust PNG crate took the same approach: https://lib.rs/fdeflate


They just need to rewrite the rest of Chrome to use the native Rust<>Rust interface.

(in reality Google is investing a lot of effort into automating the FFI layer to make it safer and less tedious)


That's great. Having a usable baseline is important to ship it in more than a handful of handpicked functions.

But the whole approach with fixed-length instructions seems terrible to me. It takes Intel a decade to add another batch of instructions for another width, and the existing applications don't benefit from the new instructions, even if they already process wide batches of data.

The CUDA approach is so much more appealing: here's my data, and you can process it in how many small or large units as you want.


The CUDA approach is just a software abstraction layer. The hardware of the NVIDIA GPUs is no more similar to the CUDA model than AVX-512 (the NVIDIA GPUs use 1024-bit vectors instead of 512-bit vectors).

There exists also for Intel AVX/AVX-512 a compiler that implements the CUDA approach (Intel Implicit SPMD Program Compiler). Such compilers could be written for translating any programming language into AVX-512, while using the same concurrency model as CUDA.

Moreover, as a software model the "CUDA approach" is essentially the same as the OpenMP approach, except that the NVIDIA CUDA compilers are knowledgeable about the structure of the NVIDIA GPUs, so they are able to map automatically the concurrent threads specified by the programmer into GPU hardware cores, threads and SIMD lanes.


In addition to ISPC, it is possible to do this kind of vector-length abstraction at the library level, e.g. in our Highway library.

We routinely write code that works on 128-512 bit vectors. Some use cases are harder than others, e.g. transposing.


The car isn't the problem, but it's tied to an incredibly overvalued stock, which in turn made it possible for Musk to buy massive influence and political power, and wield it irresponsible self-serving ways.


Quite the opposite. Pebble is great at sleep tracking. They manage to do a better job with Pebble's limited sensors than Apple can with all of their hardware. I have both, I use sleep tracking a lot, and I've compared them.


Pebble is still way better at sleep (and nap) tracking than Apple Watch.

I have a tendency to stay up late and get up at random times, so I need to track if I get enough sleep.


The fluid can't be "easily" simulated.

It's a chaotic system (turbulent flow is chaotic). Even tiniest differences between the real and simulated state will add up and amplify over time.

Fluid simulation is a notoriously hard problem. We don't have a solution to Navier-Stokes equations. Practical implementations have limited resolution in time and space, and plenty of simplifying assumptions.


If you don't count manual SIMD intrinsics or inline assembly as C, then Rust and FORTRAN can be faster than C. This is mainly thanks to having pointer aliasing guarantees that C doesn't have. They can get autovectorization optimizations where C's semantics get in the way.


Buggy unsafe blocks can affect code anywhere (through Undefined Behavior, or breaking the API contract).

However, if you verify that the unsafe blocks are correct, and the safe API wrapping them rejects invalid inputs, then they won't be able to cause unsafety anywhere.

This does reduce how much code you need to review for memory safety issues. Once it's encapsulated in a safe API, the compiler ensures it can't be broken.

This encapsulation also prevents combinatorial explosion of complexity when multiple (unsafe) libraries interact.

I can take zlib-rs, and some multi-threaded job executor (also unsafe internally), but I don't need to specifically check how these two interact. zlib-rs needs to ensure they use slices and lifetimes correctly, the threading library needs to ensure it uses correct lifetimes and type bounds, and then the compiler will check all interactions between these two libraries for me. That's like (M+N) complexity to deal with instead of (M*N).


This is a dead-end.

Like all ICE engines, it still emits NOx pollution that cities want to get rid of.

It is even less efficient than hydrogen fuel cells. It combines energy-inefficiency of ICE with the energy-inefficiency of hydrogen generation and distribution.

Hydrogen is a worse fuel than gasoline, so these engines are more complex and deliver less power.

Such engines in busses would be more expensive to run, more expensive to maintain, and still have tail-pipe emissions.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: