> With just 32 bits of state, with AVX512 we can be running 512 copies of this in parallel if we like, per core! AKA, compute will not be the bottleneck.
How can the state machine be run in parallel, when the next state always has a dependency on the previous state?
Also, how exactly would the state register be decoded? After you XOR it with 4 bytes of input, it could be practically any of the 4.7 billion possible values, in the case of an unexpected place name.
And even for expected place names longer than 4 bytes, wouldn't they need several states each, to be properly distinguished from other names with a common prefix?
How can the state machine be run in parallel, when the next state always has a dependency on the previous state?
Also, how exactly would the state register be decoded? After you XOR it with 4 bytes of input, it could be practically any of the 4.7 billion possible values, in the case of an unexpected place name.
And even for expected place names longer than 4 bytes, wouldn't they need several states each, to be properly distinguished from other names with a common prefix?