The CTR holds minimum line length values for:
- the instruction caches
...this value is the most efficient address stride to use to apply to a sequence of address-based maintenance operations to a range of addresses...
The documentation for the CTR_EL0 reg talks about "caches under this processors control" which you could argue don't include other cores, but if you allow migration between cores, then line size changes underneath running software break the above "most efficient address stride" assumption. So you can't do that.
It boils down to this: If you can't assume that cache line size doesn't change underneath you, then you can't invalidate line by line at all, and would have to go word by word. That's terrible for performance (and a huge waste), which is why the spec says to use the above value for those operations.
I appreciate you digging up the relevant text from the manual, but I don't think you should accuse Samsung of wrongdoing based on such far fetched assumptions.
That document does not explicitly forbids this. In addition, take a look at their sample code which indeed reads some cache configuration registers during each call. The same code is found verbatim in the linux kernel, if you now don't trust the ARM engineers :)
I don't think considering how to actually implement the basic operation without a terrible performance penalty is a "far fetched assumption".
> That document does not explicitly forbids this.
Yet it makes it clear software can be written to assume it doesn't happen, which is the same thing.
> In addition, take a look at their sample code which indeed reads some cache configuration registers during each call.
This doesn't prevent the problem at all, it just reduces the window for things to go wrong.
> The same code is found verbatim in the linux kernel, if you now don't trust the ARM engineers :)
I trust the ARM engineers. As I already said, their code is right, Samsung got it wrong. Note that the libgcc code that breaks was ALSO written by ARM.
The CTR holds minimum line length values for: - the instruction caches
...this value is the most efficient address stride to use to apply to a sequence of address-based maintenance operations to a range of addresses...
The documentation for the CTR_EL0 reg talks about "caches under this processors control" which you could argue don't include other cores, but if you allow migration between cores, then line size changes underneath running software break the above "most efficient address stride" assumption. So you can't do that.
It boils down to this: If you can't assume that cache line size doesn't change underneath you, then you can't invalidate line by line at all, and would have to go word by word. That's terrible for performance (and a huge waste), which is why the spec says to use the above value for those operations.