Yeah, replacing the hash would take a fork. Note that this implementation spends only about 20% of CPU time in SHA-3, so the gain wouldn't be massive. That proportion would probably grow after optimizing the field implementation, but almost certainly not enough to make it worth using a non-standardized, less-tested mode.
How hard would it be to support a fast Kyber 90's mode, without SHA-3? (I suppose you would have to break the abstraction for that one).