One fun optimization trick was the 0x66 prefix: Switching from 16bit to 32bit mode also switched the trade-off in opcode sizes. So in that intro most of the audio code runs in 16bit mode, while the graphics (which is actually not palette but full 32bit color) runs in 32bit mode.
We did it for 4k intros back in the day: https://www.pouet.net/prod.php?which=289.
One fun optimization trick was the 0x66 prefix: Switching from 16bit to 32bit mode also switched the trade-off in opcode sizes. So in that intro most of the audio code runs in 16bit mode, while the graphics (which is actually not palette but full 32bit color) runs in 32bit mode.