Interesting that shifts are in the <1IPC set; I thought those were fairly cheap with a barrel shifter; does this simply omit one for space purposes, or are they more expensive than I expect?
Barrel shifters are huge in the context of small CPUs (especially on FPGA's). To do a barrel shift, you need (input size) * (shift size) LUTs, as you need that amount of "stages". That means 32*5=160 on RV32, as you can shift by 2^5 bits.
OP's CPU takes up around 400 LUTs. Since a 2:1 mux takes up 1 LUT (although it seems the numbers are for a LUT6-based device, which can take a 4:1 mux, so maybe that can make the amount a bit lower?), you would add 160 LUTs. That's quite a lot.
I don't think this is true - on Xilinx, you can coax a DSP48 macro into implementing a barrel shifter, but the underlying primitive is a multiplier and not a barrel shifter.
Unlike adders, a barrel shifter does not generalize well enough to be implemented as a hard block in its own right.