VHDL generics and generate work nicely for 1d cases, but for 2d cases (systolic arrays), it's difficult to make the scripting really work without hard-coding a bunch of corner cases.
Another example is that defining barrel shifters is impossible to parameterize, because you need to hardcode the mux cases (see the Xilinx datasheet[1]). That's kind of insane, considering that bit-shifting is a very common and basic operation. This is particularly problematic if you're trying to describe something without having to resort to platform-specific instantiation blocks.
It's a little frustrating that VHDL doesn't have a higher-level standard instantiation library, because you're chained to a platform the moment you start doing anything other than basic flip-flops.
Well, I suppose you could say assembly is an analog of writing hdl that is very structural.
I haven't done it myself, but generates can be nested. You'd have to check to see if your tools support it or not though.
With the xilinx example, I'm not sure what you mean. Is it choosing to do multi-level muxing vs a naive muxing solution? I'd start by just writing a simple behavioral version, and only if that didn't meet performance constraints would I bother doing anything structural about it.
It's late and maybe I'm just not thinking it through. I'll take a stab at some of this and maybe it'll be clearer to me.