I don't think a lot of people understand that some of these features need more than a clear chicken bit. Some chips can be incompatible with features due to physical reasons.
Other of these features make little sense for some chips and some markets. For example, some of the VTd features. The majority of personal users aren't scheduling more than one or two VMs max. Do we really need VPID and larger TLBs for such workloads? No. So why include the feature?
If binary size becomes an issue then we need a better solution. Maybe its delaying some or all optimization for install time. Maybe its providing individual binaries for different targets.
Some chips can be incompatible with features due to physical reasons
I think this is good point. So far, I think the AVX/AVX2 distinction is based only on release date --- the newer chips support the newer instruction set. This is a good sort of improvement. And it's probably only practical to support AVX and AVX2 if you have 256-bit registers to work with.
Does the lower end mobile Pentium branded line even have full physical support for 256-bit vectors? That is, are they supporting 128-bit SSE instructions in a full 256-bit register but not providing any instructions that use the upper lane?
There is a strong advantage to a homogenous ISA. I am glad to see that newer chips have more full support of these instructions. Maybe they are still emulating them with micro-ops without real acceleration still convenient.
Regarding the mobile lineup and AVx2, honestly I wouldn't know. It is very much possible that they disabled it to avoid competing with higher lineups. But a more likely (IMO) reason is that to add 256 bit support added enough costs that the were either unacceptable or simply aren't justified by target workloads when evaluated in simulations.
The physical reasons I was alluding to revolve around power, area, and capacitance (delay). These inform the cost ($$$) and performance of chips. Some of these big features (like AVX3) push the limits of what you can do without having to use high performance transistors or deal with intense power draws.
Another physical reason is that sometime parts suffer effects during manufacturing. Modern design of chips is inherently modular, so sometimes the only option is to disable certain parts of the chip.
VT-d is branded as a virtualization feature, but it's also a killer security feature. It should be treated as a hard requirement for any system that has DMA-capable external interfaces like Firewire or Thunderbolt. It can be used to protect the system from exploits that run on the GPU or any of the other processors that are included in peripheral devices. But nobody's going to completely redesign the driver model of their OS if only 3% of their users can reap the benefits.
I actually never thought of it from that perspective. Can you expand on this? Is the primary threat DOS or malicious DMA ops from external interfaces and devices and the primary advantage mitigation by rerouting I/O from that device to a "null" VM?
Other of these features make little sense for some chips and some markets. For example, some of the VTd features. The majority of personal users aren't scheduling more than one or two VMs max. Do we really need VPID and larger TLBs for such workloads? No. So why include the feature?
If binary size becomes an issue then we need a better solution. Maybe its delaying some or all optimization for install time. Maybe its providing individual binaries for different targets.