> And even then, I can see where AMD was going. The main point of SMT is to shar...

monocasa · on Jan 23, 2023

> Sure, but wouldn't it be ideal that if a thread wasn't using its integer unit and the other thread had code that could run on it, you'd allow the other thread to run?

> "CMT" is literally just "SMT with dedicated resources" and that's a suboptimal choice because it impairs per-thread performance in situations where there's not anything to run on that unit. Sharing is better.

> If the scheduler is insufficiently fair, that's a problem that can be solved. Guarantee that if there is enough work, that each thread gets one of the integer units, or guarantee a maximum latency of execution. But preventing a thread from using an integer unit that's available is just wasted cycles, and that's what CMT does.

Essentially, no, what you're suggesting is a really poor choice for the gate count and numbers of execution units in a Jaguar. The most expensive parts are the ROBs and their associated bypass networks between the execution units. Doubling that combinatorial complexity would probably lead to a much larger, hotter single core that wouldn't clock nearly as fast (or have so many pipeline stages that branches are way more expensive (aka the netburst model)).

> And you can note the "such as" in the summary, even. That is an expansive term, meaning "including but not limited to".

Well, except that I argue it doesn't include those at all; shared L2 is extremely common, and shared FPU is common enough that people don't really bat an eye at it.

> If you feel that was not addressed in the lawsuit and it was incorrectly settled... please cite.

I'm going off your own citation. If you feel that after that these were brought up in the court case itself you're more than welcome to cite another example (ideally not a literal tabloid, but keeping the standards of the court documents you cited before).

> With AMD, "cores" that have to alternate their datapath on every other cycle are pretty damn bottlenecked and that's not what consumers generally think of as "independent cores".

That's not how these work. OoO Cores are rarely cranking away their frontends at full tilt, instead they tend to work in batches filling up a ROB with work that will then be executed as memory dependencies are resolved. The modern solution to taking advantage of that is to aggressively downclock the front end when not being used to save power, but I can see the idea of instead keeping it clocked with the rest of the logic and simply sharing it between two backends as a valid option.