AMD Unleashes First-Ever 5 GHz Processor

icegreentea · on June 12, 2013

This is pretty important for AMD because they've been having a terrible time matching Intel at per cycle efficiency. Basically, ever since the core i series started, AMD has been behind at single threaded performance, especially when going clock for clock. They've been trying to make up for it by both offering more cores/threads than comparably priced Intel components, as well as trying to scale their clock rates. Unfortunately for AMD, their last few generations have actually been falling short on their attempts to boost clock rates, as a result, when comparing comparably priced AMD and Intel chips, while AMD would typically have a clock speed advantage, it would often not be enough to overcome Intel's efficiency.

This product release is kind of an attempt to show that AMD can actually deliver on their planned strategy.

As for why AMD is going for this route, rather than trying to beat Intel in per clock efficiency? Probably because AMD's resources are severely limited compared to Intel, and this approach offered lower risk at lower cost.

Freaky · on June 12, 2013

> As for why AMD is going for this route, rather than trying to beat Intel in per clock efficiency?

Well, because clock speeds are something they can improve now, and not in $n years time when their next major microarchitecture is ready. Intel had the exact same problem with the Pentium 4, and they were similarly stuck with minor tweaks and desperately increasing clock rates for years before Core was ready.

Lagged2Death · on June 12, 2013

I wish I had a cite for this.

It was my understanding that P-IV was originally conceived as an architecture that could be clocked up and up for years to come. Developing a new architecture is expensive and risky (see P-IV, Itanium), so the hope was to design something that would scale up so well as manufacturing improved that a few generations of architectures could be skipped, so to speak.

They had hoped the P-IV would eventually reach 10GHz or so. Which made it OK that P-IV retried fewer instructions per clock than the P-III that came before. Scaling up like that isn't such a radical idea; the P6/i686 architecture behind the Pentium Pro, Pentium II and Pentium III had spanned a spectrum from 150MHz to 1.4GHz, after all, nearly an order of magnitude.

But it turned out that somewhere between 3 and 4 GHz, things got really difficult.

"Minor tweaks and desperately increasing clock rates" was more or less the P-IV plan from the get go. It just turned out not to work.

yk · on June 12, 2013

Anandtech discussed this in their Bulldozer review:

    AMD's architects called this pursuit a low gate count 
    per pipeline stage design. By reducing the number of 
    gates per pipeline stage, you reduce the time spent in 
    each stage and can increase the overall frequency of 
    the processor. If this sounds familiar, it's because 
    Intel used similar logic in the creation of the Pentium 4.

http://www.anandtech.com/show/4955/the-bulldozer-review-amd-...

And since I read this, I really wondered what trick AMD has up its sleeves.

VladRussian2 · on June 12, 2013

there is no trick, just desperate attempts to survive.

10 years ago AMD had more efficient architecture, Intel - more GHz (remember 2.2 Athlons having 3700 "PR-rating"?). Intel's approach was commercially more successful - customers were still buying GHz, and AMD was trying to educate market about real performance while radiating impression of a looser who just can't get good process and fabs. AMD gave up and decided to pursue P4-like approach for their new architecture, while Intel hit GHz "sound barrier" and went efficiency way by resurrecting PIII style architecture which resulted in Core CPUs. AMD made a huge, strategic mistake 10 years ago. How the execs in charge of Bulldozer have been BS-ing their way inside AMD last 5 years - that is a typical everyday miracle of a big company internal life.

tacticus · on June 12, 2013

There is also the billions intel spent making sure dell and hp and co. would never buy AMD

zanny · on June 13, 2013

I actually take notice of how many consumer laptops I see in a walmart / target / best buy are actually running AMD APUs. The big brands might not, but they see bigger numbers on the A6 than a Pentium and feel better about the buy even if the Pentium dominates it.

AnthonyMouse · on June 13, 2013

The trouble is that most computers aren't bought at Best Buy or Walmart, and the ones that are tend to be the low end garbage with no margins for the hardware vendor.

Keeping AMD out of Dell and HP is what kept them out of corporate America. Corporations literally buy PCs by the pallet, and then they pass on the volume discount to employees who want to buy one for home.

csense · on June 13, 2013

> they pass on the volume discount to employees who want to buy one for home

Really? I don't see the use case for ordering a PC this way for the home. When I'm buying a home PC, or recommending one for others, it's either:

(a) A bottom-of-the-barrel PC. As long as it has 1GB of memory and more than one core, you can use it for web browsing, Youtube, email, and word processing. This is what non-techies usually want (but they don't know they want it and may get upsold by good marketing). This is what I want unless I'm planning on running a specific application that requires more.

(b) A powerful PC for gaming. It needs a decent discrete GPU if it's going to play current games. Most office PC's don't have one, unless you work for Pixar.

AFAIK the machines purchased by corporations for general office use are usually middle-of-the-road beasts that cost more than category (a) but don't have the discrete GPU of category (b). I'd be guessing they'd be a waste of money for home use, even after the volume discount.

puivert · on June 12, 2013

> This is pretty important for AMD because they've been having a terrible time matching Intel at per cycle efficiency.

Bulldozer was always supposed to be a high clock long pipe machine, sacrificing some IPC. See eg http://www.anandtech.com/show/5057/the-bulldozer-aftermath-d...

"Per-clock efficiency" is generally not goal in itself in CPU design, absolute performance and efficiency are. Where AMD has stumbled is getting the clock up, probably partly related to their unfortunate fab situation

The speed-demon strategy has seen successes historically, Pentium 4's fate notwithstanding. See eg the DEC 21164 and the IBM z196.

beedogs · on June 13, 2013

There was a lot more wrong with the P4 (not the least of which was the absolutely abysmal chipset Intel married it to) that rendered it such a disaster.

voidlogic · on June 12, 2013

That is what steamroller will do, increase IPC. This is an interim fix. If they can run steamroller at these clocks and Intel doesn't do something radical, AMD will be "top dog" again for the first time since Athlon 64s were killing P4s.

kristofferR · on June 12, 2013

It's also great for marketing purposes.

A lot of people will probably be deceived by the high number, thinking that a higher number automatically always is better.

SG- · on June 12, 2013

Well they're already somewhat deceiving people with the 8 core claim.

yekko · on June 12, 2013

Mismanagement

trotsky · on June 12, 2013

Based on how bulldozer performed, it'll end up that sure it's 5ghz, but did we mention all our instructions take two times the number of clocks now?

Virtualiztion host performance on our 8 core bulldozer (esx 5.1, private kbs from vmware to try to help, 32gb ram, rad10 zfs san) was so bad (think p4 era) that we finally tracked down how to force the cpu into only using 4 cores, one per real fp core.

The reality is that there is no mainstream scheduler out that that can efficienty use cores set up like that, especially with the long pipelines. I'm not sure it can't be done, but what improvements have been made have been minimal, or just in an academic/not a real os situation.

That's why Intel ships a compiler, duh.

It is true that the # one thing holding that part back was the raw clock speed (as long as you view it more like a 4 core, 8 thread part ala Intel), but i've gone back to speccing intel - it's just not worth being that much of a ginae pig for a firm thats basically trying to scrape by until the arm64 parts start getting stamped.

sliverstorm · on June 12, 2013

we finally tracked down how to force the cpu into only using 4 cores

Was the issue the shared fp units, or the turbo-core? I wonder if you can disable the turbo-core?

mitchty · on June 12, 2013

If it was anything like the Niagra processors the shared fp units normally are a bottleneck for fp. But the larger problem was the register remapping/pipelines. They were fast, if you were running certain workloads. God help you if you had to compress anything on those systems. Without pbzip2 or pigz it took forever. Really bad example but bulldozer seemed way too niagraish to me based on its goals.

AnthonyMouse · on June 13, 2013

Running threaded floating point workloads on bulldozer-derived architectures is just folly. If you have parallel floating point code you should in general be running it on a GPU.

trotsky · on June 14, 2013

These weren't fp intensive workloads at all - mostly your typical IT IO workloads. I don't know the internals to say exactly what or why, but something seems to go really wrong on bulldozer when you try to schedule two different vms on the same coupled pair of cores.

AnthonyMouse · on June 15, 2013

It's because they're not independent cores. You're pretty much never going to get the same single-thread performance with two threads running on a module as with one, the idea is that you ought to get better than 0.5X the single thread performance, such that if you have two threads then 2*0.75X is better than X, while still allowing you to get X (or better with turbo) on strictly single threaded workloads.

Where this can fall apart is if you're trying to use eight homogenous threads at once and the threads have large working set sizes, such that the second thread causes spill out from the per-module caches. Then you have eight threads contending for L3 bandwidth, or if you're really screwed you fill up the L3 and start to hit main memory.

Out of curiosity, have you tried any of the Abu Dhabi Opterons? They doubled the L3 from 8MB to 2x8MB, which I would expect to help by both keeping you out of main memory and reducing contention by splitting each L3 between half as many cores (assuming you don't get the new twice-as-many-cores models).

trotsky · on June 14, 2013

Sorry for the late reply. I am just guessing that it was the shared fp, since that's what i thought the major shared resource was. The workloads were bog standard kind of stuff, so i assume mostly integer work, so it's absolutely possible that it wasn't the fp sharing but something else. I thought that the integer cores all had their own registers though? Never the less, I don't know enough about cpu internals to say - I was just assuming.

kinghajj · on June 12, 2013

Do you know if it used Bulldozer or the updated Piledriver architecture? I just set up a personal home server yesterday with a 8350/16GB and ESXi 5.1, and the performance seems fine enough. Are there any kinds of tasks where the slowness becomes most apparent?

sliverstorm · on June 12, 2013

Your parent was using a SAN, which means your parent was in an enterprise setting and probably running 8 guests minimum (1 per core). I'd hazard that is where issues started to crop up.

trotsky · on June 14, 2013

sorry for the late reply - that machine was using bulldozer cores, not piledriver. Also I was exaggerating for effect I guess, that thing has left a bad taste in my mouth. What I really mean is that when compared to a mainstream i7 it sucks - you can effectively put twice as much work on the i7, and it has the added bonus of being able to run a cpu hungry single threaded job basically twice as fast as the bulldozer when there isn't much/any cpu contention.

e12e · on June 12, 2013

Some comments from Anandtech:

http://www.anandtech.com/show/7066/amd-announces-fx9590-and-...

Based on an old review:

http://www.anandtech.com/show/6396/the-vishera-review-amd-fx...

and single thread performance:

http://www.anandtech.com/show/6396/the-vishera-review-amd-fx...

If single threaded scales linearly with turbo frequency (and it looks like it might):

The FX8320 (turbo boost 4.0Ghz) scores 240.7, while the FX8350 (turbo boost 4.2Ghz) scores 252.1:

The difference aligns quite nicely: (240.7/4)4.2~252.74

And for 5Ghz should give about: (240.7/4)5~300.88

This is still lower than intel's i5 3570k (302.2 - Turbo 3.8Ghz) and i7 3770k (312.4 - Turbo 3.9Ghz)

And Haswell has even higher performance:

http://www.anandtech.com/show/7003/the-haswell-review-intel-...

AnthonyMouse · on June 13, 2013

You're comparing the performance of the respective top of the line models. That only matters for bragging rights. Most people don't buy that one, which leaves AMD open to sell chips to people who would have bought midrange Intel chips -- or people who are willing to sacrifice 312.4/300.88 -> ~3.8% performance (which is almost certainly within the margin of error) in order to keep competition alive or because AMD offers a lower price.

e12e · on June 13, 2013

Well, I just wanted to guess at what that 5Ghz announcement actually would mean in terms of performance.

I'm not convinced ~11 pixels per second is within the margin of error (the numbers were from the single threaded povray test) -- but 3.8% difference certainly mean very little in the real world. I'd guess it falls within the bracket that is measurable but ignorable ;-)

Also, the "5Ghz chip" will most certainly be AMDs top of the line model?

TallGuyShort · on June 12, 2013

IBM shipped a 5.2 GHz chip for it's mainframes over 2 years ago. This is not the first.

itcmcgrath · on June 12, 2013

Indeed. We have a few zEnterprise systems in our corp data center (geeky enough looking that I want one in my basement), running with 5.5GHz chips. https://en.wikipedia.org/wiki/IBM_System_z

CountHackulus · on June 12, 2013

Exactly, this is the first ever 5GHz x86 chip. The title is factually inaccurate.

ignostic · on June 13, 2013

That's not surprising, given the title comes from and links to a press release.

AnthonyMouse · on June 13, 2013

They didn't ship the first 64-bit CPU to run Windows either. Windows NT ran on Alpha (among other architectures). Granted the OS was still 32-bit, but the CPU wasn't.

But what do you expect from a press release? It's written by marketing trolls, not engineers.

marshray · on June 13, 2013

AMD literally defined the 64-bit x86 architecture that everyone (including Intel) uses today.

That's "first".

drcode · on June 12, 2013

Does it really qualify as a 5 GHz processor if it only runs at that speed in Turbo mode? (which I assume can only kick in for a few milliseconds...) What is the "normal" speed that it can maintain for more reasonable time periods? How come this isn't mentioned in the press release?

Retric · on June 12, 2013

Turbo mode is limited by heat so with cooling you can stay in turbo mode.

PS: Intel and Nvidea do the same thing. http://www.techarp.com/showarticle.aspx?artno=745&pgno=1

api · on June 12, 2013

If older chips from Intel and AMD could be overclocked to twice their baseline rates with high-end cooling, I wonder what you could clock this beast up to?

And yes there is a market for this. There are certain workloads that are simply not parallelizable -- they're linear chains of dependencies where the output of the first process goes into the second and so on and each step depends on all N-1 steps.

DanBC · on June 12, 2013

Here's a nice example of a challenge that is very hard to parallelize (http://people.csail.mit.edu/rivest/lcs35-puzzle-description....) (http://crypto.stackexchange.com/questions/5831/what-is-the-p...)

(https://people.csail.mit.edu/rivest/pubs/RSW96.pdf)

samatman · on June 12, 2013

A classic example being numeric approximations of ordinary differential equations. Parametric curve fitting, however, is an embarrassingly parallel application of chained ODE solutions.

That is to say, for most strictly serial processes, there's an application where you'll want to run it many times on independent data.

DuskStar · on June 12, 2013

Contrary to this, AMD's current strength is extremely multithreaded wordloads, and their single-threaded performace is rather slow.

http://www.anandtech.com/show/6396/the-vishera-review-amd-fx...

This shows at least part of what I mean - Vishera (slightly lower clocks than what was just announced) loses to Ivy Bridge by a mile in most single-tread tests, but nearly ties in multithreaded ones.

drcode · on June 12, 2013

No, Intel and Nvidia DON'T do the same thing (i.e. release press releases that only mention turbo speed without indicating a base speed)

Retric · on June 12, 2013

They are all to happy to release only max speed without reference to average performance.

https://intel-newsroom.jive-mobile.com/#jive-content-item?co...

PS: They will even compare old chips with turbo boost disabled to there new chips with turbo boost enabled. See note 3 http://www.intel.com/content/www/us/en/processors/xeon/xeon-...

huxley · on June 12, 2013

Just FYI, note 3 in your PS refers to processor latency not GHz.

drcode · on June 12, 2013

Hmmm... your first link doesn't seem to load...

The "Note 3", however, certainly sounds fishy, but this seems only loosely related to the question at hand.

deelowe · on June 12, 2013

Unfortunately, heat is precisely the reason why no one's competing on clock rates any longer. It's called a thermal wall for a reason. To be honest, this is a pretty sad thing for AMD to do. It'll be interesting to see what kind of coolers will be capable of running this thing at 5ghz all the time, if any. And, if even it is possible to run this in turbo mode all the time, what next? Can AMD make a 5.1 chip? 5.2? What would they need to compromise on?

leot · on June 12, 2013

http://en.wikipedia.org/wiki/Reversible_computing

varelse · on June 12, 2013

Not even remotely the same beyond for NVIDIA besides upping the clocks. On NVIDIA GPUs, all the cores are active at all speeds. You're not going to see the equivalent of the case where 1 core at 2.6 GHZ beats 2 cores at 1.3 GHz.

Do you really want your processors going full bore 24/7 just to prove they can?

keeperofdakeys · on June 12, 2013

At least from what I have observed in my Intel CPU with TurboBoost, it spends nearly all of its time in the turbo modes. It's only when all the cores are being utilised, that it starts reaching its TDP limits, and throttling down. Keep in mind that, due to I/0 (like memory), the cpu isn't always in its active state (c0) even if a process is using that core 100%.

crest · on June 12, 2013

I run a FX-8350 as package building server and it stays at 4.55 GHz on all cores during parallel builds with after market cooling for 40€.

kayoone · on June 12, 2013

This is a desktop CPU though. So with proper cooling it should be able to maintain that speed for some time.

corresation · on June 12, 2013

This is common on multicore Intel chips as well -- the chip itself has a max heat profile that it can't exceed, so if several of the cores are quiet a single core can go much higher than it normally can, indefinitely if it remains the primary worker and there is enough heat dissipation.

The i5 in the new Mac Air runs at 1.3Ghz if both cores are active, but if a single core is active and the environment isn't too hot and ventilation is working, a single core can hit 2.6Ghz. Which is quite humorous -- you might have much better real life performance simply disabling a core.

EDIT: To clarify, I replied because of the supposition by the parent that this turbo mode is "for a few milliseconds". In actual practice it is usually a very significant contributor to performance on modern chips, and as mentioned can be indefinite in some circumstances. Ergo, dramatically more important than implied.

drcode · on June 12, 2013

Right, but intel usually doesn't create press releases that ONLY mention the turbo speed, as if that is the only meaningful speed metric.

corresation · on June 12, 2013

AMD is behind, and they'll use gimmicks where they can. Though since nvidia came up elsewhere, note that nvidia effectively advertises their "turbo" speed as the base and max speed, but in practice you'll often find it regulated to lower speeds for heat reasons.

In this case, however, it's an 8-core chip. Very few current workloads will saturate 8 cores (even on heavily taxed database servers), meaning that there is a good chance there is always thermal availability for individual cores (and thus individual threads) to be run at 5Ghz.

bad_user · on June 12, 2013

> *Very few current workloads will saturate 8 cores (even on heavily taxed database servers)

It depends. Some tasks, such as processing incoming HTTP requests and building responses, such as web servers do - are embarrassingly parallelizable. And if you have an architecture that scales horizontally, with enough network bandwidth, you can saturate how many cores you want.

corresation · on June 12, 2013

Oh for sure there are cases that might saturate all cores. They're just incredibly rare, even on machines that are working at "100%". The case of web servers is an interesting one because benchmarks seldom see them actually running at 100% despite putting all of their combined resources at a problem -- there is usually something else synchronously slowing the flow, or a simple bottleneck like Gbps networking. Even on virtualization servers, one generally leaves enough headroom that the machine is nowhere near saturated.

bentcorner · on June 12, 2013

> Which is quite humorous -- you might have much better real life performance simply disabling a core.

Does this have any implications when designing software? (e.g., do things on a single processor in certain situations because it might be faster?)

kayoone · on June 12, 2013

Not really. In most cases your software is single threaded anyway, unless you implement some sort of parallelism yourself and when you do that it usually should make sense. But in the end its the OS thats shuffles the threads around on the cores. If you disable a core, the others can run faster reaching higher clock speeds which might or might not be benefical to your program depending on the work it does.

rbanffy · on June 12, 2013

Wasn't IBM's POWER in this range already?

profquail · on June 12, 2013

Yes, according to Wikipedia, IBM was shipping 5.0GHz POWER6 processors by 2008: https://en.wikipedia.org/wiki/POWER6

The article cited in the Wikipedia entry is dated 08-Apr-2008: http://www.theregister.co.uk/2008/04/08/ibm_595_water/

apaprocki · on June 12, 2013

Don't let the numbers fool you, though. POWER6 achieved those speeds due to a change in chip architecture that actually wound up making them worse processors than the much lower clocked POWER5s in a lot of situations. IBM reversed course and changed the chip architecture back for the POWER7s, which are clocked lower and outperform the POWER6s.

bretpiatt · on June 12, 2013

I'm not an expert here but it looks like IBM is back up to shipping 5.5Ghz chips in the EC12 as of December 2012. Are these still POWER6 chips as an option or fixed and faster POWER7?

PDF Redbook reference: http://www.redbooks.ibm.com/redbooks/pdfs/sg248050.pdf

apaprocki · on June 12, 2013

As sibling mentions, they're not POWER. The max Ghz for each rev is POWER5 2.3Ghz, POWER6 5Ghz, POWER7 4Ghz.

filereaper · on June 13, 2013

apaprocki, you've been absolutely spot on.

Highest clocking POWER processor offered is the 4.42 Ghz P7+ System p 780: http://www-03.ibm.com/systems/power/hardware/780/specs.html

And yea, POWER 6's design was a bad decision.

wmf · on June 12, 2013

System z mainframes use z/Architecture processors, not POWER. POWER7 runs around 4 GHz.

userulluipeste · on June 12, 2013

Q: What will a more powerful CPU core do in a comparatively slow environment that is the rest of the computer?

A: Wait faster!

bcoates · on June 12, 2013

A lot of that comparative slowness was caused by mechanical storage, replacing it with parallel and lower-latency SSDs makes it a lot easier to get full use out of more and faster cores. It doesn't cost much to set up a system that's completely CPU-limited on OLAP database-like workloads these days.

I suspect that as more software stops being optimized for ~10ms serial disk I/O with huge caches this will become more common and more and faster cores will be a big(er) deal.

jevinskie · on June 13, 2013

Compiling big projects (10+ minutes) is still a big use case for a beefy CPU.

venomsnake · on June 12, 2013

Can you hear it - the shrieks of all those D-14 and Phantex-s screaming.

I would like to see review though. And pricing. If it has decent single thread performance and that number of cores with all next gen games being multhithreaded by default it could be a compelling processor if it is in the 4770 price range.

zanny · on June 13, 2013

Not really, because the AMD 8 physical core layout is still 4 packages of 2 register sets sharing ALU and FPU. It is exactly like a hyperthreaded Intel part, but Intel has much better per clock performance.

That, and I see next gen games more favoring openCL / GL 4.3 compute shaders to offload all their parallel workloads than to aggressively optimize for greater than 4 core processors. Your returns on moving traditionally CPU bound workloads (per agent logic, path finding, collision detection) to compute class GPUs (when available, with the cpu fallback for now) gives you significantly more returns than optimizing for the CPU.

Also, you can take a 4770k to near 5ghz on air. This part is already pushing the thermal limits of the Bulldozer architecture, AMD is just fabbing them out this high speed because they are floundering in this low-per clock performance rut the entire architecture put them in.

Now, I would point any budget oriented gamer to the 4 or 6 core AMD models around $120 - 130, because since they are all unlocked, you can get real performance gains (but terrible power efficiency) over Intel parts below the 4 core unlocked part they put out each generation. Since they are effectively 2 / 3 module parts, they are well suited for the next gen of GPGPU everything in the engine and let the CPU do control flow.

If you even approach $200, the performance gains from jumping from any non-K part to a 4.8ghz 4570k are huge, and that alone outclasses every AMD cpu for gaming, but does trade blows on some titles with the 8 core parts.

e12e · on June 13, 2013

It'll be interesting to see if AMD cornering both the new xbox and ps4 has an effect on game engines for the pc as well -- specifically if the tuning that probably will go into console versions will translate to pc -- and whether or not Intel (and nvidia) will end up being penalized as a result.

deepblueq · on June 12, 2013

The problem is that clock doesn't really mean anything concrete in terms of real world performance. It's strictly a marketing thing.

For an example, what if a chip used a 10 GHz clock for distribution, and divided it down to 5 GHz everywhere it was actually used (not that I know of any reason to do such a thing besides marketing). Would it be marketable as a 10 GHz chip? The manufacturer would certainly be in hot water if enthusiasts ever found out...

Even without such contrived scenarios, CPUs get different amounts of stuff done per clock.

Something I keep seeing, even on Slashdot and Hacker News, is the idea that a CPU that has to clock higher for a given performance will use more power. It seems to me that if you've got double the clock, the likely explanation is that half the transistors are switching per clock, and power consumption should be orthogonal to clock/IPC ratio.

If anyone's got any contrary ideas on that, I'd love to hear them. All I can think of is that higher clocks would correlate with longer pipelines, but bulldozer's pipeline isn't even that long.

VLM · on June 12, 2013

"is the idea that a CPU that has to clock higher for a given performance will use more power."

This is like a dog whistle to the EEs, they're going to get all riled up by programmers with screwdrivers. You can model a stereotypical FET gate as a capacitor, all you're really doing is charging and discharging capacitors either in FET gates or the transmission line theoretical capacitance. Right out of the C=Q/V definition of what capacitance is, mushed up against some ohms law and some algebra, and you end up with P=C times V squared times F. So you can see the intense excitement in lowering core voltages, making gates and lines smaller (lowering C) all in a tradeoff to improve the P/F or F/P (whatever) ratio.

The important part is its pretty easy, right outta ohms law and the def of what capacitance is, power is directly proportional to frequency.

tbrownaw · on June 12, 2013

The important part is its pretty easy, right outta ohms law and the def of what capacitance is, power is directly proportional to frequency.

There's also the fact that your transistors have a particular voltage that they switch state at, which means that they switch faster if you drive the gate/line capacitance with a higher voltage.

Which means that chips designed for lower frequencies can be designed to use lower voltages, which can save far more power than what would be directly proportional to the lower frequency.

VLM · on June 12, 2013

"which can save far more power"

yes, right out of the equation provided.

In "CS" terms that may be better understood on HN than "EE" terms, electrical power scales O(n squared) with voltage and O(n) with frequency.

If you really wanna get people riled up and talking you can roll out the old power "EE" stuff about maximum power transfer happening when source and sink impedance are the same, and you want to get the most bang for your buck so you'd like that, right, and a transistor gate being near infinite resistance would imply ... Or if you like to think about interconnects being signal to noise level limited, then an RF analysis about noise voltage across a resistor vs preamp noise figure vs current bias from a communications standpoint would imply... But it turns out in practice most of the time, the first mental model is by far the most effective way to look at it compared to these.

tbrownaw · on June 12, 2013

It seems to me that if you've got double the clock, the likely explanation is that half the transistors are switching per clock

Suppose CPU A has an adder, that takes one clock cycle to run an add instruction. When two registers are being added, the instruction goes thru the entire adder in one clock cycle and affects on average some % of the transistors.

Suppose CPU B has a pipelined adder that takes two clock cycles to run an add instruction. When two registers are being added, the instruction goes thru half of the adder in one cycle, and the other half in the next cycle, and affects about half of that same % of the transistors each time. BUT! This is a pipelined adder, and doesn't just do one instruction at a time. During the first cycle, when our instruction is in the first part of the adder, some other add instruction is still going thru the second part of the adder and affecting the other half of whatever % of the transistors. And during the second cycle of our instruction, the next instruction is going thru the first half. So even tho any one instruction only affects half of the adder at a time, the entire adder still gets affected every clock cycle.

deepblueq · on June 12, 2013

In that example, CPU B's adder can also be clocked twice as fast. If so, it's getting twice the work done and using twice the power (ignoring cache misses and the like for the moment). If it's clocked the same as A, it's performance and power usage will be almost the same as A.

Roughly speaking, power used = transistors switching per unit time. Performance should also follow that pretty closely, depending on the efficiency of the design. At some level, you should be able to look at any instruction and find a corresponding number of transistors that need to switch for it to execute.

Deep pipelining keeps more silicon active at any given time, increasing both performance and power consumption. Because of cache misses and the like, efficiency will drop somewhat. Double the stages also doesn't quite equal double the switches per time, for various reasons. Therefore, deeper pipelines = worse performance per watt but better performance per dollar (not sure how well that'll hold in ridiculous cases like Prescott).

From what I heard, Bulldozer only has one more stage than Haswell (15 vs. 14, don't quote me on that) - not nearly enough to account for the differences we see between them.

What I'm noting is that there are many, many more factors at play than just pipelining. In the case of Bulldozer, I've been hearing quite a bit about minor parts that they found needed more work, most notably branch prediction. It sounds like they've got lots of things that will improve performance with no power or die size downsides. The number I saw bandied about for Steamroller was a 30% performance increase. I have some trouble believing it's quite that big, but if they pull it off, that will be an amazing chip for being 32nm. It hints to me that the macroscale architecture is A-OK, and they just screwed up some small but important things.

wmf · on June 12, 2013

It seems to me that if you've got double the clock, the likely explanation is that half the transistors are switching per clock, and power consumption should be orthogonal to clock/IPC ratio.

Nope; a lot of the latches are switching every cycle, so power is higher at higher frequency. This is what doomed NetBurst-style design.

deepblueq · on June 13, 2013

Couldn't a 90nm transistor switch at 8 GHz or so in this kind of application? I'm not sure of the exact numbers, but at 1/16th the area occupied, capacitance is much lower, letting it switch far faster.

Just making up some numbers, how about 30% of gates switch on every clock, and 3x the switching speed for modern gates (it's probably much higher, but I'm being conservative here):

NetBurst: (0.3 * 3) / ((0.3 * 3) + (0.7 * 6)) = 17.6% power

Bulldozer: (0.3 * 4.5) / ((0.3 * 4.5) + (0.7 * 18)) = 9.7% power

Sandy Bridge: (0.3 * 3.6) / ((0.3 * 3.6) + (0.7 * 18)) = 7.9% power

So basically, NetBurst is ridiculous, though that shouldn't be news to anyone. Bulldozer doesn't look to be doing so bad as all that, and the numbers improve if the speed is more than 3x.

(I have no idea what the real numbers are, if someone tells me I'll update this.)

tibbon · on June 12, 2013

I remember like ~10 years ago on Slashdot some people overclocking to 7-8ghz. Of course this was on single core chips, but we've really pretty much completely stalled on the mhz progression haven't we?

mistercow · on June 12, 2013

Clock speed is not a meaningful end unto itself, and that's why it stalled. It was used as a proxy for speed for many years, and this led to its rampant inflation. Instructions per second (IPS) is a more meaningful metric for CPU speed, and that has by no means stalled, even on a per-core basis.

drzaiusapelord · on June 12, 2013

I don't think these are practical limitations, more like limitations to be able to sell laptops and desktops.

If we told intel that they could burn up to 350watts on the CPU and a 25lbs heatsink was acceptable, we'd probably have 10ghz processors. Problem is, there isn't a large market for that. Home users don't want a big ugly and noisy box and server buyers would prefer power and heat savings. Supercomputers just tie all this stuff together instead of creating some monster single-core.

Actually, this was the strategy with the pentium 4. It was a fast and power hungry single-core. Turns out, efficiency per cycle and multicore are just superior solutions.

jmngomes · on June 12, 2013

Also, I think there are some physical limitations that keep chips below a certain clock speed. Besides, the bet has been on "smarter instead of faster", i.e. producing chips that suit our computing needs, which are more adequately supported by parallel processing.

High performance cores are useful for problems that are hard to paralellize, but so far it seems that the breakthrough only occurs when a new approach to the problem makes it feasible on multiprocessing platforms (e.g. graph processing is hard to paralellize due to dependencies among graph nodes, Pregel and similar offer a different approach)...a 50GHz CPU won't save you if you need to process a huge graph (i.e. billions of nodes) on a single thread, it'll always take a lot of time.

As to the "record", I think IBM already had a Series Z that is over 5GHz.

sliverstorm · on June 12, 2013

I think there are some physical limitations that keep chips below a certain clock speed.

Not hard limits, but yes, to my knowledge it is primarily physics that keep chips where they are. The requirements for power and heat dissipation start to balloon.

louthy · on June 12, 2013

> "FX-9590: Eight “Piledriver” cores, 5 GHz Max Turbo "

Ridiculous name. Maybe if they put a 'Go faster stripe' on the top of the chip people will believe it goes even faster!

icegreentea · on June 12, 2013

The name is FX-9590. It has 8 compute cores. AMD's internal designation for this generation is "Piledriver". They've chosen to name their high-end compute family after construction equipment (their previous were named after racing tracks). 5 GHz Max Turbo is also not part of the name, it is a description of its performance. It's "baseline" performance is probably something like 4GHz or something (pulling out of my ass). The Max Turbo refers to that using their thermal management system, they can peak at least one of their cores to 5GHz for some period of time. The "Max" is in there because there are intermediate turbo speeds for varying thermal situations and CPU loads.

moe · on June 12, 2013

I really wish CPU vendors would simply name their products after relevant benchmarks. E.g. 'Intel PassMark 14490/135W'. Or 'AMD PassMark 9140/125W'.

Sounds nearly as fancy, makes the same amount of sense to the layman, and saves the rest of us a bit of time.

pavlov · on June 12, 2013

AMD used to do something like this back in the K5 era. Instead of advertising megahertz, the processors were sold based on a "performance rating" which attempted to match them up to equivalent Pentium chips based on a set of benchmarks.

So a K5 PR-200 was actually a 133MHz chip, but it could match or exceed a Pentium 200MHz in some well-selected benchmarks.

iso8859-1 · on June 12, 2013

There is no such thing as a _universally_ "relevant benchmark". They will never agree on a testing suite since performance varies too much. They have no reason to believe the test manufacturer is impartial.

vacri · on June 13, 2013

When you're done convincing them, can you convince the GPU makers as well? It's crazytown over there.

At least the network switch makers have a sort of internal consistency, though you still have to learn individual vendors' styles.

sp332 · on June 12, 2013

The CPU has different clock speeds depending on how much it is being used. The upper limit is set by thermal and power constraints. If you're using all the cores, the limit is relatively low. If you're only using one or two cores, the power management system will clock them higher. Since the higher clock speed is only available under certain workloads, it's called "Turbo".

louthy · on June 12, 2013

Oh I appreciate all of that. But in combination with all of the other superlatives, it's frankly ludicrous.

AMD Exec 1: "Hmm, which superlatives can we use to sell this 'slightly more powerful chip'?"

AMD Exec 2: "Err... hmm... how about... ALL OF THEM?!"

AMD Exec 1: "Do you know, you're a freaking genius!!"

rbanffy · on June 12, 2013

The "turbo" is probably there because of the amount of air that must flow through the radiator.

SteveTickle · on June 12, 2013

"FX", "PileDriver", "Max", "Turbo" ??? Whats next? obviously, "Super", "Mega", "Ultra", "Extra"?

joyeuse6701 · on June 12, 2013

You sir, should be in marketing. If you put a glistening racing stripe on the packaging, I'd make a guess you'd get 10% more in sales.

nakedrobot2 · on June 12, 2013

"piledriver"? Good grief! What are the next models called? Doggy Style? Missionary?

sp332 · on June 12, 2013

It was a wrestling move before it was a sex position. And the wrestling move was named after an actual tool to drive piles. https://en.wikipedia.org/wiki/Pile_driver

marshray · on June 13, 2013

It's a sex position? Kids these days and their newfangled terminology.

Still, it seems unlikely that no one had ever snickered about the phallic symbolism of the construction equipment of the term's original usage.

Stolpe · on June 12, 2013

In other news: PCs now even more versatile. Now also replace radiators.

venomsnake · on June 12, 2013

It has been like that for a while ... in 2000-ish we were calling Athlon Kotlon (Kotlon is stove in bulgarian)

freehunter · on June 12, 2013

In the mid 2000s I was running a Prescott oven in my room.

sliverstorm · on June 12, 2013

Just wrap that computer case in tin foil, and you got a makeshift oven!

vidarh · on June 12, 2013

There's the story that ARM at one point took a "pizza oven" using Intel CPU's as heating elements to trade shows back in the early 90's.

thrownaway2424 · on June 13, 2013

Back in the early 90s, it was an actual scandal that the Pentium had such a huge heat sink. The magazines (at that time, computing magazines were quite popular) were joking that next they'd put a fan on the CPU. Haha.

Those were the days, really, because there was still the possibility that in the future you'd have any damned thing in your computer, not necessarily an x86. You could have an Alpha or a SPARC or PPC or maybe an i960. And it would be silent and use no power and you'd install it in your bitchin' conversion van.

adestefan · on June 12, 2013

I miss the P4 I used to have at an old job. It kept my feet nice and warm in the winter.

kayoone · on June 12, 2013

I wonder how two of these (16 cores) compare against the new mac pro with the best CPU in software that benefits from many cores like 3D rendering and such or virtualization. Id bet they are close while the AMD only costs a fraction of the Xeon. I know its not a fair comparision since the FX-9000 is not a workstation cpu, but still...

bluedino · on June 12, 2013

The multi-threaded Pov-Ray and Cinebench tests are just about the only two benchmarks where the AMD 8-cores beat the i7 2600k, and just barely beats it.

The Intel chips soundly win in anything else (encoding, Photoshop...), and by almost 2X in some of the single-threaded tests.

Everlag · on June 12, 2013

Yes AMD, we all know you like your big numbers like core counts and clock speed. It'd, however, be just excellent if you could put out a product whose single threaded performance isn't garbage. I mean, thubans are beating your newest and greatest!

But at least you can say you got the bigger cache, clockspeed, core count, and debt than intel.

yason · on June 12, 2013

In the end it's not the frequencies nor number of cores but performance per watt that matters.

Most computers run on batteries these days, and those that don't drain ever more expensive electricity from the wall socket and at the same time waste a lot of it producing huge amounts of heat.

The more you get out of a watt the better. You can either trade in speed for lower power or trade in power for better performance, but in either case you want the performance/watt ratio to be the highest.

I would guess the power consumption of running the chip at 5GHz is pretty high. And running temperatures as well. And yet there are fewer and fewer of those huge tasks that you can only do with one core.

sliverstorm · on June 12, 2013

In the end it's not the frequencies nor number of cores but performance per watt that matters.

It depends on the workload, really. It should already be obvious that this part is not meant to be a Joe Everyman processor.

makomk · on June 12, 2013

Probably, though it's hard to get a good grasp on actual power efficiency because Intel fudge their power figures so much these days...

scotty79 · on June 12, 2013

Since Pentium 133 I never had Intel processor in desktop computer. I wanted few times but AMD was always cheeper for the same speed. Sure fastest were almost always the Intel ones but the additional bit of speed never justified the price.

csense · on June 13, 2013

My hyperthreaded [1] Core i7 makes kernel compile jobs fly.

[1] http://en.wikipedia.org/wiki/Hyperthreading

scotty79 · on June 13, 2013

It just splits core into sort of two virtual cores. I'm not sure how that could help. Have you check how disabling it influences compiling speed?

csense · on June 13, 2013

> I'm not sure how that could help

Hyperthreading keeps two threads "hot" in each physical core. When one thread is waiting on memory access, the core can do work on the other thread rather than sitting idle. (Memory access isn't that slow, so switches need to be fast to capture those otherwise-wasted cycles, which is why this is a CPU hardware feature rather than an OS-level software feature.)

Purely CPU-bound tasks [1] don't get any performance gains from HT. But almost all real-world applications spend a lot of time reading and writing memory, and memory access is pretty slow compared to CPU speeds, so in practice HT helps (otherwise Intel wouldn't have bothered to develop it and put it on their chips, which probably cost a lot of money).

> Have you check how disabling it influences compiling speed?

No. But I'd guess it would be substantially less than 100% speedup since they aren't actual, physical cores; but substantially more than 0% speedup since the compiler uses dozens or even low hundreds of megabytes of memory.

[1] By "CPU-bound" I mean register-to-register arithmetic. You might also be able to get away with hitting the L1 cache, which is a few KB, without triggering an HT context switch.

shawnz · on June 13, 2013

How ironic that AMD are now suffering from the same thing that once gave them the edge (that is, Pentium 4's overly aggressive clock speed roadmap and lacklustre per-clock efficiency).

leeoniya · on June 12, 2013

and here i was hoping never to see a high-clock-speed headline again...(supercomputers excluded)

fchief · on June 12, 2013

I guess their marketing was out of other ideas and just went back to the well one more time. It has probably been 10 years now since I really considered CPU clock speed as a factor when buying a computer.

IanChiles · on June 12, 2013

I seem to recall reading that these processors would have a 220W TDP - which makes the whole 5ghz thing much much less impressive...

Moto7451 · on June 12, 2013

I'm yet to find an actual source for that figure whenever it comes up. Is it just some tech site comment section spitballing or did they actually disclose the TDP?

VLM · on June 12, 2013

If there's a dimensioned pix of the heatsink, a bored enough engineer could calculate the theoretical degC/W rating of the heatsink, and given a presumably constant deltaT, there's your wattage.

Doesn't have to be dimensioned that accurately. To a first approximation a 1% error in surface area would be about a 1% error in TDP.

I'd like to see very high temp CPU technology. That would be an interesting, challenging direction for hardware tech to move. A tiny lightweight 5 deg C/W heatsink is plenty if you're allowed to run at, say, vacuum tube redhot glow temperatures. I'm well aware of the solid state physics challenges of this, that's why I think it would be very interesting to see if anyone could pull it off.

marshray · on June 13, 2013

I doubt that a heatsink on an engineering-sample test board would be sized within a 1% margin. Seems more likely they'd be error generously on the side of big.

brigade · on June 13, 2013

Neither - the 5GHz chips were rumored for a while, and the rumors always mentioned 220W TDP.

Since the first half of the rumor came true, it's fairly likely the second half will too.

jdavid · on June 12, 2013

Agreed, I don't know how you are going to cool that thing quietly or cheaply. I think the thermal load would be less on two cpu chips running at 2/3-rds the GHz. It might be cheaper to build as well.

nvmc · on June 12, 2013

I've been waiting since my Athlon64 died for AMD to make a chip worth getting. Them taking the brute force (P4) approach is not all that encouraging.

znowi · on June 12, 2013

I think maybe AMD should move to niche markets and stop trying to compete with Intel, giving how vast the gap is in technology and resources.

freehunter · on June 12, 2013

If there's no one trying to compete with Intel, I would guess that gap would close fairly quickly.

sigzero · on June 12, 2013

I think there should be a "their" in there, somewhere.

Ziomislaw · on June 12, 2013

yaay, I am eagerly awaiting for intel to catch up :) (so I could buy quality stuff that does not hang or overheat)

Pherdnut · on June 13, 2013

YAYYY!!! Okaynowwhydothebenchmarkssuck.

ck2 · on June 12, 2013

I wonder how far they will OC on air.

josephagoss · on June 12, 2013

What's the MIPS for this CPU?

ogdoad · on June 12, 2013

finally, angry birds will be totally responsive!

zmonkeyz · on June 12, 2013

(for consumers)