This is pretty important for AMD because they've been having a terrible time matching Intel at per cycle efficiency. Basically, ever since the core i series started, AMD has been behind at single threaded performance, especially when going clock for clock. They've been trying to make up for it by both offering more cores/threads than comparably priced Intel components, as well as trying to scale their clock rates. Unfortunately for AMD, their last few generations have actually been falling short on their attempts to boost clock rates, as a result, when comparing comparably priced AMD and Intel chips, while AMD would typically have a clock speed advantage, it would often not be enough to overcome Intel's efficiency.
This product release is kind of an attempt to show that AMD can actually deliver on their planned strategy.
As for why AMD is going for this route, rather than trying to beat Intel in per clock efficiency? Probably because AMD's resources are severely limited compared to Intel, and this approach offered lower risk at lower cost.
> As for why AMD is going for this route, rather than trying to beat Intel in per clock efficiency?
Well, because clock speeds are something they can improve now, and not in $n years time when their next major microarchitecture is ready. Intel had the exact same problem with the Pentium 4, and they were similarly stuck with minor tweaks and desperately increasing clock rates for years before Core was ready.
It was my understanding that P-IV was originally conceived as an architecture that could be clocked up and up for years to come. Developing a new architecture is expensive and risky (see P-IV, Itanium), so the hope was to design something that would scale up so well as manufacturing improved that a few generations of architectures could be skipped, so to speak.
They had hoped the P-IV would eventually reach 10GHz or so. Which made it OK that P-IV retried fewer instructions per clock than the P-III that came before. Scaling up like that isn't such a radical idea; the P6/i686 architecture behind the Pentium Pro, Pentium II and Pentium III had spanned a spectrum from 150MHz to 1.4GHz, after all, nearly an order of magnitude.
But it turned out that somewhere between 3 and 4 GHz, things got really difficult.
"Minor tweaks and desperately increasing clock rates" was more or less the P-IV plan from the get go. It just turned out not to work.
Anandtech discussed this in their Bulldozer review:
AMD's architects called this pursuit a low gate count
per pipeline stage design. By reducing the number of
gates per pipeline stage, you reduce the time spent in
each stage and can increase the overall frequency of
the processor. If this sounds familiar, it's because
Intel used similar logic in the creation of the Pentium 4.
there is no trick, just desperate attempts to survive.
10 years ago AMD had more efficient architecture, Intel - more GHz (remember 2.2 Athlons having 3700 "PR-rating"?). Intel's approach was commercially more successful - customers were still buying GHz, and AMD was trying to educate market about real performance while radiating impression of a looser who just can't get good process and fabs. AMD gave up and decided to pursue P4-like approach for their new architecture, while Intel hit GHz "sound barrier" and went efficiency way by resurrecting PIII style architecture which resulted in Core CPUs. AMD made a huge, strategic mistake 10 years ago. How the execs in charge of Bulldozer have been BS-ing their way inside AMD last 5 years - that is a typical everyday miracle of a big company internal life.
I actually take notice of how many consumer laptops I see in a walmart / target / best buy are actually running AMD APUs. The big brands might not, but they see bigger numbers on the A6 than a Pentium and feel better about the buy even if the Pentium dominates it.
The trouble is that most computers aren't bought at Best Buy or Walmart, and the ones that are tend to be the low end garbage with no margins for the hardware vendor.
Keeping AMD out of Dell and HP is what kept them out of corporate America. Corporations literally buy PCs by the pallet, and then they pass on the volume discount to employees who want to buy one for home.
> they pass on the volume discount to employees who want to buy one for home
Really? I don't see the use case for ordering a PC this way for the home. When I'm buying a home PC, or recommending one for others, it's either:
(a) A bottom-of-the-barrel PC. As long as it has 1GB of memory and more than one core, you can use it for web browsing, Youtube, email, and word processing. This is what non-techies usually want (but they don't know they want it and may get upsold by good marketing). This is what I want unless I'm planning on running a specific application that requires more.
(b) A powerful PC for gaming. It needs a decent discrete GPU if it's going to play current games. Most office PC's don't have one, unless you work for Pixar.
AFAIK the machines purchased by corporations for general office use are usually middle-of-the-road beasts that cost more than category (a) but don't have the discrete GPU of category (b). I'd be guessing they'd be a waste of money for home use, even after the volume discount.
"Per-clock efficiency" is generally not goal in itself in CPU design, absolute performance and efficiency are. Where AMD has stumbled is getting the clock up, probably partly related to their unfortunate fab situation
The speed-demon strategy has seen successes historically, Pentium 4's fate notwithstanding. See eg the DEC 21164 and the IBM z196.
There was a lot more wrong with the P4 (not the least of which was the absolutely abysmal chipset Intel married it to) that rendered it such a disaster.
That is what steamroller will do, increase IPC. This is an interim fix. If they can run steamroller at these clocks and Intel doesn't do something radical, AMD will be "top dog" again for the first time since Athlon 64s were killing P4s.
Based on how bulldozer performed, it'll end up that sure it's 5ghz, but did we mention all our instructions take two times the number of clocks now?
Virtualiztion host performance on our 8 core bulldozer (esx 5.1, private kbs from vmware to try to help, 32gb ram, rad10 zfs san) was so bad (think p4 era) that we finally tracked down how to force the cpu into only using 4 cores, one per real fp core.
The reality is that there is no mainstream scheduler out that that can efficienty use cores set up like that, especially with the long pipelines. I'm not sure it can't be done, but what improvements have been made have been minimal, or just in an academic/not a real os situation.
That's why Intel ships a compiler, duh.
It is true that the # one thing holding that part back was the raw clock speed (as long as you view it more like a 4 core, 8 thread part ala Intel), but i've gone back to speccing intel - it's just not worth being that much of a ginae pig for a firm thats basically trying to scrape by until the arm64 parts start getting stamped.
If it was anything like the Niagra processors the shared fp units normally are a bottleneck for fp. But the larger problem was the register remapping/pipelines. They were fast, if you were running certain workloads. God help you if you had to compress anything on those systems. Without pbzip2 or pigz it took forever. Really bad example but bulldozer seemed way too niagraish to me based on its goals.
Running threaded floating point workloads on bulldozer-derived architectures is just folly. If you have parallel floating point code you should in general be running it on a GPU.
These weren't fp intensive workloads at all - mostly your typical IT IO workloads. I don't know the internals to say exactly what or why, but something seems to go really wrong on bulldozer when you try to schedule two different vms on the same coupled pair of cores.
It's because they're not independent cores. You're pretty much never going to get the same single-thread performance with two threads running on a module as with one, the idea is that you ought to get better than 0.5X the single thread performance, such that if you have two threads then 2*0.75X is better than X, while still allowing you to get X (or better with turbo) on strictly single threaded workloads.
Where this can fall apart is if you're trying to use eight homogenous threads at once and the threads have large working set sizes, such that the second thread causes spill out from the per-module caches. Then you have eight threads contending for L3 bandwidth, or if you're really screwed you fill up the L3 and start to hit main memory.
Out of curiosity, have you tried any of the Abu Dhabi Opterons? They doubled the L3 from 8MB to 2x8MB, which I would expect to help by both keeping you out of main memory and reducing contention by splitting each L3 between half as many cores (assuming you don't get the new twice-as-many-cores models).
Sorry for the late reply. I am just guessing that it was the shared fp, since that's what i thought the major shared resource was. The workloads were bog standard kind of stuff, so i assume mostly integer work, so it's absolutely possible that it wasn't the fp sharing but something else. I thought that the integer cores all had their own registers though? Never the less, I don't know enough about cpu internals to say - I was just assuming.
Do you know if it used Bulldozer or the updated Piledriver architecture? I just set up a personal home server yesterday with a 8350/16GB and ESXi 5.1, and the performance seems fine enough. Are there any kinds of tasks where the slowness becomes most apparent?
Your parent was using a SAN, which means your parent was in an enterprise setting and probably running 8 guests minimum (1 per core). I'd hazard that is where issues started to crop up.
sorry for the late reply - that machine was using bulldozer cores, not piledriver. Also I was exaggerating for effect I guess, that thing has left a bad taste in my mouth. What I really mean is that when compared to a mainstream i7 it sucks - you can effectively put twice as much work on the i7, and it has the added bonus of being able to run a cpu hungry single threaded job basically twice as fast as the bulldozer when there isn't much/any cpu contention.
You're comparing the performance of the respective top of the line models. That only matters for bragging rights. Most people don't buy that one, which leaves AMD open to sell chips to people who would have bought midrange Intel chips -- or people who are willing to sacrifice 312.4/300.88 -> ~3.8% performance (which is almost certainly within the margin of error) in order to keep competition alive or because AMD offers a lower price.
Well, I just wanted to guess at what that 5Ghz announcement actually would mean in terms of performance.
I'm not convinced ~11 pixels per second is within the margin of error (the numbers were from the single threaded povray test) -- but 3.8% difference certainly mean very little in the real world. I'd guess it falls within the bracket that is measurable but ignorable ;-)
Also, the "5Ghz chip" will most certainly be AMDs top of the line model?
Indeed. We have a few zEnterprise systems in our corp data center (geeky enough looking that I want one in my basement), running with 5.5GHz chips. https://en.wikipedia.org/wiki/IBM_System_z
They didn't ship the first 64-bit CPU to run Windows either. Windows NT ran on Alpha (among other architectures). Granted the OS was still 32-bit, but the CPU wasn't.
But what do you expect from a press release? It's written by marketing trolls, not engineers.
Does it really qualify as a 5 GHz processor if it only runs at that speed in Turbo mode? (which I assume can only kick in for a few milliseconds...) What is the "normal" speed that it can maintain for more reasonable time periods? How come this isn't mentioned in the press release?
If older chips from Intel and AMD could be overclocked to twice their baseline rates with high-end cooling, I wonder what you could clock this beast up to?
And yes there is a market for this. There are certain workloads that are simply not parallelizable -- they're linear chains of dependencies where the output of the first process goes into the second and so on and each step depends on all N-1 steps.
A classic example being numeric approximations of ordinary differential equations. Parametric curve fitting, however, is an embarrassingly parallel application of chained ODE solutions.
That is to say, for most strictly serial processes, there's an application where you'll want to run it many times on independent data.
This shows at least part of what I mean - Vishera (slightly lower clocks than what was just announced) loses to Ivy Bridge by a mile in most single-tread tests, but nearly ties in multithreaded ones.
Unfortunately, heat is precisely the reason why no one's competing on clock rates any longer. It's called a thermal wall for a reason. To be honest, this is a pretty sad thing for AMD to do. It'll be interesting to see what kind of coolers will be capable of running this thing at 5ghz all the time, if any. And, if even it is possible to run this in turbo mode all the time, what next? Can AMD make a 5.1 chip? 5.2? What would they need to compromise on?
Not even remotely the same beyond for NVIDIA besides upping the clocks. On NVIDIA GPUs, all the cores are active at all speeds. You're not going to see the equivalent of the case where 1 core at 2.6 GHZ beats 2 cores at 1.3 GHz.
Do you really want your processors going full bore 24/7 just to prove they can?
At least from what I have observed in my Intel CPU with TurboBoost, it spends nearly all of its time in the turbo modes. It's only when all the cores are being utilised, that it starts reaching its TDP limits, and throttling down. Keep in mind that, due to I/0 (like memory), the cpu isn't always in its active state (c0) even if a process is using that core 100%.
This is common on multicore Intel chips as well -- the chip itself has a max heat profile that it can't exceed, so if several of the cores are quiet a single core can go much higher than it normally can, indefinitely if it remains the primary worker and there is enough heat dissipation.
The i5 in the new Mac Air runs at 1.3Ghz if both cores are active, but if a single core is active and the environment isn't too hot and ventilation is working, a single core can hit 2.6Ghz. Which is quite humorous -- you might have much better real life performance simply disabling a core.
EDIT: To clarify, I replied because of the supposition by the parent that this turbo mode is "for a few milliseconds". In actual practice it is usually a very significant contributor to performance on modern chips, and as mentioned can be indefinite in some circumstances. Ergo, dramatically more important than implied.
AMD is behind, and they'll use gimmicks where they can. Though since nvidia came up elsewhere, note that nvidia effectively advertises their "turbo" speed as the base and max speed, but in practice you'll often find it regulated to lower speeds for heat reasons.
In this case, however, it's an 8-core chip. Very few current workloads will saturate 8 cores (even on heavily taxed database servers), meaning that there is a good chance there is always thermal availability for individual cores (and thus individual threads) to be run at 5Ghz.
> *Very few current workloads will saturate 8 cores (even on heavily taxed database servers)
It depends. Some tasks, such as processing incoming HTTP requests and building responses, such as web servers do - are embarrassingly parallelizable. And if you have an architecture that scales horizontally, with enough network bandwidth, you can saturate how many cores you want.
Oh for sure there are cases that might saturate all cores. They're just incredibly rare, even on machines that are working at "100%". The case of web servers is an interesting one because benchmarks seldom see them actually running at 100% despite putting all of their combined resources at a problem -- there is usually something else synchronously slowing the flow, or a simple bottleneck like Gbps networking. Even on virtualization servers, one generally leaves enough headroom that the machine is nowhere near saturated.
Not really. In most cases your software is single threaded anyway, unless you implement some sort of parallelism yourself and when you do that it usually should make sense. But in the end its the OS thats shuffles the threads around on the cores. If you disable a core, the others can run faster reaching higher clock speeds which might or might not be benefical to your program depending on the work it does.
Don't let the numbers fool you, though. POWER6 achieved those speeds due to a change in chip architecture that actually wound up making them worse processors than the much lower clocked POWER5s in a lot of situations. IBM reversed course and changed the chip architecture back for the POWER7s, which are clocked lower and outperform the POWER6s.
I'm not an expert here but it looks like IBM is back up to shipping 5.5Ghz chips in the EC12 as of December 2012. Are these still POWER6 chips as an option or fixed and faster POWER7?
A lot of that comparative slowness was caused by mechanical storage, replacing it with parallel and lower-latency SSDs makes it a lot easier to get full use out of more and faster cores. It doesn't cost much to set up a system that's completely CPU-limited on OLAP database-like workloads these days.
I suspect that as more software stops being optimized for ~10ms serial disk I/O with huge caches this will become more common and more and faster cores will be a big(er) deal.
Can you hear it - the shrieks of all those D-14 and Phantex-s screaming.
I would like to see review though. And pricing. If it has decent single thread performance and that number of cores with all next gen games being multhithreaded by default it could be a compelling processor if it is in the 4770 price range.
Not really, because the AMD 8 physical core layout is still 4 packages of 2 register sets sharing ALU and FPU. It is exactly like a hyperthreaded Intel part, but Intel has much better per clock performance.
That, and I see next gen games more favoring openCL / GL 4.3 compute shaders to offload all their parallel workloads than to aggressively optimize for greater than 4 core processors. Your returns on moving traditionally CPU bound workloads (per agent logic, path finding, collision detection) to compute class GPUs (when available, with the cpu fallback for now) gives you significantly more returns than optimizing for the CPU.
Also, you can take a 4770k to near 5ghz on air. This part is already pushing the thermal limits of the Bulldozer architecture, AMD is just fabbing them out this high speed because they are floundering in this low-per clock performance rut the entire architecture put them in.
Now, I would point any budget oriented gamer to the 4 or 6 core AMD models around $120 - 130, because since they are all unlocked, you can get real performance gains (but terrible power efficiency) over Intel parts below the 4 core unlocked part they put out each generation. Since they are effectively 2 / 3 module parts, they are well suited for the next gen of GPGPU everything in the engine and let the CPU do control flow.
If you even approach $200, the performance gains from jumping from any non-K part to a 4.8ghz 4570k are huge, and that alone outclasses every AMD cpu for gaming, but does trade blows on some titles with the 8 core parts.
It'll be interesting to see if AMD cornering both the new xbox and ps4 has an effect on game engines for the pc as well -- specifically if the tuning that probably will go into console versions will translate to pc -- and whether or not Intel (and nvidia) will end up being penalized as a result.
The problem is that clock doesn't really mean anything concrete in terms of real world performance. It's strictly a marketing thing.
For an example, what if a chip used a 10 GHz clock for distribution, and divided it down to 5 GHz everywhere it was actually used (not that I know of any reason to do such a thing besides marketing). Would it be marketable as a 10 GHz chip? The manufacturer would certainly be in hot water if enthusiasts ever found out...
Even without such contrived scenarios, CPUs get different amounts of stuff done per clock.
Something I keep seeing, even on Slashdot and Hacker News, is the idea that a CPU that has to clock higher for a given performance will use more power. It seems to me that if you've got double the clock, the likely explanation is that half the transistors are switching per clock, and power consumption should be orthogonal to clock/IPC ratio.
If anyone's got any contrary ideas on that, I'd love to hear them. All I can think of is that higher clocks would correlate with longer pipelines, but bulldozer's pipeline isn't even that long.
"is the idea that a CPU that has to clock higher for a given performance will use more power."
This is like a dog whistle to the EEs, they're going to get all riled up by programmers with screwdrivers. You can model a stereotypical FET gate as a capacitor, all you're really doing is charging and discharging capacitors either in FET gates or the transmission line theoretical capacitance. Right out of the C=Q/V definition of what capacitance is, mushed up against some ohms law and some algebra, and you end up with P=C times V squared times F. So you can see the intense excitement in lowering core voltages, making gates and lines smaller (lowering C) all in a tradeoff to improve the P/F or F/P (whatever) ratio.
The important part is its pretty easy, right outta ohms law and the def of what capacitance is, power is directly proportional to frequency.
The important part is its pretty easy, right outta ohms law and the def of what capacitance is, power is directly proportional to frequency.
There's also the fact that your transistors have a particular voltage that they switch state at, which means that they switch faster if you drive the gate/line capacitance with a higher voltage.
Which means that chips designed for lower frequencies can be designed to use lower voltages, which can save far more power than what would be directly proportional to the lower frequency.
In "CS" terms that may be better understood on HN than "EE" terms, electrical power scales O(n squared) with voltage and O(n) with frequency.
If you really wanna get people riled up and talking you can roll out the old power "EE" stuff about maximum power transfer happening when source and sink impedance are the same, and you want to get the most bang for your buck so you'd like that, right, and a transistor gate being near infinite resistance would imply ... Or if you like to think about interconnects being signal to noise level limited, then an RF analysis about noise voltage across a resistor vs preamp noise figure vs current bias from a communications standpoint would imply... But it turns out in practice most of the time, the first mental model is by far the most effective way to look at it compared to these.
It seems to me that if you've got double the clock, the likely explanation is that half the transistors are switching per clock
Suppose CPU A has an adder, that takes one clock cycle to run an add instruction. When two registers are being added, the instruction goes thru the entire adder in one clock cycle and affects on average some % of the transistors.
Suppose CPU B has a pipelined adder that takes two clock cycles to run an add instruction. When two registers are being added, the instruction goes thru half of the adder in one cycle, and the other half in the next cycle, and affects about half of that same % of the transistors each time. BUT! This is a pipelined adder, and doesn't just do one instruction at a time. During the first cycle, when our instruction is in the first part of the adder, some other add instruction is still going thru the second part of the adder and affecting the other half of whatever % of the transistors. And during the second cycle of our instruction, the next instruction is going thru the first half. So even tho any one instruction only affects half of the adder at a time, the entire adder still gets affected every clock cycle.
In that example, CPU B's adder can also be clocked twice as fast. If so, it's getting twice the work done and using twice the power (ignoring cache misses and the like for the moment). If it's clocked the same as A, it's performance and power usage will be almost the same as A.
Roughly speaking, power used = transistors switching per unit time. Performance should also follow that pretty closely, depending on the efficiency of the design. At some level, you should be able to look at any instruction and find a corresponding number of transistors that need to switch for it to execute.
Deep pipelining keeps more silicon active at any given time, increasing both performance and power consumption. Because of cache misses and the like, efficiency will drop somewhat. Double the stages also doesn't quite equal double the switches per time, for various reasons. Therefore, deeper pipelines = worse performance per watt but better performance per dollar (not sure how well that'll hold in ridiculous cases like Prescott).
From what I heard, Bulldozer only has one more stage than Haswell (15 vs. 14, don't quote me on that) - not nearly enough to account for the differences we see between them.
What I'm noting is that there are many, many more factors at play than just pipelining. In the case of Bulldozer, I've been hearing quite a bit about minor parts that they found needed more work, most notably branch prediction. It sounds like they've got lots of things that will improve performance with no power or die size downsides. The number I saw bandied about for Steamroller was a 30% performance increase. I have some trouble believing it's quite that big, but if they pull it off, that will be an amazing chip for being 32nm. It hints to me that the macroscale architecture is A-OK, and they just screwed up some small but important things.
It seems to me that if you've got double the clock, the likely explanation is that half the transistors are switching per clock, and power consumption should be orthogonal to clock/IPC ratio.
Nope; a lot of the latches are switching every cycle, so power is higher at higher frequency. This is what doomed NetBurst-style design.
Couldn't a 90nm transistor switch at 8 GHz or so in this kind of application? I'm not sure of the exact numbers, but at 1/16th the area occupied, capacitance is much lower, letting it switch far faster.
Just making up some numbers, how about 30% of gates switch on every clock, and 3x the switching speed for modern gates (it's probably much higher, but I'm being conservative here):
So basically, NetBurst is ridiculous, though that shouldn't be news to anyone. Bulldozer doesn't look to be doing so bad as all that, and the numbers improve if the speed is more than 3x.
(I have no idea what the real numbers are, if someone tells me I'll update this.)
I remember like ~10 years ago on Slashdot some people overclocking to 7-8ghz. Of course this was on single core chips, but we've really pretty much completely stalled on the mhz progression haven't we?
Clock speed is not a meaningful end unto itself, and that's why it stalled. It was used as a proxy for speed for many years, and this led to its rampant inflation. Instructions per second (IPS) is a more meaningful metric for CPU speed, and that has by no means stalled, even on a per-core basis.
I don't think these are practical limitations, more like limitations to be able to sell laptops and desktops.
If we told intel that they could burn up to 350watts on the CPU and a 25lbs heatsink was acceptable, we'd probably have 10ghz processors. Problem is, there isn't a large market for that. Home users don't want a big ugly and noisy box and server buyers would prefer power and heat savings. Supercomputers just tie all this stuff together instead of creating some monster single-core.
Actually, this was the strategy with the pentium 4. It was a fast and power hungry single-core. Turns out, efficiency per cycle and multicore are just superior solutions.
Also, I think there are some physical limitations that keep chips below a certain clock speed. Besides, the bet has been on "smarter instead of faster", i.e. producing chips that suit our computing needs, which are more adequately supported by parallel processing.
High performance cores are useful for problems that are hard to paralellize, but so far it seems that the breakthrough only occurs when a new approach to the problem makes it feasible on multiprocessing platforms (e.g. graph processing is hard to paralellize due to dependencies among graph nodes, Pregel and similar offer a different approach)...a 50GHz CPU won't save you if you need to process a huge graph (i.e. billions of nodes) on a single thread, it'll always take a lot of time.
As to the "record", I think IBM already had a Series Z that is over 5GHz.
I think there are some physical limitations that keep chips below a certain clock speed.
Not hard limits, but yes, to my knowledge it is primarily physics that keep chips where they are. The requirements for power and heat dissipation start to balloon.
The name is FX-9590. It has 8 compute cores. AMD's internal designation for this generation is "Piledriver". They've chosen to name their high-end compute family after construction equipment (their previous were named after racing tracks). 5 GHz Max Turbo is also not part of the name, it is a description of its performance. It's "baseline" performance is probably something like 4GHz or something (pulling out of my ass). The Max Turbo refers to that using their thermal management system, they can peak at least one of their cores to 5GHz for some period of time. The "Max" is in there because there are intermediate turbo speeds for varying thermal situations and CPU loads.
AMD used to do something like this back in the K5 era. Instead of advertising megahertz, the processors were sold based on a "performance rating" which attempted to match them up to equivalent Pentium chips based on a set of benchmarks.
So a K5 PR-200 was actually a 133MHz chip, but it could match or exceed a Pentium 200MHz in some well-selected benchmarks.
There is no such thing as a _universally_ "relevant benchmark". They will never agree on a testing suite since performance varies too much. They have no reason to believe the test manufacturer is impartial.
The CPU has different clock speeds depending on how much it is being used. The upper limit is set by thermal and power constraints. If you're using all the cores, the limit is relatively low. If you're only using one or two cores, the power management system will clock them higher. Since the higher clock speed is only available under certain workloads, it's called "Turbo".
It was a wrestling move before it was a sex position. And the wrestling move was named after an actual tool to drive piles. https://en.wikipedia.org/wiki/Pile_driver
Back in the early 90s, it was an actual scandal that the Pentium had such a huge heat sink. The magazines (at that time, computing magazines were quite popular) were joking that next they'd put a fan on the CPU. Haha.
Those were the days, really, because there was still the possibility that in the future you'd have any damned thing in your computer, not necessarily an x86. You could have an Alpha or a SPARC or PPC or maybe an i960. And it would be silent and use no power and you'd install it in your bitchin' conversion van.
I wonder how two of these (16 cores) compare against the new mac pro with the best CPU in software that benefits from many cores like 3D rendering and such or virtualization.
Id bet they are close while the AMD only costs a fraction of the Xeon. I know its not a fair comparision since the FX-9000 is not a workstation cpu, but still...
The multi-threaded Pov-Ray and Cinebench tests are just about the only two benchmarks where the AMD 8-cores beat the i7 2600k, and just barely beats it.
The Intel chips soundly win in anything else (encoding, Photoshop...), and by almost 2X in some of the single-threaded tests.
Yes AMD, we all know you like your big numbers like core counts and clock speed. It'd, however, be just excellent if you could put out a product whose single threaded performance isn't garbage. I mean, thubans are beating your newest and greatest!
But at least you can say you got the bigger cache, clockspeed, core count, and debt than intel.
In the end it's not the frequencies nor number of cores but performance per watt that matters.
Most computers run on batteries these days, and those that don't drain ever more expensive electricity from the wall socket and at the same time waste a lot of it producing huge amounts of heat.
The more you get out of a watt the better. You can either trade in speed for lower power or trade in power for better performance, but in either case you want the performance/watt ratio to be the highest.
I would guess the power consumption of running the chip at 5GHz is pretty high. And running temperatures as well. And yet there are fewer and fewer of those huge tasks that you can only do with one core.
Since Pentium 133 I never had Intel processor in desktop computer. I wanted few times but AMD was always cheeper for the same speed. Sure fastest were almost always the Intel ones but the additional bit of speed never justified the price.
Hyperthreading keeps two threads "hot" in each physical core. When one thread is waiting on memory access, the core can do work on the other thread rather than sitting idle. (Memory access isn't that slow, so switches need to be fast to capture those otherwise-wasted cycles, which is why this is a CPU hardware feature rather than an OS-level software feature.)
Purely CPU-bound tasks [1] don't get any performance gains from HT. But almost all real-world applications spend a lot of time reading and writing memory, and memory access is pretty slow compared to CPU speeds, so in practice HT helps (otherwise Intel wouldn't have bothered to develop it and put it on their chips, which probably cost a lot of money).
> Have you check how disabling it influences compiling speed?
No. But I'd guess it would be substantially less than 100% speedup since they aren't actual, physical cores; but substantially more than 0% speedup since the compiler uses dozens or even low hundreds of megabytes of memory.
[1] By "CPU-bound" I mean register-to-register arithmetic. You might also be able to get away with hitting the L1 cache, which is a few KB, without triggering an HT context switch.
How ironic that AMD are now suffering from the same thing that once gave them the edge (that is, Pentium 4's overly aggressive clock speed roadmap and lacklustre per-clock efficiency).
I guess their marketing was out of other ideas and just went back to the well one more time. It has probably been 10 years now since I really considered CPU clock speed as a factor when buying a computer.
I'm yet to find an actual source for that figure whenever it comes up. Is it just some tech site comment section spitballing or did they actually disclose the TDP?
If there's a dimensioned pix of the heatsink, a bored enough engineer could calculate the theoretical degC/W rating of the heatsink, and given a presumably constant deltaT, there's your wattage.
Doesn't have to be dimensioned that accurately. To a first approximation a 1% error in surface area would be about a 1% error in TDP.
I'd like to see very high temp CPU technology. That would be an interesting, challenging direction for hardware tech to move. A tiny lightweight 5 deg C/W heatsink is plenty if you're allowed to run at, say, vacuum tube redhot glow temperatures. I'm well aware of the solid state physics challenges of this, that's why I think it would be very interesting to see if anyone could pull it off.
I doubt that a heatsink on an engineering-sample test board would be sized within a 1% margin. Seems more likely they'd be error generously on the side of big.
Agreed, I don't know how you are going to cool that thing quietly or cheaply. I think the thermal load would be less on two cpu chips running at 2/3-rds the GHz. It might be cheaper to build as well.
This product release is kind of an attempt to show that AMD can actually deliver on their planned strategy.
As for why AMD is going for this route, rather than trying to beat Intel in per clock efficiency? Probably because AMD's resources are severely limited compared to Intel, and this approach offered lower risk at lower cost.