An Inferno on the Head of a Pin

soylentcola · on Jan 17, 2017

The talk of older processors not needing heatsinks/fans reminded me of my first (and thankfully last) lesson on this matter back around 2001-ish.

In the past I'd assembled several PCs and my less-than-rigorous setup process typically involved slotting the CPU, RAM, and other cards, then plugging it into a monitor and powering up to run through a quick POST just to make sure nothing was misaligned or improperly seated. Then I'd power off, put down some thermal paste, mount the heatsink/fan, and connect any remaining cables for drives, etc.

Well, in the past, this was fine. Sure, I wouldn't want to run things without at least the crappy little Intel or AMD CPU fan but 5 or 10 seconds was no big deal. But this time I was building for a friend who had me order one of those fancy new 1GHz Athlon chips. I was psyched as well because the small markup I'd added for my time helped me afford my own fancy 1GHz CPU and new motherboard for my own PC.

So I followed my usual procedure and slotted everything for the initial POST (nothing worse than getting everything connected in one of those old PC cases with IDE cables and wires everywhere just to have to unhook it all because something wasn't seated properly) and powered on.

Blink.

Nothing.

Hmm...Did I not get the CPU seated properly? Let's pop it out and reseat it. OWW! Burned the crap out of my finger and got a blister!

Uh oh. It sorta smells like the magic smoke even though I don't see any. Well, eventually I realized what had happened and learned that even a few seconds without a heatsink/fan had been enough to fry that brand new 1GHz Athlon.

Thankfully I had the one I'd ordered for myself and I was able to finish the build that my friend had paid for. But it was a rather expensive (for me at the time at least) lesson in the need for proper cooling on these modern CPUs.

laumars · on Jan 17, 2017

You were lucky that's all that happened. Athlon's ran notoriously hot to the point that they would literally catch fire after a few seconds if no cooling was applied.

bbgm · on Jan 17, 2017

I scorched at least two Athlon's back in the day. Those buggers ran crazy hot.

nine_k · on Jan 17, 2017

You're just not looking at Intel processors that are old enough :)

80286 needed no heat sinks, even 20-MHz models.

80386 used to do well without radiators, too, or with entirely passive cooling, even the faster DX variants.

Frankly, I don't remember low-power i486 (e.g. 40-MHz versions) to have heat sinks, too. You definitely wanted a heat sink, though usually a passive one, on a DX2 (66 MHz) or DX4 (100 MHz).

Thoreandan · on Jan 17, 2017

The column "The Hard Edge" in Computer Shopper magazine (Bill O'Brien and Alice Hill) teased Intel's heat output on the new "Pentium" chip by saying they'd get a tiny frying pan made to fry an egg on it. :-) (can't find the original article, but it's referred to in this 1997 article https://www.highbeam.com/doc/1G1-19334858.html )

Before then, I don't remember my IBM 386/75 or 486 needing fans ... but the 486DX2/50 had problems with RF emissions from the corners of motherboard traces (IIRC) so they left external clock speed lower for many moons ...

jff · on Jan 17, 2017

My 486/DX 33MHz didn't have a heat sink, and while it ran hot, it wasn't too hot to touch.

monochromatic · on Jan 17, 2017

IIRC, my old 486SX at 25 MHz didn't use a heatsink.

jfroma · on Jan 17, 2017

I bought a desktop computer in ~2004 and as usual I just pick the best hardware I could afford back then.

I still feel cheated by Intel, because working on that crap was like working on an airplane turbine. The noise and the heat was impossible to tolerate to the point that some companies started to sell solutions for this thing back then, some even using water. I ended up buying a huge blue dissipator.

Intel used me to test their new Hyperthreading technology, the processor was called Prescott, a.k.a "Pres-Hot".

https://en.wikipedia.org/wiki/Pentium_4#Prescott

Asooka · on Jan 17, 2017

I actually applied an absolutely-final overkill solution to the noise&heat problem on my desktop - I run a full liquid cooling setup with a giant heatsink with three fans on it mounted fully outside the case. It's mounted on my desk. The noise is unnoticeable and cpu temp is 27C at idle, 60C at full burn.

jfroma · on Jan 17, 2017

I wonder if some processors like prescot actually need to run hot

ansible · on Jan 17, 2017

I wonder if some processors like prescot actually need to run hot

Generally speaking, no. Some are more tolerant of high temperatures, but that's it. And higher temperatures in general will degrade the processor over time.

zokier · on Jan 17, 2017

I wonder how much of that reputation is based on absolute power consumption, and how much comes from the relative huge jump from preceding Pentium III's. Checking from ARK, when looking at mainstream models the first gen P4's consumed 60ish watts which grew to 85ish watts in the end (precott). That sounds pretty much in line with all the CPUs that followed. But it was radical rise from the <30ish watts that PIIIs consumed. Of course the top of the line P4s were even more hotter (in the 115-130 watt range) but that is actually also still in line with more recent CPUs. It's just less visible as Intel in their great wisdom has limited the top models to server Xeons and thus out of the hands of consumers.

j3097736 · on Jan 17, 2017

>That sounds pretty much in line with all the CPUs that followed.

Because the Intel CPUs that followed actually had power saving features, previously found only on mobile versions.

Those P4 would just run hot all the time, not pleasant being in computer lab with two dozens of them + CRTs and AC that couldn't keep up.

throwawayish · on Jan 17, 2017

At the time water-based heatpipes weren't used for coolers, which made all coolers much, much less efficient. Today practically every CPU cooler uses 2-8 heatpipes and that's what makes all the difference. (Well that and not using top-blowers anymore).

ajford · on Jan 17, 2017

I remember those damn Prescott cores. I had one on a custom build I did for myself back in High School.

After setting that bad boy up, it ran so hot that my room was about 5 degrees hotter than the rest of the house. I finally started putting it to sleep for most of the day just to keep the heat down.

mwpmaybe · on Jan 17, 2017

Great story. When I upgraded from a 486DX2/50 to a Pentium 133, I thought for sure there was something wrong with the voltage regulator on the motherboard because the chip was throwing off so much heat. Spent hours on the phone with Tyan technical support (which was great) and a multimeter and even swapped out the motherboard at one point. Turned out the new chips just ran that hot!

adrianN · on Jan 17, 2017

Those Athlons make for a good cooking plate:

https://www.youtube.com/watch?v=atBb9JruXBw

theandrewbailey · on Jan 24, 2017

Back when I was starting to build PCs, the heatsink application step was the scariest. I had read SO MANY horror stories. When it came around to actually doing it (on one of those naked die Athlons, no less), I made sure to read the instructions a few times. I didn't have to pray or sacrifice a goat, but I've never fried a CPU.

mvindahl · on Jan 18, 2017

Story reminds me of the time that I literally squished a CPU with a fan. Being fairly inexperienced with those things, I had somehow managed to turn the fan 180 degrees before forcefully attaching it to the CPU. I even remember the hassle involved in making the little power wires extend all the way to the far side of the fan but at no time did my brain enter the loop.

Took the CPU back to the store the next day to complain that it didn't work and it took them about epsilon seconds to figure out what had happened. I bought a new CPU and hastily left the building. They must have been laughing at that for days.

soylentcola · on Jan 18, 2017

This is partly why I had historically gotten everything else in place before messing with the heatsink/fan. Those older ones could be really touchy and I was petrified of crunching the exposed die of one of those older processors by having to remove or otherwise futz around with the mount after getting it attached.

One time I really did think I heard a crunching/grit type sound when mounting a heatsink and my stomach dropped out until I got the system powered on and verified that nothing was broken.

Seriously, these things were terrifying to mount a heatsink on back in the day with the way that thing sticks out, rubber pads be damned: https://i.imgur.com/6w1oVMS.jpg

venomsnake · on Jan 17, 2017

My expensive cpu lesaon. The person that designed lga 2011 had the brilliant idea to make notches on the cpu almost symmetrical. So it is possible with juuuust a tiny bit of force to put it the opposite way.

khedoros1 · on Jan 17, 2017

My parents used to have some "junk" computers out in the garage. One of them had a Cyrix 486 clone. These are the pins on that guy: http://www.amoretro.de/wp-content/uploads/cyrix_cx486dlc-40g...

While trying to get as many of those machines working as possible, I unseated the CPU (can't remember why), and I guessed the orientation wrong when putting it back into the machine. I don't remember seeing or smelling any literal magic smoke, but it must've gotten let out, because that machine never booted again.

Luckily, that wasn't any kind of new hardware. I've got an active imagination though, and it always comes to mind when I am working with new and expensive equipment.

sethhochberg · on Jan 17, 2017

I miss old PGA CPUs like that. They'd inevitably get bent, and you could almost always use the tip of a mechanical pencil (with the lead removed) to bend things back into almost exactly the right place and boot the machine. Nowadays everything is LGA and if you bend one of the tiny in-socket pins you're just sunk.

khedoros1 · on Jan 18, 2017

I liked clicking down the locking arm. It made a satisfying click, after an equally-satisfying increase in tension. A lot of the LGA ones now make a sickly-sounding "crinkle" that I've never liked much.

theandrewbailey · on Jan 24, 2017

AMD is still old school and uses PGA sockets, at least for their desktop parts. I wish they would get with the program (they have for their server CPUs).

throwawayish · on Jan 17, 2017

Ah, you're the guy why newer LGA2011 caps come with a yellow-red warning sticker about "NOT USING ANY FORCE"? ;)

rwiggins · on Jan 17, 2017

From the article:

> I remember cooling the early CPUs with simple heatsinks; no fan. Those days are long gone.

Interestingly, for desktop machines, this is not quite correct... there is still a "fanless" movement going. My roommate has a PC that doesn't have any physical moving components: no fans, no hard drives.

Fanless designs are not space-efficient, though; see this fanless CPU heatsink: https://www.quietpc.com/nof-icepipe which is rated for up to a 95W TDP CPU - enough to run an Intel i7-6700K, which has a TDP rating of 91W.

If you're willing to build a very quiet machine instead of a silent machine, liquid cooling with very quiet fans (that eventually ramp up based on temps) is very workable, and means your machine is effectively silent (i.e. at or around ambient noise levels) 99% of the time.

Theodores · on Jan 17, 2017

Due to the problems of office noise I needed to solve the problem of being able to work 'at my desktop' but from somewhere quieter in the office, e.g. an unused meeting room.

So I invested in a cheap and cheerful Chromebook, put linux on it (Gallium OS is the easiest operating system to install, ever), got the NFS mounts working, got my desktop to proxy serve my dev domains, put 'Synergy' on there (so I can use the same keyboard/mouse on all machines from the Chromebook) and made sure I could also work fully remote with it too (local repo). I also got 'X Windows' to work nicely.

As a glorified terminal my Chromebook has full HD, massive battery life (all day) and zero noise. In fact I wish it was 'warmer' in use, such is the low-end Celeron's feeble and fanless heat output.

It seems to me that laptops actually do not last the distance if they are over a certain size - 15". The thermal management is just not up to it and it is only a matter of time before the fan is running permanently with the CPU throttled. After a couple of dead laptops one thinks 'desktop/server', somehow silent and low power so it can be left on... But this doesn't really exist unless you invest in silent cooling.

With my bargain basement Chromebook I can do everything I want to do although for some things like a time doing graphics work, I will go to the faster machine. This faster machine no longer has lots of cables attached to it, the keyboard/mouse is shared from the Chromebook so it becomes a box with power in, network and HDMI out, neatly tucked out of sight in an adjacent cupboard rather than roaring away on/under my desk.

Another bonus of the Chromebook is that it's lameness is a feature. People have rubbish computers as well as posh ones, I need to test for all devices and what will work on a low end Chromebook will fly along on anything more normal.

sethrin · on Jan 17, 2017

GalliumOS is easy to install, on something that's not a Chromebook perhaps. In order to install it, I had to:

> Put the chromebook into 'developer mode', wiping the hard drive

> Physically open the device, breaking the 'warranty void' sticker, in order to remove a write-protect screw

> Change firmware flags to allow booting off of unsigned partitions

> Replace the firmware entirely, risking bricking the device

Apart from that it was easy, and the installation program was faultless. I do use my Chromebook as a dev machine, and I agree with you that the poor performance is more of a help than a hindrance. Code editors don't stress out any remotely modern computer, and if your code runs well on the Chromebook then it should be fine anywhere else; very few machines have worse specs. I think they're fine machines, and GalliumOS is everything one would wish, but I really couldn't count ease of installation amongst their features.

Theodores · on Jan 18, 2017

My hunch is that you opened up a Chromebook Pixel (2013). I thought about it but decided against 'mutilating' the design classic that is the original Pixel, stepped back from the edge and bought an Acer 14" full HD Chromebook with 4Gb RAM for £250.

One thing though - sound. This only works on HDMI which again is a feature - I can't procrastinate with videos. Installing 'WinZip' on a PC back in the day when I used 'Windows' was harder and certainly more fraught with danger.

6DM · on Jan 17, 2017

I didn't realize how massive that heat sink was until I saw the little "how to install" animation at the bottom. It basically occupies the entire case O.O

rwiggins · on Jan 17, 2017

Yep. :-)

The heatsinks for fanless designs are enormous, but the result (no moving parts in your PC, if you eschew hard drive(s) as well) is really neat in my opinion, and great if you do audio work - as my roommate does. Totally silent, guaranteed.

You definitely need to plan for it, though. The heatsink doesn't offer enough clearance for some of the taller RAM sticks on some motherboards.

The temps aren't all that bad either. If I remember correctly, my roommate's PC operates at temps that are roughly equivalent to the stock Intel heatsink/cooler. Not the best, but given that it doesn't have a fan, pretty nice.

monochromatic · on Jan 17, 2017

I don't really understand the allure of going entirely fanless. Adding a couple of fans that spin so slowly they're inaudible makes a huge difference in cooling efficiency, with no real downside. For example, I use one of these.[1] It's several years old, but it does a good job of cooling, can't be heard, and doesn't fill up the whole damn case like that one you linked to... I can only imagine newer heatsink/fan designs are even better.

[1]http://noctua.at/en/nh-u12p

rwiggins · on Jan 17, 2017

I don't personally run a fanless build... I use the design I mentioned at the end of my post, liquid cooling with almost silent fans (Noctua brand, like you linked).

The defining feature of a fanless build really is cost. It's hard to build a machine that will stay silent under load for cheaper than the heatsink I linked above. At the same time, fanless builds don't dissipate enough heat to enable high overclocking (so you can't eke out the same performance as you could using fans).

webignition · on Jan 17, 2017

Indeed. My main desktop PC has zero moving parts and a massive heatsink and copes fine with medium- to heavy-usage software development needs.

freestockoption · on Jan 18, 2017

Which PSU are you using? Do you have a graphics card?

webignition · on Jan 18, 2017

The PSU is a Nofan P-400A: https://www.quietpc.com/nof-p-400a

No graphics card, just the internal GPU of a 4th gen i7 (purchased 2013).

antisthenes · on Jan 17, 2017

For desktop machines, the biggest challenge is not cooling the CPU, but the GPU, for several reasons:

1. Mid-range cards and higher consume twice as much as a quad-core CPU.

2. Much more vertically constrained (other expansion cards on the bottom) than a CPU, so heatsink designs are very limited

3. Heatsink is much less standardized, partly because of (2), and partly because of different PCB sizes and the fact that is has to cover VRM chips as well.

Gravityloss · on Jan 17, 2017

There are also cases where most of the computer case surface is an actual heat sink and radiator. Heat pipes are provided that you can attach to the processor and graphics card.

A free business idea: create decent looking fanless radiator cases for computers. Make them look like minimalistic furniture, not like kid's toys or faux Apple imitations.

codinghorror · on Jan 18, 2017

Oh yeah those are amazing. Fanless is the holy grail but it is very risky and super hard. See also related review: http://www.silentpcreview.com/NoFan_CR-80EH_CS-60

xja · on Jan 17, 2017

The article mentions microcontrollers that use 100 milliwatts as the lower end of embedded CPUs.

There are actually microcontrollers that use around 1milliwatt for very low power applications. For example the msp430. TI have a neat video of one running using power generated from grapes:

https://youtu.be/nPZISRQAQpw

david-given · on Jan 17, 2017

Last time this came up someone pointed me at the new generation of ultra-low-power ARMs:

http://www.atmel.com/products/microcontrollers/arm/sam-l.asp...

35µA/MHz, so with careful selection of peripherals you probably don't even need a whole milliamp.

sbierwagen · on Jan 17, 2017

Designing microamp-level systems can be interesting. From http://www.ganssle.com/rants/leaks_and_drains.html :

  I put one of the boards under a microscope and looked at 
  some of the ancillary parts. There's a 22 uF decoupling 
  capacitor. No real surprise there. It appears to be one of 
  those nice Kemet polymer tantalums designed for high-density 
  SMT applications.
  
  The datasheet pegs the cap's internal leakage at 22 uA. That 
  means the capacitor draws a thousand times more power than 
  the dozing CPU.

dbcurtis · on Jan 17, 2017

Yes, I play with little robots built with microcomtrollers. For some of our little boards my friends and I make we leave off the LEDs because they consume more battery than the CPU. (Of course the motor makes up for that quickly...)

tyingq · on Jan 17, 2017

No issues with the article as a whole, but comparing Intel's processors solely on #cores and clock speed isn't right. The table that compares a E5-1630 with a E5-1680, for example, omits the information that the E5-1680 has twice the amount of cache space, despite only having 1.5x the number of cores.

sulam · on Jan 17, 2017

The odd thing here is that no DC I know of has the power and cooling to support a rack full of these things without surrounding them with empty space to meet the watts/sqft budget. That picture of racks full of 1U servers is basically a lie -- for more reasons than power, but power is the killer.

The follow up would be "an Inferno in your Rack".

teh_klev · on Jan 17, 2017

I'd disagree, the fairly newish DC we're in is designed for supplying and cooling 20kW/rack. That's enough for 40 or so 1u servers burning ~450 watts.

dfox · on Jan 17, 2017

One thing that burned us in the past was DC that provided 20kW power per rack in power, but required you to only dissipate 5kW in heat. Needless to say we changed the DC provider in like a month after they made this clear (and they combined that with offer of some kind of "cloud provider" package which meant 15kW of both and triple the price). After few another such DC-related incidents (and mostly unbelievable: circuit breakers catching fire, 600V peak on 230V AC line, double flooring collapsing from overload and such) I'm quite relieved that I don't deal with DC procurement anymore.

mikeash · on Jan 17, 2017

What the...? Were they expecting you to radiate the remaining 15kW as microwaves or something?

dfox · on Jan 17, 2017

That's exactly the question I asked them, with "or you want us to radiate the remaining the remaining 15kW into this single mode fiber going to your switch? we can get laser capable of that"

pierrebai · on Jan 17, 2017

Does the wattage coming in entirely converts to heat? I always assumed it didn't, but have no idea in what proportion.

mikeash · on Jan 17, 2017

Energy in has to equal energy out over the long term. For electronics equipment that isn't doing mechanical work then there aren't a lot of options for how that energy can come out: it'll either be electromagnetic radiation of some kind (i.e. light or radio or similar) or heat. Most computer equipment emits relatively little electromagnetic radiation, so it's essentially all heat.

0xffff2 · on Jan 18, 2017

Isn't heat just lower frequency EM radiation?

mikeash · on Jan 18, 2017

Heat can be transmitted through EM radiation (of all frequencies, but IR is predominant for temperatures we're accustomed to) but also by direct contact between materials. For typical temperatures, radiation doesn't transmit much heat. Vacuum makes for a pretty good insulator. If you want to remove 20kW of heat from a small space, you'll need to do most of that removal by transferring the heat to a fluid of some kind, e.g. by putting it into the air.

To see the difference in practical terms, pick an item that's noticeably warm, but not hot enough to burn you. Hold your finger very close to it. The heat you feel there is radiated. Then touch it, and feel the heat that's conducted through touch. You'll feel much more heat with the latter.

throwawayish · on Jan 17, 2017

The external signalling power used by computers is completely neglible compared to the heat output. Computers and IT equipment in general are essentially ideal heat converters.

sulam · on Jan 17, 2017

Hmm, good to know. And to update another assumption-- my guess is that this would be a pretty expensive configuration, and you may be better off picking a lower TDP server that can still get the job done, yes?

Also, do you actually have room for 40 nodes in a rack? Betwee TOR switches, storage, and miscellaneous other, it seems unlikely.

shiftpgdn · on Jan 17, 2017

I would think 40 nodes in a rack is realistic if your battery is located elsewhere and you use side mount PDUs. Keep in mind in many places (NYC is a great example here) that real estate space for racks is the killer and not cooling or power.

deegles · on Jan 17, 2017

250 KW per rack is possible using immersion cooling: http://investors.3m.com/news/press-release-details/2015/3M-N...

wmf · on Jan 17, 2017

Just not in a real datacenter.

pokemon-trainer · on Jan 17, 2017

>Is this extreme? Putting 140 TDP of CPU heat in a 1U server? Not really. Nick at Stack Overflow told me they just put two 22 core, 145W TDP Xeon 2699v4 CPUs and four 300W TDP GPUs in a single Dell C4130 1U server. I'd sure hate to be in the room when those fans spin up. I'm also a little afraid to find out what happens if you run MPrime plus full GPU load on that box.

What could Stack Overflow be doing that requires such a dense GPU/CPU configuration? I didn't think a commenting site would required that level of parallel processing.

burkaman · on Jan 17, 2017

They have some machine learning projects: https://kevinmontrose.com/2015/01/27/providence-machine-lear...

Also, they do get a lot of comments. You can read about their setup here: http://nickcraver.com/blog/2016/02/17/stack-overflow-the-arc...

http://nickcraver.com/blog/2016/03/29/stack-overflow-the-har...

Shog9 · on Jan 17, 2017

The GPU use is perhaps the most interesting here... Marc Gravell wrote a series of articles about it: http://blog.marcgravell.com/2016/05/how-i-found-cuda-or-rewr...

guard-of-terra · on Jan 17, 2017

I know that several years back, they ran all their Stack Overflow from one multi-core Windows server (DB and backend and frontend). And they mocked the traditional Linux setup of multiple frontends talking to multiple DB instances with huge number of small VMs.

Makes sense the server should be really massive.

heartbreak · on Jan 17, 2017

He doesn't work at StackOverflow anymore.

teh_klev · on Jan 17, 2017

He was referring to Jeff's conversation with Nick Craver who does work at StackOverflow.

gigatexal · on Jan 17, 2017

As a hardware guy first, and a (wannabe) software guy second this post made me really happy.

Infinitesimus · on Jan 17, 2017

Hardware guy now doing software and I was all tingly inside reading about hacks to get a happy CPU under unrealistic load.

(Anyone else out there ran Prime 95 and Furmark for fun in the past?)

gigatexal · on Jan 17, 2017

That's how I validate my overclocks. And then after that it's 24 hours of memtest86+

msimpson · on Jan 17, 2017

How do you measure TDP with a Kill-a-Watt? Energy consumption does not equate to thermal design power. Nor is TDP even a measure of peak thermal output ...

MertsA · on Jan 17, 2017

>Energy consumption does not equate to thermal design power.

Sure it does, the energy has to go somewhere. If it's not being stored in some way or emitted as EM then heat is pretty much all that is left. If you measure voltage and current for Vcore going into the CPU then you can easily calculate the amount of heat it's generating since the CPU can't really store any appreciable amount of energy and there's basically nothing else that would allow that energy to leave the CPU.

msimpson · on Jan 17, 2017

Yet, he is not measuring the Vcore going into the CPU. He is measuring the overall wattage consumed by the PSU at the outlet using a Kill-a-Watt. This contains a lot more draw than just the CPU, itself. That's an overwhelming lack of concern for a wealth of other variables that can amount to tens of watts. So to be more concise:

Overall energy consumption of a computer does not equate to only the TDP of the CPU, itself.

nucleardog · on Jan 17, 2017

> Unfortunately, here's what I actually measured with my trusty Kill-a-Watt for each server build as I performed my standard stability testing, with completely identical parts except for the CPU:

The two CPUs were in identical test rigs. There is a 80W difference changing only the CPU. While you can expect some of this is lost in the PSU, simple fact is more power is ending up inside the computer and it has nowhere to go but out as heat. The most reasonable explanation given otherwise identical builds would be to expect this difference to be due to the changed component.

So if we assume that Intel's 4-core is actually 140W TDP, then there's no way this 6-core can also be 140W.

Yes, this isn't an exact, scientific test, but it's certainly reliable enough to say "this 6-core processor is emitting more heat than the 4-core, although they are rated identically" which it seems to me was the only point he was trying to make given the context which was "two more cores, slightly lower clock speed, that might be an okay tradeoff - OH WAIT, MORE HEAT".

msimpson · on Jan 17, 2017

> So if we assume that Intel's 4-core is actually 140W TDP, then there's no way this 6-core can also be 140W.

Why? How can you say that under typical usage the thermal dissipation for both chips isn't the same? Atwood's numbers measure overall power consumption while idle and under heavy load with mprime, neither of which is what TDP seeks to measure.

TDP is like fuel economy for cars. You don't claim it's a lie when you go one hundred and fifty miles an hour for 20 miles and burn two gallons of gas. You simply realize those highway numbers are meant for more of a sixty mile per hour journey over the same distance.

nucleardog · on Jan 17, 2017

Okay:

Yes, TDP is an inexactly defined and meaningless term and while to the layman it would generally be understood to have some relation to the heat emitted by the CPU during normal operation, it's possible that both chips in fact only generate 1W of heat and were given a 140W TDP because Intel had a pre-existing cooling solution and a warehouse full of spare parts. Yes, you are correct on the semantics. Power consumption has no relation to TDP because heat generated has no defined relation to TDP.

However, given any meaningfully bounded definition of TDP, the case is still made that the TDP of these chips should be dissimilar. The later numbers show that under full load the power draw at the wall increases by 20W/core used. Unless your "typical load" used for determining TDP does not actually make use of the full number of cores (which I would think could be fine for a desktop processor, certainly not a server) then it's clear that the heat generated by these two processors should be dissimilar under any load.

If the 4 core actually needs to dissipate 140W under a typical use case, then the 6 core should absolutely need to dissipate more unless the "typical use case" is uselessly applied.

If we want to talk cars... Let's say I sell a base model with a top speed of 80mph, and a sport model with a top speed of 120mph. But I tell you you only need to put 80mph rated tires on the sport model because that's as fast as a typical person drives. Would you really claim that the sport model's tires are correctly rated? Would you not be surprised when you drove 100mph in the sport model and the tires exploded? Why on Earth would anyone even buy the sport model if it's crippled to nearly the same performance as the base model?

msimpson · on Jan 17, 2017

Let's get to the crux, here:

> Unless your "typical load" used for determining TDP does not actually make use of the full number of cores

> If the 4 core actually needs to dissipate 140W under a typical use case, then the 6 core should absolutely need to dissipate more unless the "typical use case" is uselessly applied.

Taken from Intel's own specs for the E5-1650:

"Thermal Design Power (TDP) represents the average power, in watts, the processor dissipates when operating at Base Frequency with all cores active under an Intel-defined, high-complexity workload. Refer to Datasheet for thermal solution requirements."

Immediately notice "Base Frequency", not Max Turbo, "all cores active", and "Intel-defined, high-complexity workload." Until you can perform the same test on both chips, you cannot assume that "these two processors should be dissimilar under any load."

In regard to your car analogy, if the sport version had a management interface to limit speed due to the rating on the tires, then it would be like an Intel CPU. Read this:

http://www.intel.com/content/dam/www/public/us/en/documents/...

lightedman · on Jan 17, 2017

"This contains a lot more draw than just the CPU, itself."

Not really. Modern RAM uses a couple watts. Fans use at typical 2.5w each. Hard Drives only a few watts (SSD variety.) The PSU in servers tends to be fairly efficient, maybe losing 50-ish watts due to heat, and they run more efficiently at lower draws than higher draws. So using a Kill-A-Watt gives you a better idea of how much power the CPU is truly sucking down (especially when a GPU isn't installed.)

This is how I catch LED lighting companies lying about their actual power usage on their lights.

msimpson · on Jan 17, 2017

When you're arguing over a 110 watt difference, 50 watts lost as heat in the PSU is not insignificant and certainly enough to skew a test.

ksk · on Jan 17, 2017

The author is comparing how off the numbers are from the stated specs, not establishing the accuracy of individual figures. If the extraneous power-draw is reasonably constant, it can provide an interesting comparison.

msimpson · on Jan 17, 2017

The author is comparing how off overall power consumption is from the stated thermal design power under a sustained, heavy load; which is not what TDP is intended to measure ...

"TDP is not the maximum power that the processor can dissipate." -- Intel

Keep in mind that TDP is only a target number for typical use, not mprime. Not to mention Intel and AMD define it differently, see:

http://www.silentpcreview.com/article169-page3.html

"Intel is listing TDP numbers that are significantly lower than the actual maximum power draw of their CPUs."

That article is THIRTEEN YEARS OLD!

ksk · on Jan 17, 2017

Yes, so the obvious point for the reader to glean is that Intel is sneaky about using different ways of calculating TDP for different products. Why they would do that is anyones guess. Obviously they wouldn't want to give out false information to system builders who might end up with underpowered PSUs and overheating CPUs.

msimpson · on Jan 17, 2017

So if TDP is a measure, given in wattage, of thermal dissipation of a CPU under a typical load. Why can't I simply assume double that value for a CPU's actual power consumption under heavy load?

I mean, using that logic the max power consumption for the same six core processor Atwood tested would be around 280 watts. A number only 40 watts higher than his for heavy load. Too bad I used "false information" to reach such a usable number.

Now if the actual power consumption was double that, I'd definitely be berating Intel. But alas, no.

ksk · on Jan 17, 2017

Practically, sure, you pretty much have to do whatever it takes to get it working. But its not like we're talking about trade secrets here. They simply have to release a coherent definition of what they consider a typical load.

I get your point, but nothing wrong in drawing attention to the specs being incomplete to the point of being useless.

msimpson · on Jan 17, 2017

> They simply have to release a coherent definition of what they consider a typical load.

Have fun getting AMD and Intel to agree on something.

> I get your point, but nothing wrong in drawing attention to the specs being incomplete to the point of being useless.

Vague or averaged is not useless.

ksk · on Jan 17, 2017

What does Intel documenting their own specs have to do with AMD?

msimpson · on Jan 17, 2017

TDP is used by multiple manufacturers, so getting a "coherent definition" is going to take work. If you want to understand more about the thermal specs from only Intel. You can read their thermal guide:

http://www.intel.com/content/dam/www/public/us/en/documents/...

ksk · on Jan 17, 2017

Sorry, I don't understand your point. Each vendor already defines it in their own way. The point of the article is that without proper guidance you end up guessing. I'm afraid the guide you linked to doesn't help much.

Intel confusingly tells us "TDP: Thermal solution should be designed to dissipate this target power level. TDP is not the maximum power that the processor can dissipate."

So according to Intel [1] - "The best way to measure a server’s power consumption is the power meter, an inexpensive tool that is plugged into the wall, and then your device, like a server, can be plugged into the power meter. The meter displays the wattage drawn "at the wall" and allows you to analyze the power consumption under a variety of different utilization levels.". Strange.

I was looking up docs on AMD. They seem[2] to kinda get it: "To allow optimal reliability of the AMD Opteron and AMD Athlon 64 processor-based systems, the thermal and cooling solution should dissipate heat from a processor operating at its maximum thermal power.". I could find some old docs[3] that did give the maximum power, but can't seem to find any on their recent CPUs.

[1] http://www.intel.com/content/dam/doc/white-paper/resources-x...

[2] ftp://ftp.sgi.com/public/Technical%20Support/Pdf%20files/AMD/26633_5649.pdf

[3] http://hackipedia.org/Platform/x86/AMD/AMD%20Thermal,%20Mech...

msimpson · on Jan 18, 2017

Refer to the following, taken from that guide:

  4.1 T_CASE and DTS-Based Thermal Specification Implementation

  Thermal solutions should be sized such that the processor complies to the T_CASE
  thermal profile all the way up to TDP, because, when all cores are active, a thermal
  solution sized as such will have the capacity to meet the DTS thermal profile, by
  design. When all cores are not active or when Intel Turbo Boost Technology is active,
  attempting to comply with the DTS thermal profile may drive system fans to speeds
  higher than the fan speed required to comply with the T_CASE thermal profile at TDP.

  In cases where thermal solutions are undersized, and the processor does not comply
  with the T_CASE thermal profile at TDP, compliance can occur when the processor power
  is kept lower than TDP, AND the actual T_CASE is below the T_CASE thermal profile at that
  lower power.

  In most situations, implementation of DTS thermal profile can reduce average fan
  power and improve acoustics, as compared to T_CONTROL -based fan speed control. When
  DTS < T_CONTROL , the processor is compliant, and T_CASE and DTS thermal profiles can
  be ignored.

  5.3.1 Intel ® Turbo Boost Technology

  Intel ® Turbo Boost Technology is a feature available on certain Intel ® Xeon ®
  processor E5-1600 and E5-2600 v3 product families SKUs that opportunistically, and
  automatically allows the processor to run faster than the marked frequency if the part
  is operating below certain power and temperature limits. With Turbo Boost enabled,
  the instantaneous processor power can exceed TDP for short durations resulting in
  increased performance.

  (http://www.intel.com/content/dam/www/public/us/en/documents/guides/xeon-e5-v3-thermal-guide.pdf)

This means that as long as you give the CPU a thermal solution capable of dissipating a thermal wattage equivalent to the CPU's temperature, at its case, when it reaches TDP, you're good. However, if you push the processor into Turbo Boost (like Atwood did with mprime), the CPU can exceed TDP for short durations. And, while all this is happening, the Thermal Control Circuit (TCC) is managing the thermal output by adjusting the clock, frequency, and input voltage automatically so the CPU stays away from operational limits. Therefore, if you intend to run this CPU under a sustained heavy load you must supply a thermal solution beyond TDP.

Intel literally spelled that out in this PDF which they linked from their TDP definition on every CPU specification page.

Now, in terms of power consumption. You must consider the entire system as the CPU is going to manage itself to fit its environment given the TCC. That is why Intel suggests you measure overall consumption for the server using a power meter as each implementation can yield different results. Therefore, what Atwood is doing is actually Intel's recommendation for considering actual power consumption. He, as I originally stated who knows how many replies ago, is making the mistake of directly equating power consumption to heat dissipation. When all you can really be sure of is that if a CPU consumes 1 watt of power it can dissipate up to the same in heat. But, as thermal dynamics will tell you, it will always be a bit less on the output as nothing is 100% efficient.

Therefore, all Atwood's test proves is an Intel® Xeon® Processor E5-1650 v3 has the potential to dissipate up to 250 watts of heat while in Turbo, given its recorded power consumption, in that specific computer configuration while running mprime. Nothing more, nothing less.

That is my point.

ksk · on Jan 18, 2017

Thanks for explaining in detail.

lightedman · on Jan 17, 2017

That 50 watts only happens under a fully-loaded PSU scenario. 500w PSU at full load would deliver at 80+ rating ~400w. That same PSU just rolling barely enough equipment to keep a processor fully fed might draw 300w from the PSU, of which maybe 20w of that is wasted.

This is basic PSU and power supply principle.

msimpson · on Jan 17, 2017

As I said, "variables that can amount to TENS of watts."

mark-r · on Jan 17, 2017

He's not measuring TDP directly, he's measuring the difference between TDP of two different processors. The test setups were identical except for the processor.

msimpson · on Jan 17, 2017

No. He's measuring the difference in overall power consumption of the same computer using two different processors under two different states: idle and heavy load. Two states which TDP does not seek to measure.

Again, TDP is NOT peak thermal dissipation.

mark-r · on Jan 17, 2017

Is there a standard load for measuring TDP then? If not, how would you attempt to test it other than placing the processor under heavy load?

msimpson · on Jan 17, 2017

>Is there a standard load for measuring TDP then?

No. Different manufacturers use different definitions of what a typical load means.

> If not, how would you attempt to test it other than placing the processor under heavy load?

Well given that Intel has stated, "The TDP is not the maximum power that the processor can dissipate." Then testing under heavy load is most certainly not correct.

nfriedly · on Jan 17, 2017

I think the idea was just to show the magnitude of the difference. They are advertised as having the exact same TDP, so they should have approximately the same power draw at the wall, but the 6-core actually drew quite a bit more power.

msimpson · on Jan 17, 2017

From Intel:

"Thermal Design Power (TDP) should be used for processor thermal solution design targets. The TDP is not the maximum power that the processor can dissipate."

"Analysis indicates that real applications are unlikely to cause the processor to consume maximum power dissipation for sustained periods of time. Intel recommends that complete thermal solution designs target the Thermal Design Power (TDP) indicated in Table 26 instead of the maximum processor power consumption. The Thermal Monitor feature is intended to help protect the processor in the unlikely event that an application exceeds the TDP recommendation for a sustained period of time."

Therefore, are Atwood's test results really a shock? At idle both chips are within 15 watts of each other, yet under sustained load (which is not what TDP seeks to measure) they both violate their advertised TDP.

_clhx · on Jan 17, 2017

This is cool. I knew a guy who literally painted gallium onto his processors as thermal paste. He said it worked really well.

wlesieutre · on Jan 17, 2017

Gallium-based thermal compounds are commercially available, often called "liquid metal". See "Coollaboratory Liquid Ultra" for one example.

They're very effective, but must be used with extreme care because aluminum is highly soluble in liquid gallium. Aluminum is the most common material for heatsinks, and it will literally dissolve if the gallium touches it. Gallium is also electrically conductive, so if you accidentally dripped any into your processor socket I assume you're going to have a problem.

If you're careful with it and make sure to get a cooler with a copper contact area, it'll cool more effectively than traditional thermal pastes. I thought about trying it, but it seemed like more hassle than it was worth.

throwawayish · on Jan 17, 2017

> Aluminum is the most common material for heatsinks,

Aluminium is the most common material for heatsink fins. The cold plate is almost always (nickel-plated) copper.

david-given · on Jan 17, 2017

Gallium does really weird things to aluminium.

https://www.youtube.com/watch?v=JHHI2Lk79cY

ohazi · on Jan 17, 2017

This sounds like a recipe for disaster. Aluminum doesn't last very long near Gallium:

https://www.youtube.com/watch?v=4HKpMYJ-6go

Did he apply it directly to the top of the chip die or to one of those aluminum package covers? Did he use a copper heat-sink? Is somebody's leg being pulled?

antisthenes · on Jan 17, 2017

The chip package cover is usually nickel, not aluminum, and even if it was, it could be sanded away to copper.

The other side (heatsink with fan) probably had to have been full copper (not the pipe/aluminum)

Either way, it doesn't seem worth the effort for marginal gains.

jpfed · on Jan 17, 2017

I can't see a table like what's in the article without poking at it some more in Excel. Anyway, as you go down the table, the increasing cores do indeed have a higher cores*GHz. If you look at dollars per core-GHz (yes, I recognize this is silly), you get a generally increasing trend as you go down the table, but the E5-1680 is $63.35 per core-GHz and the E5-2680 is only $60.59 per core-GHz.

codinghorror · on Jan 18, 2017

Yeah that is a logical way to look at it, and better than $ per core.

lightedman · on Jan 17, 2017

I love how Intel got caught on the TDP lie. I know for a fact many Intel processors run just as hot as AMDs, despite Intel blatantly lying about it.

antgiant · on Jan 18, 2017

As a mere desktop guy. Where could I find a quality resource on safe ambient temperatures for desktop computers? I tend to assume that the magnetic HDD would be the limiter.

protomyth · on Jan 17, 2017

For me, the Pentium 60 was the turning point for heat. The 486 was pretty easy, but those Pentiums sure put off a lot of heat. I seem to remember that the Pentium 90 was cooler.

LeonM · on Jan 17, 2017

Why trying to 'accept' the thermal challenge if you can just buy an HP/Dell/whatever 1U box that already has the engineering effort done for you. Or even better, why buy a box at al, just use cloud.

In the old days, I used to tinker with my machines, improve airflow, better CPU cooler, liquid cooling etc. Now I just want stuff to work, so I buy a laptop.

cookiecaper · on Jan 17, 2017

Why not just use cloud? Because it costs way more, and the convenience factor is rarely worth it, especially when you consider that you could just run a hypervisor yourself and get most of the same conveniences within the raw hardware limitations of your cluster.

shiftpgdn · on Jan 17, 2017

That's private cloud in marketing speak. :)

Tepix · on Jan 17, 2017

It says so in the article: For therapeutic reasons.

grandalf · on Jan 17, 2017

I think when one works on software long enough, a sort of pressure starts to build up making the person long to work on hardware in any way possible. I've resorted to woodworking, DIY car repairs, and VHDL... all very therapeutic :)

vonmoltke · on Jan 17, 2017

This longing is even worse when one started out one's career working in hardware.

def8cefe · on Jan 18, 2017

The 'cloud' is still composed of physical machines which somebody is responsible for. Many of those people (as compared to the general public) read Hacker News.