I really wish iFixit would do a formal root cause analysis every so often. As an electrical engineer running a small manufacturing operation, I'm always very interested in the failure rates and root causes that much higher volume products see.
Lead free solder melts somewhere in the 200C-300C range. The solder we use melts at 230 degrees, and we reflow our PCBs in a Vapor Phase Soldering oven, which precisely limits the PCB temperature to 230C. Reflowing a PCB multiple times risks solder paste flux exhaustion, and also risks parts on the bottom side of the PCB falling off. Also, some parts are rated only for a very limited time at these temperatures, even when they are not running. Cooking these parts multiple times results in a dramatic reduction in lifespan.
It wouldn't surprise me if Apple did an x-ray inspection of every BGA part on every device they produce. This isn't common practice in the manufacturing lines that I know about, but I know that it's done in some cases. This would help catch cracks or other defects that would result in a reduced lifespan of the device, as well as detect show stopper issues.
There are a lot of other things that can fail on a PCB due to heat long before the solder will melt on a BGA pin. My guess that this was a mechanical failure inside the PCB. Either a through hole failure (such as a via disconnecting from a PCB trace, since the PCB is undoubtedly a very high layer count producing many fragile connections to long copper vias, which would expand vertically and in radius during temperature cycles) or an inner layer connection defect due to PCB or copper expansion. In either case, cycling the temperature by a large amount could temporarily fix the problem by creating a good enough connection for operation.
The bottom line is higher operating temperature always hurts mean time between failure. This is well studied, and many manufacturers will include plots of the operating temperature vs MTBF. Even though the laptop is working now, all of the components have been exposed to additional extreme additional thermal stress. Everything in the laptop is now much closer to failure. I don't expect the laptop will last much longer.
I'm not familiar with the guts of a MacBook Pro, but if there was a larger than normal gap between the chips and their heat sinks, then this would definitely explain the higher than normal system temperature. Even if the heat sink compound was able to bridge the gap with minimal air bubbles, it's still a pretty poor thermal conductor. This may have been a manufacturing defect that Apple could check for in the future. (Although I would hope they already run the devices for some length of time in their final enclosures while monitoring component temperatures)
Lead free solder melts somewhere in the 200C-300C range. The solder we use melts at 230 degrees, and we reflow our PCBs in a Vapor Phase Soldering oven, which precisely limits the PCB temperature to 230C. Reflowing a PCB multiple times risks solder paste flux exhaustion, and also risks parts on the bottom side of the PCB falling off. Also, some parts are rated only for a very limited time at these temperatures, even when they are not running. Cooking these parts multiple times results in a dramatic reduction in lifespan.
It wouldn't surprise me if Apple did an x-ray inspection of every BGA part on every device they produce. This isn't common practice in the manufacturing lines that I know about, but I know that it's done in some cases. This would help catch cracks or other defects that would result in a reduced lifespan of the device, as well as detect show stopper issues.
There are a lot of other things that can fail on a PCB due to heat long before the solder will melt on a BGA pin. My guess that this was a mechanical failure inside the PCB. Either a through hole failure (such as a via disconnecting from a PCB trace, since the PCB is undoubtedly a very high layer count producing many fragile connections to long copper vias, which would expand vertically and in radius during temperature cycles) or an inner layer connection defect due to PCB or copper expansion. In either case, cycling the temperature by a large amount could temporarily fix the problem by creating a good enough connection for operation.
The bottom line is higher operating temperature always hurts mean time between failure. This is well studied, and many manufacturers will include plots of the operating temperature vs MTBF. Even though the laptop is working now, all of the components have been exposed to additional extreme additional thermal stress. Everything in the laptop is now much closer to failure. I don't expect the laptop will last much longer.
I'm not familiar with the guts of a MacBook Pro, but if there was a larger than normal gap between the chips and their heat sinks, then this would definitely explain the higher than normal system temperature. Even if the heat sink compound was able to bridge the gap with minimal air bubbles, it's still a pretty poor thermal conductor. This may have been a manufacturing defect that Apple could check for in the future. (Although I would hope they already run the devices for some length of time in their final enclosures while monitoring component temperatures)