Intel Discloses Lakefield CPUs Specifications

CountSessine · on June 10, 2020

Jeez - how is this supposed to work with OS scheduling? The charm of big.little on arm is that the instruction sets between the big and the little cores are identical. Now the OS has to pin processes based on support for different instructions? What a ridiculous nuisance. Intel really couldn’t discipline themselves just this once and actually implement the same instructions, even if in microcode, for both types of cores?

Are the little cores’ instructions at least a complete subset of the big cores’? Are we going to have some ridiculous situations where the little cores are completely pegged but the OS can’t migrate their processes off to the big core?

Or do kernel programmers need to start chasing Intel and trap/software implement every single future AVX8192AESNISSSE instruction Intel jams into future instruction sets to provide Xeon market differentiation?

Klinky · on June 10, 2020

There were some issues with big.LITTLE, like this[1] issue with cache line size differences between Big and Little cores.

1. https://www.mono-project.com/news/2016/09/12/arm64-icache/

CountSessine · on June 10, 2020

Yeah, I remember this. But I think it was just Samsung that made this blunder, no?

murderfs · on June 10, 2020

NVIDIA had a much funnier (infuriating) one, where Tegra X1 had caches between the big and little cores that weren't always coherent. Their solution was to just turn off the little cores while continuing to spew marketing bullshit about it being an 8-core chipset.

Klinky · on June 10, 2020

Yes, you look to be right that the Samsung-designed Mongoose core in the Exynos SoC had 128-bytes cache line size, while ARM designed their generic Big cores to also use 64-bytes like their Little cores. Still though, seems like ARM did not provide guidance that these specs must match between Big & Little cores, and Samsung also didn't catch this either.

ww520 · on June 10, 2020

Different instruction sets are not too bad. Copy from my comment elsewhere.

> The OS can install an illegal-opcode exception handler. When a process is first run on the small CPU, the unsupported opcode will raise the exception. The exception handler can simply set the processor affinity of the process to the main CPU and put it to sleep. The OS will handle it like it normally would - putting the process in the run queue of the affined processor.

CountSessine · on June 10, 2020

What if my program has like 1 avx2 instruction? Then we still wake up the big core and run my process there? And keep running it there? Or do we try to migrate back to the little core every once in a while? And then go back to the big core when we trap again?

pocak · on June 10, 2020

Migrate back to the little core if the thread hasn't used AVX in a while.

Linux already tracks how long ago a task used AVX-512. I assume the same mechanism could be used to track AVX as well.

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/...

saagarjha · on June 10, 2020

It's not quite relevant to your point, but generally you wouldn't want to have one AVX2 instruction as they are only really useful if you have a bunch of data.

ww520 · on June 10, 2020

Yes, even if the process has 1 avx2 instruction, it will be scheduled to run on the main CPU. I mean you could switch the process back to the small CPU when you detect a long stretch of opcode without the avx2 instruction, but the constant switching between the main CPU and small CPU probably is not good for performance.

pocak · on June 10, 2020

Thread migration only costs on the order of 100 microseconds, including the effect of cold caches. If you keep the AVX thread on the big core for at least 100 milliseconds at a time, you only lose ~0.2% performance.

ww520 · on June 11, 2020

Good to know the switching cost is low. Runtime profiling would be key to provide insight into when to switch back.

saagarjha · on June 10, 2020

That sounds really slow.

stefan_ · on June 10, 2020

There are processor architectures (MIPS) that can't do unaligned memory access where Linux will catch the fault, emulate the memory access in kernel and return to the program. On every unaligned access.

At least this only has to be done once, and frankly these AVX instructions are slow initially anyway.

saagarjha · on June 10, 2020

Yeah, and that was suuuper slow and everyone told you to not do it for exactly that reason…

stefan_ · on June 10, 2020

It's absurdly slow on hot paths, yet entirely unnoticeable to the vast majority of software out there.

H8crilA · on June 10, 2020

Most performance anti-patterns in almost any program are not important at all, as most of the code is not on the hot path.

zokier · on June 10, 2020

As it happens, MIPS is also practically dead as an general purpose architecture...

sigotirandolas · on June 11, 2020

It also happens on 32-bit ARM, which is maybe on its way out, but still way more common.

loeg · on June 10, 2020

And MIPS is slow as a dog, in part as a result of software handlers like this (software handles TLB and virtual memory walks, too, IIRC).

gumby · on June 10, 2020

> At least this only has to be done once, and frankly these AVX instructions are slow initially anyway.

FWIW I believe only some of them, and those are just some (or all?) AVX-512 instructions. I think AVX2 is implemented on the main die and doesn't have to be powered up first.

celrod · on June 11, 2020

https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html

AVX (256 bit) instructions also suffer a penalty. Both the 256 and 512 bit instructions resulted in a slowdown for 9 microseconds featuring a quarter the instructions per clock, but the 512 bit instructions resulted in an additional penalty of 11 microseconds without executing instructions.

The first penalty was associated with voltage, and the second with frequency. Heavier 256 bit instructions would probably have resulted in the frequency transaction as well.

ww520 · on June 10, 2020

It only needs to be done once at the process launch. Once the processor affinity is set, it will stick to the main CPU at full speed for the entirety of the process lifetime.

ahupp · on June 10, 2020

It only has to happen once.

wtallis · on June 10, 2020

Once per process. I doubt operating systems will start modifying executables and libraries on-disk to tag them as using relevant instruction set extensions.

And generally speaking, when an application first starts issuing SIMD instructions, that's probably not the a great time to be interrupting it even if it only needs to happen once.

gpderetta · on June 10, 2020

once per process is perfectly fine.

the issue is that once you have an active process using a feature only available on the larger cores, you can't shut off the larger cores to save power without paying a large latency to wake up that process.

loeg · on June 10, 2020

You could imagine it as an ELF note of some kind. It's not a direction I've love to go in, and catching illegal exception traps seems fast enough that the more complicated design doesn't seem worthwhile.

saagarjha · on June 10, 2020

Well, once when it's about to try to wring maximum performance out of the processor, and if you only do it once you're ensured that your smaller cores never get used…

chrisseaton · on June 10, 2020

A single trap? That's not slow.

wtallis · on June 10, 2020

A trap, followed by migrating the process to a different core, which may have been powered off and definitely has cold caches. Context switches don't get any worse than that.

monocasa · on June 10, 2020

If the core has been powered off, then that shows how rare it is.

mlyle · on June 10, 2020

Yes, so you pay one really bad context switch penalty per process.

saagarjha · on June 10, 2020

Oh, I thought the suggestion was to trap to the large core every time an unsupported instruction was executed and let the process drift back to the smaller one later. Which would be slow I would assume. If you never switched back, wouldn't any process using advanced vectorized instructions (like anything using a decent libc) be permanently pinned to the large core?

chrisseaton · on June 10, 2020

> Oh, I thought the suggestion was to trap to the large core every time an unsupported instruction was executed and let the process drift back to the smaller one later.

Yes that's the idea.

> Which would be slow I would assume.

How expensive do you think a trap is? It takes about the order of 10 billionths of a second.

> If you never switched back, wouldn't any process using advanced vectorized instructions (like anything using a decent libc) be permanently pinned to the large core?

I think you can switch back next time you schedule.

gpderetta · on June 10, 2020

> How expensive do you think a trap is? It takes about the order of 10 billionths of a second.

are you sure about that? I would expect at least a couple of orders of magnitude more just for the userspace->kernel transition.

edit: for what is worth, a syscall it takes 250ns on my (admittedly vintage) machine. That's using the lowlatency sysenter path. An interrupt is probably going to cost more.

Anyway the cost of scheduling on another core is going to dwarf that.

edit2: for reference, this was a Sandy Bridge turboing at 3.5 Ghz during the test. With spectre mitigations on (which is going to be a good chunk of that overhead).

saagarjha · on June 10, 2020

> I think you can switch back next time you schedule.

Ok, yes, then we're on the same page. I would still think that would be slow? You'd need a full transition-to-kernel and context switch before you could execute again, which AFAIK would take at least microseconds…unless you think there would be a faster path to resume execution?

chrisseaton · on June 10, 2020

> which AFAIK would take at least microseconds…

No that's around 30 ns on modern hardware I believe.

chrisseaton · on June 10, 2020

What if the same bytes are valid instructions on both but do different things?

saagarjha · on June 10, 2020

I don't really think that's a thing on Intel? Instructions that are unsupported generally noop or do something slowly. (Or trap.)

chrisseaton · on June 10, 2020

I thought the two cores had two different ISAs?

gpderetta · on June 10, 2020

they support different subsets [1] of the same x86-64 ISA, so opcodes won't be reused.

[1] well, I hope the larger cores support a superset of the smaller cores, otherwise it would really be insane.

saagarjha · on June 10, 2020

No idea. I'm just responding on the assumption that it's just two Intel architectures, one with fewer extensions.

loeg · on June 10, 2020

Unless one of those things is "trap," that's just a horribly byzantine heterogeneous compute design that no general purpose OS will target.

gpderetta · on June 10, 2020

I believe the os can turn off features in the big cores, disabling the corresponding bits in cpuid so that the two set of cores will advertise the same capabilities.

dmitrygr · on June 10, 2020

If this is going to be anything like how arm did it, intel most likely upgraded the atom CPUs to support all the instructions of the big core, probably just very slowly. For example, you will find that the lowly Cortex-A7 supports virtualization extensions. Why? Because it was paired with big cores that supported them.

The only way to make big.LITTLE work and allow task migration between them is to have them support the same instruction set exactly. So, I suspect that that is why AVX512 was removed from the big core here, and I suspect the atom cores aren't exactly garden variety either. They probably support most other extensions of modern x86 CPUs, at a slow speed. For example, AVX2 can be microcoded by running multiple pieces of the data through the ALU, one at a time.

Legogris · on June 11, 2020

Brings me back to coding for the ARM7TDMI on the Gameboy Advance - it supported both 16-bit THUMB[0] and the standard 32-bit instruction sets, and you would switch modes at runtime. Fun times. This does seem more annoying, though.

[0]: https://en.wikipedia.org/wiki/ARM_architecture#Thumb

Symmetry · on June 10, 2020

The big core has SMT and AVX-512 turned off, I assume that they're just fusing off everything that the little cores don't have.

Someone · on June 10, 2020

“even if in microcode”

I don’t think users would be happy if it were a lottery whether their code runs slow, using microcode instructions, or fast, on the better CPU, especially given that, I guess, the difference in instruction set is for vector instructions (because adding those is what’s makes a CPU big nowadays)

So, performance oriented code will pin its performance-critical threads to the fast CPU. Now, the question is: what code won’t try to claim the faster CPU, given that benchmarks typically run programs while there is no contention from other programs?

notriddle · on June 11, 2020

I'm not sure how this is different from the existing big.LITTLE "lottery", between being scheduled on a CPU with a deep speculative pipeline and a high clock rate or a CPU with less.

If the microcoded vector ops cause your code to run slow, then you will be consuming lots of CPU time, and should be migrated to the big core anyway, right?

wmf · on June 10, 2020

So, performance oriented code will pin its performance-critical threads to the fast CPU. Now, the question is: what code won’t try to claim the faster CPU

Do apps have CCX pinning logic for Ryzen? No? Why would they add it for an Intel processor that probably won't even be sold in volume.

StefanHamminga · on June 10, 2020

I suppose they'll leverage 'function multi-version'[1], as Clear Linux already does[2] to enable extra instructions on the big cores.

[1]: `target_clones` -> https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attribute...

[2]: https://github.com/clearlinux/make-fmv-patch

davrosthedalek · on June 10, 2020

That seems hard to do with preemptive scheduling. AFAIK, the decision is on function level, and done at startup. Even if you can decide on call-time, it's only function level, and the preemption can happen mid-function.

loeg · on June 10, 2020

That only works if your code is never migrated between cores.

ip26 · on June 10, 2020

I have to imagine the differing instruction sets means different extensions, not entirely different ISA.

I'm not a programmer, but doesn't compiled code often offer multiple codepaths, checking flags at runtime? So if your code runs on the big core, it finds avx256 is supported, and uses the avx256 codepath. If it runs on the little core, it finds avx256 is not supported and it takes a legacy floating point codepath instead.

Dylan16807 · on June 10, 2020

The problem is that you want to be able to migrate processes between cores. It's not the end of the world if a number-crunching process gets stuck on the big core. But if a process detects an instruction that it's only using for convenience purposes, and that causes it to get stuck on the big core or stuck on the little cores, that's not good. In a different scenario, if it was set up so that a process starting on a little core never detected an important instruction, and was stuck in slow mode forever, that would also not be good.

It depends a lot on which instructions differ and how the feature flags work, of course. They might have set it up in a way that's fine.

mooman219 · on June 10, 2020

Tons of asterisks here, but: You'll have to explicitly write the code to do that at runtime if APIs are provided to do so. When you build for a target, say x86, you're usually targeting the lowest common denominator that you're willing to support and the compiler might not output less supported instructions even if the CPU it's compiling on supports them.

gpderetta · on June 10, 2020

Yes, most programs target the generic x86-64 ISA which means sse2 at most. Still some have accellerated paths that are enabled looking at feature bits. So a simple solution is for the os to mask out some of those bits for processes that intends to potentially migrate to the less featured cores, while potentially giving the option to let the user "pin" a process to the fast cores enabling the all features.

waltpad · on June 12, 2020

Maybe it wouldn't be possible to have several processes working in parallel, but instead a single process with 5 cores available for threads? Consider the PS3 arch for instance: it had 7 cores, 1 for management and the rest for computing. Could it be the way Intel envision its future lines of products?

kllrnohj · on June 10, 2020

> Now the OS has to pin processes based on support for different instructions?

The only complication here would be if they have differing extensions like AVX512, but that's easily solved by the OS by just advertising the common baseline. Nothing about this looks difficult to support?

CountSessine · on June 10, 2020

by the OS by just advertising the common baseline.

What does that mean? Advertise to whom? The process/process loader? Does it mean that I can’t compile with -mavx2 anymore? What if I do?

The extensions are the whole problem.

kllrnohj · on June 10, 2020

> What does that mean? Advertise to whom?

Runtime detection is the process querying what extensions are available, and then selectively using those. You adjust what the query returns to only return the common set.

Runtime detection has been a pretty standard thing for well over a decade now - it's how we all manage to run the same compiled binaries over the years despite variability in SSE & AVX support. You don't download different versions of Chrome/Photoshop/Gimp/Premiere/Blender/Whatever compiled for different CPU micro-architectures, do you? You might if you run Gentoo I suppose, but that'd be about it.

> Does it mean that I can’t compile with -mavx2 anymore?

You already can't if you're shipping binaries to users unless you only support Skylake & newer? There's a lot of CPUs currently in use that don't support AVX2. So... you either already have this problem and you're familiar with it, or you're not doing this and it's moot.

CountSessine · on June 10, 2020

Runtime detection is the process querying what extensions are available, and then selectively using those. You adjust what the query returns to only return the common set.

Except that they almost always do this runtime detection once, on startup, and then choose/thunk codepaths accordingly. If the OS just happens to start my avx2 process on a little core (and how is it going to know better?), that's going to turn off all of my optimizations, regardless of where the process subsequently gets migrated to.

You already can't if you're shipping binaries to users unless you only support Skylake & newer? There's a lot of CPUs currently in use that don't support AVX2. So... you either already have this problem and you're familiar with it, or you're not doing this and it's moot.

Except nobody in 30 years of x86 dev expects to get a different answer from CPUID during runtime.

wmf · on June 10, 2020

If all the cores are configured to advertise the lowest common denominator instructions it will work.

CountSessine · on June 10, 2020

But that defeats the purpose of supporting any extensions at all in the big core that the little core doesn't support. Software will get the lowest common denominator answer and just not use avx2. So why support it in the first place? Why not just do the right thing and have uniform extension support like big.LITTLE?

Symmetry · on June 10, 2020

Currently Intel likes to disable features on some models of Core cores for product segmentation reasons. I'm pretty sure processors sold under the Pentium or Celeron brands already have enough features fused off that they're roughly comparable to the Atoms this is being pared with. You also fuse off things like sections of cache or whole cores that have manufacturing defects.

wmf · on June 10, 2020

Chips aren't designed from scratch. They're assembled out of previously designed components and in this case Core cores and Atom cores were never designed to work together.

CountSessine · on June 10, 2020

No - but that just means that Intel shouldn’t do this at all. Either don’t support stuff like avx and avx2 in the big core by disconnecting those blocks or support a slow microcode version of avx and avx2 in the little cores. Supporting different extensions for a CPU used with modern preempting OS’s doesn’t make any sense.

kllrnohj · on June 10, 2020

> Either don’t support stuff like avx and avx2 in the big core by disconnecting those blocks

That's partly what they did. From the article: "One thing we can confirm in advance – the Sunny Cove does not appear to be AVX-512 enabled."

Maybe they also fused off AVX & AVX2 support in the Sunny Cove core as well, we'll see.

And disabling AVX in cores that otherwise support it is already a common thing - see the Pentium & Celeron lineups that Intel currently sells. They don't have AVX/AVX2, even though the cores inside them definitely could offer it.

davrosthedalek · on June 10, 2020

.. and within a function because the scheduler decided to move you to a different core.

loeg · on June 10, 2020

The typical x86 extension query (cpuid) is an unpriviliged user-mode instruction — unless the application takes care to ask the OS, there's no real way for the OS to select the common denominator.

Edit: Also: AVX2 is a lot older than Skylake. You're probably thinking of AVX512.

kllrnohj · on June 10, 2020

The OS could still intercept CPUID - this is, after all, what VMs do.

But it looks like Intel is doing this anyway, as Sunny Cove in this application has had its AVX-512 removed anyway: "One thing we can confirm in advance – the Sunny Cove does not appear to be AVX-512 enabled."

loeg · on June 10, 2020

Running every thread in a hypervisor just to trap cpuid sounds like extreme overkill. And probably some applications want the features only available on the big core, and masking those out doesn't solve that end of the problem.

saagarjha · on June 10, 2020

Well then you need to set up a VMM for the OS…

close04 · on June 10, 2020

> You adjust what the query returns to only return the common set.

But this shoots the big core in the foot. You run at the lowest common denominator and the big core doesn't have the advantage of higher clocks. And this might just lead to a lot of software that only runs on the big core.

Imagine you wanted to fly around between multiple points but if you want the option to ever switch to a bus then your plane will circle around each airport until it's as slow as the bus. Or you can opt for "plane only".

kllrnohj · on June 10, 2020

It only shoots the big core in the foot for things that would make meaningful use of the extensions present on the big cores but not on the little ones. Which in a 7w application is what, exactly?

close04 · on June 10, 2020

Anything that uses any AVX or FMA3 off the top of my head.

A better mix would have been simply using lower clocked, lower powered cores of the same type or really close derivatives of the big core where the manufacturing process and clocks are what keep power low. Not a mix of Ice Lake and Atom. But right now Intel would throw everything at the wall to see what sticks.

And it seems like a good way for developers to make sure their software stays on the big core.

Dylan16807 · on June 10, 2020

The fact that they put the extensions in implies they expect meaningful use, doesn't it? If not on this specific part, on a future part with multiple big cores.

kllrnohj · on June 10, 2020

Meaningful use in a particular market != meaningful use in all markets.

AVX2/AVX-512 is great in HPC workloads, for example. But nobody is running an HPC workload on a 7w netbook, now are they?

What is useful on Xeon and what is useful on Atom are different. This is an Atom-class SoC used in Atom-class applications, not a Xeon-class one.

Dylan16807 · on June 10, 2020

That doesn't address what I said at all. This is about extensions that are in this Atom-class SoC.

saagarjha · on June 10, 2020

memcpy?

vardump · on June 10, 2020

>> Does it mean that I can’t compile with -mavx2 anymore?

> You already can't if you're shipping binaries to users unless you only support Skylake & newer?

Huh? My 6+ years old gaming PC supports AVX2, and it definitely doesn't have a "Skylake & newer" CPU!

AVX2 support started at Haswell, or Intel core 3rd generation. We're at gen 10 now.

While there certainly are still a lot of systems without AVX2 support, new games (and other performance hungry software) requiring it would not be completely unreasonable.

wtallis · on June 10, 2020

Intel is still launching new processors with AVX2 disabled. They use AVX2 support for product segmentation and disable it on low-end parts. So a Comet Lake Pentium Gold CPU launched in Q2 2020 doesn't support AVX2.

(I recall a recent news story about how unawareness of this among people writing or documenting compilers has started to cause problems.)

blattimwind · on June 10, 2020

Using IA for market segmentation purposes is such a classic from the "Bad Intel Ideas" basket...

vardump · on June 10, 2020

That's very surprising, disappointing and extremely short-sighted from Intel.

> (I recall a recent news story about how unawareness of this among people writing or documenting compilers has started to cause problems.)

You can't blame them!

KenoFischer · on June 10, 2020

The OS can just turn off any instruction extensions by setting the XCR0 register to whatever it wants. Unsupported instructions will then cause general protection faults (SIGSEGV on Linux). Whether that's what the OS wants to do is a totally separate question. It would be kinda funny if this thing lead to the ability of a process to request a certain XCR0 that the kernel would then serve, because I've been asking for that feature for unrelated reasons for quite some time and didn't get the warmest response to it on the mailing lists.

agumonkey · on June 10, 2020

wait for OpenMultiCore that will standardize heterogeneous cpu sets

cesarb · on June 10, 2020

This is not ARM; on Intel, the CPUID instruction can also be used by normal unprivileged programs. Not all programs look only at /proc/cpuinfo or the ELF auxiliary vector.

masklinn · on June 10, 2020

> The only complication here would be if they have differing extensions like AVX512

Which they do, Atom tops out at SSE4.2.

kllrnohj · on June 10, 2020

> Atom tops out at SSE4.2.

SSE itself tops out at 4.2, but Tremont does support the newer SHA extensions.

AVX appears to be the only missing thing, but AVX isn't even standard across Intel's other lines, either. The Pentium line doesn't support AVX either, for example, even though they are using Skylake & newer micro-architectures.

So you currently can't assume AVX support, and you still won't be able to assume AVX support. Why does this matter?

loeg · on June 10, 2020

> Why does this matter?

Operating systems move threads between cores. If those cores support different features, threads that are migrated to low-feature cores might experience illegal instruction traps despite correctly checking for instruction features.

twic · on June 10, 2020

Maybe standard application code all runs on the big core, and the little cores are like the system assist processors on IBM mainframes - used to run OS jobs or specific application support code, to free up the main processor for other work (or idling).

For this to work, there would need to be a small set of undemanding tasks that account for a lot of machine time. Feeding video to the GPU? All sorts of GUI compositing and housekeeping? Handling network connections in the browser?

I don't think this is a good explanation, but it's fun to think about.

pjscott · on June 10, 2020

A lot of things are background tasks until suddenly, without warning, the user ends up waiting for them to happen. Take your example of handling network connections, for example: this can definitely be a background thing! Your computer might want to keep an IMAP connection open, periodically poll a CalDAV server, sync photos with your phone, etc., and all of these would be very reasonable things to run on low-power CPU cores. Kick them over to a wimpy core, throttle down the frequency to its most power-efficient setting, insert big scheduler delays to coalesce timer wake-ups, whatever. Good stuff.

But what happens when the user opens up a photo viewer app and suddenly wants those photos to be synced right now?

If your code is running on a recent iPhone -- the heterogeneous-core platform I'm most familiar with -- then the answer is that the kernel will immediately detect the priority inversion when a foreground process does an IPC syscall, bump up the priority of the no-longer-background process, probably migrate it to the fastest core available, and make it run ASAP. Then, once the process no longer has foreground work to do, it can go back to more power-efficient scheduling.

This kind of pattern is super common, and it would be way more annoying and perilous to try to split tasks into always-foreground and always-background.

loeg · on June 10, 2020

The standard operating system model today is mostly to do what the application(s) request and then get out of the way quickly. I'd expect to cede all cores to applications most of the time, rather than reserving the low power cores for OS tasks.

Dylan16807 · on June 10, 2020

That would be pretty disappointing if they can't handle normal processes too. It's easy to end up with a bunch of processes that are using small amounts of CPU but aren't dedicated background tasks.

Even just looking at a browser, I might have half a dozen generic tab processes open, each using a small amount of CPU. But then I navigate one to a game, and I want that particular tab to get near-exclusive access to the big core while the others use only the little cores.

moonchild · on June 10, 2020

Audio is a big one. You usually want to peg a core, for latency, but it doesn't have to be a particularly fast core.

_ugfj · on June 10, 2020

They can easily disable the differing extensions in the BIG core.

NullPrefix · on June 10, 2020

Won't be BIG anymore if it's gimped, will it?

kllrnohj · on June 10, 2020

It will. The performance different between an Intel Atom and a Core i3 has basically nothing to do with differing instruction set extensions. AVX2 support is not why a Core i3 runs circles around an Atom, particularly since the overwhelming majority of instructions executed are not AVX anyway.

rbanffy · on June 11, 2020

I'm pretty sure my computer does a lot of AVX while it matches virtual backgrounds to my video feed, or while it tries to figure out voice commands.

All that can be done with SSE and probably with x87 instructions, but, still, software will try to pick the best option.

fluffy87 · on June 10, 2020

Binaries need to target a minimum subset, but can use run time feature detection to query differences.

If a binary does that, the OS can just not migrate that binary across cores.

loeg · on June 10, 2020

How would the OS know?

The only mechanism I can come up with is to detect illegal instruction traps on small cores, then flag the thread as big-core-only and re-start the execution at the bad instruction. That's not ideal but maybe workable.

(If it traps on the big core, too, it's just a bad instruction and SIGILL is raised to userspace like usual.)

gridlockd · on June 10, 2020

I think the obvious solution is that AVX (or any other extension not in Tremont) will not be supported at all on these CPUs. These are 7W chips, after all.

qubex · on June 10, 2020

I assume OS schedulers are going to have to include instruction-set support when seeking to maintain core affinity.

kllrnohj · on June 10, 2020

The ARK page is up:

https://ark.intel.com/content/www/us/en/ark/products/202777/...

There's no mixed instructions as Anandtech speculated. The Sunny Cove core was cut down to match what the Tremont cores support. So no AVX at all, no weird extension mismatch, no OS headaches beyond the expected big.LITTLE headaches.

shorts_theory · on June 11, 2020

How much would the lack of AVX hurt performance apart from scientific computing? It's interesting to see Intel offering a chip with a clear focus on tablet computing, so the lack of AVX might not be so bad if the Tremont cores improve battery life significantly.

dr_zoidberg · on June 11, 2020

I can answer that: a lot. AVX can speed up things anywhere from modest 20% (if not very well implemented) to massive 15 or 30x when carefully used (usually things that very good fits for AVX/AVX2 instructions/use cases).

Anecdotically, I have a "netbook" that has a Goldmont+ Celeron N4000 that works very respectable for everyday business (web browsing, office suites, watching videos, etc) but crawls when trying to run scientific code. 100x slowdowns at worst, 10x slowdowns on the parts that take "nice and easy" (wrt a 7200u notebook that I usually carry around).

kllrnohj · on June 11, 2020

> I can answer that: a lot.

You mean not much. The question was apart from scientific computing how much does it hurt. Per your own comment it's only worth maybe 20%, and that everyday needs work fine.

It'll show up occasionally in some things that would be relevant to a device with this SoC, like noise cancellation, but very little else. And even then you can do noise cancellation even better on a GPU (RTX Voice says hi), and this does still have a GPU and a decent one at that (~500 gflops), so it's not even that simple.

dr_zoidberg · on June 11, 2020

Read my second paragraph. Also Intel, despite having theoretically amazing GPUs, has for years been a lackluster player in that area. Ironical too, since they're one of the largests iGPU manufacturers in the world.

whydid · on June 11, 2020

Most modern audio creation and DSP software use AVX, some even require it.

Twirrim · on June 10, 2020

I'm really curious how they envision this working, given the different instruction sets between the main core and the small cores. That seems like a bit of a nightmare to handle on the OS side of things, unless you turn things in to a two tier environment somehow, and require software (users?) to explicitly opt-in to the small cores somehow.

mackman · on June 10, 2020

IBM did this with the PowerPC chips used in the Playstation 3. It was a development headache but part of that was the low power cores were extremely limited... no shared memory so you had to DMA data over and double buffer. They ran mighty fast though. It was a lot like programming a DSP really. If this is anything like that I would expect only some workloads to be pushed to the small cores in a similar way to the way stuff is pushed to GPU today. Frameworks will pop up to make it easier for developers but it will be rough at the get-go.

masklinn · on June 10, 2020

> IBM did this with the PowerPC chips used in the Playstation 3

It was a completely different situation, the Cell's SPE were vector coprocessors, they were a completely different architecture than the PPE but they were also interacted with explicitly from the PPE.

sjwright · on June 11, 2020

How did Cell compare to doing compute on a GPU? Was the difference in complexity due to support libraries etc, or because it required non-standard thinking about how it can be utilised?

sjwright · on June 11, 2020

I remember reading or watching something by Naughty Dog about how they maximised use of Cell which was fascinating. (And something about how they ported to PS4 and made extremely efficient use of the eight cores by running game/physics in parallel with the previous frame’s rendering/graphics. Or something like that. It was ages ago and I can’t seem to find the talk online.)

rbanffy · on June 10, 2020

The Cell was very different, and horrendously hard to program. The SPUs had no direct access to memory and had to be fed via DMA transfers coordinated from the PPUs.

wtallis · on June 10, 2020

Yeah, there was no way to pretend the Cell was anything close to a traditional SMP design. But Intel's Lakefield and Arm's big.LITTLE are very much meant to be treated as SMP systems, with just a few tweaks on top to deal with/take advantage of the fact that they're not completely symmetrical. That means the CPU designers actually have to be more careful about what functional differences they allow to creep in.

rbanffy · on June 10, 2020

I'm assuming the difference in supported instructions will have to be very minimal.

OSs will need to be careful not to move a process from one core to another that supports a subset of the instruction set it was expecting.

dathinab · on June 10, 2020

My guess is the little cores are mainly for OS background jobs and a view applications which explicitly opt. for it.

Also while they have different instructions sets they don't have a different arch, i.e. they have a "shared base" of instructions which likely are good enough for many background applications. Like updaters, downloaders, background mail fetching programs etc.

In the end it fits in with their "always connected" approach. Through I'm still somewhat skeptical about the "always connected" approach. Like even in many first world countries always connected is just not a think for many users. Even on phones it not fully given through much more then on laptops.

ww520 · on June 10, 2020

I would imagine the difference between the instruction sets of the two CPU types is small since the small CPU is Atom which is not too different from the main CPU.

The program machine code can be scanned when loaded to look for specific assembly opcodes to determine the required capability of the CPU to execute it on. The code with instructions not fitted for Atom will be sent to the main CPU only.

Edit: Just a thought. The OS can install an illegal-opcode exception handler. When a process is first run on the small CPU, the unsupported opcode will raise the exception. The exception handler can simply set the processor affinity of the process to the main CPU and put it to sleep. The OS will handle it like it normally would - putting the process in the run queue of the affined processor.

jlebar · on June 10, 2020

> The program machine code can be scanned when loaded to look for specific assembly opcodes to determine the required capability of the CPU to execute it on.

That works as a heuristic, but it's not perfect, since JITs and self-modifying code are a thing.

I expect the chip will raise a fault and the OS will move the process.

agolliver · on June 10, 2020

This is one of the things we tried when I worked (2011-12) at Intel on QuickIA [0] (dual socket, Atom on one side Xeon on the other). One of my first projects was to write a vectorized matrix multiplication that could only run on the Xeon, so we could demo the fault-to-big-core behavior.

[0] https://www.neotextus.net/hpca12.pdf

(I think the "3.2 QuickIA Software Support" section is interesting, if nearly a decade old by now)

ww520 · on June 10, 2020

That's true. The code could be modified on the fly after the scanning so it's good to have a last ditch illegal-opcode exception and let the OS catch the exception to migrate the process.

kllrnohj · on June 10, 2020

You would only need to do that scanning if the program had hard requirements like it was compiled to assume AVX512 support was present. Which is going to be extremely rare, since that would already mean an x86 program that's highly non-portable.

Runtime detection would be the only actual concern here, but you can easily just advertise the common baseline. As in, just pretend the sunny cove core doesn't support AVX512. The only problem then becomes the big core is potentially slower than it could be, but given how rare things like AVX512 is in typical desktop applications will anyone actually care?

EDIT: Oh, and this appears to be what they're basically doing. Even though Sunny Cove itself supports AVX-512, it's being disabled in this application: "One thing we can confirm in advance – the Sunny Cove does not appear to be AVX-512 enabled."

jdsully · on June 10, 2020

Seems like the OS could solve it easily by trapping on undefined operation exceptions and retrying on the big core. If it still works simply mark the process as big core only.

londons_explore · on June 10, 2020

You could take this idea further...

Trap the undefined instructions, and dynamically replace them with a jump to emulation code if running on a little core.

Then let the OS and software folks do a systemwide profile to find out which bits of code most frequently run on which cores. Then configure the compiler not to output any instructions not available on the little cores for code which usually runs on the little cores.

jacques_chester · on June 10, 2020

> Trap the undefined instructions, and dynamically replace them with a jump to emulation code if running on a little core.

This was more or less what VMware pioneered to deal with privileged instructions. I expect there are thickets of patents involved, even if the earliest have expired. But it's likely there is or would be some licensing agreement.

Disclosure: I work for VMware, though not in VMs per se. Speaking for myself only.

jdsully · on June 10, 2020

This is how FPU emulation worked back in the 8086-386 days. Any patents are long expired or invalidated with prior art.

garmaine · on June 10, 2020

No, unfortunately not. Only if you implement it exactly as it was done back in the day for FPUs. Any deviation and you might run into a more recent patent that covers some aspect of a later implementation, maybe even something seemingly obvious or necessary. This is why it is called a patent “minefield.”

ori_b · on June 10, 2020

At which point you've lost.

NortySpock · on June 10, 2020

I was going to joke about Intel "discovering the big.LITTLE architecture" until you said different instruction sets. It will be interesting to hear how they hope that will work out...

est31 · on June 10, 2020

I think in this instance "instruction set" means "set of supported instructions" instead of "architecture".

CountSessine · on June 10, 2020

Sure but that’s still a nuisance from the point of view of OS scheduling. You can’t transparently move a process from a big core to a little core or vice versa the way you can with big.little.

mlyle · on June 10, 2020

No, but it can be a small nuisance: you can leave a process eligible to run on any core until it faults for an illegal instruction; then you pin it to the big core.

rkangel · on June 10, 2020

big.LITTLE kind of is different instruction sets though - full ARM vs thumb.

wtallis · on June 10, 2020

The full ARM vs Thumb split never had anything to do with big.LITTLE, and was dropped with AArch64.

CountSessine · on June 10, 2020

ARMv7 and ARMv 8 are full-on thumb2. Big.little has nothing to do with ARM vs thumb.

api · on June 10, 2020

The fact that it's heterogenous kills it for me. Remember the Cell Broadband Engine? You could get some pretty crazy performance out of that thing, but nobody used the execution units because the instruction set was different.

GPUs get away with it because they are really a completely different kind of processor, but even there GPU processing is under-utilized for this reason. It's really a pain.

imtringued · on June 11, 2020

GPUs are underutilized because most workloads simply are not suitable for GPUs. It has nothing to do with the instruction set.

rzzzt · on June 10, 2020

There was a proof-of-concept "smart network adapter" done by either Intel or MS (don't remember the exact details), which allowed finishing the download of a large file, or receiving emails while the host computer was asleep. I'd imagine tasks like these are well suited for the smaller cores.

monocasa · on June 10, 2020

A lot of game consoles have this model too. PS3, PS4, Wii, and WiiU off the top of my head run pretty complex operations on the I/O processor even when the main system is off.

masklinn · on June 10, 2020

The big.LITTLE concept was announced in 2011. It's not the concept of using differently sized cores which is odd, it's using cores with different instruction sets.

garmaine · on June 10, 2020

macOS does this. Calls it a “power nap.”

dogma1138 · on June 10, 2020

Power nap is when your computer periodically wakes up you are switching power states it’s not exactly the same thing.

Gravityloss · on June 10, 2020

What's the user and developer experience, do programs need to be recompiled? Are the extra cores meant to typically run something like the JVM so no recompilation is needed?

sesuximo · on June 10, 2020

I doubt you will need to recompile, although doing so is often a good idea as new compilers can use new instructions/optimizations.

I’d bet money that the different ISAs are full x86 and a subset of x86 which is a great idea. X86 has a lot of old instructions that almost No one uses. Perhaps the small cores will trap and move to process to the big core if it uses an old instruction.

kllrnohj · on June 10, 2020

> I’d bet money that the different ISAs are full x86 and a subset of x86

You don't even need to bet, it's in the article:

"Both SKUs will feature one big ‘Sunny Cove’ CPU core, along with four little ‘Tremont’ Atom CPU cores"

It's a different uArch, but the same ISA. The extension support would be the only concern here, which is super minor.

sesuximo · on June 10, 2020

Guess that makes it very good bet ;)

firethief · on June 10, 2020

I don't see how the cost of supporting outmoded instructions would be significant. The decoder is a tiny proportion of the die, and not a bottleneck at all. Obsolete instructions that can't be translated to the same uops everything else uses are already microcoded anyway.

I'd bet the instruction set difference is mainly level of SIMD support. Sunny Cove will have AVX2, Tremont won't have enough execution units to go wider than SSE2.

fortran77 · on June 10, 2020

It's an interesting architecture. We already do this in a way between CPU and GPU. Let the compilers and language libraries figure it out.

ericlewis · on June 10, 2020

I have seen ARM chips with co-processors that work similarly

Twirrim · on June 10, 2020

Sure big.LITTLE chips have been around for a while. But my understanding is the instruction set is identical between the cores, they just operate at different speeds. That in itself must be a fun balancing act for the scheduler.

To clarify my concerns (I haven't dug in to the specific instruction set differences):

Assume the main core supports AVX2 and the smaller cores don't. Which core do you execute the code on? Which one will get you the best performance per watt? How do you account for that in the OS scheduler? What do you want to optimise for?

If your code is compiled for AVX2, it'll fail on the small cores unless it does continuous runtime checking (which is expensive, but given processes can migrate between cores, presumably necessary).

gsliepen · on June 10, 2020

It's not so much the speeds that differ, but the micro-architecture. The big cores are typically out-of-order cores with a large amount of cache, to get as many instructions as possible per cycle. The SMALL cores are in-order cores with a small amount of cache. These have a lower IPC, but they also use much less energy per instruction compared to the big cores.

jlouis · on June 10, 2020

You do the same with fp. First floating point instruction traps and the process is then flagged as needing fp. This is then used in the context switch to also include fp instructions.

The advantage is you avoid storing fp registers unless you are going to use them.

That flag could easily determine what you can run where.

AnotherGoodName · on June 10, 2020

No they usually pair different architectures and it will crash if the OS doesn't catch and handle the invalid instruction error the CPU will throw.

https://medium.com/@jaddr2line/a-big-little-problem-a-tale-o...

brigade · on June 10, 2020

As mentioned in your link, that's ultimately against ARM's requirements for big.LITTLE and was due to buggy Samsung patches. A fixed kernel only exposes the subset of features available on all CPUs.

johntb86 · on June 10, 2020

That's not usual, that's just Samsung's incompetence.

est31 · on June 10, 2020

How does the scheduler know which process will use CPU instructions that the small cores don't support? Will it just try it out and then move the process to the higher support core if it detects an error?

AnotherGoodName · on June 10, 2020

>How does the scheduler know which process will use CPU instructions that the small cores don't support?

I don't see that as required? If you catch the illegal instruction signal that CPUs throw you can just run it on the other CPU since the instruction counter would not have incremented. There's a delay in the catch and retry on the other CPU but i don't see the big deal here?

mtklein · on June 10, 2020

I once ran into the same sort of issue with this very chip, in that the little A55 cores supported half-float compute, while the big Mongoose cores didn't. Seriously a pain.

simias · on June 10, 2020

Can you give an example? I've seen ARM chips with co-processors with different features (like no MMU for instance) but not a completely different instruction set. Besides in my experience in these configurations the OS doesn't really do much with the other core, it's generally left to specific applications to have code explicitly meant to run on it, there's no automatic scheduling.

Unless you think about embedded co-processors that are generally here to control a complex subsystem (like video processing or something like that, arguably GPUs fit in that description) but in general those aren't handled like real CPU cores at the OS level, they have dedicated drivers or userland libraries dedicated to a specific purpose.

Something tells me that if Intel wants this architecture to be popular they'll have to work on a tighter and more transparent integration, otherwise this is going to end up like the Cell.

It's a pretty interesting approach though, I'm genuinely curious to see how that's going to end up working.

russdill · on June 10, 2020

Go look at a modern automotive SoC. R5s, M3s, A72s, M4s, custom arch for real time, DSP, etc. All in one chip.

simias · on June 10, 2020

I'm very familiar with some of these chips but my point still stands, in my experience it generally requires tailor-made software to really benefit from these architectures.

If these Intel CPUs are meant to be general purpose I wonder how that's going to work out.

philistine · on June 10, 2020

With the heavy rumours of Apple moving forward with ARM chips in Macs, Intel clearly showed the plans for this chip to Apple last year, and Apple said no thank you.

AgloeDreams · on June 10, 2020

Why wouldn't they!

While Intel showed this, Apple was currently selling iPhone 11s with processors that would clearly outpace this. The A13 has two large cores and four small cores, whereas either one of the large cores outperform the comparable 'Core' core in this, and I'm betting the smaller cores do as well, possibly at less power. Supposedly their ARM< Mac CPU is a 12 core architecture, I imagine with four large cores and 8 small cores. Thats a massive hardware difference...did I mention that the 12 core CPU is supposed to be build on TSMC's 5nm? Intel is trying to get current and honestly, maybe Apple will ship a Macbook Air with it.. but probably not.

kllrnohj · on June 10, 2020

> The A13 has two large cores and four small cores, whereas either one of the large cores outperform the comparable 'Core' core in this

The evidence of this is flimsy at best. It's possible the A13's big cores are faster than Intel's Sunny Cove, but also not very likely, and certainly not established fact. The only real evidence here is Geekbench, which is a highly questionable benchmark that also has extreme OS dependencies (compare results for the same CPU on Windows & Linux, for example).

When/if an Apple releases something running MacOS with their custom ARM cores then you'd finally get a good comparison to isolate the CPU's performance by itself.

rpiguy · on June 10, 2020

This is not true we have a pretty good idea of where the A13 stands. Anandtech has run SpecInt and SpecFP on the A13 and shows it to be in the same range of performance as a Xeon Silver 4112 (2.6ghz base/3.0ghz turbo), which is a 14nm chip from 2017.

So not as fast as the latest processors on a per core basis, but still pretty damn fast.

Can't really make assumptions, but you would think with the higher thermal envelope Apple can boost clock speeds making them even more competitive.

ksec · on June 10, 2020

There are lots of other test from SPEC to Javascript benchmarks ( Using same JS Engine ) that shows what A13 is capable of. And so far results are comparable in many cases, better in some cases, and slower in other cases.

We already knew what to expect from Intel Willow Cove, we will have to wait and see what A14 has in store for us.

rayiner · on June 10, 2020

It’s not just geekbench. People have run SPEC on the Apple chips, which is a suite of real applications like GCC.

kllrnohj · on June 10, 2020

A12's spec results are close to Intel's single-thread performance at comparable clock speeds in SPEC, yes, but that wasn't the claim. The claim was that A12 was faster and that is what doesn't really have much evidence to support. Particularly since at comparable clock speeds is great and all, but not if your competitor can just outclock you by a landslide. The A12 can almost certainly go higher than 2.5ghz when actively cooled. But can it hit 4ghz? That's a different story.

rayiner · on June 10, 2020

On SpecInt2006, Apple is matching a 9900K and a 3900X even at a substantially lower clock speed: https://www.anandtech.com/show/14892/the-apple-iphone-11-pro...

gsnedders · on June 11, 2020

>Geekbench, which is a highly questionable benchmark that also has extreme OS dependencies (compare results for the same CPU on Windows & Linux, for example)

OS and compiler dependencies, really. On the other hand, comparing macOS and iOS results may well be more meaningful (they use the same compiler and much of the OS is shared, especially at layers likely to significantly affect performance).

Geekbench (v4 especially) is definitely flawed, but the results in this case don't seem out-of-line with what we see on specINT for example.

e.g., https://browser.geekbench.com/v4/cpu/compare/15541405?baseli... comparing shows an A12X roughly matching a Sunny Cove 1060NG7 on single-thread performance (and it mostly follows where I'd expect Intel's strengths to be, with a wider vector unit and specialized instructions); in multi-thread performance the A12X unsurprisingly wins, but it also has double the number cores (okay, four of them are little, but that's still providing extra capacity to get work done).

pradn · on June 10, 2020

IMHO, Apple's phones were considered fantastic for a few reasons, in order: brand (perceived quality, luxury status, pricing, "Apple"), software lock-in (iMessage, iCloud), and privacy (also a branding thing since they still use Google for Safari search). The hardware and camera were usually matched or exceeded by top-tier Android manufacturers. Now, Apple-designed CPUs are leaving their competitors in the dust, and provide, perhaps for the first time in a long time, hardware superiority. I don't think consumers quite understand the actual performance gap yet. But once it gets out, it's probably going to be second or third in the list for why people buy iPhones.

I'm going to switch from a Pixel to an iPhone this year. The CPU is just clearly better, and seems to actually matter for taking photos and web browsing. (The Pixel still takes like 5 seconds to post-process a photo. iPhones do it instantly.)

SomeHacker44 · on June 11, 2020

I switched for the opposite reason. Phone hardware is plenty fast for anything I want to do, but Apple's really restrictive closed software ecosystem and lack of hardware variety drove me to Android last year after being on Apple since iPhone 3G.

I just wanted a touch ID, notcheless, fast and current phone, and Apple did not offer one until recently.

I am not going back, and I am also now seeing the same thing with their laptops. Crappy keyboards, non optional and useless touch bars, low performance per dollar and software that gets in your way. No thanks. Windows 10, WSL, Surface Book/Go/Alienware, wow. Just good, open, usable stuff.

Havoc · on June 11, 2020

I'd do the same if I wasn't concerned about Android privacy. My pi-hole is blocking literally 10x as much phoning home stuff on the Android device I've got vs iPhone. And that is with the iPhone having way more crap installed on it.

Literally phoning home every 2 mins:

https://i.imgur.com/VJH8Yqa.png

delfinom · on June 10, 2020

Pixel 2 and newer have a completely independent processor for photo processing https://en.wikipedia.org/wiki/Pixel_Visual_Core

It also includes :AI:

The Pixel 4 has renamed it the "Pixel Neural Core" and has given it even more machine learning tasks and offloading such as Google Assistant and face unlock.

pradn · on June 11, 2020

That's true, but I meant the general-purpose CPU.

harpratap · on June 10, 2020

the Pixel 4 has dedicated Visual cores based on their TPUs they developed inhouse. All my Pixel 4 photo processing is near instant.

oehtXRwMkIs · on June 10, 2020

What are your sources for the claim that the hardware superiority actually matters in real life? Every once in a while when I'm looking to buy a phone, I like to check out videos that do real life speed tests (mainly opening apps but other tasks as well) and iPhones have never been on top. Curious to see if that has changed recently.

acdha · on June 10, 2020

You're asking for sources and then citing unnamed “real life speed test” videos without any way for people to see what they're measuring or how solid their methodology was?

The main area where normal people notice this is in web usage, where Mobile Safari has handily outpaced Android browsing for many years — see e.g. https://discuss.emberjs.com/t/why-was-ember-3x-5x-slower-on-... from 2014. How much that matters depends on how much a particular website is limited by single-core JavaScript performance — well-engineered sites probably don't have a huge impact but it's quite noticeable on anything which has a bloated SPA and the web has been moving in the latter direction for years.

Gaming is the other area where this is fairly noticeable but that varies both in where the bottlenecks are (CPU vs. GPU), how prominent the effect is, and the relative quality of the ports so it's harder to do a fair comparison.

oehtXRwMkIs · on June 10, 2020

Burden of proof is on the one who makes the claim. I'm just asking for some links and I explained why I doubted the claim.

I appreciate your effort with giving more background, although I think 2014 is a bit dated with Firefox Quantumn becoming more of a thing on Android. Can't remember if the Preview has it yet or not.

saagarjha · on June 10, 2020

> I like to check out videos that do real life speed tests (mainly opening apps but other tasks as well)

Those "tests" almost invariably suck at measuring anything useful or being accurate.

oehtXRwMkIs · on June 10, 2020

They are certainly better than synthetic benchmarks however. If you know of anything better let us know.

nojito · on June 10, 2020

It doesn’t matter what Intel shows Apple.

Intel can’t meet deadlines which forces Apple to restructure their plans constantly.

fermienrico · on June 10, 2020

Just to add color to that - Apple doesn't want external control of their semiconductors. They want to do their own thing starting with the mobile SOCs, then the secure enclave, then the M2 chip, and the GPU, and now mainstream high power processor. What Intel offers is orthogonal.

AnthonyMouse · on June 10, 2020

What Intel offers they still have to compete with.

It always seems like a good idea to do your own thing when your supplier is stumbling. The problem then is you're on your own. If they find their footing you're in trouble -- either you've blown a huge pile of money developing something which you then don't use because the competition has something better, or you use it anyway and get to relive the final days of Sun Microsystems.

And the same thing happens if anybody can beat you. If Intel can't but AMD does, you lose. If AMD can't but Qualcomm does, you lose.

Worse, success precipitates failure. If you make an in-house processor which is only a couple of percent faster, that's boring. You spend a lot for a little. If you make one which is more than a couple of percent faster, that's war. Intel can't have that. Google can't have that. Samsung can't have that. Microsoft can't have that. Even if you're bigger than any of them, you're not bigger than all of them. So they double their R&D, combine their resources, whatever it takes, and soon you're Sun Microsystems again.

jsnell · on June 10, 2020

When was the last time that a commodity mobile phone soc was even remotely competitive with Apple's latest? 2012? That seems like rather strong evidence against your theory.

AnthonyMouse · on June 10, 2020

https://benchmarks.ul.com/compare/best-smartphones

https://www.gsmarena.com/benchmark-test.php3

I see a lot of Android devices at the top of those lists.

Apple likes to cherry pick. For example, they require everything on iOS to use their browser engine, then they spend a lot of time optimizing their browser engine for their CPUs. But that's not superior hardware performance, it's software optimization, which they could do with whichever commodity CPU they chose as well.

Intel does the same thing. They spend resources optimizing popular software for their CPUs. Qualcomm not so much.

mrkstu · on June 10, 2020

Agreed- all those competitors have been shown up by Apple for years. They've had to take it because they haven't been able to come up with a technical response- even those using the same foundries...

Jhsto · on June 10, 2020

And don't forget the modem! Apple recently dumped Intel LTE modems as well.

williadc · on June 10, 2020

... and then acquired the team that made those modems.

clairity · on June 10, 2020

no, it's not purely about control. apple wants reliability and (predictable) advancement in it's supply chain. if intel had delivered on that by meeting it's own roadmap, apple probably wouldn't be looking to move away from them.

it's not apple simply being control freaks (although they are and can afford to be), it's intel shooting it's own foot.

whydid · on June 11, 2020

Please stop using the phrase "add color to". You're adding details, not color.

feteru · on June 10, 2020

Do you have a link with more info on this?

dpacmittal · on June 10, 2020

Haven't they been trying 10nm for years? It was supposed to come out by 2015.

benologist · on June 10, 2020

Even when Intel started recycling 14nm why would Apple actually care? Some of their computers have gone years without updates despite newer processors being available, it doesn't seem like an important feature of most of their computers.

In 2008 Apple bought PA Semi to design processors for iOS, make processors for Macs is likely something they've been working on since way before Intel was late on 10nm.

dannyw · on June 10, 2020

I have only seen a sentence or two mentioned in Bloomberg, etc. But it’s widely known that Intel haven’t delivered a true generational upgrade since Skylake.

We are on 14nm+++++ or something?

whynotminot · on June 10, 2020

10nm Icelake in the new 13" MBP is looking like the first solid generational upgrade in recent memory, but yeah, one qualified success in the last 4ish years does not a roadmap make.

thought_alarm · on June 10, 2020

Intel is hardly to blame for Apple's bungled MacBook lineup of the last 4+ years.

rbanffy · on June 10, 2020

That the latest and greatest MacBook Pros use 14nm processors is entirely Intel's fault.

fomine3 · on June 12, 2020

Partially due to Apple because they don't prepare for adopt AMD processors (like lack of AMD's instruction like AMD-V support).