Apple and Intel first to use TSMC 3nm

jiggawatts · on July 3, 2021

I just spent a month upgrading a bunch of cloud servers to AMD EPYC, and it was such a nostalgic feeling to be able to speed up systems with a simple, risk-free hardware upgrade.

Remember those days? When every server upgrade magically sped up everything by a factor of two or three? The stop button was pressed on that wonderful time for about a decade, but what seemed like an end of an era wasn't. It was just a pause, and now we're playing with ever faster tin once again.

It feels almost strange for that era of ever increasing hardware speed to make such a forceful comeback after having been stuck at Intel 14nm in the server space for years. The upgrades to 7nm AMD are already pretty impressive. I hear good things about Apple's 5nm laptop chips. Server chips on 3nm should blow everyone's frigging minds.

Exciting times!

hyperpallium2 · on July 3, 2021

noob question, but how is the performance better? I thought clocks weren't increasing (due to heat), so is it that smaller chips mean more per wafer, therefore cheaper and you can buy more?

I recall a Sophie Wilson talk about how things will never get faster, past 29nm.

ksec · on July 3, 2021

A lot of comment mentioned about IPC, but something obvious no one has mentioned ( yet ).

For Server is it also about core count. For the same TDP AMD offer 64 Core option. On a Dual Socket System that is 128 Core. Zen 3 Milan is also socket compatible with Zen 2 Rome.

Basically for Server Intel has been stuck on 14nm for far too long. The first 14nm Broadwell Xeon was released in 2015, and as of mid 2021 Intel barely started rolling out 10nm Xeon Part based on IceLake.

That is half a decade of stagnation. But we are now finally getting 7nm server part, 5nm with Zen 4 and and SRAM Die Stacking. I am only hoping EUV + DDR5 will bring Server ECC DRAM price down as well. In a few years time we will have affordable ( relatively speaking ) Dual Socket 256 Core Server with Terabyte of RAM. Get a few of those and be done with scaling for 90% of us. ( I mean the whole StackOverflow is Served with 9 Web Server [1] on some not very powerful hardware [2] )

[1] https://stackexchange.com/performance

[2] https://nickcraver.com/blog/2016/03/29/stack-overflow-the-ha...

gameswithgo · on July 3, 2021

instructions per clock have been going up, l3 cache sizes going up, core count going up, new wider instructions becoming available (avx 2 and 512 ), branch predictors improving.

overall speed increases are still nothing like the old days and spectre style mitigations have been eating away at the improvements

bahmboo · on July 3, 2021

Yes. Without a L1 cache a cpu is just a hyper active kid in a box. L2 important too. If one cares about performance learn about caches.

ben-schaaf · on July 3, 2021

Clocks aren't increasing much, but IPC still is. Density is also still increasing meaning lower latencies, lower power and therefore higher performance per area.

Causality1 · on July 3, 2021

IPC is going up but slowly compared to previous generations. If AMD can sustain the kind of generation on generation increases it achieved between Zen 2 and 3 I will be tremendously impressed, but as it stands my brand new 5600X has barely 50% more single-core performance than the seven year old 4670K I replaced.

flumpcakes · on July 3, 2021

I have a hunch that it would be much greater than 50% if the software was recompiled to take advantage of the newer instruction sets. Then again, you won't be comparing exact binary copies of software, which some people might say invalidates it as a benchmark/comparison.

I remember many, many years ago my PC couldn't run the latest adobe software as I was missing SSE2 instructions on my then current CPU. I believe there was some hidden compatability mode, but the software ran extremely poorly vs. the version of software that was out before that didn't require SSE2.

aseipp · on July 3, 2021

There have been almost no new broad "performance-oriented" instruction sets introduced since the Haswell era, all the way to the Zen 3 era (e.g. SIMD/vector) for compilers to target. At least not any instructions that the vast majority of software is going to see magical huge improvements from and are wide spread enough to justify it (though specific software may benefit greatly from some specific tuning; pext/pdep and AVX-512 for instance.)

The microarchitectures have improved significantly though, which does matter. For instance, Haswell-era AVX2 implementations were significantly poorer than the modern ones in, say, Tiger Lake or Zen 3. The newer ones have completely different power usage and per-core performance characteristics for AVX code; even if you could run AVX2 on older processors, it might not have actually been a good idea if the cumulative slowdowns they cause impacts the whole system (because the chips had to downclock the whole system so they wouldn't brownout). So it's not just a matter of instruction sets, but their individual performance characteristics.

And also, it is not just CPUs that have improved. If anything, the biggest improvements have been in storage devices across the stack, which now have significantly better performance and parallelism, and the bandwidth has improved too (many more PCIe lanes). I can read gigabytes a second from a single NVMe drive; millions of IOPS a second, which is vastly better than you could 7 years ago on a consumer-level budget. Modern machines do not just crunch scalar code in isolation, and neither did older ones; we could just arbitrage CPU cycles more often than we can now in an era where a lot of the performance "cliffs" have been dealt with. Isolating things to just look at how fast the CPU can retire instructions is a good metric for CPU designers, but it's a very incomplete view when viewing the system as a whole as an application developer.

AtlasBarfed · on July 3, 2021

I'm not deep on the weeds of high-performance compilers, but just because ISA evolution hasn't happened doesn't mean compilers can't evolve to use the silicon better.

There always was an "Intel advantage" to compilers for decades (admittedly Intel invested in compilers more than AMD, but they also were sneaky about trying to nerf AMD in compilers), but with AMD being such a clear leader for so many years, I would hope at least GCC has started supporting AMD flavors of compilation better.

Anyone know if this has happened with GCC and AMD silicon? Or at least is there a better body of knowledge of what GCC flags help AMD more?

aseipp · on July 3, 2021

Yes, I consider this to fall under the umbrella of general microarchitectural improvements I mentioned. GCC and LLVM are regularly updated with microarchitectural scheduling models to better emit code that matches the underlying architecture, and have featured these for at least 5-7 years; there can be a big difference between say, Skylake and Zen 2, for instance, so targeting things appropriately is a good idea. You can use the `-march` flag for your compiler to target specific architectures, for instance -march=tigerlake or -march=znver3

But in general I think it's a bit of a red herring for the thrust of my original post; first off you always have to target the benchmark to test a hypothesis, you don't run them in isolation for zero reason. My hypothesis when I ran my own for instance was "General execution of bog standard scalar code is only up by about 50-60%" and using the exact same binary instructions was the baseline criteria for that; it was not "Does targeting a specific microarchitecture scheduling model yield specific gains." If you want to test the second one, you need to run another benchmark.

There are too many factors for any particular machine for any such post to be comprehensive, as I'm sure you're aware. I'm just speaking in loose generalities.

jmgao · on July 3, 2021

> You can use the `-march` flag for your compiler to target specific architectures, for instance -march=tigerlake or -march=znver3

Note that -march will use instructions that might be unavailable on other CPUs of the target. -mtune (which is implied by -march) is the flag that sets the cost tables used by instruction selection, cache line sizes, etc.

jiggawatts · on July 4, 2021

I recently watched the CppCon 2019 talk by Matt Godbolt "Compiler Explorer: Behind The Scenes"[1], and a cool feature he presented is the integrated LLVM Machine Code Analyser tool. If you look at the "timeline" and "resource" views of how a Zen 3 executes a typical assembly snippet, it is absolutely mind blowing. That beast of a CPU has so many resources it's just crazy.

[1] https://www.youtube.com/watch?v=kIoZDUd5DKw

scns · on July 3, 2021

march=native

jbluepolarbear · on July 3, 2021

I doubt that very much. I have a i7 4790 and I have Ryzen 7 3700X. In my testing single core speed is nearly 2.5 times in favor of my Ryzen. What where you using as bench marks?

My test was a single threaded software rasterizer with real-time vertex lighting. That I compiled in vs c++ 2008 around 2010.

aseipp · on July 3, 2021

Basic single-core scalar-only workloads of my own corroborate the grandparent, as well as most of the other benchmarks I've seen. My own 5600X is "only" about 50-60% better than my old Haswell i5-4950 (from Q2 '14) on this note.

But the scalar speed isn't everything, because you're often not bounded solely by retirement in isolation (the system, in aggregate, is an open one not a closed one.) Fatter caches and extraordinarily improved storage devices with lots of parallelism (even on a single core you can fire off a ton of asynchronous I/O at the device) make a huge difference here even for single-core workloads, because you can actually keep the core fed well enough to do work. So the cumulative improvement is even better in practice.

jbluepolarbear · on July 3, 2021

Now I’m curious, is this testing against some requirements for an application? Are they any applications that would benefit solely from scalar performance vs scalar, cache size, and memory speed.

speeder · on July 3, 2021

Need to point out in particular that that best Haswell i5 was FASTER than the best i7 for single-core workloads, so this might be a factor in OP post.

And also that is the reason why I use such processor on my own computer, it was already "outdated" when I bought it, but since one of the things I like to do is play some simulation style games that rely heavily on single-core performance, I choose the fastest single-core CPU I could find without bankrupting myself. (the i5 4690k that with some squeezing can even be pushed past 4ghz, it is a beastly CPU this one)

Causality1 · on July 3, 2021

Exactly. The only thing I used my system for that was really begging for more CPU was console emulation, and that depends more on single core performance than anything else.

epmaybe · on July 3, 2021

Haha I’m in almost the exact same boat, I replaced a haswell i5 with a 5800x. I definitely went overkill with my system and still haven’t gotten a GPU upgrade yet due to cost/laziness.

Causality1 · on July 4, 2021

Indeed. I feel like I could've gotten most of the speedup by wiping windows and reinstalling but I had to make a clean break from Windows 7 or I was never going to stop using it. Hopefully GPU prices come down soon.

deviledeggs · on July 3, 2021

Even crazier, my i5 750 (from 2009) was overclocked to 3.5ghz and per core Ryzen 3 isn't even twice as fast.

We're near the end of IPC scaling per core. And it was never that good in the first place. Pentium 3 IPC is only 3-4x worse than the fastest Ryzen. Most of our speed increases came from frequency.

IMO we need to get off silicon substrate so we can frequency scale again.

I wonder if the end of scaling will push everyone into faster languages like Rust. You can't sit around 2 years for your code performance to double anymore. Will this eventually kill slow languages? I think so.

michaelmrose · on July 3, 2021

Hardware increases of speed began to slow down 20 years ago and people are still using software that is 50x slower than the fastest possible technology. If this alone was going to kill slow languages one would suppose it would have already done so.

The arguable shift that is more significant is not about hardware its hopefully about newer languages like Rust making this performance cost less in terms of safety and development time which is a more recent development.

deviledeggs · on July 4, 2021

I agree partially. But excuses for using dog slow languages, at least where I've worked, revolved around "it will be twice as fast in two years anyways"

It's clear that's no longer true, which gives less excuses for using say Ruby over Java.

slver · on July 3, 2021

Clockrate is not a bottleneck on how fast your computer is. It's just a synchronization primitive.

Think about it like the tempo of a song. The entire orchestra needs to play in sync with the tempo, but how many notes you play relative to the tempo is still up to each player. You can play multiple notes per "tempo tick".

amelius · on July 3, 2021

You will have to use a faster internal clock to play those faster notes, though.

can_count · on July 3, 2021

The point is that you don't. A clock tick is not the smallest unit of operation. It's the smallest unit of synchronization. A lot of work can be done in-between synchronization points.

_xy8h · on July 3, 2021

IPCs. Instructions per clock.

We haven't had a strict clock speed = performance ratio for over a decade, it's just one component of it now.

https://en.wikipedia.org/wiki/Cycles_per_instruction

https://en.wikipedia.org/wiki/Instructions_per_second

Rather than just bumping clock speed, they've made improvements on what can be done within the clock cycle.

I recently did a desktop rebuild.

Went from a Ryzen 7 2700X to a Ryzen 5 5600X.

On paper, this looks like a downgrade. After all, the 7 is higher than a 5, right? I have two more cores with a 2700X...

However, the 5600x has about a 28% IPC gain over the 2700x in single core performance despite running at the same base clock speed. Literally 20%+ faster in the same tasks even when taking the turbo boost out of the equation.

The 2700x was at 12/14nm process while the 5600x is at 7nm which also helps with the power consumption as the 5600x has a 40 watt lower TDP.

Since what I need is better single core performance over more cores, the 5600X is quite an upgrade despite only being two years newer. (A LOT has happened in 2 years with AMD) With two less cores but significantly higher single core performance, the 5600X outperforms the 2700X on both single core and multicore performance.

Unfortunately for Intel, they did a lot of little tweaks and cheats to gain performance at the expense of security and now the mitigation patches pulls their chips back to the Bulldozer era in terms of current performance. They were also stuck at the same node for almost a decade now (Broadwell, 2014), so they had no gains from a shrink (speed of light propagation gain from a shrink, less heat and higher clocks also).

My i7-4790k is full of jank and microstuttering now, it has become unusable. Imagine how the servers might be doing if (and they should be) they are being kept up to date on security and microcode patches.

Then there also comes with spec upgrades within the processor generations. Ryzen 2700X to 5600X also means PCI Express went from 3.0 to 4.0... Not hugely significant among desktop users, but substantial for servers that need that amount of link bandwidth for compute cards and storage.

TLDR: Magic.

mbfg · on July 3, 2021

i think for the data center the key number is performance per watt. With smaller the power requirements go down. As the power goes down, heat goes down, so you can push the processors harder.

There are of course architectural improvements that improve performance with space to add gates etc, but when you have 1000s of chips cost of energy is the big thing.

elif · on July 3, 2021

Less heat = more cores

dmitriid · on July 3, 2021

Software will inevitably eat all those gains

api · on July 3, 2021

Yeah, now we can have Electron apps compiled to WASM running in virtual browser instances hosted remotely in the server with feeds sent back via H.264 streams that then have to be decoded by a local browser instance to be rendered to...

AussieWog93 · on July 3, 2021

I know this is a meme, but back in around 2012 or so, one of my grandpa's friends pulled out a Windows 2000 laptop and loaded up a spreadsheet in Excel 2000.

I was blown away by how snappy and responsive the whole experience was compared to the then-not-bad Core2 Duo Grandpa had in his machine, running Windows 7 and Office 2010.

VortexDream · on July 3, 2021

Honestly, it doesn't feel like personal computing has improved much at all. I remember using Windows 2000. Run it on an SSD and it flies (tried it recently). Yet I can't identify anything that W10 does better (for me) than W2000 that justifies its sluggishness on a C2D.

kilburn · on July 3, 2021

While it is true that older software is extremely snappy if you compare it with what we use today, it is not that hard to find examples where we have come a long way. Off the top of my head:

- You can mix chinese and russian characters in a document [pervasive use and support for unicode]

- Your computer won't get zombified minutes after you connect it to the internet [lots of security improvements]

- You can connect a random usb thingy with a much much lower probability of your computer getting instantly owned [driver isolation]

- You can use wifi with some reliability [more complex, stable and faster communication protocols]

- You can have a trackpad that doesn't suck [commodization of non-trivial algorithms/techniques that did not exist back then]

- Files won't get corrupted time and again at every power failure [lots of stability improvements]

Whether all of the above could be achieved with the _very performance oriented_ techniques and approaches that used to be common in older software is debatable at least. In any case, a lot of the slowness we pay for toady is in exchange of actually being able to deal with the complexities necessary to achieve those things in reasonable time/cost.

VortexDream · on July 3, 2021

Honestly, I don't see why any of these things require such terrible performance characteristics.

kilburn · on July 4, 2021

I tried to use examples where some inherent performance penalties where easy to see:

- Unicode text: All text consumes more memory (because the character space is much larger). Basic text processing ("wrap this paragraph at 80 characters") becomes much harder, not just because bytes != glyphs, but also because glyphs can combine.

- Security improvements: we now have various sandboxing, isolation, execution-protection, etc.. features in OSes. The performance impact of some of those is negligible thanks to new hardware features to help with them, but others still have a significant cost. Furthermore, some performance tricks used in old systems would simply violate the current security models and are hence impossible to do anymore.

- Driver isolation: this is similar to the above. The OS is now doing more work to ensure drivers behave, the isolation forbids some more performant pathways, etc.

- Wifi with some reliability: this was an example of progress being achieved. Wifi protocols are a nightmare, and I'm still amazed that they work at all.

- Better trackpads: another progress example. The big issue here has been the development of smarter algorithms (and the hardware refinement to back them up). This is something that seems simple, but it took a very long time to get this anywhere acceptable (even after Apple showed the world it was possible). I can only assume that it is actually a pretty hard problem underneath.

- Files that don't get corrupted: we have needed years and years of iterative improvements to finally get here, both at the FS level and on the programs above (think DBs). We are now going through journal logs, memory barries, checksumming and verifying data, using copy-on-write for FSs, etc.. All these things have non-negligible runtime and camplexity costs.

In general, we have been prioritizing to make more stuff and/or making the stuff more correct, disregarding the performance aspect so long as it remains good enough (from the POV of the developers).

dmitriid · on July 4, 2021

> Security improvements: we now have

Everyone agrees that M1 is a very fast chip, much faster than the current Intel chips shipping with Macbooks.

And yet: when I trigger the native "Open File" dialog from IDEA, it still takes MacOS up to a second to verify permissions on a list of directories and mark them as available.

So, given that:

- M1 is blazingly fast

- modern SSDs pump GBs of data per second

- RAM on M1 is almost literally a part of the CPU, and the bandwidth of modern RAM is also GBs per second

why does it take up to a second to verify permissions on a list of five directories?

> Unicode ... Wifi ... Trackpads ...

None of these require GBs of RAM and 16-core processors to barely run.

bruce343434 · on July 3, 2021

> - You can have a trackpad that doesn't suck

Go on...

dmitriid · on July 3, 2021

> In any case, a lot of the slowness we pay for toady is in exchange of actually being able to deal with the complexities necessary to achieve those things in reasonable time/cost.

This is also debatable at least.

Just a few weeks ago it turned out that the new Windows Terminal can only do color output at 2fps [1].

The very infuriating discussion in the GitHub tracker ended up with a Microsoft tea member saying that you need a " an entire doctoral research project in performant terminal emulation" to do colored output. I kid you not. [2]

Of course, the entire "doctoral research" is 82 lines of code [3]. There will be a continuation of the saga [4]

And that is just a very small, but a very representative example. But do watch Casey's rant about MS Visual Studio [5]

You can see this everywhere. My personal anecdote is this: With the introduction of new M1 macs Apple put it front and center that now Macs wake up instantly. For reference: in 2008 I had the exactly same behaviour on a 2007 Macbook Pro. In the thirteen years since the software has become so bad, that you need a processor that's anywhere from 3 to 15 times more powerful to barely, just barely, do the same thing [6].

The upcoming Windows 11 will require 4GB of RAM and 64GB of storage space just for the empty, barebones operating system alone [7]. Why? No "wifi works reliably" or "trackpad doesn't suck" can justify any of this.

[1] https://twitter.com/cmuratori/status/1401761848022560771

[2] https://github.com/microsoft/terminal/issues/10362#issuecomm...

[3] https://twitter.com/cmuratori/status/1405356794495442945

[4] https://twitter.com/cmuratori/status/1406755159347130371

[5] https://www.youtube.com/watch?v=GC-0tCy4P1U

[6] https://gadgetversus.com/processor/apple-m1-vs-intel-core-2-...

[7] https://www.microsoft.com/en-us/windows/windows-11-specifica...

dmitriid · on July 3, 2021

As luck would have it, here's the continuation to Windows Terminal. Casey Muratori made a reference terminal renderer: https://github.com/cmuratori/refterm

This uses all the constraints that the Windows terminal team cited as excuses: it uses Windows subsystems etc. One person, 3k lines of code, it runs 100x the speed of Windows terminal.

See the epic demo (and stay till the end for color output): https://www.youtube.com/watch?v=hxM8QmyZXtg

kilburn · on July 4, 2021

To play devil's advocate, what I see in that demo is that:

- Yeah, some lower-level stuff seems to be broken (the windows console I/O stuff).

- Microsoft's devs linked to a very nice website that quickly and nicely explains some of the quirks of modern text rendering [1].

- Casey proceeds to demonstrate a (very fast!) approach that completely ignores some of the problems laid out in that site. Namely, his entire approach is based on caching rendered glyphs, which the document explicitly states is something you can't do naively and expect correct results (section 5).

- In his very demo some of these issues pop up (terrible-looking emojis, misaligned ascii art) and he shrugs those off as if they were easy jobs.

- Other issues are never tested in his demo. Examples include: (i) dealing with text selection, which is hard according to the linked site; (ii) how is he "chunking" non-asscii character runs (also hard to do correctly); (iii) handling ligatures (terminal programs oftentimes use ligatures between basic ascii character combinations such as => to make some source code more readable)

In other words: I see an incomplete solution that addresses only the easy parts of the problems (in a very performant way!) that fundamentally cannot be extended to a correct solution for the actual/full problem. And a lot of arrogance while presenting it.

In no way does this mean that a much more performant solution doesn't exist. But Casey's cute demo is not a proof that it does because it does not solve the actual/full problem.

PS: I don't work for MS, I don't know Casey nor any of MS's devs, and I don't even use windows. I do hold a PhD though, and I know plenty of PhD's dedicated to exploring the nitty-gritty details that some people with only cursory knowledge about the problem would dismiss as "this must be a quick job".

[1] https://gankra.github.io/blah/text-hates-you/

jiggawatts · on July 4, 2021

> caching rendered glyphs, which the document explicitly states is something you can't do naively and expect correct results

But very slightly less naively you can get correct results. One trick is to realise that ClearType only increases the effective horizontal resolution. Also, despite appearing to have three times the horizontal texels available to it, the final post-processing to reduce the colour artefacts means that the effective resolution increase is only about double the pixel count. So you can get very good results by horizontally stretching your glyph cache buffer by a small factor, typically three. This is not a huge increase in memory usage, and provides more than sufficient subpixel positioning quality. Where this breaks down is if your pipeline has very complex text special effect support, such as arbitrary transforms.

But a terminal emulator needs very few special effects. It doesn't need to be rotated smoothly through arbitrary angles, which DirectText supports. It doesn't need perspective transforms, or transparent text, or a whole range of such exotic features.

In fact, 99.99% of the characters drawn by a terminal emulator will be plain ASCII aligned to a simple grid. This is a trivial problem to solve, as demonstrated in this thread. It's not PHD work, it's homework.

dmitriid · on July 4, 2021

> what I see in that demo is that:

Remember: this is a demo written by one person over a few days, and not, you know, a full team of people at a multi-billion-dollar corporation.

> Microsoft's devs linked to a very nice website

> some of these ... hard according to the linked site ...

1. It's more excuses, and 2. we are not talking about "everything on the website".

He took all the constraints that the Windows terminal team imposed (you have to use DirectDraw, you have to go through conio, you have to use the unicode subsystem in Windows), and:

- even using those constraints he sped up the terminal 10x

- he already does things than windows terminal doesn't, for example: correct Arabic with correct ligatures despite "omg this website shows why it is so hard". So yes, he caches glyphs, and still displays proper Arabic text (something Windows terminal doesn't do) with varying widths etc.

And that 10x speedup? Instantly available to Windows terminal if only they use a cache of glyphs instead calling DirectDraw on every character. This is not rocket science. This is not "phd level research thesis". This is just junior-student-level work.

As to the issues:

- he was the very first one to point them out because he a) knows about them and b) knows how to solve them

- yes, the solutions he proposes are easy: the main "issue" is DirectDraw improperly handling font substitution, and his proposed solutions (resample glyphs after draw or skip DirectDraw and write your own glyph renderer) are easy. He literally does that for a living.

- omg text selection. Yes, text selection is a hard problem. But it's not an impossible problem, and nothing about text selection should bring the renderer from 5k fps to 2fps

> some people with only cursory knowledge about the problem would dismiss as "this must be a quick job"

These are not "nitty-gritty" details. And they are definitely not from a "person with cursory knowledge". There's nothing new in terminal rendering.

AussieWog93 · on July 3, 2021

>But do watch Casey's rant about MS Visual Studio [5]

"As an online discussion about software bloat grows longer, the probability of a Casey Muratori rant being posted approaches one."

astrange · on July 3, 2021

> In the thirteen years since the software has become so bad, that you need a processor that's anywhere from 3 to 15 times more powerful to barely, just barely, do the same thing [6].

That is not "the software" unless you count EFI, the sleep process is mostly a hardware thing and controlled by Intel.

PragmaticPulp · on July 3, 2021

> Yet I can't identify anything that W10 does better (for me) than W2000 that justifies its sluggishness on a C2D.

Core 2 Duo was introduced 15 years ago (2006). Almost a decade before Windows 10 was released.

Windows 10, and all modern operating systems, are designed around the availability of modern GPUs. If you try running it on an ancient machine without modern graphics acceleration and without enough CPU power to handle it in software, it's going to feel sluggish.

Windows 10 is perfectly fine and snappy on every machine I've used it on in recent history.

hashhar · on July 3, 2021

Hardware getting 100x faster isn't a good reason to make your software 100x slower so that the user-experience feels the same as from 20 years ago.

speedgoose · on July 3, 2021

Have you tried to do heavy computing on an old windows? In my experience the multi-tasking under load is so bad that you can't really use the machine while it's busy.

And then you can also think about the security improvements.

zozbot234 · on July 3, 2021

The multitasking is just as bad on modern Windows, tbh. Given reasonably up-to-date hardware, a lightweight Linux install really can be as snappy as Windows 2000 was back in the day, and that's with a lot of security and usability improvements.

speedgoose · on July 3, 2021

My windows 10 laptops can redraw the mouse cursor and windows without much issues when it's rendering a video using all the cpu cores for example.

VortexDream · on July 3, 2021

I genuinely don't think modern OS's are any better at multitasking with heavy loads. Particularly Linux and Windows are terrible at heavy mtitasking loads, at least as a software developer working with both environments.

I also don't see why the security improvements lead to such a massive decrease in performance.

api · on July 3, 2021

It spies on you and monetizes you better.

choeger · on July 3, 2021

... fill a spreadsheet.

prox · on July 3, 2021

But we can’t go much smaller can we? Totally not well versed in the chip space, so what can we expect?

MayeulC · on July 3, 2021

A usual reminder that we're not getting smaller. This is marketting speech, transistor gates are stuck at 20ish nanometers.

What's still increasing is transistor density, but "Dennard's Scaling" is dead. we stopped decreasing voltages some time ago.

We have more transistors, so we can make smarter chips. but we can't turn them on at the same time ("dark silicon"), we don't want to melt the chips.

Short of using other materials such as GaN, frequency won't really go above 5 GHz.

There remain plenty of ways to improve performance though: improvements to system architecture (distributed, non von neumann, changing the ISA), compilers, etc. Adiabatic computing, 3D integration, carbon nanotubes, tri-gate transistors, logic in memory, "blood" (cooling + power) and other microfluiduc advances, modularization with chiplets.

The "simple" Dennard's scaling is over though, and we need to move beyond CMOS and Von Neumann to really leverage increasing density without melting away.

ac29 · on July 3, 2021

> A usual reminder that we're not getting smaller. This is marketting speech, transistor gates are stuck at 20ish nanometers.

A quick perusal of WikiChip seems to suggest that this isnt true. Pretty much everything is getting smaller, including Fin pitch which should directly affect transistor size (not an EE, certainly could be wrong there). You're absolutely right that terms like "7nm" have become decoupled from a specific measurement and are largely marketing terms, though.

https://en.wikichip.org/wiki/10_nm_lithography_process

https://en.wikichip.org/wiki/7_nm_lithography_process

perl4ever · on July 4, 2021

> You're absolutely right that terms like "7nm" have become decoupled from a specific measurement and are largely marketing terms, though

Everybody says this, if I've read it once on HN, I've read it a million times.

But it bothered me, why should it be difficult to come up with a reasonably meaningful number? Just find out how many transistors can be placed in a given area, pretend they are laid out in a square, and find the number on a side, to calculate the linear density.

Well, I looked up and calculated this for several chips/processes, and I found that the number was consistently around 10 times the published figure, in nm.

The further interesting thing I discovered is that this seemed to go way back with no particular change in the ratio.

So it appears to me that the "decoupling" of the number from reality is a myth; it's not "real" but it's the same ratio it's always been.

But then, there's no obvious (to me) reason why a realistic number is out of the question, either.

MayeulC · on July 5, 2021

> The further interesting thing I discovered is that this seemed to go way back with no particular change in the ratio.

Exactly, as that's more or less what "process node" refers to today. I guess they don't want to change the metric as some could be confused by it.

> But it bothered me, why should it be difficult to come up with a reasonably meaningful number? Just find out how many transistors can be placed in a given area, pretend they are laid out in a square, and find the number on a side, to calculate the linear density.

There are plenty of other metrics. As usual, measuring only one number hides half the story. A process node is characterized by gate leakage and capacitance among others, for the electrical characteristics. Then, more meaningful surface area metrics are the size of an actual logic cell, like SRAM or a flipflop. Some processes lend themselves to optimizations that wouldn't be visible if just placing transistors side by side.

MayeulC · on July 5, 2021

Sorry, I should have been clearer that I was specifically talking about gate length.

Transistor size (occupied area) decreases, but that's mainly because FinFET uses vertical gates, not planar technology, so you can stack them closer (well, it also has other advantages).

Thanks for the links, that's an interesting website. If you look at the 10nm one, they explicitly call out what I said, and write 20nm for gate length (tunneling losses start increasing a lot if you reduce that, so I'm not sure how to read the 7nm page, maybe they got it wrong?)

In any case, process node once reffered to gate length; it doesn't anymore: https://en.wikichip.org/wiki/technology_node

We're not getting smaller, not that we couldn't (e-beam and ALD give us atomic resolution), but because it's useless (at that point, gate leakage and doping issues become hard to overcome). Instead we're improving precision, and our control along the 3rd dimension. Integration technologies are also progressing: flip-chip, TSVs. That will allow features like integrated HBM, on-chip lasers and photonics circuits, etc.

dfdz · on July 3, 2021

> 3D integration

In case anyone missed it, AMD announced a basic "3D" design last month

https://www.hpcwire.com/2021/06/02/amd-introduces-3d-chiplet...

Essentially AMD "stacked a 64MB 7nm SRAM directly on top of each core complex, tripling the L3 cache available to the Zen 3 cores."

I am excited to see what comes next!

kasperni · on July 3, 2021

Jim Keller believes that at least 10-20 years of shrinking is possible [1].

[1] https://www.youtube.com/watch?v=Nb2tebYAaOA&t=1800

agumonkey · on July 3, 2021

the last 20 years people had serious doubts on breaching 7nm (whatever the figure means today) but, and even if Keller is a semigod (pun half intended) .. I'm starting to be seriously dubious on 20 years of continued progress.. unless he means a slow descent to 1-2nm .. or he's thinking sub-atomic electronics / neutronics / spintronics (in which case good on him).

Nokinside · on July 3, 2021

Jim Keller is legend in microarchitecture design, not in process technology. All his arguments seem to be just extrapolating from the past.

Process engineers&material scientists seem more cautious. I'm sure shrinking goes but gains are smaller from each generation.

TSMC 3nm Process is something like 250 MTr/mm² and single digit performance increase and 15-30% power efficiency increase compared to older prosess.

tyingq · on July 3, 2021

It does, though, reduce heat, right? Which ultimately is more cores per socket. Which hits the thing that actually matters...price/performance.

Nokinside · on July 3, 2021

Yes. But that's a huge decline compared to even recent past.

Performance increases from generation to generation used to be much faster. TSMC's N16 to N7 was still doubling or almost doubling performance and price/performance over the long term. N5 to N3 is just barely single digits.

Every fab generation is more expensive than in the past. Soon every GIGAFAB costs $30 billion while technology risk increaseses.

Robotbeat · on July 3, 2021

That’s true, but because Moore’s Law has slowed, you’ll be able to amortize that $30 billion over a longer time.

ac29 · on July 3, 2021

> because Moore’s Law has slowed

Not sure that is really true based on the data. Remember, Moore's law says the number of transistors in an IC doubles every two years, which doesnt necessarily mean a doubling of performance. For a while in the 90's, performance was also doubling every two years, but that was largely due to frequency scaling.

https://upload.wikimedia.org/wikipedia/commons/0/00/Moore%27...

Robotbeat · on July 3, 2021

To be precise, Moore’s Law says the number of transistors per unit cost doubles (every two years). https://newsroom.intel.com/wp-content/uploads/sites/11/2018/...

A lot of the new processes have not had the same cost reductions. Also, some increase in transistor count is due to physically larger chips. Also, you have “Epyc Rome” on that graph, which actually isn’t a single chip but uses chiplets.

analognoise · on July 3, 2021

Yeah and after you have a working $30B fab, how many people are going to follow you to build one?

The first one built will get cheaper to run every year - it will pay for itself by the time a second company even tries to compete. The first person to the "final" node will have a natural, insurmountable monopoly.

You could extract rent basically forever after that point.

atq2119 · on July 4, 2021

I don't think we'll see a final node in our lifetimes. Improvements are slowing down and will become a trickle, but that doesn't mean research stops entirely.

Consider other mature technology, like the internal combustion engine. ICEs have been improved continuously, though the changes have become marginal as the technology matured. However, if research and improvements on ICEs ends entirely it's not because the technology has been fully explored but because they're obsoleted by electric cars.

fshbbdssbbgdd · on July 3, 2021

I thought the drivers of cost are lots of design work, patents, trade secrets etc. involved with each process. If there’s a “final” node, those costs should decrease over time and eventually become more of a commodity.

labawi · on July 3, 2021

That's only true if the supply satisfies demand.

prox · on July 3, 2021

The video that was posted goes into that (30min mark) and seems to reflect what you are saying.

agumonkey · on July 3, 2021

he might know some about the material science behind things but yeah, that said I'd like to hear about actual semi/physics researchers on the matter

barbacoa · on July 4, 2021

If we ever figure out a way to make caron nanotube transistors in volume, expect another 50 years of Moore's law.

dkersten · on July 3, 2021

Since the "nm" numbers are just marketing anyway, I think they don't mean much in regards to how small we can go. We can go small until the actual smallest feature size hits physical limitations, which is so decoupled from the nm number that we can't possibly tell how close "7nm" is (well, I mean, we can, there's a cool youtube video showing the transistors and measuring feature size with a scanning electron microscope, but I mean we can't tell just from the naming/marketing).

rorykoehler · on July 3, 2021

Check out the lex fridman Jim Keller podcast on YouTube

bertday · on July 3, 2021

On the same podcast you can find David Patterson (known for writing some widely used computer architecture books), who disputes this claim.

https://www.youtube.com/watch?v=naed4C4hfAg

At 1:20:00

someperson · on July 3, 2021

David Patterson is not disputing that there's decades left of transistor shrinking, he's just saying that the statement of "transistor count doubling every 2 years" doesn't hold up empirically.

David Patterson is saying he considers Moore's Law is dead because the current state of say, "transistor count doubling every three years" doesn't match the Moore's Law exact statement.

In other words, he is simply being very pedantic about his definition. I can see where he's coming from with that argument.

zsmi · on July 3, 2021

It's more than that though as it's important to remember why Moore made his law in the first place.

The rough organizational structure of a VLSI team that makes CPUs is the following pipeline:

architecture team -> team that designs the circuits which implement the architecture -> team that manufactures the circuits

The law was a message to the architecture team that by the time your architecture gets to manufacture you should expect there to be ~2x the number of transistor you have today available, and that should influence your decisions when making trade-offs.

And that held for a long time. But, if you're in a CPU architecture team today, and you operate that way, you will likely be disappointed when it comes to manufacture. Therefore one should consider Moore's law dead when architecting CPUs.

bertday · on July 3, 2021

I don’t think it’s irrelevant to look at changing timescale. If the law broke down to be 3 years, there isn’t any reason it won’t be 4, 5, or some other N years in the future.

vlovich123 · on July 3, 2021

Every 2 years

Robotbeat · on July 3, 2021

Right. But it is no longer 2 years so it’s not Moore’s Law any more.

stingraycharles · on July 3, 2021

And for those who don’t know, Jim Keller is a legend.

https://en.m.wikipedia.org/wiki/Jim_Keller_(engineer)

prox · on July 3, 2021

That was a nice watch! Thanks!

tyingq · on July 3, 2021

A revisit of how to do parallelism. Hopefully more successfully than Itanium and its compilers fared. The Mill CPU has some ideas there as well.

matthewfcarlson · on July 3, 2021

It was interesting working at microsoft next to some folks that were around in the Itanium days and worked on it. Hearing their stories and theories was really cool. I wonder if now is the time of alternative ISAs given that JIT and other technologies have gotten so good

xmodem · on July 3, 2021

We're already at the point where a transistor is a double-digit number of silicon atoms. The cost of each node shrink is growing at an insane rate.

The good times might be back for now - and don't get me wrong, I'm having a blast - but don't expect them to last for long. I think this is probably the last sputter of gas in the tank, not a return to the good times.

hutrdvnj · on July 3, 2021

But what happens if we hit the final plateau, same processor speed (+ minor improvements) for the decades to come?

jl6 · on July 3, 2021

We start optimizing software, and then we start optimizing requirements, and then computing is finally finished, the same way spoons are finally finished.

GuB-42 · on July 3, 2021

Are spoons really finished? I am sure plenty of people are designing better/cheaper spoons today. I love looking at simple, everyday objects and look at how they evolved over time. Like soda cans, water bottles. Even what may be the oldest tool, the knife is constantly evolving. Better steels, or even ceramic, folding mechanisms for pocket knives, handle materials, and of course all the industrial processes that gets us usable $1 stainless steel knives.

Computers are the most complex objects man has created, there is no way it is going to be finished.

agumonkey · on July 3, 2021

you can also optimize society because every time a human gets in the loop, trillions of cycles are wasted, and people / software / platforms are really far from efficient.

actually if companies and software were designed differently (with teaching too, basically an ideal context) you could improve a lot of things with 10x factors just from the lack of resistance and pain at the operator level

MR4D · on July 3, 2021

This is a really good point you make.

A simple example for me is how the ATM replaced the bank teller, but the ATM has been replaced with cards with chips in them. It’s a subtle, but huge change when magnified across society.

agumonkey · on July 3, 2021

are chips an issue ?

having worked in various administrations, the time / energy / resources wasted due to old paper based workflow is flabbergasting

you'd think after 50 years of mainstream computing they'd have some kind of adequate infrastructure but it's really really sad (they still have paper inbox for internal mail routing errors)

api · on July 3, 2021

Computing will never be finished like spoons in the software realm because software is like prose. It's a language where we write down our needs, wants, and desires, and instructions to obtain them, and those are always shifting.

I could definitely see standard classical computer hardware becoming a commodity though.

There will also be room for horizontal expansion for a LONG time. If costs drop through the floor then we could see desktops and servers with hundreds or thousands of 1nm cores.

imtringued · on July 3, 2021

The hardware will be finished. But the food we eat (the software) will keep changing.

ghaff · on July 3, 2021

You optimize software--which means more time/money to write software for a given level of functionality. More co-design of hardware and software, including more use of ASICs/FPGAs/etc. And stuff just doesn't get faster/better as easily so upgrade cycles are longer and potentially less money flows into companies creating hardware and software as a result. Maybe people start upgrading their phones every 10 years like they do their cars.

We probably have a way to go yet but the CMOS process shrink curve was a pretty magical technology advancement that we may not see again soon.

native_samples · on July 5, 2021

To some extent yes, but there's a lot of low hanging fruit.

Java is only just now getting value types, even though flattened value types are fundamental to getting good performance on modern CPUs. Much software is HTML and JavaScript which is so far from having value types it's not even worth thinking about. Simply recoding UIs from JS -> Java/Kotlin+value types would result in a big win without much productivity loss.

TheRealSteel · on July 3, 2021

Transistor size is not the only factor in processing speed, architecture is also important. We will still be able to create specialised chips, like deep learning accelerators and such.

gsnedders · on July 3, 2021

As already mentioned, there's plenty of innovation happening still with packaging, plus even on the IC level there's all kinds of possibilities for advancement: new transistor designs (to reduce power consumption or to increase density by decreasing spacing), monolithic 3D ICs (the vast majority of current 3D approaches are manufacturing multiple wafers or dies then wiring them together; if you can do it on a single wafer you can do a lot more movement between layers). Plus there's always the potential to move away from silicon to make transistors even smaller.

Away from the IC level itself we're only just starting to scratch the surface of optimisation algorithms for many NP-hard problems that occur in IC design, like floor plan arrangement.

MangoCoffee · on July 3, 2021

advance packaging, chiplet, 3DFabric...etc

peheje · on July 3, 2021

If history has taught us anything it's that technology won't stop evolving. And whenever humans thinks surely we've reached the peak of technological advancements, time proves us wrong. One thing is for sure, it's going to look very different from what we can imagine today.

Retric · on July 3, 2021

Technology stagnated all the time in history. The pattern is new approach gets perfected resulting in ever smaller gains and different tradeoffs. Look at say rings where personal skill plays a larger role than the advancements in technology. We might be better at mining gold today, but that doesn’t translate into a better ring.

Include longevity as one of the points of comparison and a lot of progresses looks like a step back. Cheaper and doesn’t last as long has been a tradeoff people have been willing to make for thousands of years.

oblio · on July 3, 2021

Technology also gets as good as necessary and not necessarily further, for long periods of time.

Babies have "colics", which are probably some kind of pain that we haven't identified yet, but because they go away on their own and parents are taught that they just have to deal with them, we still apply a medieval solution to the problem ("tough it out").

Rings seem to be the kind of problem where our current solution is good enough.

I don't foresee computing power to be the same. We'll want more and more of it.

So stagnation will be due to gaps in basic research, not due lack of interest.

rvanlaar · on July 3, 2021

I see that computing power is in high demand for quite some time to come.

I do believe there will be stagnation unless a different way is found. In the same way Henry Ford said people wanted a faster horse not a car.

And regarding travel, I would like to have faster, much faster transportation. However it hasn't come, yet. There seems to be a local optimum between costs and practicality.

andrewjl · on July 3, 2021

> Look at say rings where personal skill plays a larger role than the advancements in technology.

What about advancements metallurgy? And to use an adjacent example, cold-pressed steel techniques increase the number of places steel can be used. (See here https://en.wikipedia.org/wiki/Cold-formed_steel#Hot-rolled_v...).

More often that not, technology enables trade-offs to be made that couldn't be before. Making something cheaper and more fragile does lose something, service life, and so on. In exchange the cheaper thing is now used more widely. Perhaps this more widespread use unlocks something that only a few expensive yet high-quality units could not. Think of smartphones and the resulting network effects.

Retric · on July 3, 2021

18k Gold jewelry has been using 18 parts gold, 3 parts copper, and 3 parts silver for a long time.

In theory sure we could probably improve on it, but it works.

cwizou · on July 3, 2021

Can't help but feeling it's a bit of a diversion to make everyone forget the just announced delays of their 10nm server chips.

The original source, Nikkei, points to their own sources that claim that both Apple and Intel are currently doing early testing on 3nm. The article doesn't imply they are the only ones and according to Daniel Nenni, AMD did get the PDK too.

Then it's about who's ready first and who bought the most allocation. Rumours points to Intel buying a lot of it, although when that allocation comes is a bit unclear. Nobody should expect anyone but Apple to have first dib at anything at TSMC.

Plus it is Intel's first use of a bleeding edge PDK from TSMC, which is dramatically different from the older nodes they used so far (EUV and all). But it's been a long coming commitment (pre Gelsinger) from them to outsource their high perf DT and server chips for this node.

Their volume needs for this are high (not quite iPhone high, but still far above non-Apple customers) and it will be interesting to see what, and how soon they can launch in 2023.

I would probably expect Intel's lineup, especially the articulation around their so called HEDT (High End Desktop) segment to be shaken up a bit.

So would their margins.

https://www.intel.com/content/www/us/en/newsroom/opinion/upd...

https://asia.nikkei.com/Business/Tech/Semiconductors/Apple-a...

https://semiwiki.com/forum/index.php?threads/apple-and-intel...

ac29 · on July 3, 2021

Intel is already shipping 10nm server chips. See, for example: https://www.anandtech.com/show/16594/intel-3rd-gen-xeon-scal...

The article you linked is about their next generation server chip (which is also 10nm).

ksec · on July 3, 2021

I thought it was worth pointing out, Nikkei, ( the original source of this pieces ) and DigiTimes has exceptionally poor track record on anything TSMC.

So exceptional that every prediction or rumours have been wrong. Everything they got right was restating what TSMC said.

My suggestion for the HN community is whenever you see TSMC in the headline read it with the biggest grain of salt. Unless it is coming from a fairly reputable site. ( Anandtech for example )

alkonaut · on July 3, 2021

> If Intel can get back to a two-year node cadence with 2x density improvements around mid-decade

If I cut my kilometer time in half every six months I’ll do that 3h marathon next year!

heisenbit · on July 3, 2021

> “If Intel can get back to a two-year node cadence with 2x density improvements around mid-decade they can be roughly at density parity with TSMC,” says Jones

A guarded statement indeed. It is hard to see how Intel can catch up on semiconductor technology. At this point of Moore‘s law it is about financial capabilities and while Intel‘s revenues may be still a little bigger than TSMC the semiconductor related part measured in capacity is a third and Intel is not in the top-5.

Intel diverting money from own fabs in order to keep its options open and maintain a tech lead position is telling. Watch what they do not what they tell you.

specialist · on July 3, 2021

> it is about financial capabilities

Request for future: More about this, please.

Since binging on the acquired.fm podcast, I'm becoming more aware of the money side of things.

By way of analogy:

"Amateurs talk strategy. Professionals talk logistics."

I spent my career solving cool problems ("strategy"). While the geeks minding the business and finance stuff ("logistics") faired quite a bit better.

MangoCoffee · on July 3, 2021

>while Intel‘s revenues may be still a little bigger than TSMC the semiconductor

TSMC's Q1 2021 revenues is 362.41B

Intel's Q1 2021 revenues is 19.67B

TSMC market cap @553.16B

Intel market cap @229.20B

TSMC already surpass Intel.

https://www.google.com/finance/quote/INTC:NASDAQ

https://www.google.com/finance/quote/TSM:NYSE

greenknight · on July 3, 2021

TSMCs Q1 Revenue was 12.92B USD -- https://investor.tsmc.com/english/encrypt/files/encrypt_file...

The 362.41B is in TWD

MangoCoffee · on July 3, 2021

ah ok. google show b. i assumed its in USD.

greenknight · on July 3, 2021

In any case, in this nanometer war, ASML will come out the victor.

colinmhayes · on July 3, 2021

You actually believed TSMC had more than a trillion dollars of revenue a year?

greenknight · on July 3, 2021

Weren't Intel to use TSMC on 5nm in 2H 2021? -- https://www.extremetech.com/computing/319301-report-intel-wi...

I doubt Intel would move any of their core designs across to TSMC. Only fringe products... Its not like their CPU designs port across natively and from my understanding, they would have to have an entirely different team that would work with TSMCs IP.

baybal2 · on July 3, 2021

From my passing birdie, Intel will still be doing TSMC 5nm, but chose to announce the far in the future 3nm first because it "sounds big," and make it look as if they are ahead of AMD to Wall street people.

I haven't heard anything certain of AMD, but they are guaranteed to do very aggressive TSMC roadmap following, being fully aware that cutting edge node capacity is bought years in advance, and that Intel wanted it for the last ~3 years.

They probably simply don't have things set in stone for so far in the future.

belval · on July 3, 2021

It's not to impress wall street, so much as they can hijack TSMC production and constrain AMD's volume.

adventured · on July 3, 2021

Exactly. Intel's profits are still running near all-time record highs (which is 12x that of AMD). They can trivially constrain AMD by taking away their access to fabrication, buy as much of it out as possible. If nothing else it'll buy them more time to try to dig out of their mess.

samus · on July 3, 2021

It is in TSMCs own interest to not burn their existing customers because Intel still has their own fabs and can drop TSMC the moment they catch up. Rebuilding business relationships with clients takes time, and they are also not really without options: Samsung is very much also a key player in the market after all.

andy_ppp · on July 3, 2021

Surely that would be anti competitive in the extreme? I'm surprised TSMC are just taking the money now... I'd be really surprised if Intel don't learn a load of tricks as the drill into getting silicon onto a different fab provider - both about their own processes and the underlying details of the technology.

zsmi · on July 3, 2021

> Surely that would be anti competitive in the extreme?

Now prove it.

TSMC accepts paying customers. Intel is a paying customer too.

Haven't we spent years bemoning how Intel is behind TSMC? It seems natural Intel would take advantage of TSMC's expertise if they are ahead. And Intel probably is happy to use TSMC. Two rabbits with one shot.

Even if you could prove it, where do you plan to file the complaint?

xiphias2 · on July 3, 2021

You can't have fringe products on 3nm, they have to be profitable ones. It seems like a risky, but exciting strategy. I prefer that over the old boring Intel.

sharken · on July 3, 2021

It's a much better strategy for sure, as for AMD it seems that hard times are ahead if they can not move to 3nm after 5nm.

It seems very likely that AMD has no other option than moving most of their products to Samsung 3nm.

I hope that works out as the competition is needed.

https://www.gizmochina.com/2021/02/02/amd-outsource-gpu-apu-...

smoldesu · on July 3, 2021

Samsung's fabs have been getting better and better over the years, we could see a surprise blowout on the 3nm node. Hell, just the other day Samsung was testing an RDNA2 iGPU that was faster than the GPU on the iPhone 12... on the 7nm node. I'm no expert here, but AMD might finally have the silicon it needs to compete with Nvidia in raw compute.

atty · on July 3, 2021

That’s exciting news, but I can’t help but feel that at least in my field (machine learning), Nvidia is far more sticky than just their compute dominance. As long as CUDA is Nvidia proprietary, I just don’t think our team can afford to move.

I’m seeing lots of very impressive movements from Tensorflow and Pytorch to support ROCm, but it’s just not at a level yet where it would make good business sense for us to switch, even if AMD GPUs were 50% faster than Nvidia. And it seems like Nvidia is improving and widening their compute stack faster than AMD is catching up.

smoldesu · on July 3, 2021

I'm right there with you, I have an Nvidia GPU on all my machines too. With that being said, however, there are plenty of software-agnostic workloads that I run that could benefit from a truly open, powerful GPU.

veselin · on July 3, 2021

I was wondering why AMD didn't finally get access to the top notch process at TSMC. They can have some of the best margins in their products, just because they ended being the best now and they certainly have the volume already. They didn't even start addressing notebooks or the lower ends of the market with Zen 3.

Zen 3 is probably the fastest CPU on the market, it is not even scaled down to N5 or N6 which could get power consumption down to the best we have seen for notebook market.

Instead the rumors are that in 2022 the improvements will go mostly into packaging extra cache. And now the next processes are already booked by others.

selectodude · on July 3, 2021

Because AMD doesn’t have any money. Intel can fire off a 1 billion dollar wire transfer tomorrow. That would wipe out half of AMD’s cash reserves. Money talks.

colinmhayes · on July 3, 2021

Intel has 20 billion in profit a year. AMDs revenue is half that. AMD can't compete with apple and intel's cash flow.

the_lucifer · on July 4, 2021

But tbh Apple has been consistent in their TSMC orders so far. Intel is the dark horse here. It looks like Intel's trying to throw a wrench in AMD's future plans while they try hard to catch up.

jnjj33j3j · on July 3, 2021

Meanwhile, zero innovation in Europe... No wonder considering that even in Germany (country with highest SW engineer salaries in Europe), an average developer will only make 2500 per month after tax... For that money, you will only be able to buy a bread and butter, nothing else.

akvadrako · on July 3, 2021

The machines that power TSMC are exclusively from Europe.

Also the average programmer salaries might be low, but there is plenty of full-time freelance work in Europe with competitive rates; a decent gig in NL is around €100/hour.

moooo99 · on July 3, 2021

> For that money, you will only be able to buy a bread and butter, nothing else.

Maybe you're trolling or maybe you're just actually clueless, but when looking at how much people can afford, its not sufficient to convert the euro to dollar amounts

RealityVoid · on July 3, 2021

Last time I checked ASML is an European company. Regardless, things are... Different in the EU and there are plenty of complaints regarding innovation, but this tired talking point about EU stagnation lacks nuance and is not true.

cma · on July 3, 2021

At this point ASML has a higher market cap than Intel.

jonplackett · on July 3, 2021

> It is thought that the iPad will be first to get 3nm chips

Is anyone really pushing the current iPads tech specs? Until they massively beef up iPadOS I’m not sure this will benefit anyone particularly. The battery already lasts all day.

Tempest1981 · on July 3, 2021

High-end games

elorant · on July 3, 2021

Are there triple A games for iPad?

jonplackett · on July 4, 2021

Exactly. And if there were is that a reason anyone’s buying an iPad?

gameswithgo · on July 3, 2021

you could shrink the battery

swiley · on July 3, 2021

We reached the point a long time ago where the freedom of the platform is far more important than performance for all but the most demanding tasks (really just video editing at this point.)

ghaff · on July 3, 2021

>performance for all but the most demanding tasks

Client-side performance is increasingly irrelevant except for, as you say, video editing and a vanishingly small slice of gaming (and that is at least as much about GPUs anyway). But then, clients outside of phones are also increasingly commodities anyway.

However, performance on servers remains incredibly important. Of course, you can just throw more hardware at the problem which increases costs but is otherwise perfectly doable for most applications.

zozbot234 · on July 3, 2021

> Client-side performance is increasingly irrelevant except for, as you say, video editing and a vanishingly small slice of gaming

I really don't see it that way. Web browsing used to be considered a very light task that almost any hardware could handle, but the performance demands have been steadily climbing for quite some time.

ghaff · on July 3, 2021

Performance and memory requirements have doubtless gone up but I use a five year old MacBook Pro and it's perfectly fine for browsing. Performance isn't really irrelevant of course but browsing generally doesn't push anywhere close to the limits of available processors.

nly · on July 3, 2021

One has to wonder how long it will be before Apple build their own fabs

eyesee · on July 3, 2021

I wouldn’t count on it. Apple is vertically integrated, but doesn’t do their own manufacturing. It’s not necessary when They can dominate the supply chain and hold a practical monopsony on the best production capacity anywhere. This is why everyone else was lagging to produce 5nm parts -- Apple simply bought 100% of TSMC’s capacity.

narrator · on July 3, 2021

The amount of centralization of fab tech is astonishing. You've got all the sub 7nm fab tech that the whole world depends on for the next generation of technology coming out of a few fab plants in Taiwan. Software is easy. Material science is hard. There is one company that makes the machines that the Fab uses. They cost several hundred million each, have a multi-year backlog and only TSMC has really mastered integrating them into a manufacturing process that can scale.

Thankfully, things are getting more diversified. TSMC just started construction of one of several new plants in Arizona last month: https://www.phonearena.com/news/construction-of-tsmc-5nm-fab...

0xy · on July 3, 2021

I wonder how large the king's ransom was that Intel threw at TSMC to make this happen. Seems like it was necessary given their faltering fab strategy, and faced with years of a limping 10nm and 14nm+++++++++ products.

test_epsilon · on July 3, 2021

Are you generally asking how much it costs to purchase early slots on leading edge nodes?

I'm not sure if being Intel would have made TSMC jack up the price or anything. It is in their interest to get large profitable customers, and it is in their interest to take business away from Intel's fabs. TSMC might well have offered significant discounts or other special treatment to land Intel contracts.

0xy · on July 3, 2021

TSMC are experiencing over 100% demand, they don't need to offer discounts. In fact, they stopped giving any discounts months ago even on older processes.

Apple has deep pockets and spends heavily to ensure supply chain stability, but we haven't seen this behavior from Intel so I'm curious how big of a price they had to pay was.

test_epsilon · on July 3, 2021

> TSMC are experiencing over 100% demand, they don't need to offer discounts. In fact, they stopped giving any discounts months ago even on older processes.

I stand by what I wrote.

> Apple has deep pockets and spends heavily to ensure supply chain stability, but we haven't seen this behavior from Intel so I'm curious how big of a price they had to pay was.

Would obviously highly depend on volumes, timelines they committed to, layers, materials, and specific devices and tweaks they are paying for. Could be less per transistor than Apple, I can't see hwy it would be vastly more (unless we're talking about extremely limited volums).

smoldesu · on July 3, 2021

> I wonder how large the king's ransom was that Intel threw at TSMC to make this happen.

Probably similar to how much Apple paid to edge out the competition on 5nm.

sschueller · on July 3, 2021

From what I have gathered the smallest trace spacing is 3nm but not all traces. So it would be interesting to know what percentage is and the progress in getting all or most to 3nm.

rob74 · on July 3, 2021

I'm not an expert, but from what I have been reading these "node names" have been pretty much decoupled from actual trace dimensions for a few years already. So I wouldn't be surprised to find out that most of the structures are in fact bigger than 3nm.

To quote Wikipedia (on the 5nm process!):

The term "5 nanometer" has no relation to any actual physical feature (such as gate length, metal pitch or gate pitch) of the transistors. It is a commercial or marketing term used by the chip fabrication industry to refer to a new, improved generation of silicon semiconductor chips in terms of increased transistor density, increased speed and reduced power consumption.

intrasight · on July 3, 2021

My understanding is that it's more like the marks on a ruler. And the process shrinks the whole ruler. So it does ultimately relate to the feature size.

dilawar · on July 3, 2021

How these nm based metric relates with transistor density or flops per watt?

bullen · on July 3, 2021

I think we're going to be stuck at ~2Gflops/W CPU.

If we ever see 3+ on cheap low power consumer hardware I'll eat my shorts.

tyingq · on July 3, 2021

Any thoughts on MicroMagic? https://arstechnica.com/gadgets/2020/12/new-risc-v-cpu-claim...

bullen · on July 3, 2021

Ok, I should have added with enough total power to do some meaningful work...

amelius · on July 3, 2021

Is the US even aware at a political level that Taiwan is now ahead of the US and that US businesses are giving up?

xiphias2 · on July 3, 2021

They are giving incentives for TSMC to build factories in US for a good reason. Both China and US are aware how important it is.

amelius · on July 3, 2021

Yeah but a Taiwanese factory on US soil still isn't US tech. Shouldn't the US invest more, while we can still catch up?

xiphias2 · on July 3, 2021

TSMC is both a monopoly for the newest chips and a geopolitical risk in the case of China taking over TSMC in Taiwan.

I think that if TMSC has a subsidary in US, it would be really hard for TSMC Taiwan to block TSMC US from licensing IP from it after the knowledge is transferred. Regarding the monopoly status: sure, healthy competition is good, but it's something competitors (Samsung, Intel) should tackle on the market.

lotsofpulp · on July 3, 2021

US should be handing out green cards to TSMC workers and their families.

smoldesu · on July 3, 2021

If your house was built using "US tech", you'd be living an Amish-adjacent lifestyle.

yyyk · on July 3, 2021

Intel isn't out of the fab business, just behind. Being behind is much less relevant economically than some people think, especially in this seller's market.

It's not like 14nm processors will magically stop working, you just pay more TDP for performance. If one day the US decides Taiwan is not worth it, it would just pay a bit more until Intel catches up (which will happen eventually).

amelius · on July 3, 2021

The problem is that Intel is already more than 2 technology nodes behind. Catching up could be exponentially expensive, especially if all orders are now going to Taiwan.

yyyk · on July 3, 2021

Intel 10mm is approximately like TSMC 7mm (different measurements but similar transistor density). So I'd say 1.5 nodes behind? AMD and TSMC used to be behind like that, and eventually they caught up. Intel has the resources, and given shortages, they have a bit of time too.

amelius · on July 3, 2021

> So I'd say 1.5 nodes behind?

A node corresponds to a factor of .7 in size, see [1].

> AMD and TSMC used to be behind like that

Yes, but as a technology evolves it's getting harder to catch up.

[1] https://en.wikichip.org/wiki/technology_node

samus · on July 3, 2021

Since 20nm, a node step does not necessarily have anything to do with sizes and is purely a marketing term. Even if there are size shrinks, they are hardly comparable if a switch to another technology (FinFET -> GAAT) is involved as well.

mensetmanusman · on July 3, 2021

EUV is the unknown here. It helps skip a lot of density doubling steps, so we might see a quick catch-up.

ohazi · on July 3, 2021

Half the world won't even acknowledge that Taiwan is a country.

loyukfai · on July 3, 2021

Publicly, currently, yes.

emayljames · on July 3, 2021

Part of China, yes. You are right.

jtdev · on July 3, 2021

The Taiwanese people don’t seem to agree with your statement.

AussieWog93 · on July 3, 2021

Publicly, many of them do.

A formal declaration of independence would force Beijing to either resume war with Taiwan or lose face domestically.

Pretending that your homeland isn't technically a real country is a small price to pay to prevent insane Mainlander nationalists from launching a flurry of DF-41s into the heart of Taipei.

amelius · on July 3, 2021

The people in Hong Kong didn't agree either ... :(

rlanday · on July 3, 2021

Hong Kong has been indisputably part of China since July 1, 1997.

jlokier · on July 3, 2021

The consent of the Hong Kong people in 1997 was entirely dependent on the Hong Kong Basic Law and Sino-British treaty being honoured by China.

China his indisputably reneged on both agreements since 1997. Virtually nobody in Hong Kong or associated with Hong Kong would have accepted the 1997 handover if they had known in advance.

This should be borne in mind by anyone evaluating whether to trust China on any important agreement in future.

"I am altering the deal. Pray I don't alter it any further" comes to mind.

hnfong · on July 4, 2021

What consent of the Hong Kong people?

The "consent" of the UK government was obtained because it was cornered by treaty terms of 99 years. I don't think the UK government thought it expedient to obtain the consent of the HK people. Point me to a pre-97 poll of Hong Kongers with a majority supporting the handover and we'll talk about whether China actually reneged on their agreement (singular, mind you. The Basic Law is a domestic, national law of the PRC, not a treaty).

jlokier · on July 10, 2021

Fair enough regarding the people's consent. Looking at your subsequent comment you obviously know a lot more about the situation than I do. My mother was born in Hong Kong because my grandmother lived there for some years, but we barely have any connection to it.

(It is interesting that I can't tell from your comments whether you are arguing in support of the HK people or in support of the PRC. I find your writing curiously ambiguous on this.)

I would agree there was no people's consent in the usual meaning. But what my GP comment really means is a kind of passive consent of the people (along with active consent of the British government on their behalf), where the expressed anxieties and anger of the Hong Kong people were much more restrained in 1997 due to the Sino-British treaty and the adoption of the Hong Kong Basic Law, compared with what they would have been if they'd known China was going to dishonour those after signing them.

Even not considering consent, it looks clear to me that PRC has reneged on the high profile agreement it signed with Britain that was designed to protect the Hong Kong people and way of life as Britain departed.

From that I stand by my view that: "This should be borne in mind by anyone evaluating whether to trust China on any important agreement in future."

If I understand correctly: Although the Hong Kong Basic Law is a PRC national law as well as effectively the constitution of Hong Kong, it's existence and content is connected to the Sino-British treaty, and its adoption is a condition of the handover.

It cannot be regarded as an entirely "domestic" PRC thing, and it was not created by the PRC in isolation. There is ample evidence that the PRC no longer follows some fundamental tenets of the Basic Law (for example freedom of speech and freedom of assembly), and that now looks like a blatant contravention of the Sino-British treaty.

It looks to me like the situation is heading towards one where even if a resident has a BNO passport (a kind of British passport) and could theoretically move to Britain, they may be prevented from leaving Hong Kong by the PRC authorities.

jtdev · on July 4, 2021

You speak with what seems to be the perspective of someone who has lived under authoritarian rule and doesn’t grasp that although “consent” may not always be explicit, it is in fact required for any state to maintain legitimacy.

hnfong · on July 5, 2021

Funny, I didn't expect to hear the CCP's narrative here.

You're so wrong I don't know where to start.

What you seem to be implying is that, as long as the government stays in power and maintains a facade of stability, they can presume consent of the people for any policy they manage to implement.

This doesn't sound like somebody from a democracy would say, this sounds like what the CCP says. (Which is fine, I've said this myself (with some reservations).) But it doesn't even make sense in this context.

If you even had a inkling of knowledge of what happened during the 1980s when the talks were taking place:

- The vast majority of Hong Kong people preferred keeping status quo (UK continue to govern HK). There were polls to this effect. You could not have lived in the 1980s and 1990s in Hong Kong and not notice that almost _everything_ in the media (films, TV shows, songs) expressed the anxiety of the Hong Kong people to the brave new world after 1997

- Many people started migrating away upon hearing news of handover in 1997. It resulted in a huge brain drain.

- As such the Brits had to grant UK citizenship to a bunch of Hong Kong elites, middle class professionals and government officials, to placate them and convince them stay (because with UK citizenship they can leave any time they want, no hurry)

- The ones who did not have means to leave were mostly skeptical about the whole situation, but since people with influence were "bought out" by the Brits with UK citizenship, and you can't "fight" a government to "force" them to govern you, nothing serious happened.

I don't think _anyone_ who knows this part of Hong Kong history could even argue that there was consent among the Hong Kong people for the handover. It was a closed-doors deal between the UK and China with little regard to what Hong Kong people thought. In all fairness the UK did what it could do at the time, but claiming that Hong Kong people had "consent" is simply a misrepresentation of history. Using your weird arguments to negate historical fact is either ignorance, or willful negation of clearly recorded history.

People under authoritarian rule know full well that the fact that the government maintains power is not evidence that they support their policies. People running authoritarian governments use your argument all the time. I really don't know where you got your ideas from.

Also, it reeks of ad hominem - presuming where I have lived and my ignorance due to it. Perhaps you're just trolling and I shouldn't feed a troll, but this is one of the few topics where I'm obliged to keep the record straight.

jtdev · on July 6, 2021

Be that as it may… the people of Hong Kong (and Taiwan) seem to be very emphatically (with few exceptions) communicating a distrust of and desire to be independent from China and the CCP.