I've often pondered: why does this problem submit to sustained exponential improvement for such a long time? There doesn't seem to be any physical characteristics of miniaturization that map directly to exponential functions, in the same way as many rate-of-change problems do when you first learn differential equations.
I've largely concluded that it's a combination of three things.
1. As Fenyman said, there's plenty of room at the bottom. This is why so many orders of magnitude improvement have been available.
2. As Turing proved, computers are general purpose. A consequence of this is we have only a small number of problems to solve that apply to all software, and so as the industry has grown exponentially, we've been able to put concentrated resources into making hardware better.
3. Exponential growth is simply the best we can do as human innovators.
Exponential growth isn't a property of the problem; it's a limit of our innovation. We are able to hit the limit due to the economic consequences of computers being general purpose, and able to sustain it due to there being plenty of room at the bottom.
computers are also used for their own future designs.
I would imagine that faster computers could help design better computers in the future, thus there's some level of exponential growth there. However, i think computer aided designs are reaching the limits of what help it gives, and the bottleneck is human intelligence and material engineering difficulties (that cannot use the help of computers).
that's why moores law was more exponential early on in the miniaturisation of chips, but it's slowed down more and more - till one day it's gonna be sublinear.
Of course, this doesn't preclude a new breakthrough in a different medium (optical computers, rather than electronic?), or a different form like quantum.
I think there are two components that are required for exponential growth, with the second one being a corollary of the other:
1) There needs to be some feedback from the effect of the process back to the constraints on the process
2) There must not be a significant external limitation
where 2) implies that the total (economic) effort expended towards the process is small compared to the total (economic) benefit derived from the process. As the sibling comment already states: We could not produce modern chips without (at least modernish) chips to execute the design tools. But I'm not sure how critical that actually is for the exponential growth (extrapolating from the inefficiency of all other software written today).
The interesting question is, whether we are close to a limit yet. Which, I think, translates to the question, whether we are already close enough to the edge of what is physically possible to have some significant economic cost (with respect to the derived economic benefit)
What makes me think right now is your 3) .. What would a (sustained) better-than-exponential growth require and how would we identify such a thing/opportunity
10 x 10 = 100 and 14 x 14 = 196. So in order to double the transistors we only need to shrink by 70% each time.
So I wonder if the emphasis on exponentials is misleading. A geometric decrease in feature size is sufficient, there were lots of atoms to work with while making wires ever smaller, and up till 2010 or so the regime of Dennard scaling gave shrinking transistors the very nice property of not affecting power consumption.
I know the article goes into more complex techno-economic reasons (and I guess that is what you are really discussing as well), but most IEEE papers cite Dennard scaling as a critical enabling driver. The end of Dennard is what made the IDRS roadmap identify an "end" to Moore's law.
Quick aside, I get what you are saying (that is: multiply the original size by 0.7), but as a non-native speaker of English: isn't "shrink by" the opposite of "shrink to", and wouldn't this wording technically imply shrinking to 30% of the original size?
If you're shrinking to a fixed(ish) percent at a fixed(ish) interval, I'd call that exponential. The difference between exponential and geometric is really just continuous vs discreet, isn't it?
Innovations in transistor design help to shrink it on one dimension (e.g. the semiconductor size gap). This translates to a silicon wafer in two dimensions.
Therefore, for every factor (n) decrease we get in transistor length, we can fit (n^2) more transistors on a chip. Therefore we don't need (as) exponential amounts of innovation to get exponential outcomes.
Maybe I'm being crazy, Babbage you can correct me if this is way off :)
>the end of Dennard Scaling around 2006 has meant that the rate of increase in performance has slowed even as Moore’s Law has continued.
This is why parallel compute (multi core/GPUs) has taken off since 2006 - heat puts a cap on clock speed, so the only option is to do more operations per cycle. Even some microcontrollers are dual-core now.
I don't see this trend stopping. Massive parallelism is the future of computing.
There are limits to how much programs can be parallelized, see Amdahl's law. Even if Moore's law technically continues, the extra capacity isn't adding the same amount of extra value to the workflow of end users that it once did. Here's a video that talks about diminishing returns as it relates to graphics in computer games https://m.youtube.com/watch?v=VUCYAws6oW0. Diminishing returns in regards to polygon counts themselves have also been discussed: https://www.polygon.com/2013/12/10/5192058/opinion-stop-dwel...
Also, even if Moore's law remains in place for now, it will end at some point and the world needs to think about what happens afterward.
What day to day problems do people have that require parallel processing to solve anyway? Sure graphics and neural networks, but is having a neural network the end problem that needs solving? Maybe I'm just a grumpy old programmer but I believe the human mind is specialised to think in terms of serial solutions. I think that's no accident as race conditions and other synchronization issues are difficult to avoid in parallel algorithms. That may be a fundamental property of processing in general, even for algorithms some AI may come up with, which remains mostly fictional at this point.
>I believe the human mind is specialised to think in terms of serial solutions
The brain is massively parallel - 80 billion neurons operating asynchronously at only 100hz.
>is having a neural network the end problem that needs solving?
Absolutely! Neural networks are a general-purpose way to solve a huge number of problems, especially problems that involve raw data or the messy real world.
I know I think in parallel often enough that two threads end up disagreeing with each other and a 3rd one has to sort it out. I can’t begin to describe the internal fork() sensation that happens in my noggin constantly.
This is true, but parallel programs are not more limited - they're just good at different things. We will write different kinds of programs to do different kinds of tasks because of access to massive parallel compute.
Parallelism allows you to do computations that integrate huge amounts of information, like checking every pixel of an image against thousands of weak rules all at once. This is very inefficient to do serially because you can only check one at a time. It's also a huge usecase right now because machine learning lets you learn weak rules from data.
It’s funny, because while technically correct, you may end up fundamentally wrong that there was a limit to the value we might find in parallel compute. After all, we may have lie of sight to AGI through parallel scale alone.
What you've identified is known as Gustafson's Law. If you can keep solving larger and larger problems, your solution time doesn't necessarily go down, but "speedup" is potentially uncapped.
Of course there may be some practical limit to Gustafson's Law, but I don't think we've found it yet, at least in many scientific domains.
Intel's ARC gpu's keep improving with driver updates, if they ship another generation I can see them eventually properly disrupting the current duopoly.
ARC is still very far away from even the lower end of what Nvidia offers. And on the enterprise side Intel is also well behind with their Gaudi chips. I'd love cheaper GPUs as much as everyone, but I don't see anyone who would could be considered a threat to Nvidia's position over the next years.
"very far away" isn't accurate - the A770 is pretty close to a 4060 - perfectly adequate for 1080p/1440p 60fps gaming on high settings in modern titles.
Jim Keller gave a talk at Berkeley titled "Moore's Law is not dead" (2019) where he summarized a bunch of up and coming technologies that will enable the continued scaling of transistor density - https://www.youtube.com/watch?v=oIG9ztQw2Gc
Moore's Law has been entirely coopted by the media and, as such, lost all meaning and predictive power, which is what made it famous in the first place. Once upon a time it was a useful tool to plan programs with.
Spectre is just about timing attacks on branch prediction. It has nothing to do with transistor density, and it can be remediated with a reasonable impact on performance, that is if remediation is necessary. These vulnerabilities are also hard to exploit in real life and not always relevant.
Rowhammer has to do with RAM density, but again, hard to exploit and there are mitigations.
Not that important depending on use-case. Lots of people have been running for years with `mitigations=off`. If someone could steal their crypto, they would have.
> Gargini, who is chairman of the IEEE International Roadmap for Devices and Systems (IRDS), proposed in April that the industry “return to reality" by adopting a three-number metric that combines contacted gate pitch (G), metal pitch (M), and, crucially for future chips, the number of layers, or tiers, of devices on the chip (T).
It’s not my field of expertise, but couldn’t we adopt a measure of density of transistors? Billions per mm or something like it.
A single number is also good for marketing as the MHz of the 90s have shown us
> It’s not my field of expertise, but couldn’t we adopt a measure of density of transistors? Billions per mm or something like it.
That's actually what the current "marketing nm" are meant to do. They're the notional extrapolation of simple planar transistor density to complex 3d transistor architectures.
People like to stand on a soap box and stand impressed with themselves for pointing out these are "just marketing numbers not the real half pitch anymore" but they're missing the point. The industry continues to use the nm nomenclature because it's clear to them what everyone is referring to.
Modern fab processes are so complex you're not going to characterize their actual technical details by any single number.
Is there a resource you'd recommend that would help one understand what `marketing nm` means technically, and why it's a quantity that the industry uses "because it's clear to them"?
All it means is average transistor density went up. It's supposed to mean by 2x but, in practice, it doesn't even mean that really. i.e. "At the 2020 IEEE IEDM conference, TSMC reported their 5nm process had 1.84x higher density than their 7nm process." [2]
And there are a lot stuff in there besides transistors: contacts, metal traces, power routing, ...
And there's also the quality of the software that draws the transistors. (No one draws billions of transistors by hand) And to further complicate things, there are many good reasons to make your transistors larger than the minimize size, and in fact many are.
So, it vaguely means density went up and unfortunately there isn't a single resource that says how to measure it or what the limit is.
Come to think of it, It's kind of like Moore's Law (the topic of TFA) in that it used to have a definite meaning. But then the meaning sort of evolved over time into something else without society actually agreeing on what that "else" was.
It isn't complex at all. Take the average density (average of sram and logic), take the square root of that, and you have (barring a factor) the node.
The best way to understand the dimension is to look at the contacted gate pitchm2 pitchcell height,and take the square root. The pitches obviously make an area, the cell height sets the 'design' component. So you get a reasonably scaled number.
Putting it that way also clarifies technology. Why euv, to tighten contacted gate pitch. Why cobalt: to tighten up metal 2 pitch (without killing resistance). Why backside metal: lowers cell height. There is of course more to it, but it is a good way to coarsely understand
A good way to know if it is marketing is if someone talks about feature sizes instead of pitches. Patterning is done by pitch. For example, you don't do euv to make smaller features. You do it to make more complex layouts at tight pitch.
> contacted gate pitchm2 pitchcell height,and take the square root
I am having trouble parsing `pitchm2 pitchcell height`.
Assuming that you meant to use `*` and then HN's markdown assumed it was italics, and placing `×` where italics start and begin:
sqrt(pitch × m2 pitch × cell height)
Then I am left with the questions:
What are the units of pitch? Best guess: length units? (Based on: cell height having length units. `m2 pitch` being dimensionaless?
Why do the pitches "obviously make an area" then?
What are the units of metal to pitch? Best guess: dimensionless. (Based on the final units of the sqrt being repoted in length, and the cell height presumably having units of length).
What the units of cell height? Best guess: length units, based on what the variable is called.
I suppose here is the best place to ask: Isn't Moore's law capped by fundamental physics (i.e. tunneling effects once you get around / below 1nm sizes)?
EDIT: did some snooping around, Moore himself said it in 2005:
"We have another 10 to 20 years before we reach a fundamental limit."
It depends on how one defines transistor but I believe there is no known fundamental limit. If transistor as switch is sufficient then single-atom transistors have been demonstrated. https://en.wikipedia.org/wiki/Single-atom_transistor
In that (extreme) example, each transistor requiring at least 1 atom, would be your fundamental limit.
New semiconductor materials or entirely different approaches (like quantum computing, optical ICs, 3D stacking etc) may give us a few more orders of magnitude.
But as the article states: most likely it's economic barriers. Market conditions must support the technical effort required. Otherwise that tech is a dead end no matter how far it could go in theory.
FWIW: in a way I'd welcome an end to Moore's law. It's about time software bloat gets addressed more aggressively.
Some discussion around applying Ngrams to your investigation: It's really easy to make ngrams say something wrong by accident. I don't think these ones turned out too poorly for what the conclusions were trying trying to convey but for modern terms there are two things you want to keep an eye out for or adapt for:
1) Enable case insensitivity, particularly for things like "transistor" which are rarely capitalized or "integrated circuit" but even for "should be capitalized things" like "Moore's Law" vs "Moore's law".
2) (Especially on things spanning less than a century) check the impact of smoothing. It can "smooth out" the introduction of term to appear to take off years early or hide the true peaks/variabilities because they only lasted a couple of years instead of decades like the default smoothing might be better suited for.
3) (Not as applicable here) make sure the terms are popular enough you're not just comparing noise.
4) Adjust your range so the data is properly scaled. If your term was coined around the 1970s (and comparatively uncommon prior to that)then the 1870s shouldn't be midway through your graph.
5) If various case insensitive spellings create a messy set of graph lines adding in a less popular term like "exaflops" will cause them to be merged and you can extract that line.
6) Be careful comparing things that can be pluralized or otherwise modified with things that aren't. E.g. "Moore's Law" vs "Integrated Circuit[s]?"
Combining some of these, notes like "Here is a further ngram of the ‘End of Moore’s Law’ which has clearly been a live topic of discussion since the late 1990s." actually look to be a bit off when you switch the graph to clue in on the last 30 years, set smoothing to 0, and enable case sensitivity to catch "[Ee]nd of [Mm]oore's [Ll]aw". Case insensitivity changes the shape of the humps and 2019 ends up significantly higher than the peak in the early 2000s. Shrinking the x-axis and not smoothing out the peak to be wider than it is shows there is very little in the way of the end of Moore's Law in the late 90's, only really starting to take off in 99 and not peaking until 2003.
A similar story repeats for the opening graph, the gap between variants of "Moore's Law" and variants of "Integrated Circuit" is more than double the frequency, adding in that variants of "Integrate Circuits" are even more than either and counted separately.
Overall the points turn out right enough (a few years difference in when the phrase found its first wave doesn't impact the story, nor does how frequent the use of Moore's law is vs ICs as much as the general rise of one and decline of the other) but I always want to highlight how the friendly interface of Google's Ngram viewer can easily lead you to the wrong conclusion on the data.
I'd also like to point out the above tips are neither exhaustive or universally applicable to any type of search. Sometimes you want more smoothing, sometimes you do want an exact version of a phrase, sometimes you do want to show the term was used long before the modern meaning, and sometimes there are additional things that can bite you. E.g. one I was worried about here was "Moores Law" typos excluding the ' but it turned out to not be common at all. You really don't know until you check each possibility though.
Pretty much every time I’ve seen an article that used Google ngrams as a secondary anecdote, it’s a bit of introductory fluff, and I pay zero attention to the integrity of the analysis. It’s the equivalent of a journalist writing “people are saying…”.
One example being once the phrase "sweet summer child" came up with someone asking where it originated and why it was popular all of the sudden. Some were insistent it was a common phrase in recent times prior to A Game of Thrones using it and others were saying it's always been a common thing and GoT was just the latest media but not related to its modern usage. A default ngrams search will show some decent activity in the 1800s then the phrase catching on a bit in the early 90s, staying steady through the late 90s, until it took off in the late 2000s. The phrase turns out to have definitely been said in the 1800s but maybe not with the frequency the graph might hint at initially because of the infrequency of recorded material at the time. Then unsmoothing things and shrinking the date range you'll see it really didn't take off until just after 1996 where it died down pretty quickly until 2011.
I.e. what initially looked like something that had been a common phrase before and into popularity in recent times turned out to be something that had been previously mentioned but not really in use then lines up exactly with the release dates of the book and movies for A Game of Thrones instead of something which slightly preceded it. This shows how easy it is to end up with two opposite conclusions by using Ngram Viewer slightly differently.
Alright, enough rambling about the hazards of Ngram Viewer in a post focused on Moore's Law :p.
I've largely concluded that it's a combination of three things.
1. As Fenyman said, there's plenty of room at the bottom. This is why so many orders of magnitude improvement have been available.
2. As Turing proved, computers are general purpose. A consequence of this is we have only a small number of problems to solve that apply to all software, and so as the industry has grown exponentially, we've been able to put concentrated resources into making hardware better.
3. Exponential growth is simply the best we can do as human innovators.
Exponential growth isn't a property of the problem; it's a limit of our innovation. We are able to hit the limit due to the economic consequences of computers being general purpose, and able to sustain it due to there being plenty of room at the bottom.