In 1988 a friend and I presented at the 3rd Conference on Hypercubes and Concurrent Computers. We had spent the summer and fall programming an Intel Scientific Hypercube. Fractal calculations, heat diffusion, chemical reactions, that sort of thing. Used to take us 20 minutes a run.
We’re on the vendor display floor walking around, talking to folks, watching demos. A guy has a board in an expansion board in his desktop computer. The cover is off and he’s pointing out the 2 transputer chips on it. Meanwhile the display screen is filling out a familiar drawing of a fractal. I ask how long it took them to render the drawing. “Oh, no, it’s doing that work now.” A cute little board readily outstripping our refrigerator sized machine back home.
I ended up talking with David May. The thing he seemed most proud of at the moment was their floating point work. They had written a description of their new floating point unit in Occam. They had tested and debugged it. They had formally proved the implementation of a standard (IEEE 754?). And then they had built the silicon.
I returned home with a set of inmos manuals. Never could get anyone back at uni to see the potential. Sigh. Still, at the moment I knew I was truly in the presence of the state of the art.
Could you be a bit more specific in how the two companies and / or products are related, apart from the nameplay? I noticed that the then chief architect of Inmos also co-founded XMOS.
“ The name XMOS is a loose reference to Inmos. Some concepts found in XMOS technology (such as channels and threads) are part of the Transputer legacy.”
I have an XMOS dev board sitting in a drawer basically unused since I found I had to use their proprietary extended C dialect. But I understand they may have fixed this since?
You have to use their XC dialect, but it is mostly C anyway.
And you can include reglar h-files to easily link to and build regular C. With provided macros you can have h-files which are using some XC features but still be compatible with both XC and C files (making it easy to wrap XC code and call it from C and vice versa).
But I don't find XC to be bad, and likely a much better starting point than the library above.
The compiler is a fork from gcc from 2006 (if I remember correctly) and that shows its age somewhat. The many protections in XC can be a bit tedious as well.
I had existing code written in C++. I ran it using GCC on the Parallax Propeller instead. I had no desire to port to XC as I was trying to keep a core of it relatively cross platform.
Of course this is just another of my unfinished hobby projects, so :-)
Transputers were and are interesting, they are the closest to a dataflow or systolic array processor with enough power and forethought gone into their design that you could use them for real world applications. But the MHz wars killed all those effort and only now, that we've exhausted the easy gains does it make sense to review the past to see what we can salvage in terms of ideas.
Transputers weren't dataflow. They were parallel processors. The Transputer had a program counter (instruction pointer). Dataflow machines don't need one: when an instruction has all of its inputs, it executes on the next available processor. Only a few experimental dataflow machines were ever built, e.g. the one at Manchester University.
Technically true, which is probably the best kind of true, but if we're going to get technical I did say closest. You could program a Transputer to use its links as inputs and outputs in a fabric where the implementation details of program counters and firmware running on a particular Transputer were hidden from sight. This allowed for all kind of interesting architectures to be created 'ad hoc' without having to go through the stages of circuit design and so on, and it allowed for much more complex operations than you'd be able to get out of a 'real' dataflow processor on a single tick or message passed from one processor to another because of the much higher level at which they operated.
So sure, they weren't true dataflow processors right off the bat but you could make them look like a pretty good simulation of one on the outside, even if you didn't have access to a crossbar switch (which would be cheating) and absent the real thing that was as close as I was going to get at the time.
I've worked on a ton of dataflow systems which were macro-dataflow, i.e. they executed on conventional processors. Great programming model in many instances.
You could program a dataflow system on a network of Transputers but the hardware isn't dataflow. We would have called it a General Purpose Parallel machine.
The two are related. The Wikipedia article I posted has appropriate links that explain the relationship. Note that I replied to someone who said “dataflow“ and not dataflow hardware or architecture.
> Note that I replied to someone who said “dataflow“ and not dataflow hardware or architecture.
The comment you directly replied to, yes, but it was itself responding to a comment saying "they are the closest to a dataflow... processor" which seems to be referring to dataflow architecture?
The Maxeler hardware claims to be dataflow, and there is some in these parts. However, I couldn't find out much about it except that it doesn't seem to support languages with a Lucid/SISAL-type heritage.
MIT Tagged Token machine is another TTDA. I actually wonder if cheaper CAM or Qubits would help, even though debugging would still be very, very painful.
I read this as information density in theory, not necessarily in practice.
For this result to be true in practice evidence that ternary hardware can be manufactured and operated at the scale of binary hardware, e.g clock speed, error rates and power draw. Everything I have heard prior pointed to the conclusion that binary is more efficient when accounting for the physical properties of the substrate.
I'm sceptical of claims in the realm of physics on the basis of pure math without empirical confirmation.
Is there any modern research into manufacturing ternary chips?
Binary also has mathematical advantages over ternary, for instance 2 is the smallest integer greater than 1 (no shit) so powers of 2 are as close together as you can get in the integers.
Bitwise operations correspond to the finite field GF(2), while there is a GF(3) it's not nearly as interesting or useful.
Binary has advantages on a discrete/logical sense (namely bitwise operation).
Balanced ternary (just ternary no) has the advantage that the sign is trivially included in the number, so there is no consideration of (1/2 complement or signed/unsigned extension)
This is partially a lie, as balanced ternary can be read as normal ternary giving only non-negative integers... Still it is a more natural encoding of arithmetic
a) Numeral Systems (e.g. ternary) are just trees, and specific numerals are just paths from root to leaf.
b) A 6-digit numeral roughly corresponds to a tree of length 6.
c) Base_10 corresponds to a tree with 10 possible children for each node.
d) e is the most efficient multiplier when trying to achieve compound growth in the fewest iterations of multiplication.
> Also, why does Euler's constant appear all over the place?
e is special because e^x is its own derivative. It also acts as a "bridge" between addition and multiplication. It often appears where growth or trees are involved.
Thank you for taking the time to explain, with a diagram even.
Another comment mentioned "radix economy", which, together with your description helped me understand (generally) why Euler's constant is the most efficient base in terms of number of digits needed to express numbers.
The cost of finding a leaf in balanced tree (with data only at leaves, which is how we represent integers in base B) of size N is on average proportional to its branching factor B multiplied by its height H. Height is (log N)/(log B), so cost is B * (log N)/(log B).
By derivatives, that's minimized when B satisfies
0 = (log N) (1 /
(log B) + B(-1/((log B)^2)(1/B)
==
0 = 1/(log B) - 1/(log B)^2
1 = log B
Base of the logarithm = B.
This looks like you can pick any logarithmic base b
you want, and so
any B you want, but in fact the derivative I wrote assumes e is the base (hence the term "natural" logarithm.
Other bases b would yield a scaling factor of ... e/b,
since d(e^x)/dx = e^x
and b^x = e^((ln b) x)
You can chase "deeper" reasons for this all day long by digging deeper into the the many definitions/properties of e and proving they are equivalent.
Playing with e is the most fun you can have in pre/calculus.
Frankly, it's just a pattern I've observed from playing with numbers myself. I'm unsure how to explain it properly, and I have no academic sources to point toward. You can probably find a better explanation somewhere in an article on optimization problems.
The intuition is that (e) is the optimal water-level when limited water is distributed among a variable amount of buckets where all the filled buckets multiply each other. Alternatively, it's like how volume() is maximized where
volume(x, y, z) = (x y z)
const = (x + y + z)
when (x = y = z). Except in our original situation, the number of dimensions is arbitrary instead of fixed.
Balanced ternary is insane, and most data formats would need to be adapted to deal with it. Except in specialized processors, it is ludicrous to even consider it until almost every other avenue has been explored.
Binary has so many useful properties that are relatively intuitive to understand. Binary is a uniquely convenient number system. Even day to day, base-two number systems are hyper-convenient.
One thing I can think of that balanced ternary has interesting properties for is 2D space partitioning. Balanced ternary partitioning is an interesting tool; then again, you could just use modulo binary space partitioning (i.e. wrap the space around by half the width of a binary partition), it's almost as good. I wonder what the balanced ternary equivalent of an octree would be, I guess it would be a 27-tree, kinda cool, like the 3² tree.
> used in the Soviet Union until 1970
So was a planned economy, and they still haven't recovered. The KGB sure have staying power though, must be the balanced ternary advantage! ;- )
It's not like balanced ternary has no positives, but you have to acknowledge that well before even decent relay computers were invented, binary was already much better explored than balanced ternary; obviously relay computers helped solidify that (though bi-quinary did not survive, popular as it was since Colossus).
Signed arithmetic is elegant enough in binary, I guarantee you will fail to express even basic arithmetic concepts in balanced ternary to any lay person.
As for experts implementing them in a system, there are advantages to balanced forms, but good luck finding enough experts. The benefits probably start kicking in for very large integers, where the carry itself can take longer than you might like.
This is a matter of physical implementations. Using normal transistors ternary is horrible, but there are other physical medium where ternary is more natural (often using magnetic field).
There was a range of transputer addons and dedicated systems all the way up to the very pretty and very expensive Meiko Computing Surface[1] with something like 100 (300? honestly can't remember) transputers in it. It seemed, for a while, like they were being shoehorned into everything. There was a distinct belief that transputers were the way the future was going. I think the expectation was before too long everything would gain a set of transputers as replacement to a single one thread CPU, or in addition to, much like machines gained graphics cards.
I had an encounter with a Meiko Computing Surface (64 or 128 Transputers, I think?) around 1990, as I was studying Occam at the time. It was impressive to watch it rendering high-res 3D fractal landscapes in real time.
Ditto a few years before that - we had a Meiko box in the visuals R&D dept of the flight simulator company where I was doing my electronic engineering apprenticeship, although I didn't really get hands-on with it.
I was at Atari when, one fine day, some folks in the UK that we had never heard of did a product announcement about an Atari computer that definitely was not coming out of our engineering group. (We were down to one building at that point, if you don't count our remote office in Monterey as part of the GEM porting effort).
I remember the Tramiels being a little flabbergasted by the Transputer announcement as well.
I was a summer intern at the UK company (Perihelion) doing the Atari-based Transputer machines. The word there was that because Atari had invested in the UK company (?) the Atari machine was essential to include as the front-end, even though it didn't really fit the UK designers' idea of what a good front-end machine would be... (nothing against the Atari design, just that its quirks/shortcuts didn't match up with the Transputer cluster backend 'vision').
I’ve always thought of the Zachtronics game TIS-100 [1] to resemble programming an extremely simple transputer. Definitely worthwhile to try the game, if only to find out about all new kinds of problems you run into if you have to distribute your processing over multiple independent nodes that communicate over serial links!
I hypothesize what was really missing to make the transputer successful was a language/compile-target to express propagators: https://youtu.be/nY1BCv3xn24
Is there an algebraic logic strongly associated with what we conventionally think of as types (see https://cuelang.org/docs/concepts/logic/) which we can effectively and efficiently compute unions and Boolean satisfiability?
I'm not sure who is "They" here. Well...ok I have a bit of an idea of one or two people ;)
But for 90% of the transputer group at the time it was very obvious that we needed alien language support. We had to spin up a compiler group, which took time. I seem to remember we also OEM'ed some compilers from outside vendors before the internal ones were ready (memory hazy and I didn't work directly on software there). I had direct contact with many customers, specifically discussing C language support and I don't at all remember anything like what you describe where 3rd party products were deliberately not mentioned.
Again, I'm talking about 1986/7 or so. In that time frame the T4/T8 were performance competitive with anybody's CPU. However you could only program in Occam. Huge hygiene factor.
Of course today we have golang which has much the same features as Occam and everybody thinks it is the best thing since sliced bread. And we have folks seriously re-writing dusty decks in Rust. But back then you just couldn't seriously sell a CPU that forced you to program in an unknown, fairly basic, programming language.
Also, internally we still used a lot of BCPL. So the company itself didn't eat its own dogfood in the way it expected customers to.
http://tardis.dl.ac.uk/computing_history/parallel.html confirms my memory, but perhaps the timeframe is wrong. (I rather thought Bill Purvis wrote a compiler, but it seems that was Occam for the T machine.) Thanks for the history, anyhow.
I programmed in Occam for a time. It used two spaces for indentationally-significant code which I thought a great idea then -- and which is one reason I liked Python and took easily to it. https://en.wikipedia.org/wiki/Occam_(programming_language)
For a time around 1986-1987 I had several Transputer boards in a robot lab I worked in at Princeton University. I liked the INMOS sale pitch that someday you could have a transputer at every robot joint. Something similar is relatively easily accomplished now using various serial connections between microcontrollers -- even if it might still be easier to just run cables to the joints from boards elsewhere. That combination of Transputer boards (including several loaner boards from another lab on campus which has got them related to NSA-funded signal processing stuff) for a time gave me the most powerful computer on the Princeton campus in terms of total memory and CPU cycles -- even if the IBM 308X mainframe in the computer center no doubt had more total I/O capacity.
One other thing I liked about transputers was the (for the time) high speed serial links between them which could also go to special chips to read or write parallel data. I used such chips to interface a network of transputers to the teach pendant of an Hitachi A4 industrial robot so they could essentially press the buttons on the pendant to drive the robot around. Interesting how modern USB in a way is essentially just a serial transputer link. A PU undergrad student I helped used that setup for a project to use sensing whiskers to "feel" where objects were (inspired by a robotics magazine article I had seen by a group in Australia who had done that for a different robot).
While transputers not supporting C was an issue early on, another issue was just that Occam did not easily support recursion. Transputer at the start also did not have standard libraries of code then like C had from Unix. And also transputers were expensive (as was the software development kit) and also they were not easy to get in the USA with various supply chain issues and delays.
Occam was mind-expanding in its own way. I had previously networked a couple of Commodore VICs and a Commodore 64 for a robot project (via modems to an IBM mainframe) for my undergraduate senior project at Princeton a couple of years earlier using assembler, BASIC, and C -- so the transputers with Occam were a big step up from that conceptually. Still, practically I had gotten the Commodore equipment plus a parallel code simulator (a VM in C) I wrote for the IBM mainframe running under VMUTS to do more interesting stuff -- perhaps because I had years of experience programming in those other languages by then whereas Occam was so different and I did not spend that much time with it before leaving that job. Even without Occam and transputers, that robot lab job was the best paying job working for someone else I ever would have in many ways as far as autonomy, mastery, and purpose -- but I did not appreciate it then as it was my first real job beyond college and I figured the grass would be greener somewhere else -- not realizing that grass tends to be greener where you water it. Thanks for creating such a great working situation for me, Alain!
> While Inmos and the transputer did not achieve this expectation, the transputer architecture was highly influential in provoking new ideas in computer architecture, several of which have re-emerged in different forms in modern systems.
I have a vague sense that there's a rich vein of potential projects and maybe businesses in "before their time" ideas in the world of computing.
If you want inspiration the Computer Chronicles was a show on PBS documenting in real time the process and rise of computers. I believe all episodes are on YouTube here: https://www.youtube.com/user/ComputerChroniclesYT
This is incredible. I think I've just found my new late-night / fall-asleep 'fictional' TV show — because it really feels that way, in our weird cognitive perception of the past.
Thanks a lot for the pointer. This may spark funny and uncanny world-domination ideas! ( :
A serious hint/note for the "futurologists" / innovation-driven folks out there — my people: look for what was true back then but is no longer, especially in the form of limitations or roadblocks or axiom (hard limit) to a design. Remove that piece and see what you get...
Couldn't agree more. A lot of things were invented in the 60's-80's which didn't work out due to lack of applications, technological limitations of those times etc.
Yes, hundreds, many of which are now being 'reinvented' or 'rediscovered'. Time to figure out how to persuade investors to get on board - processor architecture isn't one of their fads at the moment.
The japanese tried to build massively parallel computers back in the 80s. The project ultimately failed because the performance of normal CPUs continued to improve. It looks like we are once again in an age where parallelism seems to be the only way out: CPUs have hit a thermal ceiling, speculative execution has led to vulnerabilities... Given the popularity of today's GPUs and their applications, I'd say the 5th generation computer project was ahead of its time.
USB
On-chip PLL clock generator
Automatic power on reset
On-chip programmable DRAM controller
Hardware threading
On-chip I/O DMA
Concurrency and parallelism in programming languages
«It means that no more than three of the SPE's can talk to the memory controller at one time using full capacity. Four SPE's will fill the bus, and the CPU controller will not be able to access the memory at all.
[...]
What you can do is have all the SPE's work at the same time, but using
nowhere near the capacity they each have. Basically the PS3 gets punished
for performing well, and tweaking cell for better results in the future
seems to be a nightmare if not impossible.
»
The memory read speeds for the SPEs (Synergistic Processing Elements, or "APUs" these days) was horrible. It was about 3 orders of magnitude slower than it was meant to be.
This is why the chip basically died. Memory read speeds for the SPUs were impossibly slow, meaning that they were crippled into uselessness, AIUI.
On the 40th anniversary of Inmos, creators of Transputer, we filmed this set of talks discussing the legacy and impact of Inmos in Bristol, UK. You'll find some fun insight into Inmos and the transputer in some of the longer talks.
Question to people who know anything about circuit, processor design: are we confident that current designs and paradigms are almost-as-good-as-it-gets for our classes of materials, or is it just the result of 'good enough design + scale economics = winner CPU arch' (resp. all types of processors) but there could be many great "unknown unknowns" out there?
If Intel were given a container ship full of a gold and a mandate to redo the basic underpinnings of computation, and given a ten year alternative timeline, I'd wager they'd be about 30% faster than an equivalent Intel that maintained the status quo.
Honestly I think we're more held up by programming languages that make concurrent programming hard than by the underlying architecture. It's not that hard to compile an existing C++ or Java codebase to a new ISA. It's hard to rewrite it in a language that scales well to thousands of threads. And that language doesn't even exist. The closest we have is whatever Nvidia is calling C++ on CUDA and that sucks.
Current CPUs are no where near "almost-as-good-as-it-gets" for the simple reason that compatibility with existing software (OS and Consumer Software wise) still is main driving force in CPU design. If you come up with something revolutionary, you have to convince hell-ofa-lot software engineers to port to your arch, because without software no one will buy your hardware.
I guess the choice thus comes down to how much potential benefits may be possible, versus what software actually needs? (I can see first-hand we don't need more single-thread performance in so many use cases)
I just have no idea "how good it can get", compared to what we have now. Is there an alternative world where CPUs vastly outperform ours in performance/power using the same materials? But more importantly how much can "vastly" be?
A huge portion of the power a CPU consumes is simply due to the clock running which is why modern CPUs go through tremendous efforts to disable circuits not in use and lower the clock speed when the CPU is idle.
However, imagine if we didn't have a clock at all. If, instead, the only thing that caused CPU transistors to switch was new instructions being executed.
In that world, the only time CPUs would consume any power is gate leaks and when the CPU is actually doing stuff. That would translate in extremely low power draw.
Performance would also be way up. Data would mainly be constrained by switching speed, which can be super fast. Today, data speed is mostly constrained by how big the largest section of the pipeline is and how fast the clock moves.
So why don't we do this today? Mostly because the entire industry is built around synchronous (with a clock) CPU design. Switching to an async design would be both super difficult (lack of tools to do so) and very different from the way things work now. Just like parallel programming is very hard, async circuit design requires a large amount of verification that we just lack. Further, HDLs are simply not well built.. but also not well built for async.
Async circuitry usually requires more transistors and more lines. That, however, isn't really a problem anymore. Today, the vast majority of transistors in a modern CPU are spent not on logic, but on the cache.
It'd be super expensive to adopt. It would be totally worth it. But I doubt we'll see it happen until AMD and Intel both completely stop at advancements.
I was in a startup trying to commercialise async technology back in 2002; we wound it back to just doing better clock gating and eventually sold out to Cadence.
It's not a silver bullet. It gives you maybe 30% less power? Gate leakage has been creeping upwards too, since there's a direct speed/leakage tradeoff.
There's been a few advances that have limited gate leakage (primarily finfets that I'm aware of). But it is still there.
I agree though, not a silver bullet. It would buy one generation of power gains and performance.
Marketing wouldn't like it either because clock speed has so often been used to sell CPUs.
I'm also not sure how much modern CPUs can incrementally add async rather than having a groud up redesign and if that would get them close to the same power gains. Already, modern CPUs have impressive latencies for most instructions.
Real gains, though are somewhat unkowable. If I were to guess, the first place we'll see an async CPU will be mobile. After that happens, we should have a much clearer picture of the real gains it grants.
There was apparently (I didn't see it) a neat demo where you could run a live benchmark and spray freeze spray on the processor, which would cause it to speed up, since the propagation delay was inherently dependent on gate temperature.
I don't expect to see it commercialised any time soon. Too much retraining and retooling required.
> One very notable feature due to the asynchronous design is the drop of power dissipation to 3 μW when not in use
That's impressively low power!
I wasn't aware that AMULET was a thing. Neat to see that someone put the effort into making an async CPU.
I had heard from one of my professors that Intel tried the same thing with a Pentium 1 and ultimately gave up due to poor tooling. (I don't know the exact timeframe of this, but I believe it was around the P2 or P3.)
Yes. When I said "Too much retraining and retooling required", I meant on the IC design side. An AMULET user would see nothing unusual about the processor apart from uneven execution speed.
(If you're worried about side-channel attacks, you definitely don't want asynchronous technology as it's going to leak data-dependent timing information!)
This is a very interesting perspective. The evident benefit of async helps us see a world/civilization where indeed computing is pervasive to a much, much deeper/higher degree.
Positive note: it could certainly help take silicon-based electronics further in a resource-starved world; it could/should also simply be part of the paradigm of the next thing if it comes soon enough — photo-hype, buzz-ristors, whatever tech wins.
(Thanks for an uplifting glimpse at the "TO DO" list of humanity, and one more spark of interest as a programmer!)
This actually seems like it would be somewhat ideal for many web servers, where in a large number of cases the only interesting things going on are in response to events.
These kinds of questions have an essential unknowability to them, in that every phenomenon you try to model through computing is an inexact approximation - whether it's something like the precision of a mathematical computation or "what is this human being's real name".
What we have done to date is the low-hanging fruit: take known categories of application and juice them up by applying CPU power. And for that, the amount of power you need is "enough to have an effect". A lion's share of the benefits of putting computers in the loop were realized in the 70's or 80's, even though the measured amount of power that could be applied then was relatively miniscule. There were plenty of alternatives at various times that were "better" or "worse" in some technical sense, but succeeded or failed based on other market factors. The real story going forward from then has been changes in I/O interfaces, connectivity and portability - computing has merely "kept up with" the demands imposed by having higher definition display, networking, etc.
So then we have to ask, what are the remaining apps? If we can engineer a new piece of hardware that addresses those, it'll see some adoption. And that's the thrust of going parallel: It'll potentially hit more categories at once than trying to do custom fast silicon for single tasks.
But the long-term trend is likely to be one of consolidating the bottom end of the market with better domain solutions. These solutions can be software-first, hardware-later, because the software now has the breathing room to define the solution space and work with the problem domain abstractly, instead of being totally beholden to "worse-is-better" market forces.
What an intelligent bird's-eye view of the problem. Thank you. I love the perspective, you kind of blend the evolution of simple bits at the hardware level (increasingly lots of them, but nonetheless "known category of applications") with complex high-level software space. The clarity, looking forward, is greatly improved through this lens.
That last paragraph confirms my own vision for the next cycle, the next decade or so.
About "remaining apps": the current cycle of ML, post-parallism (GPUs, DL, etc. since early 2010s) is imho one such candidate for transformative computing, where like vapor or hydrocarbons or electricity, computing lends itself to enabling a whole new category of 'machines', of tools. That's really interesting (just not the end-all be-all some seem to think, but a whole new solution space to build upon).
I guess robotics is a fitting candidate as well (insofar as interacting with real physical objects changes everything) but we are many years away from commercially-viable solutions, afaik.
I also like to think there are (potentially transformative) social or behavioral use of compute-enabled machines that we haven't scratched much yet. Areas of life/civilization generally too complex to be brute-forced or even fully modeled, like biology and health, or the more advanced social behaviors (topics best described by Shakespeare, Tocqueville, Stephen Covey, Robert Greene...); or things we 'just' need to brute-force e.g. life-like graphics/VR or seamless/ambient/'augmented' computing/reality. Some of these may be in for the taking during the next cycle or two.
Single thread performance is still the limiting factor for things like perceived web browser performance. Raw MIPS is available in absurd amounts, but things like memory round trip delay become limiting.
Is that where Zen's architecture, with memory controller central to the chip (close to memory IO up/down) and 'chiplets' containing cores left and right, makes so much sense?
I'm partial to announcements of "3D" stacking chips; wherein memory cells, controller and CPU cores are arranged like cake layers, thus with extremely short memory round trips. A clever design could entirely remove the need for L3 cache I suppose, make RAM behave essentially as such. It's elegant I think, somewhat balanced in design, essential. And I find it crazy and awesome that cooling those is even possible.
I don't think this is the case. We have seen MIPS and ARM appear, and all sorts of ISA extensions (like NEON). Amazon even offers non-x86 machines now!
The old mantra "without software no one will buy your hardware" is no longer generally true. There are lots of programmers now days, and as long as your new chip can show improvement in one area, someone may use it. We've even had chips which execute Java bytecode directly!
“Worse is better” applies on all levels: microarchitectures, ISAs, operating systems, applications, languages, computing paradigms, etc. Once Moore’s law is truly over, we will be forced to make some real progress in areas other than shrinking silicon features.
That was kind of my intuition, from what little I know about these things (nerdy culture, but tangential and second-hand at best).
Tangential remark, however, I hope there's room for post-silicon / -electronics technology. We are far from some of the more advanced use-cases of computing (currently "sci-fi" but hard/real science, just hard problems in technology as well), in ways that no cleverness nor cheat (e.g. qbits) could substitute for actual processing capability. I don't know how many orders of magnitudes silicon still has to go, but my impression is that it's relatively limited physically, compared to what could be. But that's a topic for the 2030's and later, probably, hopefully. Here's to the quest for Landauer's limit!
No signup needed, just change your user agent to "Googlebot" :)
>“Moore’s ‘law’ came to an end over 20 years ago,” says David May, professor of Computer Science at Bristol University and lead architect of the influential "transputer" chip. “Only the massive growth of the PC market then the smartphone market made it possible to invest enough to sustain it.
>“There’s now an opportunity for new approaches both to software and to processor architecture. And there are plenty of ideas – some of them have been waiting for 25 years. This presents a great opportunity for innovators – and for investors.”
I feel the same. I long for a stagnation in processing tech so that we can go back to weird assembly tricks being commonplace and necessary to keep up. But gan or diamond will probably step in and prevent that.
Could you please elaborate? I'm pretty sure I only get 10% at best of the implications of this. I understand crystal clear what you mean — very much agree.
FWIW I can't stress enough that this is just a question / thought experiment, whose point is probably to explain why we could do better but don't, and with good reason (boiling down to "not worth it, at least yet, possibly ever" I suppose, but the details are the meaty part for me).
If you're interested in more academic Transputers take a look at the CPA (Communicating Process Architectures) conferences [0]. Just last year it featured two papers on Transputers.
Nice to see that there's still Transputer work going on at Kent - I studied Occam briefly there in the late 80s/early 90s and was allowed to look at (not touch!) their Meiko Computing Surface rendering fractal landscapes in real time.
I remember seeing these advertised ubiquitously in computer magazines in the late 1980s. Was always a bit curious about them. This Wikipedia link 30 years later is the closest I ever came to interacting with one myself.
Green Arrays suffer from the issue of low compute power. They target the extreme low power market using a proprietary Forth dialect. It is difficult to benchmark how it performs against existing general purpose ARM/x86-64/RISC-V or even NVIDIA chips other than their claims of efficiency gains due to a parallel architecture. Most of their existing press (especially on HN) is because their founder is the inventor of Forth and that everything is allegedly built on a single bootstrapped end-to-end Forth system with its own EDA toolkit. Would be interesting if somebody can do a rigorous test against e.g. Parallela chips.
We’re on the vendor display floor walking around, talking to folks, watching demos. A guy has a board in an expansion board in his desktop computer. The cover is off and he’s pointing out the 2 transputer chips on it. Meanwhile the display screen is filling out a familiar drawing of a fractal. I ask how long it took them to render the drawing. “Oh, no, it’s doing that work now.” A cute little board readily outstripping our refrigerator sized machine back home.
I ended up talking with David May. The thing he seemed most proud of at the moment was their floating point work. They had written a description of their new floating point unit in Occam. They had tested and debugged it. They had formally proved the implementation of a standard (IEEE 754?). And then they had built the silicon.
I returned home with a set of inmos manuals. Never could get anyone back at uni to see the potential. Sigh. Still, at the moment I knew I was truly in the presence of the state of the art.