This article is based on a faulty premise. The A10 processor is still far away from the performance of recent Intel CPUs (a quick browse of geek bench shows about 2 times the single and 4 times the multi core performance). Apple is very quickly making the gains towards the limits of Moore's law not because of a different model of computation but because those gains had not yet been realised for mobile CPUs. As the performance gap is getting smaller, it is likely that the year-on-year improvements in processor performance will also slow down for Apple CPUs.
Which is not to take away from the achievements of the A10 design team, considering performance per Watt this chip is incredible.
Yes, an A10 core is only 60% as fast as a single intel core. But the Intel part has a 91 watt TDP.
IMO 60% as fast is not "far away". Clearly if apple wanted to use more power and/or transistors they could improve the performance further.
Apple's so competitive in that space that intel's actually in retreat in the mobile space.
Sure Intel is much faster in multicore, but that's because it has much more power to work with. Intel's has tried to scale decent performance to lower power levels, it hasn't worked out particularly well so far.
The more interesting question is will Apple have better luck scaling 64-bit arms up than intel has had with scaling down.
I don't disagree. What I found interesting is the bigger picture of hardware
By which I mean that, I think 'performance' is a slippery eel of a concept. The A10 fits in my pocket and and the Xeon Phi does not...or rather it doesn't in a way that provides me with useful computations...and the latest i7 doesn't hold a candle to the GPU in my $40 graphics card when I want to rotate the 16 million pixels my camera stuffs into a RAW file...and if I want to build a Kubernetes cluster, I can throw Raspberry Pi's at the problem.
the latest i7 doesn't hold a candle to the GPU in my $40 graphics card
I think I understand your point (unless you meant iPhone 7?), but interestingly, recent Intel processors have impressively powerful graphics processors built in: https://software.intel.com/sites/default/files/managed/c5/9a.... So if you were using full capacity of that i7, I think it would beat your graphics card handily. The issue is that practically no one is writing code to use the full capabilities of these modern CPU's. A friend did a write up here:
http://lemire.me/blog/2016/09/19/the-rise-of-dark-circuits/
Even without the built-in GPU, I'd bet that the right software running on that i7 would blow away that $40 graphic card. I think problem is that we don't really have the right tools for writing low level multi-core software. Image rotation parallelizes and vectorizes really well, the x64 side of those processors have excellent vector capabilities, but it still requires hand coding to get top performance.
if I want to build a Kubernetes cluster, I can throw Raspberry Pi's at the problem
A thought experiment: how fast a cluster could you make from a few dozen iPhone 7's connected wirelessly? The processors are surprisingly fast, and I think they support 802.11ac at gigabit plus speeds. Could you possibly do distributed computing on an ad-hoc network of iPhones that happen to be nearby? An app with a sandboxed work queue that accepts local connections? There's lots of reasons it makes no practical sense, but it would be quite a demo.
An Intel i7-6700 will do about 200 single precision GFLOPS[1] and costs about $300 [2]. An Nvidia GeForce 710GT will do about 350 single precision GFLOPS [3] and costs about $40 [4]. A pixel pipeline is one of those 'embarrassingly parallel' workloads and the software I use, Darktable [5] is tuned to take advantage of GPU parallelism...the tools are there in no small part due to the gaming industry.
Thoughting on an iPhone cluster, the hurdle seems to be software that is designed to create friction against implementing such a thing: drivers and firmware in particular.
An Intel i7-6700 will do about 200 single precision GFLOPS[1] and costs about $300 [2]. An Nvidia GeForce 710GT will do about 350 single precision GFLOPS [3] and costs about $40 [4].
You are right, and I stand (mostly) corrected. Alternatively said for Skylake, with unrolled 256-bit FMA, you can calculate close to 16 single-precision floating point calculations per cycle per (physical) core: two vector loads, one vector store, and the 8x32-bit FMA itself.
At 4 GHz this is about 100 Gflop/s per core. For 4 cores this is about the same 350 Gflop/s as the $40 graphics card you pointed to. And realistically, loop overhead will knock you down another 10%, and if you run for any length of time you probably be thermally throttled back to something closer to the 200 Gflop/s you cite.
My general question about why no one is interested in using the built-in Tflop/s capable side of the Skylake die stands, but I was wrong to think that a well-tuned desktop processor could come anywhere close to the price/performance of a cheap graphics card for raw flop/s. Thanks for pointing out the real numbers.
The iris pro 560 is about as fast as an ultra low end graphics card. It's simply memory starved so it might do ok on some benchmarks, but in the real world it's a ~2010 graphics card.
Sure, as a standalone graphics card for gaming it's low end.
But as an accelerator to a CPU, it seems like a phenomenally underutilized resource. It has a direct connection to RAM, a 128MB cache, and the high tier can do over a TFlop/s[1]. For use cases like Ben mentioned of rotating a RAW file that fits in this cache, it's almost a perfect fit, yet almost no-one would think of using it so it stays dark.
Why is this? Not too many years ago it would have been thought insane to have a TFlop of unused capability on die, and now it's a non-story. I don't think it's because we have no use for the speed-up. Rather I think that the tooling just isn't there for most programmers to be able to make use of it.
Well there's a bunch of problems:
1) If your are going to completely rewrite your code for an accelerator you are liking going to use CUDA to target a MUCH better accelerator.
2) If you don't you are likely going to ignore the decent intel GPUs because astonishingly small number of the intel SKUs (let alone the non-intel) have a decent GPU. Try to find a regular product/laptop with a i5-5675c or i7-5775c in it for instance.
3) CUDA was available first and seems to have the lion share of the mindshare. OpenCL is a distant second.
4) Even if you have the right SKU, devices with said SKU are often not designed for heavy GPU use and will heavily thermally throttle.
I've played this game hoping for a linux laptop with a non-crappy GPU and I didn't want to play games with nvidia feeding pixels through the intel GPU to get to the screen. Very few laptops have the Iris chips. The lenovo 721s has the iris 540, but they locked linux out with their broken BIOS that disables AHCI. The XPS 13 has it as an option, but you have to get the I7 and the 3200x1800 shiny/relective screen that halves your battery life.
I did manage to buy an Iris 540 NUC, nice little machine and the GPU seems to keep up quite well for the light usage I've tried. Most web games, full screen 1080P movies, and minecraft.
GPUs and i7 CPUs have radically different design philosophies. The i7 has great performance per core and giant caches, while the GPU has lots of (relatively) slow cores with tiny caches, and effectively only SIMD-instructions.
For simple, highly parallelized floating-point number crunching nearly any GPU will blow your i7 out of the water, no matter how you program your i7. Only a tiny fraction of the surface area of the i7 has stuff useful for that task, but GPUs are designed for that works. Conversely, any GPU will be absolutely terrible at running Microsoft Word.
Not clustering, but Apple has hinted that they're exploiting (at least when they're plugged in) the computational power of hundreds of millions of phones to do work like training image recognition algorithms.
This perspective fits into the top-down evolution narrative: Write highly specialised solutions in the most general way in software until essential, generic parts are discovered and stable enough to be translated into transistor logic. Repeat.
Or like the Bitcoin "evolution": CPU -> GPU -> FPGA -> ASIC (this is a very simple, single purpose optimisation, but it can illustrate part of picture)
Only optimising transistor speed, their size or the whole CPU/GPU package has obviously limitations and may be a dead end.
Dead end is far too negative. There is nothing wrong with improving general purpose computing. It's the very best thing you can do - when you can. Just be ready to adapt (by going for specialization) when you can't.
Huh, not where I thought this article was going from the title. I thought it'd be about software bloat eating into the gains given by hardware (which, in hindsight, is exactly backwards from the title).
One important aspect to apple's position is it had a new set of limitations other processor designers weren't dealing with; power consumption and size. Yes there were small processors and there were processors that didn't use tons of power, but they weren't fast. Apple had a very specific set of needs and apple being apple, was able to completely costomize its design and supply chain
My speculation: Intel doesn't want to threaten it's core buisiness in the pc market (which is heavily dependent on backwards compatibility), and the experience with itanium in the hpc market probably left a bad taste in their mouth.
Again that is how it was back in the day, CPU and supporting hardware, OS, compiler, standard libs etc all co-engineered. Tho' in the case of the Amiga the CPU was part of the supporting hardware, the heart of the system was something else that farmed work out to the CPU or a co-pro as appropriate...
Intel seems to be captured in its own x86 world. There were no big changes to the instruction set for ages, even the x64 extensions were designed by AMD. So while Intel still excels at manufacturing, the few projects for breaking out of their box failed (Ithanium, Larrabee). And also: why are there no Intel chips yet who include all the nice things USB-C can offer to the market? E.g. Thunderbolt, newest DisplayPort.
That seems more like a vague rumour than anything, at least until someone leaks some more info. (That it hasn't occurred, makes me somewhat doubting. Maybe the various undocumented instructions which have been found over the years are part of this...)
This approach was common in mainframes - you would buy a box that contained everything, but only the parts that you'd paid for would be accessible. If you wanted more, you could just apply an unlock code that you'd bought, and new features would come online, that were lying dormant all along. Even up to the level of more CPUs.
To add to the other comments, Intel has already integrated a FPGA directly into some of its Xeon server CPU packages to allow for custom logic in hardware:
One thing I find disconcerting lately is that all of intels recent innovations appear to be on items that are very hard to buy as a solitary developer (Knights landing, Optane, FPGAs).
>Intel seems to be captured in its own x86 world.
Eh, no.
>even the x64 extensions were designed by AMD.
AMD (not working solo) has published the spec in 2000, AMD, Intel, VIA and a few others have a pretty different implementation of that spec, Intel's x64 extension is drastically different than AMD's implementation.
Intel has been adding new extensions with virtually every generation, just look at how many generation of virtualization support extensions we've had.
>the few projects for breaking out of their box failed (Ithanium, Larrabee).
Itanium died because the industry didn't want to take a "RISC" (;)).
Larrabee isn't dead, in fact it is very much alive it is just inside your CPU, the AVX extension set and the silicon that supports are Larrabee's vector processing units.
Intel just more or less had the insight to see that Larrabee won't go anywhere really and they can shrink it and implement it into every CPU within a few years.
Larrabee was interesting but it was not here and not there, Intel went to implement the good parts of it into each of their own CPU's and for HPC/GPGPU like computing it went a different way by releasing the Xeon-Phi.
>And also: why are there no Intel chips yet who include all the nice things USB-C can offer to the market? E.g. Thunderbolt, newest DisplayPort.
This just shows your utter lack of understanding of the technology landscape.
I was tempted to cynically ask you about what version of Thunderbolt you have on your non-Intel machine to see if you fall for it, but alas I don't want to spread misinformation.
USB-C doesn't bring Thunderbolt not Displayport to the market, USB-C can be mechanically compatible with those 2.
Thunderbolt is Intel's proprietary technology and its upto the system integrator to decide if they want to implement it or not, don't buy cheap ass laptops get the latest thunderbolt.
As for displayport could you please tell me what is the "latest"? if it is 1.4 there isn't a single desktop GPU that "technically" supports it yet, even the GTX 1080 is only certified for 1.2 (and in theory can support 1.3/1.4 but ink on the specs isn't dry yet and the certification process is well meh).
Thunderbolt 3 is the only current interface that is actually certified for DP1.3, it should probably support DP1.4 but the spec was only finalized in march this year so....
The performance didn't suck that much, not for the RISC part.
The problem was that too many IA-64 application relied on the x86/xCPU emulation which was well yeah very shitty.
The Itanium is a VLIW, not a RISC (although they have some similarities). It executes "bundles" of 3 instructions in parallel. It was excellent in certain benchmarks because the compiler could schedule sequences to make full use of this; but it turns out that general-purpose code isn't really so parallel all the time, so the compiler would have to fill 2 or 3 of the slots in each bundle with NOPs, wasting space (thus cache usage and fetch bandwidth) and leave much of the CPU's execution units idle.
It was great at highly parallel benchmarks, but much slower than contemporary x86 (the P4, which wasn't that great either) for everything else.
As for the performance I don't have enough experience to pinpoint exactly all the problems, but i think it really ended up being a software issue.
Most complaints I've encountered were about the "emulation" part with the dynamic translation libraries and instruction set emulation.
Overall it seems to me like they just tried to tackle too many things, have some compatibility with X86, take on SPARC/POWERPC and do HPC, mainframes and tons of other applications at the same time.
That said after all this is still a mid 90's ISA for a very niche market with all these issues its surprising it lasted so long, I never understood why HP kept paying Intel to produce these chips in the first place.
The performance for native code wasn't bad but it was never the game-changer the early marketing pitch predicted and that was especially true compared with the competition after adjusting for price or power consumption.
I used to work with a group of scientists who wanted as much performance as they eke out. They had a large C codebase which had already been ported to every major architecture and it did a ton of floating point math along with a fair amount of integer work, etc. — basically the best-case scenario for IA-64. Unfortunately, even with Intel's compiler and tuning tools the performance just never panned out: the best-case scenario was that if someone wanted to divert years of staff time into tuning, IA-64 might take the performance crown but that would be an expensive gamble and probably locking ourselves into a single server vendor (HP). It was much safer to simply take that time and money and buy x86-64 boxes which were faster and available from many vendors. (The same story repeated later with Cell: neat possibilities but nowhere near enough benefit to justify the risk of locking in to a single vendor architecture with an unclear future)
Ignoring the architectural debates, Intel made two fatal implementation mistakes with IA-64: the most obvious one was never finding a deadline they couldn't miss but a less obvious problem was failing to take software, and particularly open-source software, seriously enough. IA-64 performance critically depended on using Intel's proprietary compilers. Since those were extremely expensive, almost nobody used them and thus they had compatibility issues with many large Unix or Windows codebases, and the focus was clearly on the optimizations needed to deliver good SPEC benchmark results rather than other features.
I don't know how much difference it would have made in the end but I think having a solid open-source or at least free toolchain would have made it less expensive to try the platform. It's very easy to imagine that it would have gone considerably differently had something like LLVM existed back then and Intel had simply contributed a high-quality backend, not to mention having a broad general shell server program for open-source projects to use to port and optimize.
Itanium came out of a spec that HP initially designed, so it most likely had a lot of impact to do with it being locked to a single vendor (i think SGI played with Itanium workstations but my memory fails me at this point).
I'm not defending IA-64 overall it had issues, but it wasn't as bad as some people claim it to be "OMG INTEL CAN'T DO X64 THEY SUCK".
Intel also took quite a long time to bring Itanium CPU's to the forefront of Intel's own technology, Itanium got DDR3 and QPI only in 2010.
By the time Intel brought all the good things to Itanium it was pretty much already dead in the water and on life support, but doesn't really bare on the viability of the IA-64 as an architecture it more or less bares on the viability of the Itanium as a product line.
Part of the problem was that Intel spent the late 90s hyping Itanium and VLIW as the future of computing. The idea wasn't obviously wrong – the ancestor (PA-RISC) was competitive at the time if expensive – but the heavy sales-pitch set them up for failure if they couldn't deliver and left both early vendors and purchasers feeling betrayed. Failing to hit volume cascaded to the later feature failures you mentioned since they never managed to catch up to the x86 world.
As far as the architecture goes, the main conclusion I draw is that VLIW was a plausible idea but failed because it overestimated the amount of instruction-level parallelism and perhaps because it assumed too much of compilers. The GPU world has massive parallelism, enormous market volume, and the toolchains are more advanced, but even there AMD went from the VLIW-style TeraScale design to a RISC-style Graphics Core Next to compete with nVidia's RISC-y Tesla. It definitely seems reasonable to ask whether VLIW is an evolutionary dead-end.
SGI fell for the Itanium hype hook, line and sinker, and announced they would be discontinuing the MIPS line with the 12000 (an excellent CPU for its day) and migrating everything to Windows running on Itanium... But Itanium was late, and MIPS had to rush the 14000 out and it wasn't so competitive, and IRIX had had no serious work done on it in years, and customers were thinking well if we are going to have to switch architectures anyway why don't we shop around... And people stopped buying SGIs.
The manager behind this decision, Rick Belluzzo, jumped ship and got a sweet job at Microsoft.
Everything promised about the Itanium relied on a "sufficiently smart compiler" which proved harder to write than anyone expected. It's being kept alive now solely because of legal agreements (e.g. see the recent HP vs Oracle case). Technologically it's a dead end, I don't think too many people dispute that in 2016. It's just a shame that it killed SGI.
The AMD Opteron introduced x64. While Intel was still full steam on the P4 architecture and tried to push Ithanium. Only some years later Intel licensed the x64 instruction set - of course on chip they then did their own implementation of the x64 instruction set.
Ithanium failed, because first it was delayed several years, got very expensive and power hungry. For a while it had quite a good performance, but due to the price it gained only a little foothold in the very expensive workstation market. The final nail in the coffin was, that the Opteron really launched the x86 based server marked. With the Opteron companies would for the first time replace Sparc etc. in the middle range servers. And when Intel also offered x64 chips the risk server market started to fade.
I am not buying cheap ass laptops. But a lot of Apple laptops are held back by having to include extra chips for supporting thunder bolt, while this feature is promised for one of the next Intel cpu generations as being included into the cpu. Had the plain MacBook Thunderbolt via USB-C it would be more useful
The "AMD 64" spec was openly published in 2000, Intel did not license anything.
Intel was planning to license the AMD64 ISA directly from AMD for Yamhill however that did not happen and they've ended implementing the spec on their own with their own ISA.
Intel's X86_64 ISA is actually quite different than AMD's implementation, and funny enough neither of those are actually 100% in line with the original spec.
The spec was published because it was an extension of the X86 instruction set, not a replacement, the x86 instruction set is available to AMD through the cross license deal they have with Intel.
>I am not buying cheap ass laptops. But a lot of Apple laptops are held back by having to include extra chips for supporting thunder bolt, while this feature is promised for one of the next Intel cpu generations as being included into the cpu. Had the plain MacBook Thunderbolt via USB-C it would be more useful
Thunderbolt will always require additional chips (just like displayport, usb, and any other port on your computer), this is how the spec works, it was never designed to be implemented on die on the CPU.
USB-C has nothing to do with Thunderbolt and vise versa, there is a mechanical compatibility standard that allows you to use the same physical port for Thunderbolt and USB via the type C connector.
This isn't any different than Thunderbolt using the Mini-Display port mechanical standard.
Thunderbolt has multiple mechanical standards including over a dedicated TB electrical port, a fiber optic port, DP electrical port, and USB-C electrical port.
Nothing Intel is doing is holding Apple back, the reason why Thunderbolt 3 isn't available on Apple laptops yet is because Apple is 2-3 generations behind, the mid 2015 Macbook pro is using a 4th gen haswell CPU.
It's worth consideration although I don't think an architecture requires backward compatibility by definition: more a way to keep users or earnings reliably. You'd be saying that Mac OS 9 isn't an architecture since Mac OS X was different. Or PS2 -> PS3 mismatch meant they weren't architectural designs. Android API isn't backward compatible with GNOME or KDE on Linux. I'm just not seeing a lack of B.C. meaning there's no architecture or platform.
If it was, though, we might potentially resolve the issue if there's a standard API for various things where the implementation can be ignored (and improved w/ SW or HW). Modern example of an Amiga-like architecture might be Microsoft's DirectX techs where there's API's for networking, I/O, audio, and video. Any of those might have been accelerated with dedicated coprocessors with apps looking the same. At least one or two were in practice.
Likewise, the programming model might obviate the need for this consideration. One where they were tasks called as needed by a control code w/ optional acceleration might mean just the control code or OS interface needs to change per release or platform.
I think one of the things that is incorrect in this article is the software used to design asics (Custom Chips). Synopsys, Cadence, Mentor, Magma (when it was around) all made pretty good tools. The thing is the weren't free. Also you had to use TCL, but they could take rtl (Verilog aka the design language) to GDS (what gets fabbed). Heck there always seemed to be some startup that claimed they came up with a better routing tool or a better timing tool. They weren't super complicated to use. If you had too many gates you had to script a bunch of hacks, but it isn't too hard.
Simple summary, you can use transistors in your chips to speed up software. That comes at the cost of flexibility of course but I think it was one of the secrets of the iPad's initial success.
That said, I think the ability to make inexpensive custom chips is going to power a wave of new hardware gizmos, unfortunately all of that capacity is in China and so if you don't read Chinese data sheets you're probably not going to be able to use those chips. (not that autotranslation is "bad" just that it doesn't always make technical documentation actionably understandable.)
Why is it underwhelming? What do people do with phones?
I exchange text, sound, and pictures with people. Sometimes I play stupid games. The sound/pictures/text sharing long ago hit a limit where theres a big question as to why a person needs a faster processor. Increased watts per compute helps, sure. But what else? What besides bloat requires a Moore's law increase in a phone processor?
What do I want to do with my phone that I don't yet know I want?
> What besides bloat requires a Moore's law increase in a phone processor?
Here are some relatively boring answers:
AI stuff - including, among other things, voice recognition, other parts of virtual assistants, photo capture (which involves a lot of processing in modern smartphones to improve the image), and photo auto-tagging - is always happy to guzzle extra cycles to improve accuracy a few percent. Some of this can be (and is) done on the cloud instead, but at the cost of latency, availability (when reception is poor), and, of course, privacy. Much better to do it on-device where possible.
Games, even short of VR/AR (which is definitely a "I don't yet know I want" candidate), similarly often push the limits of the GPU to achieve graphical fidelity. True, the kinds of simple, repetitive, low production value games that tend to grace the top charts of mobile app stores today have little need for increased fidelity. But I think that phenomenon is more about market dynamics than inherent to the form factor. Dedicated portable consoles like the 3DS have many excellent games that would so benefit, and while some depend on more precise controls than available on mobile (i.e. buttons), many do not, e.g. RPGs, and of course smartphones have their own unique input methods. (Given that the 3DS has atrocious graphics, games clearly don't need good graphics to be fun, but they can still benefit from them, as shown by home consoles/PC gaming - anyway, larger and higher resolution screens increase the baseline level of detail required for acceptability.)
It's perhaps unsurprising that both of these examples tend to rely on the GPU and other specialized processors more than the CPU.
For this use case, phones are kinda powerful enough.
Not quite though : applying photo filters (even just HDR+ in order to improve the picture) can take a lot of time. I am not sure whether Prisma slowest is due to mobile SOCs or would be helped if they switched to a better implementation though (Google Photos & Snapseed rely heavily on Renderscript and have been able to deliver very fast filters that way)
HDR+ is still slow though .. and I don't think it is unrealistic to ask to see the resulting picture in less than one second.
So better SOCs would help there
Also, for motion design, fluidity is very important : if you miss frames, the illusion is shattered. And that's also important for a lambda user, having things move into place helps in understanding what is going on. When you delete a mail (or have a new one), having the list of emails move into place instead of blinking from one state to the other helps the user understand the event. Same thing for screen transitions.
Android is not quite there yet (and to be fair neither is iOS, it is even surprisingly worse in many regards). Again, it is not entirely clear how much of this is due to software (not java, but the OS debuts as a camera OS, then a blackberry competitor and only after that as a modern mobile OS) but surely better single core performances would help accomplish this and Apple wrecks Android in that specific area.
> What do I want to do with my phone that I don't yet know I want?
Untethered VR and AR. Rendering views in stereo for an extremely pixel-dense screen (>4K) at the highest frequency possible (>90Hz) while processing camera feeds and depth sensors for positional tracking (>90Hz).
We're kind of done with this now though, right? I can pick a mobile device with a tiny screen no more than a few inches to a monster the size of a largish book. I can get pixel density finer than my eyes are able to detect. There's not much room left for improvement.
And likewise not much else has improved since iOS/Android were released. What I mean by "improved" is real differences in the kinds of things you can do.
Windows 95 is 26 years old, my first computer ran it. It's been a quarter of a century and I don't do anything with a shiny new MacBook Pro that I didn't do with Windows 95. I browse the web, read news and comments, watch videos, exchange text with people. Sure, the video resolution is better but that has a lot more to do with the network than with the computer hardware. The UI is a little shinier, but I'm not enabled to do anything new that I couldn't do before.
If I don't buy a new computer every few years, what I have will be nearly worthless because software has a funny habit of finding ways to spend all available resources, even if it does most of the same things. I just don't think I would be that disappointed if I was still using Win 95.
> With greater capability comes more, even if we don't yet know what "more" will be.
If you discount network speed and pixel density, "more" hasn't been much in 25 years as far as I can see. Miniturization means I can carry a laptop around with me, ok.
Likewise with phones. Android and iOS are about 8 years old now. What can we do that we couldn't then? _not much_ as far as I can see.
We're definitely past the point of diminishing returns when it comes to resolution, and near or already at the point where physically your eye can't couldn't tell if the pixel density was higher. So what's left?
Mobile devices are ideal candidates to be thin clients, because they are battery powered.
If most instances of greater capability can be implemented as a thin client, then it's not CPU performance that is the bottleneck, but mobile ISPs and aged infrastructure.
Yea, my $100 low-budget phone is fast enough. I can use it as satnav, watch 1080p videos, browse the internet, and so on. I can even play some 3D games and things like that.
I'd like the battery to last longer. 14 days - like my dumb phone. That would be a meaningful improvement.
How bad is debugging logic running purely on hardware nowadays? Is there anything more user-friendly than tapping specific lines and watching the output on an oscilloscope?
Usually you start at the verilog level with simulators (I've done this professionally with zero hardware experience and discovered many critical bugs in a chip that's now taped out. Of course the real engineers had to fix them.) Now that it's gone physical, after a series of very low level tests, they will take similar software and adapt it to the interface for hardware instead and repeat all the tests.
Yes, you can debug on an FPGA first, for example, allowing you to put the hardware-equivalent of a printf statement in places where you need them.
Or, you can configure all of your flipflops into a barrel shift register, so you can read out their contents serially (and shift their contents back in circularly).
You're correct, honestly it seemed like Sony was expecting ALL the graphics work to be done on the SPU's. The whole design of the Cell Broadband Engine is basically one SMT PowerPC core (PPU) and a bunch of stream processors (SPU's). Considering their choice in an extremely weak GPU compared to the Xenos in the 360, a lot of games had to move a bunch of work off the GPU and write it for the SPU's instead (which were nowhere near as nice to work with, they couldn't just pull data from system memory without you passing a pointer to them that was translated to an address they could DMA from, and you couldn't just throw GL or RSX commands at them so you were stuck writing all the nitty gritty by yourself).
Possibly because developing games for the weird PS3 Cell architecture turned out to be so complicated, particularly for games that were intended to be cross-platform with x86 XBox and PC platforms.
TL:DR; Moore's law can no longer be counted on for performance gains, so speeding up things will now be dependent on replacing general purpose hardware with hardware specifically designed to implement specific algorithms.
Of course work on custom chips has been going on as long as chips were a thing. The article just underlines that this is now pretty much the only way forward.
That, and optimizing the software we use. There is so much bloatware, so much technical debt collected in incremental improvements over decades, that we could get a decent speed boost from optimizing on it.
>Aren't optical transistors another option though?
It will be a minimum of a couple more decades of research and hundreds of billions of dollars of retooling before any of the potential replacements for silicon surpass what currently exists now at the current price and volume of mass production.
Optical switches are extremely large compared to contemporary transistors, and there's no obvious way to scale them smaller than the diffraction limit.
Like all research into the unknown it might prove easy (5-10 years) it might prove hard (20 years) it might always be 50 years away (hehe fusion) or it might just not be possible (faster than light travel).