I am sorry but saying that RISC-V is a winner in code density is beyond ridiculo...

snvzz · on Dec 2, 2021

You seem to be making your whole argument around some facts which you got wrong. The central points of your argument are often used in FUD, thus they are definitely worth tackling here.

>Even the hard-to-believe "research" results published by RISC-V developers have always showed worse code density than ARM

the code size advantage of RISC-V is not artificial academic bullshit. It is real, it is huge, and it is trivial to verify. Just build any non-trivial application from source with a common compiler (such as GCC or LLVM's clang) and compare the sizes you get. Or look at the sizes of binaries in Linux distributions.

>the so-called better results were for the compressed extension, not for the normal encoding.

The C extension can be used anywhere, as long as the CPU supports the extension; most RISC-V profiles require it. This is in stark contrast with ARMv7's thumb, which was a literal separate CPU mode. Effort was put in making this very cheap for the decoder.

The common patterns where number of instructions is larger are made irrelevant by fusion. RISC-V has been thoroughly designed with fusion in mind, and is unique in this regard. It is within its right in calling itself the 5th generation RISC ISA because of this, even if everything else is ignored.

Fusion will turn most of these "2 instructions instead of one" into actually one instruction from the execution unit perspective. There's opportunities everywhere for fusion, the patterns are designed in. The cost of fusion on RISC-V is also very low, often quoted as 400 gates, allowing even simpler microarchitectures to implement it.

jaytaylor · on Dec 2, 2021

I was surprised to find the top gOggle hits for "RISC-V fusion" (because I don't know WTF it even is) point to HN threads. Is this not discussed prominently elsewhere on the 'net?

https://news.ycombinator.com/item?id=25554865

https://news.ycombinator.com/item?id=25554779

Is the Googrilla search engine really is starting to suck more and more, or is there something else going on in this case?

The threads read more like a incomplete explanation with a polarized view than anything useful for understanding what fusion means in this context.

Overall is give the ranking a score of D-.

mananaysiempre · on Dec 3, 2021

> ... is there something else going on in this case?

There is: the term comes from general CPU design terminology and is not specific to RISC-V, although Google does find some RISC-V-specific materials for me given your query[1–3]. Look for micro- and macro-op fusion in Agner Fog’s manuals[4] or Wikichip[5], for example.

[1]: https://riscv.org/wp-content/uploads/2016/07/Tue1130celio-fu...

[2]: https://reviews.llvm.org/D73643

[3]: https://erik-engheim.medium.com/the-genius-of-risc-v-micropr...

[4]: https://www.agner.org/optimize/#manuals

[5]: https://en.wikichip.org/wiki/macro-operation_fusion

mda · on Dec 3, 2021

Try 'risc V instructuin fusion" or "Risc V macro op fusion' Not that hard really. It is a well known subject.

I hope people could stop whining everytime they mess up a search query.

ksec · on Dec 3, 2021

>I hope people could stop whining everytime they mess up a search query.

I dont know. I have never seen anyone using the term "Fusion" by itself, may be it is specific to RISC-V crowd? It is always "macro-op fusion". So your parent 's search parameter isn't something out of order for someone how knows very little about hardware. And HN are full of Web developers so abstracted in the hierarchy they knows practically zero about hardware.

And to be quite frankly honest the GP's point about Fusion had me confused for a sec as well.

theresistor · on Dec 2, 2021

> This is in stark contrast with ARMv7's thumb, which was a literal separate CPU mode.

This is disingenuous. arm32's Thumb-2 (which has been around since 2003) supports both 16-bit and 32-bit instructions in a single mode, making it directly comparable to RV32C.

snvzz · on Dec 2, 2021

Your statement does not run counter to mine quoted.

Thumb-2 is better designed than Thumb was, but it is still a separate CPU mode.

And it got far less use than it deserved, because of this. It doesn't do everything, and switching has a significant cost. This cost is in contrast with RISC-V's C extension.

brandmeyer · on Dec 3, 2021

Comparing RISC-V's "C" extension to classic Thumb when Thumb-2 is 17 years old is like comparing RISC-V's "V" extension to classic SSE when AVX-512 and SVE2 are already available. Its an insidious form of straw-man attack that preys on the reader's ignorance.

> [Thumb-2] doesn't do everything, and switching has a significant cost.

Technically true, but irrelevant. Cortex-M is thumb-only and can't switch. Cortex-A processors that support both Thumb and ARM instructions almost never actually switch at all.

AshamedCaptain · on Dec 3, 2021

> Cortex-A processors that support both Thumb and ARM instructions almost never actually switch at all.

That is not correct. At least before ARMv8, most processors that could run both Thumb and ARM switch very frequently, up the point some libraries could be Thumb while others were ARM (i.e. within the same task!). A lot (but not all) of Android for ARMv7 is actually Thumb(-2). This is why "interworking" is such a hot topic.

Also, contrary to what the above poster says, switching does not have a "high cost", it is rather similar to the cost of a function call.

stephencanon · on Dec 3, 2021

> it is rather similar to the cost of a function call.

It literally is a function call, most of the time.

And yeah, thumb-2 was the preferred encoding for 32b iOS and Android, and the only supported encoding for Windows phone, so it was used on billions of devices.

lonjil · on Dec 2, 2021

The ARMv8-M profile is Thumb-only, so on ARM microcontroller platforms there is no switching at all, and it does do everything, or at least everything you might want to do on a microcontroller, and has of course gotten a very large amount of use, considering how widely deployed those cores are.

Dylan16807 · on Dec 2, 2021

Is thumb-only particularly good for density, compared to being able to mix instruction sizes?

lonjil · on Dec 2, 2021

Thumb has both 16-bit and 32-bit instructions.

Dylan16807 · on Dec 2, 2021

Oh, you meant thumb and thumb-2.

lonjil · on Dec 2, 2021

"thumb-2" isn't really a thing. It's just an informal name from when more instructions were being added to thumb. it's still just thumb.

brucehoult · on Dec 3, 2021

Thumb2 is a thing. Thumb is purely 16 bit instructions. Thumb2 is a mix of 16 bit and 32 bit instructions.

As an illustrative example, in Thumb when the programmer writes "BL <offset>" or "BX <offset>" the assembler creates two undocumented 16 bit instructions next to each other which together have the desired effect. If you create those instructions yourself using e.g. .half directives (or if you're writing a JIT or compiler) then you can actually put other instructions between them, as long as you don't disturb the link register.

In Thumb2 the bit patterns for BL and BX are the same, but they are an ACTUAL 32 bit instruction which can't be split up like it can in Thumb.

Taniwha · on Dec 2, 2021

The main distinction is that the 16-bit RISCV-C ISA exactly maps to existing 32-bit RISCV instructions, its implementation only occurs in the decode pipe stage

bpye · on Dec 2, 2021

The C extension is that, an extension. A RISC-V core with the C extension should still support the long encoding as well. There is no 16-bit variant specified, only 32, 64 and 128.

There is an E version of the ISA with a reduced register set, but this is a separate thing.

brucehoult · on Dec 2, 2021

You are mixing up integer register size and instruction length.

RISC-V has variants with 32 bit, 64 bit, or (not yet fully specified or implemented) 128 bit registers.

RISC-V has instructions of length 32 bits and, optionally but almost universally, 16 bit length.

bpye · on Dec 2, 2021

Ah yes, I misunderstood the original comment as implying that RISC-V C had 16 bit register length, rather than opcode length.

Taniwha · on Dec 2, 2021

The E version only has half as many registers

ddingus · on Dec 3, 2021

So there really are more instructions in RAM, right?

And then they get combined in the CPU, right?

Won't those instructions need to be fetched / occupy cache?

orra · on Dec 2, 2021

> the so-called better results were for the compressed extension, not for the normal encoding.

Ignoring RISC-V’s compressed encoding seems a rather artificial restriction.

brucehoult · on Dec 2, 2021

Indeed.

The "C" extension is technically optional, but I'm not aware of anyone who has made or sold a production chip without it -- generally only student projects or tiny cores for FPGAs running very simple programs don't have it.

My estimate is if you have even 200 to 300 instructions in your code it's cheaper to implement "C" than to build the extra SRAM/cache to hold the bigger code without it.

adrian_b · on Dec 3, 2021

The compressed encoding has good code density, but low speed.

The compressed RISC-V encoding must be compared with the ARMv8-M encoding not with the ARMv8-A.

The base 32-bit RISC-V encoding may be compared with the ARMv8-A, because only it can have comparable performance.

All the comparisons where RISC-V has better code density compare the compressed encoding with the 32-bit ARMv8-A. This is a classical example of apples-to-oranges, because the compressed encoding will never have a performance in the same league with ARMv8-A.

When the comparisons are matched, 16-bit RISC-V encoding with 16-bit ARMv8-M and 32-bit RISC-V with 32-bit ARMv8-A, RISC-V always loses in code density in both comparisons, because only the RISC-V branch instructions are frequently shorter than those of ARM, while all the other instructions are frequently longer.

There are good reasons to use RISC-V for various purposes, where either the lack of royalties or the easy customization of the instruction set are important, but claiming that it should be chosen not because it is cheaper, but because it were better, looks like the story with the sour grapes.

The value of RISC-V is not in its instruction set, because there are thousands of people who could design better ISAs in a week of work.

What is valuable about RISC-V is the set of software tools, compilers, binutils, debuggers etc. While a better ISA can be done in a week, recreating the complete software environment would need years of work.

FullyFunctional · on Dec 3, 2021

> The compressed encoding has good code density, but low speed.

That's 100% nonsense. They have the same performance and in fact, some pipelines can get better performance because they fetch a fixed number of bytes and with compressed instructions, that means more instructions fetched.

The rest of the argument falls apart resting on this fallacy.

adrian_b · on Dec 3, 2021

They have the same performance only in low performance CPUs intended for embedded applications.

If you want to use a RISC-V at a performance level good enough for being used in something like a mobile phone or a personal computer, you need to simultaneously decode at least 8 instructions per clock cycle and preferably much more, because to match 8 instructions of other CPUs you need at least 10 to 12 RISC-V instructions and sometimes much more.

Nobody has succeeded to simultaneously decode a significant number of compressed RISC-V instructions and it is unlikely that anyone would attempt this, because the cost in area and power of a decoder able to do this is much larger than the cost of a decoder for simultaneous decoding of fixed-length instructions.

This is the reason why also ARM uses a compressed encoding in their -M CPUs for embedded applications but a 32-bit fixed-length encoding in their -A CPUs for applications where more than 1 watt per core is available and high performance is needed.

brucehoult · on Dec 4, 2021

You're just making stuff up.

ARM doesn't have any cores that do 8 wide decode. Neither do Intel or AMD. Apple has, but Apple is not ARM and doesn't share their designs with ARM or ARM customers.

Cortex X-1 and X-1 have 5 wide decode. Cortex A78 and Neoverse N1 have 4 wide decode.

ARM uses compressed encoding in their 32 bit A-series CPUs, for example the Cortex A7, A15 and so on. The A15 is pretty fast, running at up to 2.5 GHz. It was used in phones such as the Galaxy S4 and Note 3 back before 64 bit became a selling point.

Several organisations are making wide RISC-V implementations. Most of them aren't disclosing what they are doing, but one has actually published details of how it's 4-8 wide RISC-V decoder works -- they decode 16 bytes of code at a time, which is 4 instructions if they are all 32 bit instructions, 8 instructions if they are all 16 bit instructions, somewhere between for a mix.

https://github.com/MoonbaseOtago/vroom

Everything is there, in the open, including the GPL licensed SystemVerilog source code. It's not complex. The decode scheme is modular and extensible to as wide as you want, with no increase in complexity, just slightly longer latency.

There are practical limits to how wide is useful not because you can't build it, but because most code has a branch every 5 or 6 instructions on average. You can build a 20-wide machine if you want -- it just won't be any faster because it doesn't fit most of the code you'll be executing.

amelius · on Dec 3, 2021

Doesn't decompression imply that there is some extra latency?

socialdemocrat · on Dec 3, 2021

No, it is just part of the regular instruction decoding. It is not like it is zip compressed. It is just 400 logic gates added to the decoder… which is nothing.

FullyFunctional · on Dec 3, 2021

All the implementation I know of all does the same thing: they expand the compressed instruction into a non-compressed instruction. For all (most?), this required an additional stage in the decoder. So in that sense, supporting C mean a slight increase in the branch mispredict penalty, but the instruction itself takes the same path with the same latency regardless of it being compressed or not.

Completely aside, compressed instruction hurt in a completely different way: as specified RISC-V happily allows instructions to be split across two cache lines, which could be from two different pages even. THIS is a royal pain in the ass and rules out certain implementation tricks. Also, the variable length instructions means more stages before you can act on the stream, including for example rename them. However a key point here is that it isn't a per-instruction penalty, it a penalty paid for all instruction if the pipeline support any variable length instructions.

amelius · on Dec 3, 2021

Yes, but the logic signal needs to ripple through those gates, which takes time.

kragen · on Dec 4, 2021

It potentially adds latency, but doesn't drop throughput.

dragontamer · on Dec 2, 2021

> The RISC-V ISA has only 1 good feature for code size, the combined compare-and-branch instructions. Because there typically is 1 branch for every 6 to 8 instructions, using 1 instruction instead of 2 saves a lot.

Which isn't really a big advantage, because ARM and x86 macro-op fuse those instructions together. (That is, those 2-instructions are decoded and executed as 1x macro-op in practice).

cmp /jnz on x86 is like, 4-bytes as well. So 4-bytes on x86 vs 4-bytes on RISC-V. 1-macro-op on x86 vs 1-instruction on RISC-V.

So they're equal in practice.

-----

ARM is 8-bytes, but macro-op decoded. So 1-macro op on ARM but 8-bytes used up.

adrian_b · on Dec 3, 2021

The fusion influences only the speed, not the code size and the discussion was about the code size.

For x86, cmp/jnz must be 5 bytes for short loops or 9 bytes for long loops, because the REX prefix is normally needed. x86 does not have address modes with auto-update, like ARM or POWER, so for a minimum number of instructions the loop counter must also be used as an index register, to eliminate the instructions for updating the indices.

Because of that, the loop counter must use the full 64-bit register even if it is certain that the loop count would fit in 32-bit. That needs the REX prefix, so the fused instruction pair needs either 5 bytes (for 7-bit branch offsets) or 9 bytes, in comparison with 4 bytes for RISC-V.

So RISC-V gains 1 byte about at every 20 bytes from the branch instructions, i.e. about 5%, but then it loses more than this at other instructions so it ends at a code size larger than Intel/AMD by between 10% and 50%.

socialdemocrat · on Dec 3, 2021

By combining instruction compression and macro-op fusion you get the net effect of looking like you have a bunch of extra higher level opcodes in your ISA.

Compress a shift and load into a 32-bit word and macro-op fuse those and you have in effect an index based load instruction, without sucking up ISA encoding space for it.

kragen · on Dec 4, 2021

I agree that this is about the code size, but you seem to be doing your back-of-the-envelope estimates based on RISC-V uncompressed instructions, which is a mistake and explains why your estimates came out with nonsense results like "code size larger than Intel/AMD".

quotemstr · on Dec 3, 2021

> For x86, cmp/jnz must be 5 bytes for short loops or 9 bytes for long loops, because the REX prefix is normally needed.

It's a shame the x32 architecture didn't catch on. https://en.wikipedia.org/wiki/X32_ABI

theresistor · on Dec 2, 2021

ARM64 has cbz/tbz compare-and-branch instructions that cover many common cases in a single 4-byte instruction as well.

adrian_b · on Dec 3, 2021

There are cases when cbz/tbz are very useful, but for loops they do not help at all.

All the ARMv8 loops need 2 instructions, i.e. 8 bytes, instead of the single compare-and-branch of RISC-V.

There are 2 ways to do simple loops in ARM, you can either use an addition that stores the flags, then a conditional branch, or you can use an addition that does not store the flags, then a CBNZ (which tests whether the loop counter is null). Both ways need a pair of instructions.

Nevertheless, ARM has an unused opcode space equal in size to the space used by CBNZ/CBZ/TBNZ/TBZ (bits 29 to 31 equal to 3 or 7 instead of 1 or 5).

In that unused opcode space, 4 pairs of compare-and-branch instructions could be encoded (3 pairs corresponding to those of RISC-V plus 1 pair of test-under-mask, corresponding to the TEST instruction of x86; each pair being for a condition and its negation).

All 4 pairs of compare-and-branch would have 14-bit offsets, like TBZ/TBNZ, i.e. a range larger than that of the RISC-V branches.

This addition to the ARM ISA would decrease the code size by 4 bytes for each 25 to 30 bytes, so a 10% to 15% improvement.

audunw · on Dec 2, 2021

> I am sorry but saying that RISC-V is a winner in code density is beyond ridiculous.

You have no idea what you're talking about. I've worked on designs with both ARM and RISC-V cores. The RISC-V code outperforms the ARM core, with smaller gate count, and has similar or higher code density in real world code, depending on the extensions supported. The only way you get much lower code density is without the C extension, but I haven't seen it not implemented in a real-world commercial core, and if it wasn't, I'm sure there was because of a benefit (FPGAs sometimes use ultra-simple cores for some tasks, and don't always care about instruction throughput or density)

It should be said that my experience is in embedded, so yes, it's unsafe code. But the embedded use-case is also the most mature. I wouldn't be surprised if extensions that help with safer programming languages would be added for desktop/server class CPUs, if they haven't already (I haven't followed the development of the spec that closely recently)

dataflow · on Dec 2, 2021

>> RISC-V has an acceptable code size only for unsafe code

> You have no idea what you're talking about.

> It should be said that my experience is in embedded, so yes, it's unsafe code.

Just going based off your reply it certainly sounds like they had at least some idea what they were talking about? In which case omitting that sentence would probably help.

voz_ · on Dec 2, 2021

Textbook example of the kind of hostility and close-mindedness that is creeping into our beloved site. Why are we dick measuring? why are we comparing experience like this? so much "I" "I" "I"...

I have no horse in the technical race here, but I certainly am put off from reading what should be an intellectually stimulating discussion by the nature of replies like this.

snvzz · on Dec 2, 2021

It was likely instigated by its parent trying to inflate himself by giving themselves some credentials, to try and give their voice more weight.

All of it, pretty sad, but I believe we should focus on the technical arguments and try to put everything else aside in order to re-conduct the discussion somewhere more useful.

wott · on Dec 3, 2021

Somehow I find it refreshing to see flamewars about ISAs, for I hadn't seen any in the last 15 years (at least). It makes me feel young again. :-)

flatiron · on Dec 2, 2021

Oh no. We don’t maintain this site with these types of comments no matter your feelings. It’s the internet. Don’t get heated!

macintux · on Dec 3, 2021

> Be kind. Don't be snarky. Have curious conversation; don't cross-examine. Please don't fulminate. Please don't sneer, including at the rest of the community.

> Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.

> When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."

https://news.ycombinator.com/newsguidelines.html

zozbot234 · on Dec 3, 2021

Memory-safe programming does not need any special ISA extension compared to traditional, C-like unsafe code. Even the practical overhead of bounds- and overflow checking is all about how it impedes the applicability of optimizations, not about the checks themselves.

saagarjha · on Dec 4, 2021

It doesn't need them, but having hardware checks is generally more performant than having software ones.

GoblinSlayer · on Dec 3, 2021

RISC-V doesn't hinder safe code, that was an incorrect claim. Bound checks are done with one instruction - bltu for slices, bgeu for indexes. On intel processor you need cmp+jb pair for this.

The linked message is about carry propagation pattern used in gmp. AIU optimized bignum algorithms accumulate carry bits and propagate them in bulk and don't benefit from optimal one bit at a time carry propagation pattern.

zamadatix · on Dec 2, 2021

> Except for this good feature, the rest of the ISA is full of bad features

What are your thoughts on the way RISC V handled the compressed instructions subset?

hajile · on Dec 2, 2021

It's not too surprising. Load, store, move, add, subtract, shift, branch, jump. These are definitely the most common instructions used.

Put it side-by-side with Thumb and it also looks pretty similar (thumb has a multiply instruction IIRC).

Put it side-by-side with short x86 instructions accounting for the outdated ones and the list is pretty similar (down to having 8 registers).

All in all, when old and new instruction sets are taking the same approach, you can be reasonably sure it's not the absolute worst choice.

zamadatix · on Dec 2, 2021

It was more a question of the way it was handled (i.e. it's not a different mode and can be mixed) than what the opcode list looked like.

hajile · on Dec 2, 2021

Mode switching bloats the instruction count by shifting in and out. RISC-V does well here.

If there's a criticism, it's that the two bytes on 32-bit instructions mean the total instruction range is MUCH smaller overall until you switch to 48-bit instructions which are then much bigger.

brandmeyer · on Dec 2, 2021

It only addresses a subset of the available registers. Small revisions in a function which change the number of live variables will suddenly and dramatically change the compressibility of the instructions.

Higher-level languages rely heavily on inlining to reduce their abstraction penalty. Profiles which were taken from the Linux kernel and (checks notes...) Drystone are not representative of code from higher-level languages.

3/4 of the available prefix instruction space was consumed by the 16-bit extension. There have been a couple of proposals showing that even better density could be achieved using only 1/2 the space instead of 3/4, but they were struck down in order to maintain backwards compatibility.

brucehoult · on Dec 3, 2021

This is just rubbish.

Small revisions to a function that increase the number of live variables to more than the set that are covered by the C extension mean that reference to THAT VARIABLE ONLY have to use a full size instruction. There is nothing sudden or dramatic.

Note that a number of a C instructions can in fact use all 32 registers. This includes stack pointer-relative loads and stores, load immediate ({-32..+31}), load upper immediate (4096 * {-32..+31}, add immediate and add immediate word ({-32..+31}), shift left logical immediate, register to register add, and register move.

It's certainly possible that another compressed encoding might do better using fewer opcode, and I've seen the suggestions. The main thing wrong with the standard one in my opinion is that it gives too much prominence to floating point code, having been developed to optimise for SPEC including SPECFP (no, not the Linux kernel or Dhrystone ... I have no idea where you got that from).

But anyway it does well, and the opcode space used is not excessive. If anything it's TOO SMALL. Thumb2 gets marginally better code size while using 7/8ths of the opcode space for the 16 bit instructions instead of RISC-V's 3/4.

brandmeyer · on Dec 3, 2021

> The main thing wrong with the standard one in my opinion is that it gives too much prominence to floating point code, having been developed to optimise for SPEC including SPECFP (no, not the Linux kernel or Dhrystone ... I have no idea where you got that from).

The RISC-V Compressed Spec v1.9 documented the benchmarks which were used for the optimization. RV32 was optimized with Dhrystone, Coremark, and SPEC-2006. RV64GC was optimized using SPEC-2006 and the Linux kernel.

hajile · on Dec 3, 2021

> It's certainly possible that another compressed encoding might do better using fewer opcode, and I've seen the suggestions. The main thing wrong with the standard one in my opinion is that it gives too much prominence to floating point code, having been developed to optimise for SPEC including SPECFP (no, not the Linux kernel or Dhrystone ... I have no idea where you got that from).

Javascript is limited to 64-bit floats as is lua and a couple other languages.

Sure, you can optimize to 31/32-bit ints, but not always and not before the JIT warms up.

adrian_b · on Dec 3, 2021

The compressed instruction encoding is very good and it is mandatory for any use of RISC-V in embedded computers.

With this extension, RISC-V can be competitive with ARM Cortex-M.

On the other hand, the compressed instruction encoding is useless for general-purpose computers intended as personal computers or as servers, because it limits the achievable performance to much lower levels than for ARMv8-A or Intel/AMD.

kragen · on Dec 4, 2021

This is wrong. RISC-V's instruction length encoding is designed in such a way that compressed instructions can be decoded very quickly. It doesn't pose the same kind of performance problem amd64's variable instruction length encoding does. Even if it did, it would be obvious nonsense to claim that this made it impossible for a RISC-V implementation to achieve amd64-like performance; if it were true, it would also make it impossible for amd64 implementations to achieve amd64-like performance, which is clearly a contradiction. And ILP and instruction decode speed are also improved by other aspects of the RISC-V ISA: the fused test-and-branch instructions you mentioned, but also the MIPS-like absence of status flags.

FullyFunctional · on Dec 3, 2021

This of course utter nonsense. There's nothing different about the performance of compressed instructions.

adrian_b · on Dec 3, 2021

For competitive performance in 2021 with CPUs that can be used at performance levels at least as high as those required for mobile phones, it is necessary to decode simultaneously at least 8 instructions per clock cycle (actually more for RISC-V, because its instructions do less than those of other CPUs).

The cost in area and power of a decoder for variable-length instructions increases faster with the number of simultaneously-decoded instructions than the cost of a decoder for fixed-length instructions.

This makes the compressed instruction encoding incompatible with high-performance RISC V CPUs.

For the lower performance required in microcontrollers, the compressed encoding is certainly needed for adequate code density.

The goals of minimum code size and of maximum execution speed are contradictory and the right compromise is different for an embedded computer and for a personal computer.

That is why ARM has different ISAs for the 2 domains and why also RISC-V designs must use different sets of extensions, depending on the application intended for them.

socialdemocrat · on Dec 3, 2021

RISC-C compressed instructions cannot be compared to CISC variable length instructions. The instruction boundaries are easy to determine in parallel for multiple decoders. Something which is hard for e.g. x86. Compressed instructions don’t have arbitrary length. It is two instructions fitted in a 32-bit word.

Decompression is part of the instruction decoding itself. It only requires a minuscule 400 logical gates to do.

In fact RISC-V is very well designed for doing out-of-order execution of multiple instructions as instructions have been specifically designed to share as little state as possible. No status registers or conditional execution bits. Thus most instructions can run in separate pipelines without influencing each other.

serentty · on Dec 14, 2021

8-wide is the absolute state of the art. Last I checked, AMD’s fastest core was 4-wide, and Intel’s was 6-wide. I only know of Apple doing 8-wide, and not anyone else. So branding this as the minimum necessary for mobile devices when even most desktops do not achieve it is silly.