J2 open processor: an open source processor using the SuperH ISA

mikepurvis · on April 20, 2021

The predecessor to the SuperH is the H8 family of devices, which notably included H8/3292, the microcontroller in the Lego Mindstorms RCX. Obviously an irrelevant implementation detail to 99% of users, but deeply important if you were one of those weirdos who compiled BrickOS for your system so that your Lego robot could run a real C program, and even moreso if you needed to rewrite parts of your program in native assembly language to make it run fast enough to be usable.

(This was me, in my bedroom at age 16, furiously preparing a robot capable of playing Connect-4 for an rtlToronto Lego robotics competition called Deep Yellow...)

userbinator · on April 20, 2021

The H8 or H8S is also used in many Thinkpads as the embedded controller:

https://www.thinkwiki.org/wiki/Embedded_Controller_Chips

peter_d_sherman · on April 19, 2021

>The rest of this page explains how to compile and install a "bitstream" file to implement this processor in a cheap (about $50) FPGA board, then how to build Linux for that board and boot it to a shell prompt.

>Numato: The cheapest usable FPGA development board ($50 US) the j2 build system currently targets is the Numato Mimas v2 (also available on amazon). It contains a Xlinux "Spartan 6" LX9 FPGA that can run a J2 at 50mhz, 64 megs of SDRAM, USB2 mini-B, and a micro-sd card slot.

PDS: Nice!

But, it would be an additional serious "would be nice"(!) -- if this could run on Lattice FPGA's / IceStorm Open Source Toolchain:

https://www.latticesemi.com/Products

http://www.clifford.at/icestorm/

https://github.com/YosysHQ/icestorm

freemint · on April 19, 2021

You might be interested in https://github.com/j-core/jcore-j1-ghdl

alexhutcheson · on April 19, 2021

I think https://github.com/j-core/j-core-ice40 is the better repo to check out.

freemint · on April 19, 2021

actually yes.

jsnell · on April 19, 2021

Previously on HN:

https://news.ycombinator.com/item?id=12105913

https://news.ycombinator.com/item?id=20658584

Is the project still alive? There's no news for over 4 years.

cbmuser · on April 19, 2021

Yes, the first Turtle Boards have been delivered and kernel support on SH has been improving here and there.

I‘m Debian‘s sh4 maintainer and I have done quite a lot to keep the port alive and improve it.

MaxBarraclough · on April 19, 2021

> I‘m Debian‘s sh4 maintainer and I have done quite a lot to keep the port alive and improve it.

Neat! What's your reason for doing this? Just a fan of retro architectures, or something else?

cbmuser · on April 19, 2021

Just for fun and for educational purposes.

I‘m maintaining most unofficial architectures in Debian that are part of „Debian Ports“.

sitkack · on April 20, 2021

You should do an AMA.

What does that entail?

What skillset does one need?

How do you connect with your users? The ones with the weird hardware?

pantalaimon · on April 19, 2021

Nice! Are there still plans for J3/J4?

cbmuser · on April 19, 2021

The J-Core people are planning to release a J4 CPU, but I don’t know when that will happen.

I just received my Turtle Board with J2 and will give it a go how far I will get with that.

freemint · on April 19, 2021

I'm in contact with the company which open sourced the J-Core family and of processors. They have much in the pipeline they want to open source. However they just haven't found the time to open source the next stuff since they need to clean it up before release and don't just want to occasionally dump a tarbal but provide a full versioned history of their repository.

rdpintqogeogsaa · on April 19, 2021

The mailing list[0] has been trucking along. Not with a lot of force, but at least one post per month.

[0] https://lists.j-core.org/pipermail/j-core/

zdw · on April 19, 2021

There's some activity in the FPGA retrogaming emulation community about the Sega Saturn and other consoles that use the SH2, along with FPGA reimplementations of the graphics chips: https://twitter.com/srg320_/status/1363880878263443457

I wonder if this is the CPU core that's being used.

freemint · on April 19, 2021

The Sega Saturn is not an good target to emulate as a lot of code relies on cycle perfect due the lack of locking between the two cores. The J2 has different cycle times from the SH2 used in the Saturn.

Since the J2 is quiet modular it might be possible to modify the timings but i somewhat doubtful that 2 vector units and two cores will fit on an affordable for hobbyist FPGA but i honestly don't know.

Edit: Changed DreamCast -> Saturn

monocasa · on April 19, 2021

That's interesting about the different cycle times given that SH2 is probably one of the most pure 'one instruction per cycle' RISCs out there, even breaking apart the iterative divide into each cycle's operation.

Is there any chance you can expand on the issues there?

JohnBooty · on April 19, 2021

The Saturn has a complex array of custom chips, including a second SH2, two graphics chips, a DSP, etc. All sharing access to various resources such as memory.

https://www.copetti.org/writings/consoles/sega-saturn/

So, it's notoriously difficult to code for (if you want peak performance) and tough to emulate. I think each individual core is easy to emulate, but the whole salad of chips is another story.

Coding Secrets on YouTube has some videos on writing code for the Saturn, like this one. Not SH2 specific, but wow.

https://www.youtube.com/watch?v=czqwd43WQWM

https://www.youtube.com/watch?v=wU2UoxLtIm8 (mentions the incomplete documentation from Sega, as if the system wasn't complicated enough without iffy documentation as well)

nwallin · on April 20, 2021

The Sega Saturn was ridiculous. "Chip salad" is a good way to start characterizing it, but I feel the need to drive it home.

* Two SH-2s. These CPUs occupied different places on the buses (plural) and were generally running different code to perform different tasks. It is incorrect to think of the two CPUs on the Saturn as being equivalent to x86 SMP; (or dual core, for that matter) the two CPUs were not interchangeable.

* An SH-1. This controlled the CD-ROM, and was only able to run code baked into ROM.

* The VDP1. (video display processor #1) It drew sprites and polygons.

* and VDP2, which drew backgrounds.

* A bus controller. The two SH-2s were hard pressed to deliver the computation needed for geometry transforms, so the bus controller contained a DSP which was often used to calculate geometry.

* A 68000. The Amiga, Macintosh, and Sega Genesis (and lots of others) used a 68000 as a CPU. This was generally given control of the...

* FM synthesizer. It was based on the YMF262, aka the OPL3, which drove the Sound Blaster 16.

I think that one of the SH-2s was tied more tightly to VDP1 and the bus controller DSP, and the other was tied more tightly to VDP2. I think I remember reading that a general common architecture was for game logic and VDP2 control to run on one SH-2, and the other SH-2 would feed geometry into the DSP which would feed into VDP1.

The VDP1 was a strange beast. Basically every hardware graphics accelerator you've ever heard of has used polygons as its basic primitive, but not the Saturn. The Saturn used quadrilaterals. It's not incorrect to think of the VDP1 as being a super duper sprite engine that was fast enough to build 3D stuff with sprites.

The VDP1 was not powerful enough to draw the whole screen. If a 3D Saturn game was to look good, significant parts of the screen would have to be drawn by the VDP2. It was mostly up to the task; it could display something like 7 layers of background, with two layers having full perspective control and 5 more with decreasing levels of scaling/translation. (SNES Mode 7 gave 2 layers, one with partial perspective control (I'm forgetting the technical term) and another with just translation) A good looking level/game required incorporating the VDP2's perspective background layer into the level. But since the VDP2 took different parameters than the VDP1 did, lots of games matched the VDP2's background very poorly with the VDP1's foreground.

That system was wild.

justinjlynn · on April 20, 2021

I honestly do sometimes feel a kind of nostalgia for the diversity and exploration offered by classical console systems. It felt like any unique, but great, hardware architectural idea could take the world by storm. Now it feels like everything has become fairly uniform - an application specific personal computer with essentially the same design. But then again, nostalgia is never what it used to be... So... Eh. You know.

JohnBooty · on April 20, 2021

I know what you mean. Things are objectively better now, but they used to be weird and a lot of us love that about the Saturn. (I just started exploring this wonderful mess of a console this year)

What's tragically hilarious about the Saturn is that despite the deep weirdness and absolute gobs of silicon, it's still not particularly good at anything.

...relative to its contemporary peers, I mean. The Saturn was an absolute beast at traditional 2D games, but even then, due to the paltry onboard RAM it required the 4MB RAM expansion cart to really shine and do things that the Playstation couldn't.

A good counterexample might be the original Amiga, which was "weird" and employed gobs of custom silicon and was indeed actually a generation or so ahead of its peers when it came to graphics and sound.

freemint · on April 19, 2021

I would have to ask or rewatch all J-Core talks. I recall something like that:

"We traded die space for speed so one instruction which takes one cycle on an SH takes 2 on J-Core."

If you are really interested i can see if i can come back to you.

deaddodo · on April 19, 2021

> The J2 has different cycle times from the SH2 used in the dream cast.

The Dreamcast uses an SH4, not SH2; which is a superset of the SH2 instruction set, let alone completely different timings.

freemint · on April 19, 2021

You are correct. My bad. But the timing of the J2 and SH2 also differ.

yjftsjthsd-h · on April 19, 2021

So this feels like it begs the question: What's the big deal with RISC-V if this exists? If we've had a Linux-capable BSD-licensed patent-free processor since 2015, why did we spend so much time and effort and money to make another completely new ISA?

freemint · on April 19, 2021

They started before RISC-V existed.

The reason why they build their own implementation of an CPU and not used some open Mips or Sparc is that they found that those used to much BUS bandwidth for their use case. SuperH was really well designed architecture with really good instruction set density.

Instruction set density is not a thing tagged on to a spec which many cores don't implement and if it is implemented you end up having a mess of mixed 32 and 16 bit instructions. SuperH is cleaner in that everything is a 16 bit instruction which helps with making decoder cheaper.

yjftsjthsd-h · on April 19, 2021

But that's an even stronger reason for j2 to have taken off and risc-v to never be created, isn't it? So what went wrong?

monocasa · on April 19, 2021

SH processors are very good contenders in their specific gate count niche, but have a lot of headaches for implementing higher performance cores that RISC-V was attempting to explicitly address. A lot of exposed pipeline kind of stuff. Only specified a soft fill TLB. The ISA is very constrained by being 16-bit only that gets in the way of density when compared to other more modern instruction bandwidth conscious RISCs (although it can't be stated enough how much SH inspired all of the later examples, including Thumb and RV-C).

Additionally SH variants with full MMUs would have lost their patent protection after RISC-V was already released and gaining steam.

freemint · on April 19, 2021

Very good summary.

One example of such an head ache would be the branch delay slot. It works really well if you have the compiler perform an reorder of instructions for you so the chip can do more while waiting for memory.

In a modern pipeline CPU with register renaming, out of order execution this ISA level reordering would be at best complicating the implementation.

The branch delay slot makes sense where you aim to wait 1-3 cycles for memory loads but becomes a meaningless complication in the face of 10's or 100's of cycles waiting for memory seen on high clocked CPUs today.

freemint · on April 19, 2021

I never thought about it that way. So let me first answer your question and then expand on the underlying issue.

You might say "J-Core failed" because of a lack of privilege and because of scope.

J-Core just wasn't pushed by an top university and had multiple big companies invest in it.

J-Core was never ment to be an industry standard covering use cases from embedded to server to super computer accelerators with an ISA designed by a comittee (i don't use designed by compute sneeringly here). J-Core was meant as a patent troll proof way for a small company to design an competitive chip for their use case on a "shoe string budget" (as far as CPU design budgets are concerned). The only reason why the wider world here about it, is because people in the company believe in open source and they didn't want anybody to have to redo this work if it fits someone's use case.

RISC-V was originally created because some greedy company wanted much money for their CPU architecture to be used in the course of an professor at an prominent university.

A CPU architecture is a set of trade offs. RISC-V traded compactness of the instruction set for the flexibility an 32 bit instruction set allowed and to gain back some density by having an optional compact instruction set at the cost of making the encoder slightly more complex. J-Core always has an Multiplier-Accumulator, while this optional for RISC-V. RISC-V traded suitableness for embedded use cases for agreeableness (You don't have to implement some stuff, if you don't need it), flexibility, mass appeal and i think those trade-offs worked out for what they had in mind.

If the team behind J-Core took that route they wouldn't have shipped a product and had a half finished obscure standards document instead.

Symmetry · on April 19, 2021

Sure it's an open ISA with some advantages but if I was going to have an undergrad design a small processor in a single semester I'd certainly go with RISC-V just because it's so simple and regular. And likewise with a simple out of order processor for a grad student or senior project.

And then there' the design for extensibility and only implement what you need that makes RISC-V appealing in the embedded space even apart from being free to use.

chasil · on April 19, 2021

Notice that it does not implement an MMU, nor was 64-bit ever implemented in the instruction set.

yjftsjthsd-h · on April 19, 2021

> J2 is a nommu processor because sh2 (the processor in the Sega Saturn game console) was, and the last sh2 patent expired in October 2014. The sh4 processor (dreamcast) has an mmu, but the last sh4 patents don't exire until 2016. (Update: we're probably implementing a simpler MMU design which will run the same userspace software but require kernel and QEMU updates, which we'll submit upstream when ready.)

Seems like it was in the pipeline, and still should be easier to add an MMU to a fully-fleshed system than create a whole new ISA. For that matter, there are plenty of RISC-V implementations without MMUs, aren't there? So they should be exactly the same.

The 64-bit thing I'll give you, although it isn't obvious to me how much that would matter for most places where people are using ARM or RISC-V so far; it limits you to 4G of memory, but in embedded uses (phones, routers, raspberry pi) who cares?

freemint · on April 19, 2021

To my knowledge an J32 with an MMU exists. It is just not open sourced yet.

protomyth · on April 19, 2021

The SH-5 was the 64-bit model that never really launched although it did have gcc support at one point.

dleslie · on April 19, 2021

> SH2 made it to the United states in the Sega Saturn game console, and SH4 powered the Sega Dreamcast.

And the 32X used the SH2!

ulzeraj · on April 19, 2021

Some versions of the HP Jornada handheld had an SH3 Hitashi processor. I remember running running JLime Linux on a 690e. The Linux system had to be installed on a CF card along with a binary that would kill Windows CE and boot the Linux kernel.

antattack · on April 19, 2021

Is it possible to predict of SuperH performance per Watt compared to Risc-V or ARM?

It seems to be a great candidate for IoT CPUs.

pjscott · on April 19, 2021

SuperH and RISC-V and ARM are instruction sets, not microprocessors. The performance per Watt depends on the specific microprocessor architecture, the physical manufacturing technology, etc., much more than the instruction set.

antattack · on April 19, 2021

Thanks. I assumed that ISA influences architecture more because there are ARM IoT processors but not X86.

klelatti · on April 19, 2021

I think that it's actually not an unreasonable question and the prior answer was a little short!

For example ARM has different ISAs depending on the application: A for more complex Application processors, R for Realtime and M for Microcontroller. The ISA for M is less complex, less complex to implement and likely to use less power than A (or than a full x86 implementation).

Not an expert but from what I've seen Super-H seems like a good candidate for a low power applications.

freemint · on April 19, 2021

There definitely is an influence but the margin of ISA on power consumption are much smaller than what differentiation inside an ISA using engineering allows for. An ULTRASPARC with a huger register window is never going to be powesaving compared to an Atom but an i9 thread can eat more than an ULTRASPARC II as Intel is currently keen on showing.

userbinator · on April 20, 2021

because there are ARM IoT processors but not X86.

Intel did try... https://en.wikipedia.org/wiki/Intel_Quark

freemint · on April 19, 2021

Since no J-Core chip, as far as i am aware, was produced on an recent node, so comparisons would need to be based on simulations.

If you were to talk with a specific foundry and shell out the money to get access to proprietary simulators and other proprietary information you could get some simulated number. I am not aware of any such numbers.

I heard "decade on a button cell at chip inside free toy inside kids magazine cost" in a presentation once and that probably refers to a 180 nm process but this is hardly a concrete number.