Hacker News new | past | comments | ask | show | jobs | submit login
A trustworthy, free (libre), Linux capable, self-hosting 64bit RISC-V computer (cmu.edu)
246 points by caned 7 months ago | hide | past | favorite | 57 comments



> The chip foundry wouldn't know what the FPGA will be used for, and where the proverbial "privilege bit" will end up being laid out on the chip, which mitigates against Privilege Escalation hardware backdoors. Exposure is limited to DoS attacks being planted into the silicon during FPGA fabrication, which yields a significantly improved level of assurance (i.e., the computer may stop working altogether, but can't betray its owner to an adversary while pretending to operate correctly).

I suppose in theory the FPGA could contain a hidden CPU that has full read/write access to the FPGA program.

Further, if the system becomes popular and more FPGAs need to be produced for the same system or the next generation, then the foundry has additional information and they can make a good guess of where the privilege bit will be. Even simpler, they could program an FPGA with the code and figure it out manually.


I suppose in theory the FPGA could contain a hidden CPU that has full read/write access to the FPGA program.

All of them do at this point. It isn't hidden.

You can't buy a large FPGA without an ARM core in it. The ARM cores all have an opaque signed blob running in EL3 that you can't replace. This isn't a soft core on the fabric; it's dedicated silicon. And it has access to the ICAP (internal configuration access port) on Xilinx devices, and the equivalent on all the other manufacturers.


I'm not sure what you mean. The biggest FPGAs do not have hard processors built in. Those which do feature hard cores usually advertise it specifically.

See Xilinx's UltraScale+ Kintex and Virtex series, both able to have more logic elements than the Zynq MP equivalents https://docs.amd.com/v/u/en-US/ds890-ultrascale-overview


yea op (the person you're responding to) is wearing a tinfoil hat. source i work for AMD (formerly xilinx). there are about a billion reasons it isn't true (starting with we'd charge you more money for the IP instead of hiding it) and ending with ultrascales are sold to DoD and there's very little room for shenanigans there.


What is it for?


The ARM Core?

It's usually used for anything software you need and the FPGA gets used for any domain specific circuit for signal processing and what not in hardware. The ARM core usually runs a linux or any other OS and is used for communication with other devices. It's just able to run at a higher frequency than most soft-cores, because it's a fully integrated and optimized circuit. So in theory that's a nice thing and gives you more space for custom circuits on the programmable logic part.

PS: The latest polarfire chips from Microchip use a RISC-V CPU. Since AMD has announced the microblaze-v (their proprietary soft-core CPU with RISC-V ISA), I assume they soon (tm) will release their zynq range with dedicated RISC-V in near future, too. But even if it's an open instruction set, it's still a closed source CPU


I think backdooring the RAM would be easier. Modern DRAM has lots of complicated features (e.g. link training, targeted refresh, on-die ECC). I don't know exactly how it's implemented, but that's plenty of complexity to provide cover for backdoors.

It should be possible to add something that watches for specific memory access patterns and provides arbitrary read/write capabilities when the correct pattern is detected. This could be used for privilege escalation from untrusted but sandboxed code, e.g. JavaScript. It could work with any CPU architecture or OS, because the arbitrary memory reads could be used to detect the correct place to write.

This would be less effective with DIMMs or other multi-chip memory modules, but RISC-V computers are usually small single-board computers that only have a single DRAM chip.


Could you encrypt the data you send to the RAM?


Yes, and hardware support for encrypted RAM already exists:

https://en.wikipedia.org/wiki/Trusted_execution_environment

However, this will never be perfectly secure against backdoored RAM in a multitasking environment, because the memory access patterns alone leak information. Additionally, I don't think any of these systems support authenticated encryption, which means you could do things like corrupt branch targets and hope to land on a big NOP slide you control.


This sort of thing is analogous to the "Thompson hack" [1], where a malicious compiler has a self-propagating backdoor. It never shows up in the source code, but self-injects into the binaries.

Thompson demonstrated this under controlled conditions. But realistically, the backdoor begins to approach AGI-level cunning to evade attempts at detection. It has to keep functioning and propagating as the hardware and software evolve, while still keeping a profile (size, execution time, etc.) low enough to continue evading detection.

Work like this that rebuilds modern computing on a completely different foundation, would seriously disrupt and complicate the use of this type of backdoor.

https://en.wikipedia.org/wiki/Backdoor_(computing)#Compiler_...


FFS - when a smart person designs something clever it "begins to approach AGI-level cunning"? AGI doesn't exist, at the moment it's purely mythical


I think they meant that a realistic attack would need to be an AGI, not that an AGI was built.


I wonder as well whether it wouldn't just be easier to snoop I/O and somehow exfiltrate the data. (This would be completely impractical for dragnet surveillance, of course – but I'm sure if a state actor knew that some organization was using this technique to avoid surveillance, _and_ was using a predictable software setup...)


> I suppose in theory the FPGA could contain a hidden CPU that has full read/write access to the FPGA program.

Even if it did, it would be exceptionally difficult for that CPU to identify which registers/gates on the FPGA were being used to implement which components of the soft CPU. The layout isn't fixed; there's no consistent mapping of hardware LUTs/FFs to synthesized functionality.


Even if the mapping changes, the network (graph of logic gates) will locally be similar. So a subgraph matching algorithm might be all that is needed.


That would you mean you connect your hidden CPU to essentially every wire inside the FPGA. Trivial to detect, and extremely expensive, and probably even impossible considering timing model.


There's no need for such complexity. FPGAs read their programming from an i2c eeprom/flash when they boot, the hidden CPU just has to sniff that bus to get the entire bitstream and know the mapping.


And then you know that mapping. That still means you will need to connect to arbitrary wires. If you have the mapping but you aren't connected to the wire you want to disrupt or sniff then tough luck you can't do anything.

Theoretically what you could do is MITM the bitstream, upload it to a server. Resynthesize, place and route with your sniff wires connected and write that back flash. But now you have to hide a radio, and either force a restart or hope a restart will happen.


It's certainly non-trivial to put a hidden CPU in a FPGA that has full read/write access. The wire configuration inside the FPGA will be different for every design loaded into, hell even for the same design the place and router will do different things. So to what will you connect your hidden CPU?


Without having thought it through fully I feel like the classic "trusting trust" attack could work at the fpga/bitstream level.


For a nation state the most useful thing would be a "kill bit" where you can broadcast some signal or key and disable all your enemy's computers. That's fairly easy to do in an FPGA - the signal would be detected by the serdes block(s) and the kill bit could just kill the power or clock or some other vital part of the chip.


> For a nation state the most useful thing would be a "kill bit" where you can broadcast some signal or key and disable all your enemy's computers.

CNE is generally considered to be far more valuable than CNA. First of all keep in mind that all genuinely sensitive systems are air gapped, so you can't effectively broadcast a signal to them.

Second, CNA is a one-off; after the attack you will typically lose access. CNE access on the other hand can persist for years or even decades, and will be beneficial in both "cold" scenarios for political and economic maneuvers, and closer to a flashpoint. CNA, on the other hand, is usually only relevant when a conflict is turning hot.


No, it's the opposite. The FPGA makes it much harder to hide a trojan in the silicon. If the LUTs were biased, it would be detected fairly quickly. A dedicated circuit with an RF interface would be equally obvious in terms of chip usage and power draw.


I didn't say anything about modifying the LUTs or adding RF interfaces, I don't know where you got that from.


How would the in-field FPGA receive the broadcast in your scenario?


Even better, a logical fuse that would make recovery impossibly expensive and timely.


It’s really quite amazing to login a linux shell on an orangecrab FPGA running a RISV-V softcore, built using an open source toolchain. That was impossible not so long ago! At best you’d have something like Xilinx PetaLinux and all their proprietary junk.


Fun thing is that orangecrab's FPGA is not even a requirement.

A tiny iCE40 LP1K will fit SERV (and even QERV) no prob.

It's amazing how small a fully compliant RISC-V implementation can be.


This is and will be a rallying moment soon for the community, both open hardware and software finally working together! This will be huge by the end of the decade.


I guess using some local definition of "huge".


> This is and will be a rallying moment soon for the community

My guy this thing is 4 years old. Spoiler alert: it wasn't.


I'm kind of going the same direction, but different route. My design is based on VexRiscv and all hardware is written in SpinalHDL. It does not run Linux yet because of limited SRAM (512KB) on my Karnix board, but it has Ethernet and HDMI. I have coded a CGA-like video adapter with HDMI interface that supports graphics (320x240x4) and text (80x30x16) modes with hardware assisted smooth scrolling. :)

If someone is interested, here's a rather brief README: https://github.com/Fabmicro-LLC/VexRiscvWithKarnix/blob/karn...

KiCAD project for the board: https://github.com/Fabmicro-LLC/Karnix_ASB-254


This is cool. I was happy see the prominent reference to my work on diverse double-compiling (DDC), which counters the trusting trust attack. If you're interested in DDC, check out https://dwheeler.com/trusting-trust


Rebuilding the system on itself and validating that the bitfile is the same is nice.

I'm amazed that it could be rebuilt in 512MB (and in "only" 4.5 hours on a ~65MHz CPU.) My experience with yosys (and vivado etc.) is that they seem to want many gigabytes.

> A 65MHz Linux-capable CPU inevitably invokes memories of mid- 1990s Intel 486 and first-generation Pentium processors.

50-65MHz* and 512MB seems comparable to an early 1990s Unix workstation. Arguably better on the RAM side.

*4.5 Mflops on double precision linpack for lowRISC/50MHz


I did something similar in 2022, also with LiteX, but not self-hosting because it used a Kintex-7 FPGA which at least at the time required Vivado for the actual place-and-route. It did result in a open gateware laptop running Linux and Xorg, though (thanks to Linux-on-LiteX-VexRiscV): https://mntre.com/media/reform_md/2022-09-29-rkx7-showcase.h...


Also see RISC-V based Shakti : Open Source Processor Development Ecosystem from IIT-Madras, India - https://shakti.org.in/

Good overview at wikipedia - https://en.wikipedia.org/wiki/SHAKTI_(microprocessor)


hey this is the same guy who did some work for running osx on qemu/kvm. https://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/


This is very, very cool. I've been thinking for a while that a fully self-hosted RISC-V machine is sorely needed. The biggest limiting factor at the moment actually seems to be finding an FPGA board which has enough RAM on board. The target board here has 512 megabytes, I think - but FPGA toolchains are much happier with several gigabytes to play with.


While I love the idea of self-hosting HW and SW, I can't even imagine the pain of building stuff like GCC on 60Mhz CPU. Not to mention the Rocket CPU is written in Scala. I recently stopped using Gentoo on RockPro64, because the compile times were unbearable, and that's a system orders of magnitude faster than what they want to use.


You can definitely go considerably faster. A lot of these FOSS cores are either outright unoptimized or target ASICs and so end up performing very badly on FPGAs. A well designed core on a modern FPGA (not one of these bottom of the barrel low power Lattice parts) can definitely hit 250+ MHz with a much more powerful microarch. It's neither cheap nor easy which is why we tend not to see it in the hobby space. That, and better FPGAs tend not to have FOSS toolchains and so it doesn't quite meet the libre spirit.

But, yes, even at 250MHz trying to run Chipyard on a softcore would certainly be an exercise in patience :)


People used 50Mhz SPARC systems to do real work, and the peripherals were all a lot slower (10mbps Ethernet, slower SCSI drives) with less and slower RAM. But it might take a week to compile everything you wanted, I agree; of course there is always cross-compiling as well.


That was before everything became a snap package in a docker image.


> That was before everything became a snap package in a docker image.

A modern app should consist of dozens of of docker images in k8s on remote cloud infrastructure, all running "serverless" microservices in optimized python*, connected via REST* APIs to a javascript front-end and/or electron "desktop" app, with extensive telemetry and analytics subsystems connected to a prometheus/grafana dashboard.

That is ignoring the ML/LLM components, of course.

If all of this is running reliably, and the network isn't broken again, then you may be able to share notepad pages between your laptop and smartphone.

*possibly golang/protobufs if your name happens to be google and if pytorch and tensorflow haven't been invented yet


Oh I believe in theory a 50Mhz CPU is capable of doing almost everything I need, but it just lacks the software optimized for it. I think a week to compile everything is too optimistic.


Old compilers/IDEs like Turbo Pascal or Think C were/are usably fast on single-digit MHz machines and emulators.

And even if the CPU is 50 MHz, modern DRAM and NVMe flash are very fast compared to memory and storage on 1990s (or older) machines.

Older versions of Microsoft Office (etc.) ran about the same on 50 MHz systems as Office 365 runs today.


I did valuable work on a 2 MHz Apple II with a 4 MHz Z80 add-on running CP/M that I used to write the documentation. The documentation part was just as fast forty years ago as it is now but assembling the code was glacially slow. The 6502 macro assembler running on the Apple too forty minutes to assemble code that filled an 8 k EPROM.


6502 assemblers are amazingly fast on more recent hardware. Something like 60-70ms to run a script to assemble and link an a version of msbasic (AppleSoft) on my old laptop.

https://github.com/mist64/msbasic


I usually only notice typos after hn has disabled editing... ;-(


> I can't even imagine the pain of building stuff like GCC on 60Mhz CPU

Some of us remember what that sort of thing was like, not so very long ago...


I remember when I got CodeWarrior on my PowerMac 6100/60 and suddenly I could answer questions online about weird MacApp problems by making a temporary project with their code and compiling the whole of MacApp in 5 minutes.

Previously that had taken about 2 hours (Quadra with MPW), and I did clean builds only when absolutely necessary.

Truly painful was trying to write large programs in Apple (UCSD) Pascal on a 1 MHz 6502.


Back then GCC was much smaller, and only contained C code, not C++. But sure, let's compare apples and ... much bigger heavier apples.


I made a meme and sent it to my even older coworker with two guys from the office looking pensive. Titled 'The Build Failed Saturday and Again Sunday Night'


At one time many of us dreamed of having a computer that could run as fast as 60MHz. The first computers I used ran around 1MHz. Compilation will take longer on a slower machine, but that really isn't a big deal. If the computer is reliable and the build scripts are correct, you can just let the process run over days or weeks. I've run many tasks in my life that took days or weeks. Cue "compiling": https://xkcd.com/303/

The real problem is debugging. Debugging the process on a slow system can be unpleasant due to long turn-arounds. Historically the solution is to work in stages & be able to restart at different points (so you don't have to do the whole process each time). That would work here too. In this case, there's an additional option: you can debug the scripts on a much faster though less trustworthy system. Then, once it works, you can run it on the slower system.


Wow, I am starting to read all your reading material that you have put up.

It's really what I have always wanted to do and it's more than that because you are using FPGAs. I am from India and I want to help you in any way I can because I also have wanted to go on this journey. It's just amazing I wish you all the blessings.


All the good stuff is here looks like (linked in the op article):

https://github.com/litex-hub/linux-on-litex-rocket

Cool, so didn’t know that you could use screen to connect to a tty/serial. Neat.


(2020)


There's nothing free about FPGAs...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: