I am running fp16 LLaMA 30B (via vanilla-llama) on six AMD MI25s. Computer has 384 GB of RAM but the model fits in the VRAM. It takes up about 87 GB of VRAM out of the 96 GB available on the six cards. Performance is about 1.6 words per second in an IRC chat log continuation task and it pulls about 400W additional when "thinking."
I wanted to use my inexpensive Chinese fiber laser engraver without the buggy, Windows-only EzCAD2 software. So I reverse engineered the protocol and wrote some simple tools for interfacing with it. GPL3.