I actually just tried this on my own domain and although it enabled WebGPU it had another error, which was:
"Find an error initializing the WebGPU device Error: Cannot initialize runtime because of requested maxBufferSize exceeds limit. requested=1024MB, limit=256MB. This error may be caused by an older version of the browser (e.g. Chrome 112). You can try to upgrade your browser to Chrome 113 or later."
I can do 60 tokens in 24s on my 2020 ASUS G14, Thorium 111 + Windows 10. Nvidia-smi says my RTX 2060 is 24% loaded and has ~1.1G eaten up.
Thats slower than Vicuna 7B (aka LLaMa 7000M, GPT 3ish? model) on linux on the same machine, where I get about 3.5 tokens/sec and 97% usage. So... yeah, performance is not so great yet.
I don't have exact numbers but trying all 3 out: on a local 4090 this seems mildly slower than GPT-4 and ridiculously slower than GPT 3.5 (both via cloud GPUs of course). That said all are within the realm of usability in terms of speed though GPT 3.5 is really in a whole different class of being able to be used nearly interactively without delay.
It's really a shame that there is no 8-bit float support in the WebGPU spec. Even though few cards support it natively, it'd still massively benefit ML workloads.
Another annoying constraint but specific to wgpu (Rust's implementation of WebGPU) is that it does not support f16 yet (which IS in the spec), only through SPIR-V passthrough...
I've got it running on Chrome v113 beta on Ubuntu with an older AMD RX 580. The feature flags don't seem to be taking for me in chrome GUI but if you start chrome from terminal like this it works.
Also instead of just "Update Chrome to v113" the domain owner could sign up for an origin trial https://developer.chrome.com/origintrials/#/view_trial/11821...