WebGPU GPT Model Demo

DustinBrett · on April 21, 2023

It indeed works and loads quick. I am more interested currently in the Vicuna 7B example from https://mlc.ai/web-llm/

Also instead of just "Update Chrome to v113" the domain owner could sign up for an origin trial https://developer.chrome.com/origintrials/#/view_trial/11821...

fredrickd · on April 21, 2023

Thanks for the tip!

DustinBrett · on April 21, 2023

Np! Although there may still be some special WebGPU features in v113 that make v94-112 less ideal.

DustinBrett · on April 22, 2023

I actually just tried this on my own domain and although it enabled WebGPU it had another error, which was:

"Find an error initializing the WebGPU device Error: Cannot initialize runtime because of requested maxBufferSize exceeds limit. requested=1024MB, limit=256MB. This error may be caused by an older version of the browser (e.g. Chrome 112). You can try to upgrade your browser to Chrome 113 or later."

FL33TW00D · on April 21, 2023

My 250M parameter model runs in 50ms/token ;)

Releasing April 26th when Chrome 113 hits stable. Open source NPM library you can add to any project.

Preview here: https://twitter.com/fleetwood___/status/1646608499126816799?...

lxe · on April 21, 2023

That's pretty impressive. What model are you using?

FL33TW00D · on April 21, 2023

FLAN-T5 Base currently, 780M parameter variant coming shortly!

VierScar · on April 21, 2023

What kind of model? Question-Answering? I imagine it must be quite specialised at <1b Params when many are 7b, 13b, or more?

doodlesdev · on April 21, 2023

   > WebGPU is supported in your browser!

   > Uncaught (in promise) DOMException: WebGPU is not yet available in Release or Beta builds.

Anyone using Chromium care to chime in?

If no one chimes in I might set up a Chromium browser up just to take a look at this, seems pretty cool.

EMM_386 · on April 21, 2023

I'm on the latest Brave (Chromium 112.0.5615.165) and it tells me it is not supported.

doodlesdev · on April 21, 2023

It needs to be Chromium 113+, otherwise you'll have to enable webGPU in the browser flags like I have for Firefox.

adamkochanowicz · on April 21, 2023

Try chrome canary

zamadatix · on April 21, 2023

Chrome Beta works as well.

brucethemoose2 · on April 21, 2023

Thorium 111 is working with the WebGPU flags enabled.

ianpurton · on April 21, 2023

Question. I can see in the code the WGSL that's needed to implement inference on the GPU. https://github.com/0hq/WebGPT/blob/main/kernels.js

Could this code also be used to train models or only for inference?

What I'm getting at, is could I take the WGSL and using rust wgpu create a mini ChatGPT that runs on all GPU's?

spencerchubb · on April 21, 2023

This repo only does inference but it should be possible to write training code that runs on WebGPU.

HopenHeyHi · on April 21, 2023

No, that is not how that works, sadly you cannot.

FL33TW00D · on April 21, 2023

Why not? You could absolutely write the training code in WGSL.

MuffinFlavored · on April 21, 2023

> At the moment, WebGPT averages ~300ms per token on GPT-2 124M running on a 2020 M1 Mac with Chrome Canary.

How do ChatGPT on GPT-3.5 / GPT-4 compare?

brucethemoose2 · on April 21, 2023

I can do 60 tokens in 24s on my 2020 ASUS G14, Thorium 111 + Windows 10. Nvidia-smi says my RTX 2060 is 24% loaded and has ~1.1G eaten up.

Thats slower than Vicuna 7B (aka LLaMa 7000M, GPT 3ish? model) on linux on the same machine, where I get about 3.5 tokens/sec and 97% usage. So... yeah, performance is not so great yet.

zamadatix · on April 21, 2023

I don't have exact numbers but trying all 3 out: on a local 4090 this seems mildly slower than GPT-4 and ridiculously slower than GPT 3.5 (both via cloud GPUs of course). That said all are within the realm of usability in terms of speed though GPT 3.5 is really in a whole different class of being able to be used nearly interactively without delay.

lxe · on April 21, 2023

Interesting, I'm getting 100ms/token on plain old wasm with 4 threads via ggml, using a 1.7B quantized cerebras model.

luizfelberti · on April 21, 2023

It's really a shame that there is no 8-bit float support in the WebGPU spec. Even though few cards support it natively, it'd still massively benefit ML workloads.

Another annoying constraint but specific to wgpu (Rust's implementation of WebGPU) is that it does not support f16 yet (which IS in the spec), only through SPIR-V passthrough...

tormeh · on April 21, 2023

Any way to run this kind of thing outside the browser? Chrome hasn't enabled WebGPU on Linux yet.

b_mc2 · on April 21, 2023

I've got it running on Chrome v113 beta on Ubuntu with an older AMD RX 580. The feature flags don't seem to be taking for me in chrome GUI but if you start chrome from terminal like this it works.

google-chrome --enable-unsafe-webgpu --enable-features=Vulkan,UseSkiaRenderer --enable-dawn-features=disable_robustness

GPU doesn't work in --headless though.

kristianp · on April 22, 2023

Not available in Firefox yet: https://bugzilla.mozilla.org/show_bug.cgi?id=1602129

junrushao1994 · on April 21, 2023

Is there any plan to support larger models than GPT-2?

eurekin · on April 21, 2023

Omg, no pytorch/wsl/conda hiccups... This could save me some sleepless nights

samueldurante · on April 22, 2023

I didn't understand why I need WebGPU to use WebGPT...