Something is wrong with your numbers: gpt-oss-20b and gpt-oss-120b should be much much faster than what you are seeing. I would suggest you familiarize yourself with llama-bench instead of ollama.
Running gpt-oss-120b with a rtx 5090 and 2/3 of the experts offloaded to system RAM (less than half of the memory bandwidth of this thing), my machine gets ~4100tps prefill and ~40tps decode.
Your spreadsheet shows the spark getting ~94tps prefill and ~11tps decode.
Now, it's expected that my machine should slaughter this thing in prefill, but decode should be very similar or the spark a touch faster.
Your system RAM is probably 1/20th the VRAM bandwidth of the 5090 (way way less than half) unless you're running a workstation board with quad or 8 channel RAM, then it's only about 1/10th or 1/5th respectively.
We actually profiled one of the models, and saw that the last GeMM, which is completely memory bound, is taking a lot of time, which reduces the token speed by a lot.
FYI you should have used llama.cpp to do the benchmarks. It performs almost 20x faster than ollama for the gpt-oss-120b model. Here are some samples results on my spark:
Is this the full weight model or quantized version? The GGUFs distributed on Hugging Face labeled as MXFP4 quantization have layers that are quantized to int8 (q8_0) instead of bf16 as suggested by OpenAI.
Example looking at blk.0.attn_k.weight, it's q8_0 amongst other layers:
Under the hood, we used Argon2i algorithm to derive the secret key from an arbitrary-long password string. We used the term "password" because that's what ordinary people will understand (like, zip uses the same term for their secret keys). In practice, people should choose password that's long enough to prevent brute forcing, just like picking a password for your online accounts.
It's a good idea to use a public key system. But it really confuses new users who has never used PKI before. Nevertheless, we have a key exchange feature built into the app that allows 2 parties to negotiate a shared secret using X25519, for advanced users.
Wow! This project seems to do exactly what ours does right now.. with an even better UI/UX.. but they don't seem to support any kind of nonce'ed and key'ed encryption?
For some reason it's no longer on the App Store anywhere.
Ordinary people can use browser extensions OK on desktop, but on mobile it's a mess. Chrome for Android doesn't support extensions, and no one uses the Android browsers that do. Installing an extension for Safari on iOS requires following many unintuitive steps. I hope mobile extensions become easier to install/use with time!
The original version was a browser extension. It was very painful to maintain support for all the different types of input fields. Most large social media sites do not use standard text areas.
Thanks for your advice! However, there are several problems with self hosted platforms in China.
1. People are unaware of their existence due to those projects being very technical and hard to deploy/join. They also don't have a good client on mobile platforms. People will trade their privacy for all the convenience, say, WeChat brings, because all of their contacts are already using WeChat. It's hard to convince people to change to use your matrix server.
2. Cloud services are also monitored by the government. There are programs running in the background inside VPSes that monitors all processes in your server.
3. If you want to host a website, you have to register it with a state agency, so if there are any contents on your website that the government doesn't like, your website will be shut down and you'll be held responsible.
As of the walled garden Apple created, I heard that EU has passed a law mandating Apple to allow third-party app stores. It'll be very interesting to see what'll happen in the future.
As for getting people to join. LEAVE THE APPLE WALLED GARDEN. After that it's entirely as easy as sticking up QR codes of equivalent.
I'm not talking about working within the system. Buy crypto, and with it rent a self hosted NON CHINESE SERVER, not a website, and do your best to keep the box accessible to known popular not yet banned VPN used in China, for the day when the firewall gets you.
Again, with regard to getting people to join these servies, if people aren't willing to sacrifice some minor discomfort of not using the WeChat interface, they're hardly likely to stand next to you in a street protest.
Yes if you want to start a large viral movement you have to dress it up a little, improve the chinese locale or fix some UI issues, but this is massively easier than starting from a text editor or compiler on a remote box. But if you just want to go viral, use WeChat, get a knocked off account or 10 and expect that knock on the door when they turn up because you're protesting _WITHIN_ the system.
Again, VPN are massively technical but hugely popular even in mainland China (I've known enough Chinese apple users to even know this is the case). People are capable of following "click here" instructions better than most people imagined, otherwise technophobes wouldn't have social media.