Hacker News new | past | comments | ask | show | jobs | submit login

Oddly, they are only charging slightly more for their hosted version:

open-mistral-7b is 25c/m tokens open-mistral-nemo-2407 is 30c/m tokens

https://mistral.ai/technology/#pricing




They specifically call out fp8 aware training and TensoRT LLM is really good (efficient) with fp8 inference on H100 and other hopper cards. It's possible that they run the 7b natively in fp16 as smaller models suffer more from even "modest" quantization like this.


Possibly a NVIDIA subsidy. You run NEMO models, you get cheaper GPUs.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: