Hacker News new | past | comments | ask | show | jobs | submit login

I'm not seeing DaVinci enabling any scalable business models with its pricing ($0.02/1K token).



Pretty sure it's just a matter of time until it goes the way of Stable Diffusion.


biggest barrier to this is the hardware requirements. I saw an estimate on r/machinelearning that based on the parameter count, gpt-3 needs around 350GB of VRAM. maybe you could cut that in half, or even one-eighth if someone figures out some crazy quantization scheme, but it's still firmly outside of the realm of consumer hardware right now.

stuff like koboldai can let you run smaller models on your hardware though (https://github.com/KoboldAI/KoboldAI-Client).


There already exist comparable EleutherAI models, I believe. Not as good, but pretty good.


The biggest I've found is GPT-J (EleutherAI/gpt-j-6B), which has a model size comparable to GPT-3 Curie, but the outputs have been very weak compared to what I'm seeing people do with GPT-3 Da Vinci. The outputs feel like GPT-2 quality. I'm probably using it wrong, or maybe there are better BART models published that I don't know about?

> Write a brief post explaining how GPT-J is as capable as GPT-3 Curie and GPT-2, but not as good as GPT-3 Da Vinci. GPT-J ia a new generation of GPT-3 Curie and GPT-2. It is a new generation of GPT-3 Curie and GPT-2. It is a new generation of GPT-3 Curie and GPT-2. sentence repeats

Using temperature 1e-10, top_p 1.


The existing models aren't fine tuned for question answering, which is what makes GPT-3 usable. Eleuther or one of those other Stability collectives is working on one.


It's very sad how they had to nerf the model (AIDungeon and stuff). I don't think anything on a personal / consumer GPU could rival a really big model.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: