Hacker News new | past | comments | ask | show | jobs | submit login

https://openreview.net/pdf?id=yzkSU5zdwD

https://arxiv.org/pdf/2203.15556.pdf

There were also some informal comparisons of GPT models with various parameter counts.




Excellent info - I did find a bit in the conclusion from the arXiv article:

> While the desire to train these mega-models has led to substantial engineering innovation, we hypothesize that the race to train larger and larger models is resulting in models that are substantially underperforming compared to what could be achieved with the same compute budget.

This mirrors some of my experience. Training/tuning a 7B parameter model feels like goldilocks right now. We are thinking more about 1 specific domain with 3-4 highly-targeted tasks. Do we need 175B+ parameters for that? I can't imagine it would make our lives easier at the moment. Iteration times & cost are a really big factor right now. Being able to go 10x faster/cheaper makes it worth trying to encourage the smaller model(s) to fit the use case.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: