Hacker News new | past | comments | ask | show | jobs | submit login

A lot of previous transformer-based models did not have great depth simply because "shallow" transformers already took forever to train.

I am concerned that we are approaching an age where only massive companies and research groups with tons of GPU resources will be able to train next-gen models.




This study suggests that the number of parameters can be significantly reduced using deeper networks, which is a consistent finding in ML research. So these results could actually help to decrease the amount of computing resources required.


I reached the same conclusion but was actually happy about it. When I just started working on neural networks I had almost no competition. Then starting from ~2015 suddenly everybody did neural networks, but all backprop. Right now almost nobody can work on backprop anymore because to improve the status quo you need massive budgets. So I can finally work in peace again on other more obscure neural networks that are way more fun in my opinion. The work is more creative than just tweaking backprop a bit more.


> approaching

We were already there in 2020.


That's why open research groups like EleutherAI are critical. They've used elbow grease and enthusiasm to follow in the tracks of OpenAI and Microsoft, and some of their members have prolifically coded implementations of the latest and greatest theory in papers.

Huggingface is also worthy of great praise, making the models accessible and lowering the barrier to actual commercial use.

Lastly, Google deserves props for Colab. You can run models that are 80% as good as the ones which initially cost millions of usd to produce. Even the free tier is excellent, and I've pointed various people to them as a resource for developing practical hands-on experience with machine learning.

This is a wonderful ai summer we're having, appreciate it while you can!


There is still plenty of interesting and relevant ml research available to people who don't have access to CERN-sized compute clusters, just like there is still plenty of interesting and relevant research for physicists outside of CERN.


> only massive companies and research groups with tons of GPU resources will be able to train next-gen models

Oh, don't be. We are already at this stage.


I'm not too concerned GPUs scale very well horizontally and aren't as much effected by Moore's Law as Single Thread Bound algorithms. We just discovered a new class of algorithms and hardware needs time to catch up from big gpu clusters to end user hardware


I'm not particularly concerned about this tbh. What I mean is that ML is highly relevant with less resources as well, and the fact the only a handful companies or research groups have the financial power to push the state of the art isn't really a problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: