Hacker News new | past | comments | ask | show | jobs | submit login

My long-standing observation has been that while nature may abhor a vacuum, she also really, really loves sigmoids.

That performance vs. training data is not linear, but logarithmic, doesn't exactly come as a surprise.




The question that is unanswered, is the logarithmic performance improvement the result of better sampling of the underlying distribution over time, or related to just doing more training with slight variations to effectively regularize the model so it generalizes better? If it's the former, that indicates that we could achieve small models that are every bit as smart as large ones in limited domains, and if that's the case, it radically changes the landscape of what an optimal model architecture is.

I suspect from the success of Phi3 that it is in fact the former.


Every exponential is really an s curve?


in physical systems, it’s very often the case!


Pretty much always. One exception might be the expansion of the universe


Symmetries in Nature strike again! It's just like in Noether's theorem.


Noether's theorem has nothing to do with S-curves.


The physical world has limits, that's why sigmoids everywhere.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: