My long-standing observation has been that while nature may abhor a vacuum, she ...

CuriouslyC · 2024-05-09T19:25:01 1715282701

The question that is unanswered, is the logarithmic performance improvement the result of better sampling of the underlying distribution over time, or related to just doing more training with slight variations to effectively regularize the model so it generalizes better? If it's the former, that indicates that we could achieve small models that are every bit as smart as large ones in limited domains, and if that's the case, it radically changes the landscape of what an optimal model architecture is.

I suspect from the success of Phi3 that it is in fact the former.

bilsbie · 2024-05-09T16:18:08 1715271488

Every exponential is really an s curve?

hprotagonist · 2024-05-09T16:31:30 1715272290

in physical systems, it’s very often the case!

mr_mitm · 2024-05-09T17:18:35 1715275115

Pretty much always. One exception might be the expansion of the universe

jskherman · 2024-05-09T17:07:54 1715274474

Symmetries in Nature strike again! It's just like in Noether's theorem.

eru · 2024-05-10T01:46:32 1715305592

Noether's theorem has nothing to do with S-curves.

ixaxaar · 2024-05-09T19:23:17 1715282597

The physical world has limits, that's why sigmoids everywhere.