Hacker News new | past | comments | ask | show | jobs | submit login

Well it's not a completely meaningless metric as it immediately tells you roughly how much memory you need to load it, which is kind of important?



If you look at my suggestion, it's to state exactly that memory -- rather than to estimate based on bits/parameter.


Well then do explain a bit further, I still don't fully grasp what "100s PT in 0.5T" means exactly. 100 petatokens in half a trillion? Half a terrabyte? 100 seconds?

Plus afaik base model training tokens don't have the same effect as fine tuning tokens, so there would need to be a way to specify each of those separately.


FWIW I easily interpreted these as '100s of petabytes' and '0.5 terabytes' without having to give it too much thought. The original comment explicitly specified 'bytes' as the unit being suggested.


I edited to be TB,PB --- I was thinking of these as prefixes on bytes




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: