If I understand the authors correctly, they trained the compared models on only ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

cs702 on Feb 28, 2024 | parent | context | favorite | on: The Era of 1-bit LLMs: ternary parameters for cost...

If I understand the authors correctly, they trained the compared models on only 100B tokens, all drawn from RedPajama, to make the comparisons apples-to-apples. That's sensible. It allows for easier replication of the results. Otherwise, I agree with you that more extensive testing, after more extensive pretraining, at larger model sizes, is still necessary.

lr1970 on Feb 28, 2024 [–]

towards the end of the paper they mentioned training on 2T tokens.

cs702 on Feb 28, 2024 | [–]

You're right. Thank you for pointing that out.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact