I'm also curious about the potential speed gains in automatic differentiation, a... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

sva_ on Feb 28, 2024 | parent | context | favorite | on: The Era of 1-bit LLMs: ternary parameters for cost...

I'm also curious about the potential speed gains in automatic differentiation, as there are way less branches to 'go up'. Or am I wrong here?

lumost on Feb 28, 2024 [–]

They actually use a relu to represent the model weights. But I'm not convinced that this can't be avoided. We do gradient boosted decision tree training without this trick.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact