Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
sva_
on Feb 28, 2024
|
parent
|
context
|
favorite
| on:
The Era of 1-bit LLMs: ternary parameters for cost...
I'm also curious about the potential speed gains in automatic differentiation, as there are way less branches to 'go up'. Or am I wrong here?
lumost
on Feb 28, 2024
[–]
They actually use a relu to represent the model weights. But I'm not convinced that this can't be avoided. We do gradient boosted decision tree training without this trick.
Join us for
AI Startup School
this June 16-17 in San Francisco!
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: