Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
bob1029
4 months ago
|
parent
|
context
|
favorite
| on:
Deep Learning Is Not So Mysterious or Different
And, the attention mechanism scales quadratically with context length. This is where all of the insane memory bandwidth requirements come from.
Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: