Hacker News new | past | comments | ask | show | jobs | submit login

My initial guess is they have nothing to do with each other. It would be like explaining why the next idea pops in your head. You can create a rational explanation but there's no way to test it.



my thoughts too, based on limited understanding of GPT. but the more pressure you apply towards compressing the neural network during training, the more circuitry these paths are likely to share. it would be interesting to see just how much and which parts could be folded together before you start to lose significant fidelity (though unfortunately the fidelity seems too low today to even try that).




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: