Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Latency between GPUs kills performance



It depends on just how huge the model is. Some models take multiple seconds to run/backpropagate and might take hundreds of gigabytes of memory, in which case it could be useful.


Also seems like a problem that could be partially solved by tailoring the NN architecture. Does that make sense?


Do you mean like Stochastic Gradient Descent does?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: