Hacker News new | past | comments | ask | show | jobs | submit login

Latency between GPUs kills performance



It depends on just how huge the model is. Some models take multiple seconds to run/backpropagate and might take hundreds of gigabytes of memory, in which case it could be useful.


Also seems like a problem that could be partially solved by tailoring the NN architecture. Does that make sense?


Do you mean like Stochastic Gradient Descent does?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: