Hacker News new | past | comments | ask | show | jobs | submit login

This. Try any recent network with TF-TRT and you'll find that memory is constantly being copied back and forth between TF and TRT components of the system every time it stumbles upon an operation not supported in TRT.

As such I often got slower results with TF-TRT than just pure TF, and at most a marginal improvement, even though what TRT does is conceptually awesome from a deployment standpoint, and if it only supported all the operations in TF, it could be a several-fold speed up in many cases.




> even though what TRT does is conceptually awesome from a deployment standpoint

I thought the same until, earlier this week, I realized that if I convert a model to TensorRT and serialize it & store it in a file that file is specific to my device (i.e. my specific Jetson Nano), meaning that my colleagues can't run that file on their Jetson Nano. What the actual fuck.

Do you happen to have found a workaround for this? I really don't want to have to convert the model anew every single time I deploy it. There are just too many moving parts involved in the conversion process, dependency-wise.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: