I believe TensorFlow is a top-tier deep learning framework, and it had ROCm support since 2018.
> edge TPU's are absolutely top-notch for performance per $ and Watt right now
Do you mean "aren't"? The performance per $ and Watt is not awesome even when it was released, I was hoping for great toolchain support but that also didn't happen.
It is true that it is not Google who are distributing binaries compiled with ROCm support through PyPI (tensorflow and tensorflow-gpu is uploaded by Google, but tensorflow-rocm is uploaded by AMD). Is this what you meant by "not officially supporting"?
What you describe sounds a lot like the PyTorch support before this announcement: You could download PyTorch from AMD's ROCm site or build it yourself for >= 2 years now and this worked very reliably. (Edit: The two years (Nov 2018 or so) are the ones I can attest to from using it personally, but it probably didn't start then.)
The news here is that the PyTorch team and AMD are confident enough about the quality that they're putting it on the front page. This has been a long way in the making, and finally achieving official support is a great step for the team working on it.
Oh, interesting. I do wonder if Google puts the same quality control and testing into the Rocm version, though. Otherwise it would really be a lower tier of official support.
Granted, I don't know anything about the quality of PyTorch's support either.
Of course it doesn't really help that Google refuses to release a more powerful TPU that can compete with e.g. Xavier NX or a V100 or RTX3080 so for lots of applications there isn't much of a choice but to use NVIDIA.
Sorry, should have mentioned "if you have access to Shenzhen" in my post :)
What I have in mind is something like RK3399Pro, it has a proprietary NPU at roughly 3Tops / 1.5W (on paper). But its toolchain is rather hard to use. Hisilicon had similar offerings. There are also Kendryte K210 which claims 1Tops @ 0.3W but I haven't get any chance to try it.
I was already playing with RK3399Pro When Edge TPU was announced, life is tough when you had to feed your model into a blackbox "model converter" from the vendor. That's the part I hope Edge TPU excels at. But months later I was greeted by... "to use Edge TPU, you have to upload your TFLite model to our online model optimizer", which is worse!
I'll also add a caveat that toolage for Jetson boards is extremely incomplete.
They supply you with a bunch of sorely outdated models for TensorRT like Inceptionv3 and SSD-MobileNetv2 and VGG-16. WTF, it's 2021. If you want to use anything remotely state-of-the-art like EfficientDet or HRNet or Deeplab or whatever you're left in the dark.
Yes you can run TensorFlow or PyTorch (thankfully they give you wheels for those now; before you had to google "How to install TensorFlow on Jetson" and wade through hundreds of forum pages) but they're not as fast at inference.
> I'll also add a caveat that toolage for Jetson boards is extremely incomplete.
A hundred times this. I was about to write another rant here but I already did that[0] a while ago, so I'll save my breath this time. :)
Another fun fact regarding toolage: Today I discovered that many USB cameras work poorly on Jetsons (at least when using OpenCV), probably due to different drivers and/or the fact that OpenCV doesn't support ARM64 as well as it does x86_64. :(
> They supply you with a bunch of sorely outdated models for TensorRT like Inceptionv3 and SSD-MobileNetv2 and VGG-16.
They supply you with such models? That's news to me. AFAIK converting something like SSD-MobileNetv2 from TensorFlow to TensorRT still requires substantial manual work and magic, as this code[1] attests to. There are countless (countless!) posts on the Nvidia forums by people complaining that they're not able to convert their models.
Yeah, it works. I get 140 fps on a Xavier NX. It's super impressive for the wattage and size of the device. But they want you to train it using their horrid "DIGITS" interface, and it doesn't support any more recent networks.
I really wish Nvidia would stop trying to reinvent the wheel in training and focus on keeping up with being able to properly parse all the operations in the latest state-of-the-art networks which are almost always in Pytorch or TF 2.x.
I was aware of that repository but from taking a cursory look at it I had thought dusty was just converting models from PyTorch to TensorRT, like here[0, 1]. Am I missing something? (EDIT: Oh, never mind. You probably meant the model trained on COCO[2]. Now I remember that I ignored it way back when because I needed much better accuracy.)
TF-TRT doesn't work nearly as well as pure TRT. On my Jetson Nano a 300x300 SSD-MobileNetV2 with 2 object classes runs at 5 FPS using TF, <10 FPS using TF-TRT and 30 FPS using TensorRT.
This. Try any recent network with TF-TRT and you'll find that memory is constantly being copied back and forth between TF and TRT components of the system every time it stumbles upon an operation not supported in TRT.
As such I often got slower results with TF-TRT than just pure TF, and at most a marginal improvement, even though what TRT does is conceptually awesome from a deployment standpoint, and if it only supported all the operations in TF, it could be a several-fold speed up in many cases.
> even though what TRT does is conceptually awesome from a deployment standpoint
I thought the same until, earlier this week, I realized that if I convert a model to TensorRT and serialize it & store it in a file that file is specific to my device (i.e. my specific Jetson Nano), meaning that my colleagues can't run that file on their Jetson Nano. What the actual fuck.
Do you happen to have found a workaround for this? I really don't want to have to convert the model anew every single time I deploy it. There are just too many moving parts involved in the conversion process, dependency-wise.
https://github.com/wang-xinyu/tensorrtx has a lot of models implemented for TensorRT.
They test on GTX1080 not jetson nano though, so some work is also needed.
TVM is another alternative to get models to inference fast on nano
> edge TPU's are absolutely top-notch for performance per $ and Watt right now
Do you mean "aren't"? The performance per $ and Watt is not awesome even when it was released, I was hoping for great toolchain support but that also didn't happen.