Hacker News new | past | comments | ask | show | jobs | submit login

I believe TensorFlow is a top-tier deep learning framework, and it had ROCm support since 2018.

> edge TPU's are absolutely top-notch for performance per $ and Watt right now

Do you mean "aren't"? The performance per $ and Watt is not awesome even when it was released, I was hoping for great toolchain support but that also didn't happen.




Tensorflow doesn't seem to officially support ROCm, only unofficial community projects do. This is official support from PyTorch.


Tensorflow does officially support ROCm. The project was started by AMD and later upstreamed.

https://github.com/tensorflow/tensorflow/tree/master/tensorf...

https://github.com/tensorflow/tensorflow/blob/master/tensorf...

It is true that it is not Google who are distributing binaries compiled with ROCm support through PyPI (tensorflow and tensorflow-gpu is uploaded by Google, but tensorflow-rocm is uploaded by AMD). Is this what you meant by "not officially supporting"?


What you describe sounds a lot like the PyTorch support before this announcement: You could download PyTorch from AMD's ROCm site or build it yourself for >= 2 years now and this worked very reliably. (Edit: The two years (Nov 2018 or so) are the ones I can attest to from using it personally, but it probably didn't start then.)

The news here is that the PyTorch team and AMD are confident enough about the quality that they're putting it on the front page. This has been a long way in the making, and finally achieving official support is a great step for the team working on it.


Oh, interesting. I do wonder if Google puts the same quality control and testing into the Rocm version, though. Otherwise it would really be a lower tier of official support.

Granted, I don't know anything about the quality of PyTorch's support either.


Jetson Nano: 1.4 TOPS/W, Coral TPU: 2 TOPS/W ?

Of course it doesn't really help that Google refuses to release a more powerful TPU that can compete with e.g. Xavier NX or a V100 or RTX3080 so for lots of applications there isn't much of a choice but to use NVIDIA.


Sorry, should have mentioned "if you have access to Shenzhen" in my post :)

What I have in mind is something like RK3399Pro, it has a proprietary NPU at roughly 3Tops / 1.5W (on paper). But its toolchain is rather hard to use. Hisilicon had similar offerings. There are also Kendryte K210 which claims 1Tops @ 0.3W but I haven't get any chance to try it.

I was already playing with RK3399Pro When Edge TPU was announced, life is tough when you had to feed your model into a blackbox "model converter" from the vendor. That's the part I hope Edge TPU excels at. But months later I was greeted by... "to use Edge TPU, you have to upload your TFLite model to our online model optimizer", which is worse!


There's now a blackbox compiler that doesn't have to run on their service, but it's basically the same as all the others now because of that.


On Xavier, the dedicated AI inference block is open source hardware.

Available at http://nvdla.org/


Are there any <10W boards that have better performance/watt for object detection?

If it exists, I wanna buy it


I used the Intel neural compute sticks [1] for my porn detection service [2] and they worked great. Could import and run models on a Pi with ease

[1] https://blog.haschek.at/2018/fight-child-pornography-with-ra... [2] https://nsfw-categorize.it/


Very interesting project and article. You should consider submitting it to HN for its own post (if you have not done so already).


Jetson Xavier NX, but that comes with a high price tag. It’s much more powerful however.


I'll also add a caveat that toolage for Jetson boards is extremely incomplete.

They supply you with a bunch of sorely outdated models for TensorRT like Inceptionv3 and SSD-MobileNetv2 and VGG-16. WTF, it's 2021. If you want to use anything remotely state-of-the-art like EfficientDet or HRNet or Deeplab or whatever you're left in the dark.

Yes you can run TensorFlow or PyTorch (thankfully they give you wheels for those now; before you had to google "How to install TensorFlow on Jetson" and wade through hundreds of forum pages) but they're not as fast at inference.


> I'll also add a caveat that toolage for Jetson boards is extremely incomplete.

A hundred times this. I was about to write another rant here but I already did that[0] a while ago, so I'll save my breath this time. :)

Another fun fact regarding toolage: Today I discovered that many USB cameras work poorly on Jetsons (at least when using OpenCV), probably due to different drivers and/or the fact that OpenCV doesn't support ARM64 as well as it does x86_64. :(

> They supply you with a bunch of sorely outdated models for TensorRT like Inceptionv3 and SSD-MobileNetv2 and VGG-16.

They supply you with such models? That's news to me. AFAIK converting something like SSD-MobileNetv2 from TensorFlow to TensorRT still requires substantial manual work and magic, as this code[1] attests to. There are countless (countless!) posts on the Nvidia forums by people complaining that they're not able to convert their models.

[0]: https://news.ycombinator.com/item?id=26004235

[1]: https://github.com/jkjung-avt/tensorrt_demos/blob/master/ssd... (In fact, this is the only piece of code I've found on the entire internet that managed to successfully convert my SSD-MobileNetV2.)


They provide some SSD-Mobilenet-v2 here:

https://github.com/dusty-nv/jetson-inference

Yeah, it works. I get 140 fps on a Xavier NX. It's super impressive for the wattage and size of the device. But they want you to train it using their horrid "DIGITS" interface, and it doesn't support any more recent networks.

I really wish Nvidia would stop trying to reinvent the wheel in training and focus on keeping up with being able to properly parse all the operations in the latest state-of-the-art networks which are almost always in Pytorch or TF 2.x.


> They provide some SSD-Mobilenet-v2 here: https://github.com/dusty-nv/jetson-inference

I was aware of that repository but from taking a cursory look at it I had thought dusty was just converting models from PyTorch to TensorRT, like here[0, 1]. Am I missing something? (EDIT: Oh, never mind. You probably meant the model trained on COCO[2]. Now I remember that I ignored it way back when because I needed much better accuracy.)

> I get 140 fps on a Xavier NX

That really is impressive. Holy shit.

[0]: https://github.com/dusty-nv/jetson-inference/blob/master/doc...

[1]: https://github.com/dusty-nv/jetson-inference/issues/896#issu...

[2]: https://github.com/dusty-nv/jetson-inference/blob/master/doc...


You have https://github.com/NVIDIA-AI-IOT/torch2trt as an option for example to use your own models on TensorRT just fine.

And https://github.com/tensorflow/tensorrt for TF-TRT integration.


TF-TRT doesn't work nearly as well as pure TRT. On my Jetson Nano a 300x300 SSD-MobileNetV2 with 2 object classes runs at 5 FPS using TF, <10 FPS using TF-TRT and 30 FPS using TensorRT.


This. Try any recent network with TF-TRT and you'll find that memory is constantly being copied back and forth between TF and TRT components of the system every time it stumbles upon an operation not supported in TRT.

As such I often got slower results with TF-TRT than just pure TF, and at most a marginal improvement, even though what TRT does is conceptually awesome from a deployment standpoint, and if it only supported all the operations in TF, it could be a several-fold speed up in many cases.


> even though what TRT does is conceptually awesome from a deployment standpoint

I thought the same until, earlier this week, I realized that if I convert a model to TensorRT and serialize it & store it in a file that file is specific to my device (i.e. my specific Jetson Nano), meaning that my colleagues can't run that file on their Jetson Nano. What the actual fuck.

Do you happen to have found a workaround for this? I really don't want to have to convert the model anew every single time I deploy it. There are just too many moving parts involved in the conversion process, dependency-wise.


https://github.com/wang-xinyu/tensorrtx has a lot of models implemented for TensorRT. They test on GTX1080 not jetson nano though, so some work is also needed.

TVM is another alternative to get models to inference fast on nano


How does TVM compare to TensorRT performance-wise?





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: