Show HN: SOTA semantic segmentation with MobileNetV3, in 3 lines of PyTorch code

p1esk · on Aug 12, 2020

Can you please explain what you did there, and how it compares with the official state of the art?

ekzhang · on Aug 12, 2020

Yeah, so this work was done at Nvidia as part of a larger project that required semantic segmentation. But although MobileNetV3 has state-of-the-art performance on many image tasks like classification, detection, and segmentation - there are no public implementations of MobileNetV3 for semantic segmentation with good accuracy.

Looking through implementations on GitHub, we saw accuracies of 40-50% mIoU, which is frankly unacceptable given that the paper claims 72.6% mIoU. So over the past few months at Nvidia, I worked with some researchers from ADLR (https://nv-adlr.github.io/) to implement MobileNetV3 in PyTorch. After a bunch of hyperparameter tuning, we managed to train it to within 0.3% of the accuracy reported in the paper: https://arxiv.org/abs/1905.02244v5. See the "Metrics" section of the GitHub README for more detailed information.

Also, unlike other code releases, this repository is meant to be _easy to use_, in that it works out of the box (just install with pip), and is extremely fast. My goal in open sourcing these models was to make it easier for others to do the same kind of work.

I'm looking forward to seeing what people do with these models. :)

p1esk · on Aug 12, 2020

Awesome, thanks! It would be really valuable if you described what tricks did you use to get to that accuracy. Like what hyperparams turned out to be important, what was missing in the paper, and how did you do hyperparam tuning. Not here, but on the github page. Your advice will probably outlast your results :)