I imagine that edge cases will be flagged and checked by a human operator, a bit like postal services do with machine unreadable addresses. Deep learning isn't perfect - how would you suggest those cases are handled?
I wouldn't make any claims like Amazon did, unless it was an over 95% automated, with humans involved only in some case of dispute etc.
So, face recognition to track people + image recognition to track products (which is not that difficult, since it's a constrained data set e.g. 40.000 products to train, and they can also use information about position to narrow results -- eg. aisle B, so it's one of the soaps, etc).
"image recognition to track products (which is not that difficult, since it's a constrained data set e.g. 40.000 products to train, and they can also use information about position to narrow results -- eg. aisle B, so it's one of the soaps, etc)."
Wish it was that easy. People have a habit of picking stuff up and putting it in the wrong place. Staff at stores spend forever putting stuff back to it's correct place and binning stuff that should have been refrigerated but placed somewhere non-refrigerated.
"Well, the stuff will already have been identified in the picking stage."
Sure, but you take some semi-random object out of the basket and add it to the wrong shelf. Do that a few times and things are going to get much harder.
If we're honest, regardless there will be wrong classifications anyway - if it wasn't for laser scanners humans would make more mistakes as well and these are nowhere near human intelligence.