I agree that a self-driving car is much harder on the basis that it has to handle other cars alone.
With that being said, a small drone like this is harder than it looks for a couple reasons.
First, there's not enough payload capacity for a good, accurate IMU. You're stuck using lightweight but awfully noisy MEMS gyros and accelerometers along with drifty barometers.
Plus, there's not enough payload capacity for an off the shelf LIDAR or other long-range depth sensing. That means detecting obstacles needs to be delegated to 2D imagers plus CV algorithms, with some short-range depth sensing. IR ToF like a Kinect 2 or even a grid-based approach like a Kinect 1 might work in a favorable environment, but I wouldn't trust it outdoors even at close range.
If I had to build a pizza-delivery drone, I'd use a bottom-facing camera like the AR.Drone does to try to provide a position reference independent of the bad MEMS gear, and I'd use CV algorithms combined with a last-ditch short-range obstacle sensor (like IR or ultrasound) to attempt to avoid obstacles. Once I got to the destination, I'd delegate landing to a pilot, since "finding the front door" is a surprisingly hard problem.
Multicopters like the one in the video have no trouble reaching much higher altitudes, where obstacles would likely be fewer and easier to detect. Given a specific delivery area, a reasonable cruising altitude could be preconfigured and the flight could probably be navigated by GPS with no CV whatsoever. Then, like you say, a human pilot could handle the landing and subsequent take-off.
The location of the pizzeria could even be chosen such that take-offs and landings at that end of the trip could be entirely automated, by GPS or perhaps with some much more basic CV.
With that being said, a small drone like this is harder than it looks for a couple reasons.
First, there's not enough payload capacity for a good, accurate IMU. You're stuck using lightweight but awfully noisy MEMS gyros and accelerometers along with drifty barometers.
Plus, there's not enough payload capacity for an off the shelf LIDAR or other long-range depth sensing. That means detecting obstacles needs to be delegated to 2D imagers plus CV algorithms, with some short-range depth sensing. IR ToF like a Kinect 2 or even a grid-based approach like a Kinect 1 might work in a favorable environment, but I wouldn't trust it outdoors even at close range.
If I had to build a pizza-delivery drone, I'd use a bottom-facing camera like the AR.Drone does to try to provide a position reference independent of the bad MEMS gear, and I'd use CV algorithms combined with a last-ditch short-range obstacle sensor (like IR or ultrasound) to attempt to avoid obstacles. Once I got to the destination, I'd delegate landing to a pilot, since "finding the front door" is a surprisingly hard problem.