Learning how to label camera images and how to respond to already detected features are separate issues. You can make the choice of whether to apply unsupervised training to either one separately.
This [1] blog post from Tesla explicitly claim that they are using unsupervised learning as one of their strategies to determine appropriate system response to specific detected objects in specific locations:
> This is where fleet learning comes in handy. Initially, the vehicle fleet will take no action except to note the position of road signs, bridges and other stationary objects, mapping the world according to radar. The car computer will then silently compare when it would have braked to the driver action and upload that to the Tesla database. If several cars drive safely past a given radar object, whether Autopilot is turned on or off, then that object is added to the geocoded whitelist.
> When the data shows that false braking events would be rare, the car will begin mild braking using radar, even if the camera doesn't notice the object ahead. As the system confidence level rises, the braking force will gradually increase to full strength when it is approximately 99.99% certain of a collision. This may not always prevent a collision entirely, but the impact speed will be dramatically reduced to the point where there are unlikely to be serious injuries to the vehicle occupants.
No autonomous vehicle manufacturer uses end-to-end learning. The only one to claim to use it was Comma.ai, and we all know how that went.
All the autonomous car companies will manually label the camera images - e.g. given an image, draw boxes around where all the cars are.