Hacker News new | past | comments | ask | show | jobs | submit login
Nvidia on new self-driving system: “basically 5 years ahead and coming in 2017” (electrek.co)
177 points by stesch on Nov 12, 2016 | hide | past | favorite | 168 comments



I'm still not happy with self-driving on vision alone, or vision augmented with radar. There are too many hard cases for vision. Everybody who has good self-driving right now - Google, Otto, Volvo, GM - uses LIDAR.

Self-driving is coming to the first end users in 2017, in Volvo's test of 100 vehicles. Volvo has multiple LIDARs, multiple radars, multiple cameras, redundant computers, and redundant actuators. They're being cautious. Yet they're getting there first.

With the new hardware, Tesla ought to be able to field smart cruse control that doesn't ram into stopped vehicles partially blocking a lane. They've rammed stopped vehicles at speed three times now. At least with the new hardware things should get better. Do they still have the radar blind spot at windshield height?


The argument reasoning I've heard goes like this;

People drive reasonably well using vision primarily and with imperfect visibility of their environment.

Computer learning networks can classify imagery at least as accurately as humans and sometimes more so.

A computer using imagery that is well classified from an array of visual sensors with near perfect visibility should be able to drive as well, or better, than a human driver.

The execution strategy appears to be to run classification and command prediction all the time, and while the human is in control consider it supervised learning.

The argument against LIDAR is just this in reverse, humans don't need LIDAR to drive, why should computers?

LIDAR is an engineering solution to the problem of creating a representation of the 3D space around the vehicle. It is a stand in for the less well understood human ability to do the same just by looking around. As a result if the "looking around" solution being proposed by NVidia and Tesla meets the engineering requirement, I don't see any reason that the car should have LIDAR.


Driving has almost nothing to do with image classification. *

Humans implicitly perform SLAM (simulataneous localization and mapping). What do I mean? Look around your room. Close your eyes. Visualize the room. As a human, you've built a rough 3D model of the room. And if you keep your eyes open and walk through the room, that map is pretty fine-grained/detailed too and humans can keep track of where they are in the map.

The state of the art in visual SLAM (visual SLAM = SLAM from just images, nothing else) is not deep learning. It's actually still linear-algebra/geometric/keyframe based traditional computer vision (including variants that incorporate GPS/accelerometer info). There are all sorts of limitations, but the biggest is the current algos don't work when the environment is moving (!!!).

SLAM from LIDAR is solved. That's why people use LIDAR.

You might argue, that perfect SLAM is overkill for driving. And I agree. Humans rely on being able to do lots of things that are "theoretically overkill" for any given task - and maybe that's exactly why, so far, humans can drive and computer can't.

* It bears noting that even in domains like image segmentation, humans still do better than neural nets. (Group pixels in an image into different categories - this is still a caricature of "vision," but still far more representative of real vision than simply giving a global label to an image).


Let me make sure I get this:

When a Tesla (or other non-LIDAR) vehicle is driving, it is not continuously building a 3D model of its environment. Instead, it is matching patterns on the road, and "understanding" based off what it sees in an otherwise flat image.

Whereas LIDAR vehicles use the LIDAR technology to develop a map of the world around them, for additional understanding?


It's unclear if Tesla is building a 3D model or not. For the cameras, at least looking at Mobileeye's demos, it appears not. But Tesla also is using radar - it's unclear whether they're using the radar for just collision detection, or for building a full 3D map.

LIDAR gives you a 3D map of the surroundings, yes.


No. They're both building a map and trying to place themselves in it. Using LIDAR makes this much easier to do.


It's not clear whether Tesla has a local map. They've never shown pictures of one, unlike Google. The Mobileye unit definitely does not; you can buy a Mobileye aftermarket unit and watch it put rectangles around things that look like cars and people. Tesla's radar probably just returns targets and range, like most first-generation automotive radars. Tesla's system may be entirely reactive.

Google builds maps; they do path planning, so they have to.


> People drive reasonably well using vision primarily

This is not accurate however. Other important senses in use include proprioceptive, hearing and tactile feedback from wheels. In addition to vision and the improved dynamic range of eyes, there is the important fact that human vision integrates a world model into expectations. Human vision also models time and motion which help manage where to focus attention. Humans can additionally predict other agents and other things about the world based on intuitive physics. This is why they can get on without the huge array of sensors and cars cannot. Humans make up for the lack of sensors by being able to use the poor quality data more effectively.

To put this in perspective, 8.75 megabits / second is estimated to pass through the human retina but only on the order of a 100 bits is estimated to reach conscious attention.

> Computer learning networks can classify imagery at least as accurately as humans and sometimes more so.

This is true but only in a limited sense. For example, when I put in the image on the right (of a car in a swimming pool) from http://icml.cc/2015/invited/LeonBottouICML2015.pdf#page=58 (which you should read and find the talk of but) in ResNet I get as top results:

0.2947; screen, CRT screen

golfcart, golf cart

boathouse

amphibian, amphibious vehicle

For LeNet it's:

0.5422; amphibian, amphibious vehicle

jeep, landrover

wreck

speedboat

The key difference is learning in animals occurs by breaking things down in terms of modular concepts, so even when things are not recognized new things can be labeled as a composition of smaller nearby concepts. Machines cannot yet do this well at all and certainly not as flexibly. Things as lighting and shading do not move animals as much in the concept space.

> The execution strategy appears to be to run classification and command prediction all the time, and while the human is in control consider it supervised learning.

This strategy will not learn from accidents because the signal there will be far from optimal usually.


I commend you. Very few people actually try feeding images into these things to show how bad they are.

For a real shock at how poor performance is, try feeding frames of video in. (Video is different because it generally doesn't have carefully stereotypically framed, exposed, in-focus content).

Edit: examples of ResNet applied to video by a colleague http://blog.piekniewski.info/2016/08/12/how-close-are-we-to-...


Very informative quote:

So where do the reports of superhuman abilities come from? Well since there are many breeds of dogs in the ImageNet, an average human (like me) will not be able to distinguish half of them (say Staffordshire bullterrier from Irish terrier or English foxhound - yes there are real categories in ImageNet, believe it or not). The network which was "trained to death" on this dataset will obviously be better at that aspect. In all practical aspects an average human (even a child) is orders of magnitude better at understanding/describing scenes than the best deep nets (as of late 2015) trained on ImageNet.


>>This is not accurate however. Other important senses in use include proprioceptive, hearing and tactile feedback from wheels. In addition to vision and the improved dynamic range of eyes, there is the important fact that human vision integrates a world model into expectations. Human vision also models time and motion which help manage where to focus attention. Humans can additionally predict other agents and other things about the world based on intuitive physics. This is why they can get on without the huge array of sensors and cars cannot. Humans make up for the lack of sensors by being able to use the poor quality data more effectively.

Yes, but all those things you listed are very imperfect and can increase the risk of mistakes. For example, someone two lanes over honking their horn can cause a momentary distraction for you ("are they honking at me?"), causing you to not notice a cyclist cutting in front of you. And so can a dancing clown on the sidewalk.


Very much so and that's why we don't want our cars to perfectly replicate humans. More sensors plus a limited and narrow AI is better than just vision and a smarter AI.


> This is true but only in a limited sense.

Isn't what we need very limited anyway? What the cars need is recognition of obstacles and the type of the obstacle in a very limited range. Basically when something is on the road, it doesn't matter whether it's a moose or a deer - you slow down and avoid, or brake depending on the environment.

"what is in this picture" classifiers don't seem like a good algorithm to use in that case. Object detection / feature extraction seems to be much closer.


Eh. Is it clouds or a truck, or a shadow on the road or a moose? A tree besides the road or a cyclist) Hard to tell without really good classifiers.

I guess that with a lidar or some kind of 3d perception, you can relax the demands on the classifier a bit and ignore really flat things and the sky...


Object detection is considered a more difficult problem than image classification in the vision community. So I'm not sure what the point is there.


> The key difference is learning in animals occurs by breaking things down in terms of modular concepts, so even when things are not recognized new things can be labeled as a composition of smaller nearby concepts. Machines cannot yet do this well at all and certainly not as flexibly.

Actually, that's pretty much what deep learning is doing. For instance: https://papers.nips.cc/paper/5027-zero-shot-learning-through...

That paper was from a few years ago, I think the state of the art is better now, but it's trying to do exactly what you're talking about. More broadly, what you're talking about falls under the umbrella of transfer learning (that is, a model's ability to learn helpful information about task Y by training on related task X, preferably by learning and sharing useful features.)


I'm talking about learning modularly. Children can categorize things they've never seen before by inventing labels on the spot, they're not limited to selecting from a preexisting set (e.g. a lion is a "big cat"). They can recognize novelty and ask questions if nothing they know quites fit.

As a human, you are able to learn the general concept of leg and understand it, even if it is in a context you have never seen before and an object you've never seen or never seen used in that way before. Everything you learn, is also as part of a set of relations. Each of individual concept modified in a precise manner as you learn something new about any of them. A big part of human intelligence, from the simple naming of things, to the highest levels of science is taking parts of things you know and putting them together in novel ways.

In Neural nets this: https://arxiv.org/abs/1511.02799 is in line of what I mean.


Well, not with traditional feedforward networks (LeNet, etc.). You can't run the classifier and find tires, then wheels, and then a car; but you do get composition of features.


> not with traditional feedforward networks (LeNet, etc.)

I'd argue they are implicitly doing this.

> You can't run the classifier and find tires, then wheels, and then a car;

Why can't you run a classifier for tires, one for wheels, one for cars, then combine their outputs for a final classifier maybe based on a decision tree? You can train all the networks at the same time and it will give you a probability distributions for all 4 outputs (tires, wheels, cars, blended). What am I missing?


That would just be your opinion. It has not been shown. It's still an open research question over what neural nets are actually learning in their intermediate layers.

You're going to need large amounts of fine-grained labeled data for each category. You've also just manually determined some sort of (brittle) object ontology. What if there are only 3 tires? What if there are four tires on the road but no car? All sorts of edge cases, and all you've done is train a classifier for cars, not actually solved driving in any meaningful way.


Doesn't scale. You don't have N brains to compose every representation.


Agreed -- If you get away from traditional feedforward networks by adding recurrence throughout, then at least there is some chance of learning scale-free features and compositionality.


Why not shoot for better than human performance and "cheat" any way possible along the way? To paraphrase a quote I can't remember by who, do we care if a submarine "swims"?

Besides, with even with lidar, the problem is hard enough.


I agree, being better than human is a sales point that I expect to see in brochures. One of the ways I would expect that plays out is self driving transport cars for high value targets like world leaders and drug lords. "This car will respond faster, and more accurately, to get you to safety before a human driver even knew there was a problem."

That said, John stated that without LIDAR you couldn't adequately meet the environmental challenges and achieve good self driving.

Specifically "I'm still not happy with self-driving on vision alone ... There are too many hard cases for vision." which boils down to a disbelief on the imaging processing side of the pipeline where NVidia has been attacking using GPU type architectures to extract image information rather than generate it.

One way to evaluate how far the image processing pipeline has come is to look at research on how well it can classify images. And in that space, in the research, it is doing better than humans [1]. As I've said elsewhere I think LIDAR was a crutch that worked well to cover for weaknesses in classifying images, but I recognize that the crutch may no longer be needed (certainly Tesla and Nvidia are trying to make that case).

[1] http://www.eetimes.com/document.asp?doc_id=1325712


There is only a little evidence current image classification models outperform humans. The 5.1% number is just the number from one grad student who went through some ImageNet images himself - e.g. a single blog post (http://karpathy.github.io/2014/09/02/what-i-learned-from-com...). There really hasn't been a concerted effort to see how humans actually do on ImageNet.

I am willing to believe that a group of humans trained on ImageNet and without time constraints would be able to outperform the state of the art neural net models.


> One way to evaluate how far the image processing pipeline has come is to look at research on how well it can classify images. And in that space, in the research, it is doing better than humans.

Well, computer vision is doing better than humans on the ImageNet dataset, that does not mean it is better than human on a driving image dataset.


> One way to evaluate how far the image processing pipeline has come is to look at research on how well it can classify images. And in that space, in the research, it is doing better than humans [1]. As I've said elsewhere I think LIDAR was a crutch that worked well to cover for weaknesses in classifying images, but I recognize that the crutch may no longer be needed (certainly Tesla and Nvidia are trying to make that case).

I have worked on systems for automatic target detection and classification. Systems have a long way to go before they reach human-level accuracy in classification. Even on "artificial" images like radar return maps humans are still slightly better, and many systems pass the underlying processed data to the human operator for final review. Tracking and classification in a dynamic environment is hard. It's even harder when trying to rely on passive sensors to do it.


ML seems pretty bad at classifying things it hasn't seen before though. There are quite a few examples where an input outside the training data resulted in misclassification.

Humans may not always see a white truck in a snowstorm, but is computer vision going to see it either? Or will it pattern match the few visible parts as something else entirely? Or dismiss the truck entirely as noise?


I don't disagree, both humans and ML are bad at classifying things they haven't seen before[1]. However that reasoning doesn't disqualify either vision only auto driving systems or machine learning.

Both statements are true:

"Computer driven cars may crash, even fatally, when they encounter a situation that they do not recognize." and

"People driving cars may crash, even fatally, when they encounter a situation that they do not recognize."

The success criteria for self driving cars is that they can drive at least as well, in the common case, as the set of human drivers who are defined to be "good" drivers. And self driving is not invalidated by a computer's mishandling of an event that a good driver would also mishandle.

I expect that self driving systems will be differentiated by how well they handle the unusual cases so a Mercedes system might do better in an unusual situation than a Chevy system. And all of this discussion is orthogonal to LIDAR :-).

[1] http://puzzlephotos.blogspot.com/


There is a difference though - humans understand the surrounding state, computer vision is not quite there. It can recognize things, and in NVIDIA's case directly generates steering commands without going through the intermediate step of building a model.

Humans build models of the world, and such models allow us to predict the future to a little extent, and explain the reasons behind a situation. Humans can intuit the intentions of other drivers and the behavior of other objects. AI can't do that quite as well.


Also, making eye contact and being waved through. Humans are excellent at reading cues such as this.


You've hit on a key insight. Predicting the future (even by a bit) turns out to be a very powerful learning signal for building models of the world.

It won't work on a traditional feedforward neural network but if you have feedback everywhere it appears to work.


> The success criteria for self driving cars is that they can drive at least as well, in the common case, as the set of human drivers who are defined to be "good" drivers. And self driving is not invalidated by a computer's mishandling of an event that a good driver would also mishandle.

This comes with one important caveat: these are the engineering criteria. The criteria of public perception, unfortunately, may not allow for a computer driver that makes the same mistakes that a human "would have", because people tend to mis-estimate what they or another human "would have" done.


But the computer vision systems can be endlessly improved and merge experience from millions of cars, while human drivers accumulate experience from a single driver, age, and are eventually replaced by younger, inexperienced drivers.

Soon enough these systems will have data from encounters with far more varied situations than any single human will ever be physically able to encounter in a lifetime.


Not sure I see how that would work when there is no 3G signal. If a computer on-board a vehicle sees something it does not recognize when it's not connected to the Tesla HQ, what should it do? And even if it is connected, uploading video over 3G is too slow for the real-time classification needs. Right?


The notion isn't that they phone home to immediately ask about unknown data, but that they periodically feed back any unknown data (or, in the worst case, it gets extracted from a blackbox if a car has an accident), and receive revised models to process the world with.

No, this won't necessarily save you if you see some impossible scenario like a boat cruising down the highway toward you, but it does mean that the list of conditions the model can't respond to reasonably will rapidly trend toward "fewer than most human drivers", in the ideal case.

This will, invariably, have some unfortunate bumps when people discover real-world conditions that, for whatever reason, the model doesn't remotely have responses for (I wonder if they've trained it on e.g. an enormous wall of water, or tornadoes?), but that's why you don't claim it's an always-on self-driving system (e.g. you have to be ready to take over at any point), and arguably the error rate is still going to be lower than most humans to start with.


You need humans to label those "millions of experiences." The bottleneck is not raw video. You need humans to label that data. Otherwise it's useless.


No, you don't. You need to process enough of it to see how the majority of human drivers act in situations where the automated system currently would react substantially differently.


You've just described end-to-end-learning (e.g. the data is raw video/camera/radar/sensor data, the labels are steering angles, etc.)

No autonomous vehicle manufacturer uses end-to-end learning. The only one to claim to use it was Comma.ai, and we all know how that went.

All the autonomous car companies will manually label the camera images - e.g. given an image, draw boxes around where all the cars are.


Learning how to label camera images and how to respond to already detected features are separate issues. You can make the choice of whether to apply unsupervised training to either one separately.

This [1] blog post from Tesla explicitly claim that they are using unsupervised learning as one of their strategies to determine appropriate system response to specific detected objects in specific locations:

> This is where fleet learning comes in handy. Initially, the vehicle fleet will take no action except to note the position of road signs, bridges and other stationary objects, mapping the world according to radar. The car computer will then silently compare when it would have braked to the driver action and upload that to the Tesla database. If several cars drive safely past a given radar object, whether Autopilot is turned on or off, then that object is added to the geocoded whitelist.

> When the data shows that false braking events would be rare, the car will begin mild braking using radar, even if the camera doesn't notice the object ahead. As the system confidence level rises, the braking force will gradually increase to full strength when it is approximately 99.99% certain of a collision. This may not always prevent a collision entirely, but the impact speed will be dramatically reduced to the point where there are unlikely to be serious injuries to the vehicle occupants.

[1] https://www.tesla.com/blog/upgrading-autopilot-seeing-world-...


Classification is not the right metric to use here. Lidar doesn't classify the objects it's looking it, it just tells you the direction and distance.

Cameras can also gauge distance pretty effectively from parallax. Either using multiple cameras, or from the motion of the vehicle itself, or both. From this it should be possible to gauge where obstacles are and drive safely.

But NNs give the possibility of gathering much more information from recognizing objects. Information that Lidar systems don't have.


> But NNs give the possibility of gathering much more information from recognizing objects. Information that Lidar systems don't have.

Wouldn't you feed both the depth map from the lidar and imagery from the cameras into the neural network? I imagine that a variety of different sensors as input would make it easier to do classification. As an analogy, someone who has lost their sense of smell might have a harder time telling the difference between a clean sock and a dirty sock than I would.

Please let me know if I'm wrong here, but I assume that the depth information that can be derived from parallax is not a superset of what you get from lidar (I'm thinking about low light, glare, objects with complicated geometries, similar-colored objects obscuring each other, etc).


You could do that. The advantage of using purely cameras is they are cheaper and simpler, and don't stop functioning during bad weather.


You can use a model that gives you its classification uncertainty. Bayesian SegNet for example [1]. We may also adapt the legislation for how vehicles should look like we did to make human driving easier (ex: tail and side lights).

1: https://arxiv.org/abs/1511.02680


> Humans may not always see a white truck in a snowstorm, but is computer vision going to see it either?

So you put in your training and test dataset a bunch of such situations. At some point you've covered enough cases to extrapolate the rest.

Good testing is going to hunt for these blind spots and fix them. Fact is that it's already safer than humans, even with all its hidden imperfections.


What if that point is 20 years from now? What if every time Ford/GM/Toyota substantially changes the look of their cars, your classifier no longer recognizes them because all your data only has the old models in it. That's what people are driving at. Simply collecting more data is not enough to solve this problem.


At a certain point it's just about recognising an object which shares broad characteristics with a car rather than aesthetics. Eg it moves at the speed a car moves at, it's in the road, it's overtaking on the right hand lane. I would expect any autonomous car to be able to fail over to "this object is likely a vehicle I haven't seen before" given a strange car-like object being detected.


Great. Now the problem you've posed is no longer image classification. It's more like video classification or zero-shot classification! (neither of which are close to solved)


It doesn't seem like zero-shot classification to me. It still seems like image classification. You said:

> What if every time Ford/GM/Toyota substantially changes the look of their cars, your classifier no longer recognizes them

My answer was probably incomplete, but I took the above to mean that cosmetic changes to vehicles mean that classifiers no longer identify them as cars, and this detrimentally modifies the behaviour of the car.

Whilst it's trivial to envisage a scenario where your problem is solved systemically (sufficient training data for a new chassis released in advance or something), it seems like it would be possible to train based on "things we expect to see from any car".

As far as I know, that's how all of the existing methods operate. They seem to have a hierarchy for decision-making:

0. Is there an object around me which I need to consider? If not, continue to monitor for one whilst operating the vehicle within the parameters of road signs and conditions.

1. Is this an object which has a predictable path based on either its signals or the expected behaviour of a car on this part of the road / operating within the parameters set by the road signs I can see?

2. Is this an object which is operating safely despite not falling into category 1?

3. Is this an object which I need to take action to avoid?

Which is to say that it ought to be possible to "fool" a Tesla with a non-car object behaving in a similar fashion to a car. The Tesla sees an object, not a car.


"it moves at the speed a car moves at, it's in the road, it's overtaking on the right hand lane" is video classification, which is not solved. In fact, at least how you described it (you could probably change the problem statement to avoid this), this would involve an ML model that must learn a model of physics - also unsolved.

You've just specified a manually hardcoded set of decision rules. This is not machine learning, and is incredibly brittle.


I think we're talking across one another.

I had thought that in your original post you were agnostic about the methodology for identifying a car, but were remarking that, in a world where it's possible to do it using whatever form of classification, it would be possible to 'stump' any reliable model by modifying the appearance of a car. I'm observing that any model for classification almost certainly would not rely on aesthetics.

> You've just specified a manually hardcoded set of decision rules. This is not machine learning, and is incredibly brittle.

I'm pointing this out to illustrate that the technology already deployed to solve this problem does not get confused by aesthetics.


I was talking about deep learning. The comment I was replying to was making the specific problem seem as if it were easy. Certainly there may one day be a classification technique that does what you say will do. But you may as well have said there will one day be a perfect classification technique that will just perfectly output steering angles, end thread. What use is there in conjecturing about perfect unknown classification techniques? Not to mention that there is no guarantee such a perfect method would not rely on aesthetics. Even if the train set has more than just aesthetics (e.g. video of cars in motion) maybe this perfect classifier would just cheat and rely on aesthetics, you don't know.

So I'm pointing out the methodology you suggested is not currently feasible, or is currently widely considered by the community to be the wrong practical approach. Because theoretical solutions will not solve self driving cars.


> ... humans don't need LIDAR to drive, why should computers?

That being the case, wouldn't we be limiting self-driving technology to the same traffic-related death rates as humans? Maybe 10, 20% better, but still fundamentally close.

For self-driving cars to be truly successful, the death rates will need to be an order of magnitude better. An incremental improvement won't convince governments and the public at large to trust their lives to an algorithm running inside a black box.

To be an order of magnitude better, you'll likely need to go well beyond simply processing pixels, including LIDAR and other sensors.


>Maybe 10, 20% better, but still fundamentally close.

You are asserting that human drivers are essentially perfect, because in 80-90% of their crashes, the information necessary to avoid the collision just isn't available visually.

That seems like an incredibly optimistic view of human drivers.

Collisions happen because a driver does not look at, see, understand, or act appropriately on available visual signals. Or they are going too fast / following too closely for their actions to be effective.


A huge number of traffic deaths are due to alcohol. An autonomous system that's as safe as a sober human would improve safety by a factor of 2 or 3. Many of the other deaths are due to distraction, inattention, or slow reaction times. Get rid of those and you can probably see an order of magnitude improvement with something that is nominally "no better than a human driver."


Would you (while sober) get into a car driven by an autonomous system that was demonstrably more likely to get into a crash than the average sober, awake, healthy driver, but less likely to get into a crash than the average driver?

Honest question.

I don't think I would.


But you can't predict what would happen. Driving even healthy and sober, something could happen. Assuming you're in a more developed version of the self-driving cars, machine-learning has most likely come a long way since the beginning. The car/network of cars would have learned by now that the command: "Stay on the right side of the road." Doesn't mean to stay on the right side of the road but it's okay to hit a few cars or pedestrians." They would have learned, or have programmed in them, that hitting cars or people is not good. Machines don't have a moral sense, and hoping that they are not completely sentient, this means that they don't have opinions, meaning that if you don't like this guy, you can be a little rude to him. And my last point is that the network of cars, all communicating at once, would learn how to be safest. Done.


Good question. It would depend on the exact numbers, I'd say. I do sometimes ride in (or drive) non-autonomous cars more dangerous than the best drivers, after all.

I don't know how relevant it will be, though. I suspect that the fact that computers are always attentive, can react instantly, and follow the rules consistently will make them much safer very quickly. But we shall see!


Interesting question.

As long as you still have the manual option, it doesn't really matter. You can just get in and drive if you want to.

As the option becomes more common, obviously impaired driving becomes less common.


It matters if, for example, the car is an Uber and you aren't allowed to drive it.


Will Uber still stand when self-driving cars become common?


LIDAR is just another form of seeing, just not as we are used to as people but combined with cameras they two would compliment each other. Relying on only one is a fool's gambit.

LIDAR won't go blind from white trucks on sunny days. LIDAR won't suffer snow blindness or inability to track in conditions where humans don't see well, like heavy rain at night. You add in visual acquisition to fine tune what you are detecting if necessary; perhaps to read signs and tell what color the traffic light is, maybe even to see brake lights. To know a floating bag is just that and not a solid object, to see that road is washed out or such.


Actually in snow, fog, rain, and related humans (and cameras) can do pretty reasonably. Things like brake lights and running lights can be pretty severely distorted and you still know the approximate distance to the car in front of you.

Lidar on the other hand is a point source of light (not from the environment) and any distortion makes it less likely for said light to return to the sensor. So with stereo vision (or radar) you can get a relatively accurate distance for a car in front of you. With lidar some fraction of the returns will be bouncing off fog/snow/rain between you and the distant object.

Because of this disadvantage lidar based systems might suggest a slower safe speed, and risk rear ending from humans.


The problem is that humans do primarily (not solely) use vision to drive, but they have mental models about the other driver. I remember once I was at a red light and when it turned green, I looked at the oncoming driver (far away) and thought, "that guy is too into his music" and didn't accelerate. Sure enough, he goes right through the red and slams his brakes halfway through the intersection.


One of the central tenets of autonomous vehicles should be that they are BETTER than any human driver could be. Relying on vision because, well, it works OK for humans doesn't cut it in my book.


> The argument against LIDAR is just this in reverse, humans don't need LIDAR to drive, why should computers?

That is a terrible argument: birds don't need ailerons either.


I think you meant to say that birds don't need a vertical stabilizer to change direction. Bird wings have pretty awesome ailerons built into them.


Unless self driving vehicles drive better than humans, and LIDAR or 360° vision seems a requirement for that, they will not succeed.


article from yesterday:

http://spectrum.ieee.org/cars-that-think/transportation/sens...

hopefully this will make LIDAR more economically practical


> There are too many hard cases for vision.

Even worse, most machine learning vision approaches make the vision problem much harder on themselves. They do not treat the visual world as the dynamic physical interacting processes that give rise to it.

A disadvantage treating of vision as static frames of pixels is that deep feedforward networks have to "memorize" all the physical dynamical effects of e.g., shadows on textures.

Such systems cannot generalize well. A more promising approach is hierarchical systems with ubiquitous recurrent connectivity.


What's a hard case for vision? That's how our eyes work. LiDAR has problems with precipitation too.


Depth map extraction from vision in real time depends on accurate algorithmic merging of past frames, color gradients and motion vector extraction to come up with a 3D map of what's around the vehicle. Contrast this with LIDAR, which can present a depth map in real time by sending out an array of light pulses and timing how long they take to come back to the IP.

Which method is more likely to have implementation errors?


In a perfect world lidar is far superior. It's trivial to say "out of the 50M voxels returned, are any from where the car will be in 3 seconds". So lidar wins in the best case, maybe even the average case.

However in the real world where the car/sensors gets dirty/wet, and the air is filled with snowflakes, raindrops, or mist it gets much more complicated.

Even if statistically better there's value to acting more like a human driver. After all the roads are filled with human controlled cars and any deviation from human norms because of weather(like wind, snow, rain, fog, and blown sand) cause problems.


Humans don't use accurate depth maps to drive.

Machine learning is great at complex algorithms if done well.


Humans don't use accurate depth maps to drive.

Humans are very good at estimating distance from a combination parallax visual cues and experience. We don't need to have seen a specific model of car before to judge how far away it is with a high level of accuracy.


Machine learning techniques are also being used to evaluate depth using parallax and other cues.

And humans aren't good at estimating distance. They're good at estimating relative distance which is different. Some people claim we need LIDAR because cameras can't give us accurate depth information but humans don't use accurate depth information.

Are you saying these systems need to have seen a specific model of car in the past before they can determine its distance? Certainly I've seen systems that do not require that information.


Reminds me how much I hate the average mirror. They force me to lost frontal depth by focusing on sides. I tried driving while looking aside for a minute, it's impressive how much your peripheral vision is meaningless then, even at 5mph you're never sure how far the car before you is.


Humans can drive well with vision in only one eye, so it seems like even binocular vision isn't necessarily as important as some of those cues and experience.


We use non-binocular depth cues beyond ten meters or so.


Do the newer systems assume network connectivity to a backend processing facility?

Personally, I view each additional layer (network, someone else's data center, machine learning) to be something that can fail and put people in danger. Assurance via local brute force is much more reassuring for me.


I can't see a system where processing is done remotely as working very well, nor is it very necessary as the hardware requirements shouldn't be that high. Training could definitely occur in a data center, but that is done offline.


No. The current Tesla Autopilot v2 system has 8 cameras and must be able to react in milliseconds. They do the computation onboard with an Nvidia Drive PX2 computer.

https://www.tesla.com/presskit/autopilot#autopilot


So its not just 2 cameras? Fun, do you have a (entry level) link ?


Stereo adds just another layer of data to merge, yeah.

I should add a disclaimer: I'm waaaaay out of date. The last time I worked on this stuff was back in 2005 during the DARPA Grand Challenge. And while I'm fairly confident that the basics are still the same (the fact that we're still discussing monocular vs. stereo vs. LIDAR is testament to that...), I'm going to defer to someone more up-to-date to provide more recent research/documentation.


Yeah, me too; I was in the 2005 Grand Challenge. Vision processing has made enormous progress since then. Here's Mobileye's guy explaining what they do and what needs to be done.[1] Here's the NVidia guy.[2] Here's Chris Urmson from Google.[3] All three use their vision system to draw boxes around things that look like obstacles. Then the planner uses those boxes to construct a path. The vision system just gives you a rectangular box in image space; the LIDAR based systems turn point clouds into simple 3D solids.

These vision systems do OK on things that look like typical road obstacles - cars and people. Other stuff, not so much. Only Urmson from Google talks about that much. The videos from the vision systems don't seem to be doing much with road edge clutter. They're not marking every post, fireplug, and low-hanging branch. Train a deep learning system on cars, trucks, pedestrians, and bicycles, and your system sees only cars, trucks, pedestrians, and bicycles.

These vision systems do a lot less than many people think they do. Watch the videos.

[1] https://www.youtube.com/watch?v=GZa9SlMHhQc [2] https://www.youtube.com/watch?v=2NGnvGS0AtQ [3] https://www.youtube.com/watch?v=Uj-rK8V-rik


Aww yeah -- another Challenger! What team were you on?


I ran Team Overbot.[1] We were way overdesigned for off-road and underdesigned for going fast.

[1] http://www.overbot.com


Same for us -- speed was not a consideration at all. We figured that if we could keep going and finish, we'd be one of the top 3 teams.

As it turned out, our mechanical engineering was great. What got us was software: we failed to free memory for passed obstacles, ran into memory exhaustion issues as a result and crashed out at mile 9. When we fixed the leak and reran the course, the truck finished in just over 7 hours.

It was an object lesson that garbage collection won't save you from memory-related issues. In retrospect, it was the also first time I ever encountered a million dollar bug. D'oh.


What was the second time?


I ran across quite a few of them while working tech at an investment bank. Never anything that resulted in direct losses, but implied missed revenue -- definitely.


They are using deep learning and also other sensors may contribute to the depth map (assuming they have one).


There are a number of interesting examples in computer vision and psychology literature which document difficult cases for visual perception, see for example slides 14-19 in [0]. Ultimately, vision is a holistic process with edge cases that require knowledge of physics, human psychology, and so-called "common-sense reasoning" in order to resolve. Depending on how one phrases the objective of what a computer vision system should do, the problem can go from a tractable subset of automated reasoning to an intractable general AI task, often with very subtle changes to the problem statement.

[0] http://www.cs.toronto.edu/~urtasun/courses/CV/lecture01.pdf


Tesla is going a different way with radar+cameras. Lidar TODAY is too expensive for normal priced vehicles, google's solution is very expensive, volvo seems to be targeting large trucks which are less price sensitive. The price of the future volvo passenger cars hasn't been announced, has it?

Society seems hyper sensitive to different risks, even if lower than existing risks. Thus a tesla fire is big news, even if the rate is less than the normal gas car fires. Thus weaknesses of lidar (like say fog) could cause problems, even if safer than existing cars. This is complicated by the humans driving cars around. Imagine heavy fog on the highway, and that humans decide that 45mph is safe, and the telsa (with a camera+radar system) decides on similar. Lidar might well decide the safe speed is less, and get rear ended more.

Reference for the 3 rammed cars "at speed"? The one I saw the car in front slowed, then accelerated, and merged right before a stopped car. The telsa slowed, then accelerated, decided it wasn't safe to merge, and braked hard. It did hit the car, but not very fast. Not sure I would have done better myself.


Reference for the 3 rammed cars "at speed"?

Sideswipe of car stopped at inner edge of roadway in China: https://www.youtube.com/watch?v=rJ7vqAUJdbE

Rammed stopped or slow moving street sweeper at inner edge of roadway in China. Driver killed: https://www.youtube.com/watch?v=xoSNw_n1Xgk

Rammed stopped van at inner edge of roadway in Germany: https://www.youtube.com/watch?v=qQkx-4pFjus

Those are all the same design flaw - a big solid obstacle partly blocking the lane was hit.

These all have dashcam video on Youtube. One wonders how many more times this has happened without a dashcam.

This list doesn't include ramming the semitrailer in Florida, another fatal event. (Ref NSTB investigation HWY16FH018).

(There's some denial from Tesla fans, and Musk, about this.)


I suspect Volvo started first as well


I agree, everyone is using LIDAR in the sensor mix but Tesla. Cameras simple don't work in certain weather conditions like direct low sun light rays eg in the evening.

Ford, Volvo and others are using two (or more) smaller cheaper ($6k) LIDAR instead of the single big $70k one that everyone remembers from Google cars. And smaller cheaper LIDAR are around the corner. It seems Telsa isn't going the full self-driving long way at the moment but offers some package that doesn't fully deliver and is a risk on the road if the driver uses it in condititions outside of it's limited designated highway-style roads, but in inner cities or country side roads. I am surprised that Google doesn't deliver something or Mercedes who did research since the 1980s and if the other companies take that long, Volvo and other chinese owned car manufacturers will take the lead in the next years.


Super cool to see NVIDIA releasing hardware specifically meant to be a platform for building self-driving systems[1]:

"NVIDIA DRIVE™ PX 2 is the open AI car computing platform that enables automakers and their tier 1 suppliers to accelerate production of automated and autonomous vehicles. It scales from a palm-sized, energy efficient module for AutoCruise capabilities, to a powerful AI supercomputer capable of autonomous driving."

I hadn't heard of this before and with their purported pivot to an AI company, I can't wait to see what other platforms they develop in a similar capacity.

[1] - http://www.nvidia.com/object/drive-px.html


NVIDIA has been trying to pivot into a "services" company for a while, NVIDIA GRID/Gaming Cloud, computing etc. "AI" or at least fused sensor automation seems like a good place for them since they already have both the hardware and the software expertise for this.

NVIDIA actually gave up on even attempting to do console graphics again since they didn't want their pipeline to be suffocated by these draconian single customer contracts. What keeps AMD's graphics department alive these days, is exactly what would have prevented NVIDIA from pushing their business forward.


Well, that and AMD has x86-64 & ARM licensing and experience sitting right there in house, ready for Microsoft, Sony, Nintendo, etc to say what IP Blocks they want, with a 3 month lag until that custom chip is ready.

Nvidia did make what could have been an x86-64 chip, but had to can that since they couldn't get the relevant licensing from AMD & Intel to allow its production and sale, hence why everything they sell is ARM.


The new nintendo console/handheld have Nvidia graphics.


It's the NVIDIA Tegra SOC this is different than having to fill your pipeline with custom graphics for a mainstream console.

Nintendo just licensed the SOC from NVIDIA, the PS4 had a custom chip from the green giant, and AMD does custom APUs for the PS4 and Xbone,


I thought the reason for Tesla switching away from Mobileeye was that Mobileeye and Tesla couldn't come to an agreement on price and data licensing?

https://electrek.co/2016/09/15/tesla-vision-mobileye-tesla-a...

... and because Mobileeye wasn't comfortable with Tesla using their system for level 4 & 5 driving:

https://electrek.co/2016/09/16/mobileye-responds-to-tesla-ag...


Could someone who understands this space weigh in on how technically interesting this is? (Or isn't?) In particular, their research paper on "End to End Learning for Self-Driving Cars"[1] seems to yield a system that requires an unacceptable amount of manual intervention: in their test drive, they achieve autonomous driving only 98% of the time. But I have no real expertise in this space; perhaps this result is impressive because it was end-to-end or because of the relatively little training? Is such a system going to be sufficiently safe to be used in fully autonomous systems? Or is NVIDIA's PX 2 interesting but not at all for the way it was used in their demonstration system?

[1] http://images.nvidia.com/content/tegra/automotive/images/201...


It's incredibly freaking amazing if they are using deep learning to drive via mainly cameras only 98 percent of the time. No one else can do that. 98 percent is obviously a lot.


Thanks -- that answers the question! So fair to say that it's impressive because of the absence of LIDAR and/or other sensors -- and that by adding LIDAR to such a system one could presumably get towards 0% manual intervention?


The difference between 98% and 99.999% is very difficult to solve, and it's not going to happen in the next 5 years. LIDAR can't help, for example, with obeying a police officer gesture.


NVIDIA's Drive PX 2 has too high power consumption and too low perf/W for the moment. They're winning this space because they are there more than that they are the best possible solution.

And they may continue to win because successful execution of an 80% product is worth far more than a 90+% powerpoint processor cough TenSilica et al. cough, or because this is such a huge potential market, it might actually go to a successful competitor. 2018 and beyond will be very interesting.

For while it's really desirable have the deep learning equivalent of x86 assembly language (CUDA) across a full stack from training to inference, in the end, IMO cost will be king. I'm not a big fan of $150K high-end servers filled with $5000 GPUs that can be bested with clever code on a $25K server fill with $1200 consumer GPUs. But I am a huge fan of charging what you can while you are unopposed. It's just that I think that state is temporary.


>I'm not a big fan of $150K high-end servers filled with $5000 GPUs that can be bested with clever code on a $25K server fill with $1200 consumer GPUs. But I am a huge fan of charging what you can while you are unopposed. It's just that I think that state is temporary.

There is virtually not a single "enterprise" grade product which can't be made at least 50% cheaper (or sometimes 10 times...) with off the shelf consumer grade hacked hardware....

Enterprise products always have a pretty steep markup, but what you lose with those 1200$ GPUs is both features (e.g. virtualization, thin provisioning, DMA/Cuda Direct etc.) and support. When you buy a 5000$ CPU over a 500$ with the same performance what you pay for is reliability and support, if you don't care about that then fine, but when you need to launch a 100M$ service on top of that platform you won't really care about the price tag it's all in the cost of doing business.


Virtualization? Don't care, in fact, virtualization is what disabled P2P copies and created craptastic upload/download perf on AWS until the P2 instance.

DMA/CUDA Direct? Say Hello to P2P and staged MPI transfers, faster, cheaper (and usually better). Know your PCIE tree FTW.

Support? As someone who has been playing with GPUs for over a decade, bugs get fixed in the next CUDA release no matter Tesla or GeForce, if ever.

$100M service? Yep I'm with you. But I prefer a world without a huge barrier to entry to building that service, especially a barrier built 99% on marketecture. I want to build on commodity hardware and deploy in the datacenter.

Unfortunately, sales types seem to hate that outlook.


I don't understand your argument, so maybe this is off base, but if you are saying people in industry aren't replacing their supercomputers with commodity gpu's, you're wrong; both apple and google have massive purchase orders for commodity nvidia gpus because they aren't just cheaper, they are better at this application. And I imagine other companies are as well.

Edit: "replace" is probably not the right word, this is work that the old systems don't do well, but they aren't throwing out x86 racks for gpus of course. It's just instead of buying more of the same for machine learning applications.


They aren't buying consumer GPU's they aren't buying the NVIDIA dedicated servers, but they aren't running Geforce chips either.

If nothing else is that because you cannot virtualize Geforce line GPU's, there is no CUDA Direct or NVLINK support etc.

If you are telling me that Google is buying Geforce GPU's and flashing the bios with a custom bios ripped off a Quadro card so they can do PCIe passthrough in a hypervisor and initialize the cards then sorry not buying it.


While I agree that Google is not buying GeForce GPUs, their general use-case for GPUs does not require virtualization.

They use containers to isolate and throttle different tasks/jobs running on the same hardware.

At their scale, virtualization would be significantly wasteful in terms of manageability and overhead.


I think we have different meaning for virtualization when it comes to GPU.

I'm not talking about running virtual OS, I'm talking about things like rCUDA, GPU direct and RDMA.

But still even for their containers solution they need support for gpu passtrough and vGPU if not they can't run containers.

NVIDIA doesn't allow you to run GeForce cards over a hypervisor.


Containers would imply there is no hypervisor involved, only a dri device exposed by the kernel and bind-mounted into the namespace. You would still need support for multiple contexts but that doesn't require multiple (virtual) PCI devices or an IOMMU.


> both apple and google have massive purchase orders for commodity nvidia gpus

source?


Rather than attempt to out these downlow Ge Force deep learners, why don't you ask yourself why you can only buy Titan X Pascal from NVIDIA itself.


Wait, who cares about the power consumption in a car?

perf/W is a useful number if you are trying to stack a data center full of these things and electricity (incl for cooling) is essentially your only cost. But if a car is using 300Wh/mile or, in an ICE, generating 100kW in excess heat it is an entirely pointless metric.

(Just for clarification: no one is using the Drive PX2 in a data center. Just look at these connectors:

http://images.anandtech.com/doci/9903/NVIDIA_DRIVE-PX-2.jpg )


Informal anecdotal knowledge: GM wants 10-20 watts for the entire self-driving system, sensors and all.


Well, they are going to be disappointed. We all want a small black box we can stuff under the steering wheel and have it draw zero power so we don't need to cool it or pay for the extra thick cabling.

But in the meantime, where we haven't actually solved the problem even when the trunk is full of racks with high-end boxes, it seems silly to mandate that, in the classic GM "compromises safety" sense.


GM isnt exactly the shining beacon you should follow and listen, they did get scammed out of $1B on Cruise after all.


Is 10 watts the correct figure?

The powertrain uses 300+ Watts, so it seems like a manageable number if it is correct.


That's a big "+" on your figure - it's about 300 Watt-hours per mile, or 3600W constantly, if you're going 60 mph. For comparison, one horsepower is about 750 watts - a Tesla is a very low-drag car.

Anandtech reported [1] that the TDP of the whole board is around 250W. The Tegra SoCs are probably around 10W, but that doesn't get you much in the way of GPU horsepower.

1: http://www.anandtech.com/show/9903/nvidia-announces-drive-px...


Yeah, I screwed up the math.

Point stands though.


I don't think it does still stand when you were off by an order of magnitude.


Yeah, in the wrong direction, I underestimated the powertrain consumption, so the power cost of the computer is relatively even smaller.

The point I made missed the point though, it apparently isn't the energy consumption they are concerned with, other people in the thread are pointing to limiting/getting rid of the heat being the main concern.


Seems indeed, and in fact an A/C will eat a kilowatt or more, but...

The components of this system need to generate next to zero heat and they need to be placed in all sorts of inconvenient locations. That's causing automakers to desire extremely low-power dedicated circuitry over GPUs.

Consider as an example the C7 Corvette: despite its enormous blindspot, it still doesn't have blindspot indicators because they can't find any way to cram existing sensors into the thing. There are more examples, that's just the one with which I'm most familiar because I balked on purchasing one over this despite an insanely great deal on it at the time.


I don't believe that. Today's cameras are tiny and the C7 isn't a particularly small car.


One thing I haven't seen discussed:

Many times I'm driving and have the police wave me through a traffic light.

Would a self-driving car realize what's happening?

What about a driver waving me to get ahead?


It's not a solved problem but it's something that people are working on.

https://youtube.com/watch?v=8aEWHdduPwc This is the Mercedes car that communicates to pedestrians about the vehicle's intent.

https://en.wikipedia.org/wiki/Vehicular_communication_system... Something like this might feature prominently as well.


The headline reads confusingly to me. It sort of sounds like it's saying the system is five years in future but what he meant is that Telsa itself is five years ahead of the competition in this area. (Off-the-cuff verbal speech is often a bit hard to follow when written down exactly word-for-word)


How well would a modern car do in the darpa grand challenge? I'm curious how far we've come.


I doubt a self-driving car without a LIDAR would make it - on the same "test road" as in 2006. But it shouldn't be a problem for others like Google, Volvo, Ford, etc

We need an independent review of self-driving cars in a few years. It will be quite interesting how good they really are in different driving situations like different road and weather conditions. Say good buy to only-camera+radar based cars on a snowy road with bright winter sun (low sun rays) or heavy rain in a dark foggy night.


First, I'm all for AI-based transportation solutions, but why does it seem like there aren't nearly enough redundancies being considered? There are just going to be competing proprietary solutions?

In the US, the NHSTA should get ahead of things and push for open standards and potentially for some level of development of standard safety features like being able to set some form of material or device on objects to mark them in ways that can transmit specific information about objects like other cars, fixed structures, etc.

I could make 10's of millions selling stickers or paint additives that would mark a human-driven car's edges to help "protect" it from automated vehicles' AI.


Not only is NVIDIA getting those three things that Huang mentioned, but they are also planning, and maybe getting, a monopoly. Think, rapidly switching from chip-making to AI, before anybody else, but not for self-driving cars. Then, when Tesla decided to start self-driving cars, NVIDIA hopped in and now Tesla is going to use their Drive PX2 supercomputer. This is a smart move by NVIDIA, after all, they're five years ahead!


The PX 2 is kind of a cool computer. 8 teraflops, 250W, liquid cooled. I imagine it would run a good bit less than that most of the time. 8 teraflops is about 8% of Moravecs estimate of brain equivalence so assuming you use ~8% of your brain driving it may be about right.


The brain doesn't do floating point operations. You're trying to compare apples to oranges.


Yeah but Moravec's arguments were based on the rough processing power he found necessary to achieve equivalent performance in the simple robots he was building. So it's a two apples equals one orange for making fruit salad kind of estimate.


Noob question here. How does multiple LIDARs not interfere with each other?


Question: Does NVIDIA do remote jobs for engineering positions?


This article seems to relate mostly to the computer vision problems associated with self-driving systems, but what about weather conditions? Is this a solved problem?


Weather conditions are a sensor / computer vision /control problem. It is far from solved.


I would love to see a betting market give a prediction about whether a Tesla car sold today will have full self drive capability by 2018.


Very cool: with capable CUDA device in a car, one should be able to mine some crypto coins with it! With some luck the car should be able to pay for itself!

Alternatively, having in mind Elon's creative approach to finances, Tesla could get some significant hashing power with its fleet!

Edit: looking at the downvote I suppose the joke wasn't obvious?


The idea of ubiquitous and powerful computers in eveyrone's possession contributing to some global processing while they're idle, is actually fascinating.


It was, until most of those computers got a power-saving mode -- electricity is not free. Also, note that some things (like bitcoin mining) are so specialized that they run poorly on general-purpose cpus and gpus.

Still, there is a thriving set of "do this computation at home" projects: Folding@Home, SETI@Home, ...


Well now billions of people have phones, tablets and laptops combined, so each of them can contribute some negligible use of their CPU while they're plugged in and sleeping, and they'll still add up to a huge difference to various projects.

I think something like this should be built into iOS/Android. You choose a single project to contribute to while your devices are idle, and you get some standardized currency/points based on how much computing you've donated.


In the 9years I have been on HN I have up voted 3 really good jokes and downvoted thousands of so so ones. Before you post a joke think is it that good. Because, jokes are like spam on message boards they add clutter without aiding the discussion.


HN is a humor free zone and I for one appreciate it!


Wow, really? I just don't see how this is a spam. They definitely add some value, unlike some other stuff out there.


I thought it was kind of funny but to be honest I couldn't tell you were joking. It's hard not to assume people are just dumb.


Who thought making decent autonomous cars was going to take as long as 2020? I mean for regulations etc maybe, but not for the tech. Earlier this year I expected it to be done later this year.

Edited for clarity (I hope).


As someone not living a place with huge, straight highways with perfect asphalt, I haven't seen anything making me believe it will be generally available soon.

Right now I'm in the mountains. Narrow, icy roads. Thick snow fall, making some of the sensors already available unreliable.

The Google approach is to map everything before hand, and mostly easy conditions. Tesla somewhat more general, but still not too difficult roads. Volvo has been the only one I've seen with something that might tackle these issues.


Audi had a car that would drift up Pike's Peak years ago. http://www.autoblog.com/2015/02/16/stanford-audi-tts-thunder... Ford is making headway on snow and rain http://www.dailymail.co.uk/sciencetech/article-3399315/The-d... Last I checked, this was mostly a sensor problem. The dynamics of driving in slippery conditions is pretty well-studied.


Many people. And I still don't expect to see general-purpose (secondary roads/cities, range of weather conditions) for decades.


This exactly. I did robotics in the late 90's and it's interesting to see how many fundamental problems still remain unsolved. And the trajectory of tech advancement is positive for sure, but not _nearly_ as fast as popsci and marketing articles suggest.


It happens a lot.

Hard problems are rarely just one hard problem, but until you solve the first hard sub-problem you won't even know what the other hard sub-problems are, let alone what proportion of difficulty each contributes.

It's like climbing a mountain in the fog: you've found the way forward and if it holds out you can extrapolate when you will reach the top, only to find you dead end at the face of a sheer cliff...it will be awhile before you find the next way forward.

It's progress but the destination remains further than it seems.


What fundamental problems remain to be solved for self driving cars?


Vehicle detection: detect any vehicle, even from the side, even if its shape is rare, even at dark, etc. This could be solved with data.

Control: when to yield, without watching the face of the other driver, etc.

Edge cases: obeying a police officer, yielding to an ambulance, cooperating with other cars.


And the non-technical side.

Liability: who is liable when the car kills a pedestrian?

Driver engagement: How do you safely transition from automatic control to manual control.

Maintenance: Will the manufacturer be obligated to provide software updates for the life of the vehicle? Even if that vehicle is 20 years old? Even if newer software is 10x less likely to kill a pedestrian?

Cost: Are the costs one-time or will there be a maintenance fee?

Licensing: Does the manufacture have the right to disable functionality after the purchase? Do they have a right to your data? Can they sell that to your insurer?

Regulation: Who certifies systems? How do they test them? When is a system "good enough?"

All of these things sound trivial compared to the technical challenges but it's these kinds of non-technical challenges that killed the small-airplane market in the US and are still unresolved 65 years later.


Nvidia mentioned identifying humans, dogs, parked cars, trucks, street signs, street lights, etc better than a human.

On their todo list was identifying kinda of cars (like a police car), kinds of trucks (school bus and ambulance), and acting appropriately.

I've seen google mention responding to hand gestures from bicyclist. But also being exceedingly polite and repeatedly stopping as a track standing bicyclist rides slightly backwards and forwards.


Not to mention that, since you did robotics in the 90s, there might just be another point where the entire field of AI falls off a cliff, again.


On the one hand, I'm inclined to think not. Having said that, I do wonder if the rapid improvements (and sometimes impressive results) in certain narrow domains are misleading a lot of people to assume that all that's needed is continued incremental improvement.

As a case in point, while voice recognition has indeed gotten quite good with the right kind of microphones (e.g. Amazon Echo), the natural language processing and other "smarts" to use that as a basis for even a subpar by human admin standards digital assistant seems to still be reasonably far away. And that doesn't even require interfacing with the physical world.


Why would secondary cities be an issue ?

They seem big enough to incentivize mapping costs.


I was referring to secondary roads (not limited access) and cities generally. Sorry I was unclear. Major cities would actually seem to be the bigger challenge than smaller ones in general.

Detailed and current mapping will likely be a necessary part of fully-autonomous systems at least initially. But there's nothing especially difficult about doing that mapping. It's just a question of economics.


In general, people tend to overestimate technological advancement in the short term.


That's kind of the opposite of the point of the article.


One thing I hope the engineers are addressing: Google Maps recently directed me down a cobblestone street that required me to slow to a crawl. If an autonomous system attempted to maintain the speed limit on that road, I'm not sure the car would make it without anything being broken.


What "it"?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: