> Similarly here, if people use data and inferences they can make with the data without any concern about error bars, about heterogeneity, about noisy data, about the sampling pattern, about all the kinds of things that you have to be serious about if you’re an engineer and a statistician—then you will make lots of predictions, and there’s a good chance that you will occasionally solve some real interesting problems. But you will occasionally have some disastrously bad decisions. And you won’t know the difference a priori. You will just produce these outputs and hope for the best.
He is not saying anything about the relative heft of machine learning and civil engineering. He is saying that if you don't worry about whether your predictions coming from big data are accurate, and whether you know a priori that they are accurate, you will still make predictions, but some of them will be wrong, and you don't know which ones. The analogy with engineering is only incidental to his point, which is mainly about overfitting.
You can point out afterwards that a certain prediction made using big data was correct in hindsight by collecting data after the prediction was used to make some decisions, like Amazon might. But you would really like to know whether a decision is likely to be a good one before you make it. And he, as a scientist, is interested in knowing for sure whether his results are correct.
> Similarly here, if people use data and inferences they can make with the data without any concern about error bars, about heterogeneity, about noisy data, about the sampling pattern, about all the kinds of things that you have to be serious about if you’re an engineer and a statistician—then you will make lots of predictions, and there’s a good chance that you will occasionally solve some real interesting problems. But you will occasionally have some disastrously bad decisions. And you won’t know the difference a priori. You will just produce these outputs and hope for the best.
He is not saying anything about the relative heft of machine learning and civil engineering. He is saying that if you don't worry about whether your predictions coming from big data are accurate, and whether you know a priori that they are accurate, you will still make predictions, but some of them will be wrong, and you don't know which ones. The analogy with engineering is only incidental to his point, which is mainly about overfitting.
You can point out afterwards that a certain prediction made using big data was correct in hindsight by collecting data after the prediction was used to make some decisions, like Amazon might. But you would really like to know whether a decision is likely to be a good one before you make it. And he, as a scientist, is interested in knowing for sure whether his results are correct.