This is an intellectually confused post. It is so unclear and muddled that only a charitable reader would suppose the author understands the mathematical and modeling issues here. I’m surprised the author published it in this form - you don’t want to jump into a controversy like this with such an unclear discussion.
As just one example, the whole digression on a “decision boundary” is conceptually mistaken. Once you report a posterior probability, it’s up to the user to establish a simple real number threshold for placing a bet (if we’re talking about means to put skin in the game). The posterior you just learned holds all information, your only action is to threshold it. That’s the Neyman-Pearson lemma.
If you allow generalizations to interval-valued probability, which might be sensible under these conditions, the situation gets more complicated. But the writer of the post did not mention this.
Another place this came out is his initial discussion of aleatory vs. epistemic uncertainty — this can be done clearly, but here we read:
> Aleatory uncertainty is concerned with the fundamental system (probability of rolling a six on a standard die). Epistemic uncertainty is concerned with the uncertainty of the system (how many sides does a die have? And what is the probability of rolling a six?).
This is just not helpful. He has said “probability of rolling a six” twice.
Agreed. The more I read the more I got embarrassed for the author, mainly because his points were so ridiculously flimsy. I thought this was one of the worst:
> Because FiveThirtyEight only predicts probabilities, they do not ever take an absolute stand on an outcome: No ‘skin in the game’ as Taleb would say. This is not, however, something their readers follow suit on. In the public eye, they (FiveThirtyEight) are judged on how many events with forecasted probabilities above and below 50% happened or didn’t respectively (in a binary setting).
Saying "this is not something their readers follow suit on" - only if you're a reader who doesn't understand probability. In fact, Silver is constantly trying to get across how likely < 50% probabilities are with his football analogies (e.g. "a team down by 3 at the half actually winning") to emphasize how likely something like "20%" actually is.
If anything, I love Silver's reality-based approach exactly because he does acknowledge that all he can do is offer a best estimate based on available evidence, instead of saying things with false-certainty and then basking in self-proclaimed genius when luck happened to run in his favor.
> basking in self-proclaimed genius when luck happened to run in his favor.
I have read read Black Swan and I think Taleb's central thesis is there are valid epistimic gaps in every model, and that suggest there is at least one family of hedge strategies where the goal is to think as creatively as possible about what we don't know, but what could be, so as to reframe that lack of knowledge as a plan to go tromping around in a jungle looking for holes. But to do so requires wide-ranging knowledge across many domains, which is hard to develop.
This is a smart observation. There has been some theoretical work on the intersection of game theory and uncertainty quantification [1]. One part of it is a mathematical take on what you described - you have a system which is not pricing risk correctly, due over-confidence in model predictions. If so, there should be money to be made in placing bets on that system.
"Saying "this is not something their readers follow suit on" - only if you're a reader who doesn't understand probability."
Unfortunately that is the more common case, and I agree on the author on that that most people take a binary stance on polls.
It should not be the case for regular readers of a website whose main topic is statistical analysis, although I imagine that just before a major election they would see a large number of non-regulars just to check the polls.
Fine, but Silver goes to extreme ends in my opinion to explain what a probability means. During the 2016 presidential election he was emphasizing "Hey, it's less likely than the alternative, but there is a real possibility Trump will win. There is a very plausible path for him to wrap up the electoral votes." And afterwards, I think the results showed how accurate Silver was - he gave a much higher probability to Trump winning than anyone else IIRC, and the outcome was pretty much that: An unlikely probability (many of the deciding states were very much within any possible margin of error) that actually happened.
I was following the website frequently at the time and I fully agree.
Bur I was just commenting that that the author is spot-on at least in that many readers do not take these explanations into account and they see just 28.6% of winning chance and they automatically convert it to 0% in their mind.
And the explanation is that they don't understand math and probability (or don't want to), whether the author likes it or not.
I still don't get why the author thinks it's OK to assign blame to 538 for readers interpreting things incorrectly when Nate clearly spends a significant amount of time trying to educate the readers on how to interpret.
It makes no sense to me to hold 538 accountable for readers interpretations of the data. If I wander into some realm of science I'm unfamiliar with (say, quantum mechanics) and start making false inferences from it, does that mean the scientists were misleading me?
I also listen to their politics podcast, and they constantly talk about how to prevent people from rounding up an 80% chance to a 100% chance. One change they made this cycle was to try say 4 out-of 5 chance (which is mathematically the same) to convey both the lack of precision and to make it more intuitive. I think it helps. Personally, when I talked to people about election outcomes, I phrased it as "Based on what I know, I will be surprised-but-not-shocked if x happens, and shocked if y happens" to differentiate between, say, a 20% chance and a 1% chance.
Yes, and he admitted to misleading with fake math:
> But it’s not how it worked for those skeptical forecasts about Trump’s chance of becoming the Republican nominee. Despite the lack of a model, we put his chances in percentage terms on a number of occasions.
So it would be more accurate to say that he was careful during the second half of the election, after making significant mistakes (both mathematical and ethical) during the first half.
Even more confidence-inspiring: before the election, they had discussed what a path to victory for Trump would look like, and their most likely path is what ended up happening: the midwest went for Trump, in a correlated manner, which is exactly what other forecasters had failed to account for properly.
If your model can help you predict how it's going to fail, I'm fairly impressed.
Do you have dates for those quotes? I suspect many (if not all of them) refer to pre-nomination model. Silver explicitly addressed how he messed up the primary model by not trusting the numbers in his 18 May 2016 blog post "How I Acted Like A Pundit And Screwed Up On Donald Trump: Trump’s nomination shows the need for a more rigorous approach."
Yes, they were pre-nomination articles. At that time Silver was acting in the way an earlier comment in this thread said he does not act:
> saying things with false-certainty and then basking in self-proclaimed genius when luck happened to run in his favor
Something seems to have gone wrong at 538 after their stellar performance in 2012. In 2016 even their Congressional predictions were only 90% accurate. Or perhaps 2016 was simply different from past elections.
> "So in the House we have Democrats with about a 4 in 5 chance of winning," Silver told ABC's "This Week."
> "But no one should be surprised if they only win 19 seats and no one should be surprised if they win 51 seats," Silver added. "Those are both extremely possible, based on how accurate polls are in the real world."
Dinesh D'Souza jumps on him for hedging his bets:
> And just like that, an 80 percent chance for Democratic takeover of the House goes to 50-50.
Nassim Taleb also jumps on him in the same way:
> @DineshDSouza is right: you don't change a forecast from 80% to 50% under uncertainty. Second time klueless Nate Silver makes a mistake.
> When someone says are event and its opposite are extremely possible I infer either 1) 50/50 or 2) the predictor is shamelessly hedging, in other words, BS.
> 4- Things are actually worse: that @NateSilver538 doesn't get that if BOTH X & Non-X are "extremely possible = Probability converging to 50-50 is EXACTLY my problem w/his misunderstanding of probability in forecasting, & point of my paper.
> 5- FOR THE RECORD I am not translating @DineshDSouza's point that it was 50-50: I ALSO understood ~50-50 DIRECTLY from Silver's "extremely" in the linked "The Hill".
The Twitter fight happened on Nov 4, if you want to go look.
In this article, the author notes that 538 only presents its model to readers in terms of probabilities, but the public tends to round their predictions up and down.
> This is not, however, something their readers follow suit on. In the public eye, they (FiveThirtyEight) are judged on how many events with forecasted probabilities above and below 50% happened or didn’t respectively (in a binary setting)
This isn't the public's fault, says the author, it's 538's fault for not clearly presenting the uncertainty in the prediction.
Additionally, the author writes, the 538 polling-based model swung too wildly in the 2016 election. In doing so, it failed to present a single prediction on which it could be judged. It also failed to acknowledge that unforeseen events could change the polls.
Personally, I'd argue that the swinginess of the conveyed the uncertainty of the polls. I spent the last month of 2016 citing Nate Silver to argue on Facebook and Reddit that people were overestimating the certainty of the election forecasts. Fifteen percent of the electorate was up for grabs with just two weeks left in the election! (about 7.5% undecided; and 7.5% saying they'd vote 3rd party)
> This is Taleb’s primary argument; FiveThirtyEight’s predictions do not behave like probabilities that incorporate all uncertainty and should not be passed off as them.
Nate Silver is being blamed both for trying to convey to the public that his predictions involve uncertainty and for the public not perceiving the uncertainties in his predictions.
Silver's point is that their model results in a probability distribution of outcomes and, at the time he was commenting, in 10% of the cases Dems won 19 or fewer seats, resulting in a GOP majority, and in 10% of cases Dems won 51 or more seats, resulting in an overwhelming Dem majority.
That those extremes were approximately equally likely (in CDF terms) tails is willfully missing the point that the vast majority of the 80% of outcomes in the middle between those relatively fat tails involved Democrats winning, which is why the prediction was a 4 in 5 chance of Dems retaining control.
By missing that rather obvious point, D'Souza unsurprisingly comes off as a moron and Taleb also fails to cover himself in glory.
Yeah, this is the key point. I was shocked when I saw the linked twitter posts from that exchange (having not followed it at all).
Just based on that snapshot my take is even more grim. Taleb surely knows how probability distributions work. The fact that he sneeringly dismissed basic reasoning like that in the political context he did suggests not cluelessness but malice.
One thing we do know is that Taleb is very smart and very good at making money. And right now there's a lot of money to be spent pandering to the dumb nihilistic and pseudo-intellectual wing of the political right; indeed, the same people that D'Souza has been pandering to for decades.
And beyond that, the other reason I attribute malice to Taleb is that he is just such an asshole. D'Souza of course, he's a political hack, but I feel that Taleb could have easily made his points without so obviously willfully misconstruing what Silver actually said, and without the Trump-esque name calling. If he's so smart he should let his ideas state on their own without using the cheap "let's have a Twitter war for views" tactic, which was obviously intentional.
I normally like Taleb's insights but it was obvious when following this that he (and Dinesh D'Souza) were ridiculously misconstruing Nate Silver's comment. Two outcomes being "extremely possible" in no way means they have a 50/50 chance of happening. It merely means that neither outcome has a low enough chance that you should assume it won't happen.
If you say that X is extremely possible does that mean that X is unlikely to happen (probably won't), or that X is very likely to happen (probably will)?
I'd never come across anyone using this phrase until Nate Silver did. Is it a US English thing?
It doesn’t mean that a broken heart will definitely kill you, just that there is definitely a chance that it will kill you.
I could totally imagine Nate trying to emphasise the idea that 20% is a much bigger probability than say 1% by using language like that, but put too much weight on it here leading to confusion.
> What on earth is a professional like Silver doing using the word "extremely" in that context?
Extemporizing while being interviewed on TV, presumably to stress the uncertainty in the prediction, is what he is doing, not writing an HN post, tweet or paper.
I suppose that whenever you are interviewed on national TV, you only say exactly what you mean, nothing more and nothing less?
As the blog post rightly said, they present 538 present the forecast in a way that they are never wrong. I can see the point. I also fond the comparison of NFL vs. Senate forecasts meaningful. This kind of "forecasting" seems to be the current version of crystal balling and astrology. They (models) don't really know anything (no causation, and definitely nothing close to an actual model of the real world like in physics). It's a bit like high-performing fund manager sand CEOs who fail to replicate their successes after they were crowned "person of the year", depending on what exactly they forecast a bit better than that because even without having any models that model actual causation, since the world is pretty stable overall (compared to what the universe could throw at us) even correlations may hold for a while.
As for the "50/50" here, I don't think this is meant as being exact numbers (after all, the whole point of the issue is that those exact numbers don't really tell you anything, if anything still is possible and any outcome can be justified later), but simply as the common usage of a phrase in the vernacular for "we don't know either way".
> As the blog post rightly said, they present 538 present the forecast in a way that they are never wrong.
The whole point of probabilistic modeling is to replace absolute decisions like "right or wrong" by continuous weights on the possibilities. If you absolutely need a definite decision, you can sample a prediction according to the probability assigned by the model. If the true outcome is x and the model assigned it probability p, then that procedure is going to be wrong (1-p) of the time. You could define that number as the "wrongness" of the probabilistic model, as a continuous analog of the definite case.
The advantage of probabilistic modeling is that you can also ask how wrong the model expects to be and get a meaningful answer. If there are many possible outcomes and none of them very likely, any choice is going to be wrong a lot. But you should expect a good model to have a small difference between its expected and actual wrongness. One might call that value "honesty".
The whole point about the current topic as well as of my post: You missed by about a thousand miles. Please read it again. It's really pointless to argue about a strawman created by you. Your model is useless, that's the point! It makes no real(!) predictions - not usable for anything apart from blowing ever more hot air, and if it doesn't come to pass, you are never wrong because you left the door open by not actually saying anything in the first place.
Do you not understand that the guy/his company did nothing at all? And that giving some arbitrary probability was/is utterly devoid of any meaning (especially if you can't be wrong whatever the actual outcome)? They could have made any prediction at all, what difference would it have made? That is the value of that "work".
However, I realize there's people who like such meaningless drivel. It is a version of appearing to actually do something while not actually doing anything. You make it into the news but you can never be held accountable because whatever happens happens, you just helped create a few more entirely useless headlines (apart from helping with page views and ad impressions of course). It's actually quite ingenious to misuse actually useful tools like statistics.
So, and now go back and read what I wrote until you understand it. There is no point to this whole thing (apart from creating clicks for the attention industry). It has no impact (again apart from creating clicks). There is no outcome that hinges on anything. They can claim whatever, whether they are right or wrong matters for absolutely nothing. The creation of attention and clicks is completely independent - it happens before the "forecast" event. Great business where outcomes don't matter.
It's a link about how we can assess how good a job someone is doing at making probabilistic predictions. I had (maybe wrongly) assumed the relevance was obvious.
Using the phrase "extremely possible" as an expert is either a gross mistake or intentional misguidance.
"Possibility" is like "optimality" in the sense that they are binary attributes and thus don't admin grades. Something either is or isn't possible. Qualifying something as "more possible" is a mistake, and qualifying it as "extremely possible" is just nonsense.
A guy who lives from probabilistic analysis should now better.
I suspect the main reason he was attacked by Taleb was this misuse of language, which can be misinterpreted as meaning a 50-50 chance. Like others have said, he could have said "far from impossible" or "entirely possible" to convey that he meant the estimate had a large variance. Then again, I think Taleb should have realized this and not attacked him for simply failing to convey variance correctly. This seems like another example of Twitter bringing out the wrost in people...
Apparently there's people who have hard feelings in this argument too.
I could understand people downvoting my comment (it may be interpreted as harsh even if that wasn't my intention) but... yours? Something weird is at play here.
Maybe it's just people being emotionally invested in Nate Silver being right (because they defended him publicly before for example), so they interpret any criticism of Nate Silver as a personal attack.
>In this article, the author notes that 538 only presents its model to readers in terms of probabilities, but the public tends to round their predictions up and down.
The only way you could get "50-50" from that is if you read "the Democrats get 51 seats" as "the Democrats win the House", because without that inference, he didn't say anything about the chance that they win the House. He compared a very lopsided win to a slight loss, and those were equally likely.
> He compared a very lopsided win to a slight loss, and those were equally likely.
I'm not arguing with you, since the one thing I know is that there's no surer way to making a mistake than jumping in confidently on probabilistic issues, especially issues of interpretation rather than of pure mathematics; but I think you must have misspoke, or I must have misunderstood. The quoted sentence says 'extremely probable' (an absolute condition), not 'equally likely' (a relative condition):
> "But no one should be surprised if they only win 19 seats and no one should be surprised if they win 51 seats," Silver added. "Those are both extremely possible, based on how accurate polls are in the real world."
Not "extremely probable" -- it says "extremely possible", which is a malapropism or a new phrase. It's easy to mistake because it has no clear meaning. The comparison, though, does not appear to be D's win vs D's lose, but D's win by more than X vs D's lose, and the latter clearly (imo) suggests D is favored to win.
> Not "extremely probable" -- it says "extremely possible", which is a malapropism or a new phrase. It's easy to mistake because it has no clear meaning.
Indeed, despite quoting it in my response, I made exactly that mistake. Thanks!
As said he tries to help his readership understand how probability works, but in the end one may still ethically do business in one's chosen field without having to solve the problems of understanding the limits of that field for those uneducated in it.
I think you may not be appreciating the subtlety of the repetition of the "probability of rolling a six". His claim is that aleatory probability starts with the assumption that you have a standard six-sided die with all sides weighted equally, but that epistemic uncertainty requires accounting for the uncertainty that you have a fair die, or even that it has six sides. So in both cases you are indeed trying to calculate the "probability of rolling a six", but the answers, and the process for creating them, are not the same. So while it might not have been the best phrasing for clarity, he's making a meaningful distinction.
Yes, the description is technically correct (the best kind of correct...), but as an explanation it is garbage because it hides the point to be explained behind subtlety.
Unlikely. Taleb seems to like how his argument has been summarised.
The crux of the issue taleb has with silver is showing probabilities without making a decision/declaration is cowardly. When asked about making a prediction he says one thing then follows it up with a “ but don’t be surprised if it could be completely random.” And that’s the point - the fact he covers his ass is the problem. Election forecasting is hard so don’t pretend you know something you don’t if you won’t stake something on your predictions. Oh so it could be anything and you don’t want to be held accountable? Then don’t say a damn thing. Basically Silver never wants to be held accountable for his models but always have uncertainty covering his ass as a cop out. so I’m
With taleb on this one. Either silver makes a claim and sticks with it or don’t show probabilities and pretend he knows things
If asked about the outcome of some uncertain future event (say, an election or some sports game), saying "team A has a 20% chance of winning" is making a prediction.
Making a decision/declaration in such a case where the outcome isn't (yet) certain does not mean "being held accountable", it means that you're either stupid, or a liar, or a stupid liar. If you want to call the results of a game before it's certain, then you shouldn't say a damn thing, because anything you say is a lie if you're falsely implying that the result is certain.
If reality is uncertain, then any certain statements/predictions are by definition wrong. Some predictions have more certainty than others, you can stake things on such predictions (sports betting is a great example - if you think that there's a 20% chance of winning but others think that it's 10% or 30%, then there's an opportunity), but it's ridiculous to require certainty where certainty shouldn't be expected.
This is not what the disagreement is about. It’s got nothing to do with uncertainty in the predictions and the debate is not about saying for certain what is going to happen. We know that distributions have means and variances so one can’t say what WILL happen. The point taleb is making is that Nate presents likelihoods and then throws his hands up and says whoa there I’m not saying what is going to happen just look at these likelihoods and draw your own conclusions. When the public naively looks at them and it looks as if it was right he gets praise for being a predictive genius and when the less likely outcome occurs he covers his arse with “well it was part of the 10% chance so I’m still right”. one is approaching stats from a real world perspective of there being consequences for actions and the other wants the praise that comes with being seen as a predictive so and so but never be accountable for his predictions. Put it another way: Nate’s “predictions” are only worth how much money/reputation he stakes on them. So far that’s zero by his own admission
I think this is a mischaracterization of Nate's own words. He certainly stakes his own reputation on the accuracy of his forecast. They spend a lot of time looking back and seeing how accurate the model was for the election as a whole.
There are two kinds of things Nate is saying here, and they are related but distinct:
1. Educating readers/listeners about how 20% chances happen all the time
2. Internal analysis and external reporting of how often his 20% predictions were right. If 538 predicts a group of 100 congressmen to all have a 20% chance of being elected and then 21 do get elected, the model did well and Nate's work is worth money to ABC and his reputation is improved (or maintained). If instead 33 of those congressmen get elected, the opposite result happens.
I disagree. Probabilities can be useful even if they're not 0% or 100%. For example weather forecasts often give the probability of rain. My plans will be different if the probability is 20% vs 80%. My plans don't tend to vary based on who wins the election, but I imagine that that information is very useful to some people.
And if your data is telling you to be 80% sure, I don't think you should fudge it and claim 100%. Report the uncertainty that you actually have.
With the weather man, I could keep track of when it rains and doesn't. If it rains significantly more or less than 80% of the time they say there's an 80% chance of rain, I can say that their model is bad.
I can't think of a similar way to evaluate Silver's election forecasting model. They very clearly aren't independent probabilities, and his model changes significantly from cycle to cycle. Was his model good in 2012 when every state went to where he predicted the likely probability was? Was it bad when his model said Hillary had a 71.4% chance of winning?
They do not only predict top-line presidential results, but every race, in every state, for president, House, and Senate. Non-independence is accounted for in the model, so they have no qualms about you judging them by the calibration of their predictions, I. e. You want roughly 60% of their “60%” forecasts to be right, and 40% to be wrong. If all of the races they predict 60/40 go to the more likely candidate, they themselves consider this a failure: https://fivethirtyeight.com/features/how-fivethirtyeights-20...
> I can't think of a similar way to evaluate Silver's election forecasting mode
You bucket every prediction, look at the outcome, and then confirm whether the favoured outcomes in the 8th decile actually occurred 70% to 80% of the time.
IMHO what Nate Silver does is not a prediction on the outcome of future events, such that when he doesn't get it right it is because he has failed, but rather an analysis of existing data in order to provide a detailed description of our own uncertainty. And by seeing how this uncertainty evolves as more data comes in or different events happen, we get a better understanding of the underlaying process.
When reporting on the run-up to an election, taking great care to avoid having one's figures misconstrued as suggesting that the outcome is settled, when it is not, is the responsible thing to do. For Silver to not say a damn thing would leave all the interminable speculation in the mouths of those with even less knowledge of what might happen, and those with an agenda.
Absolutely disagree. People have a tendency to think forecasting is simply making a binary prediction: something will happen or it won't. That simply is not the case. It's more than that. It's quantifying uncertainty. In actuality, reducing a prediction to a simple "X will happen/won't happen" statement throws away information. I can predict the sun will rise tomorrow, and can also predict that the Denver Nuggets will beat the Dallas Mavericks tonight, but the probabilities associated with those predictions are vastly different.
The quandary that FiveThirtyEight and Silver are in is that the general public is stupendously bad with probabilities. Prior to the the 2016 election, Silver was personally being attacked, in some cases by major media organizations like the Huffington Post, and accused of tipping the scales towards Trump (for some inexplicable reason), since their models were giving him approximately a 1 in 4 chance of winning. Then, in the months following the election, the narrative somehow switched, and the fact that Trump won despite only being given a 1 in 4 chance by FiveThirtyEight meant that Silver was now a hack and the model was wrong.
So now, fast forward to this year's election, Silver is doing his best to drill into people's minds that, yes, their model showed a Democratic win in the house as the most likely outcome, but that absolutely doesn't not mean their model says it is an absolute certainty, or that other outcomes would be unusual. It isn't "covering his ass", its educating the public on how probabilistic statements work.
I kind of feel the same. I literally shouted out "what the fuck is this shit" mid-way reading the post. Then I looked up the author's credentials: Ph.D. Candidate at Stanford | Chief Data Scientist at MatrixDS ... I'm not sure what to make of this.
It seems less intellectually confused once one recognizes that it's mostly just a plug for the author's company: he links to his product at both beginning and end of the 'article.'
Peripherally, I like using nuclear reactor design analogies for Aleatory vs. Epistemic.
Aleatory is the inherent randomness of a system. You can characterize it but it's often hard to reduce. For example, how much boron impurities are mixed into your steel at the time of manufacture.
Epistemic is things that are knowable but you don't know with much certainty (because it's hard to measure), like the probability that an incident neutron at 1 MeV will inelasatically scatter off a Uranium-238 nucleus and emerge at 200 keV in some direction.
That might be really useful for you, but for those of us who don’t design nuclear reactors it is utterly meaningless - what are the consequences of each of the results and how much should I care about them? I would be blown away if more than 1 in 10,000 could understand what the consequences are of your statement.
If I'm understanding it correctly, aleatory uncertainty is uncertainty you understand, and epistemic uncertainty is uncertainty due to lack of knowledge.
Uncertainty about the outcome of an unbiased die roll is aleatory. You don't know what's going to happen, but that's because of inherent uncertainty, not because you don't know enough about the die. So while you can't be sure what the outcome will be, you can know that each outcome has a 1/6 probability, and you shouldn't expect to adjust that probability as you gain more knowledge.
But uncertainty about the outcome of an election is epistemic, because it depends on a lot of present unknowns. If you run more polls you could reach a different, more accurate probability. Part or all of the uncertainty is because of things you don't know.
I can imagine a world where gambling sharks arrive at the roulette wheel in casinos, wearing hidden google-glass-ish devices, and place bets while the ball is in motion. Unbeknownst to everyone else, their device performed real time computation based on the position and velocity of the ball and wheel, and gave an output of a tight probability distribution (10% chance 25 red, 35% chance 29 black, 40% chance 12 red, 10% chance 8 black, 5% other). The sharks place bets based on these distributions, and make a bunch of money over the course of a few games.
This raises the questions:
1. To the public (not the sharks), all of the possible outcomes of the game have equal probability. Would it be correct to say that all of their uncertainty regarding the outcome of the roulette wheel is aleatory?
2. To the card sharks, they DO have a model with knowledge of game outcomes. So should one say that the distribution output by the device is aleatory uncertainty, and the remaining uncertainty (the fact that there is a distribution instead of a single predicted outcome) in the system is epistimic?
3. #1 and #2 differ in presuming that such a device is present. In absense of such a device, or in general, in absense of a tested model, is it correct to say that all uncertainty is epistimic?
You don’t have to just imagine such a world. This actually happened in the 1970s (although it was a shoe computer and a team of well trained physicists, not Google Glass)!
>This is just not helpful. He has said “probability of rolling a six” twice.
For different purposes.
One is "probability of rolling a six on a standard die" -- e.g. a systemic property, where we know it's 1 in 6 but we have inherent randomness in how we roll the dice (alea in aleatory comes from the latin for dice btw, as in the famous J.Ceasar quote "alea iacta est" -- well, famous from Asterix at least).
The other is the probability of rolling a six based on what we don't know but in theory could (do we have a standard 6-sided dice? Are we asked to predict an event featuring some bizarro D&D dice we haven't seen? Is it really cubic? Have the edges been treated with a file? How about its balance?)
The author was expressing apparently advanced philosophy that is missing from normal discourse, so I can understand why it goes over most peoples' heads.
"Instead, epistemically uncertain events are ignored a priori and then FiveThirtyEight assumes wild fluctuations in a prediction from unforeseen events are a normal part of forecasting. Which should lead us to ask ‘If the model is ignoring some of the most consequential uncertainties, are we really getting a reliable probability?’"
We don't know what we don't know. Any probabilistic model is constructed with a snapshot of the perceived variables. If the model-maker is not able to conceive of variables, then they will not be in the model, which could have asymmetric effects on accuracy of the model.
The article seems like an oversimplification of the underlying dispute.
- It’s not necessary or even useful to define an arbitrary “decision boundary” unless you actually have to make a decision. From a Bayesian perspective, a probability stands for itself: 100% means the event is certain to occur, 50% means you have no idea, and numbers in between convey varying levels of certainty. In reality, 538’s predictions are not true Bayesian probabilities because they don’t take epistemic uncertainty into account, but that’s a totally different issue.
- “Wild fluctuations in a prediction” from new information are absolutely a “normal part of forecasting” - sometimes. If I’m planning to flip two coins, the probability of getting two heads is 25%; but once I flip the first coin, the probability changes to either 50% (if I get heads) or 0% (if I get tails). In the case of Comey reopening the investigation, even if the model had included a probability of that happening, it would have been low and thus wouldn’t affect the overall forecast much. But once that low probability became a certainty, you would expect a sudden swing. The real question is whether 538’s predictions are more swingy than they logically should be (particularly earlier on), but again, that’s a different issue.
Seems like in addition to whether they're more swingy than they should be, are they being presented in a way that is interpreted as more certain or authoritative than they can possibly claim?
That is actually an issue they've acknowledged and tried to address. People also tended to see percentages and think they reflected vote percents.
To address this, for the 2018 election cycle 538 started displaying their odds as numerical ratios (5 in 9, for example) to try to reduce the aura of certainty.
538 started displaying their odds as numerical ratios
Did that actually make things clearer to people? Speaking only for myself, I find ratios quite hard to works with and reason about. I certainly couldn't tell you instantly for example if 5/9 is more or less than 4/7 or how big the difference between the two are without first doing some mental arithmetic. Perhaps people in the US are more used to working with ratios since their measurement systems tend to be ratio based.
That's the point though. It's meant to require a bit of thinking to digest the information. They didn't set out to make the information clearer and easier to understand, they set out to make the false positive of reading a percentage and considering it final results less frequent.
538 provides pretty in-depth discussions of it forecasts and changes in it. Far more than you would expect - and I think because 538 is ultimately a statistics blog at heart, not a political blog.
The 2016 election was swingy and it was uncertain.
In 2008 and 2012, the political media was myopically focused on the twists and turns of the horse race and called it 50-50 dead even. Nate Silver's 538 modeled the election as basically stable with only small polling shifts, and the only change in the last month was a steady decline in the remaining time that McCain (or Romney) had to significantly shift the polls.
In 2016, the political media treated the election as if Hillary had a lock on it, and entreated us not to get swept up in the daily swings in the polls. Meanwhile, Nate Silver's model swung wildly with the polls, and Nate went on the media to emphasize the uncertainty of the election. Both candidates were disliked, and a large portion of the electorate was undecided right up until the last week of the election.
I remember the following exchange in one interview:
Interviewer: If you say that Donald Trump has a 33% chance to win, what's the chance that he actually wins?
Nate Silver: One in three. I'm predicting a one in three chance that Donald Trump wins in November.
As for the authoritativeness of their presentation, FiveThirtyEight has a really beautiful forecast that presents their 2018 prediction as a probability distribution:
The Real Clear Politics polling aggregator showed nearly a 10 point jump for Hillary Clinton after "grab em by the pussy" came out. That lead then eroded back to the mean over the course of the month.
That's what I mean by swingy. Nothing in 2008 or 2012 shifted the polls like that. The polls in those elections stayed within a 2 point band the whole way. That level of volatility was clear in the polls well in advance of the election.
(And, 15% of the electorate was effectively undecided with 2 weeks to go, which is much higher than it was in 2008 or 2012).
Such swings are also what Taleb and the author of this article are criticizing Nate Silver for.
>The 2016 election was swingy and it was uncertain.
I don't remember it that way, in fact I remember seeing a lot of very shocked faces in the Democratic camp when the results were beginning to take shape. I don't remember having seen similar confused reactions after any previous US presidential elections, not even after the Bush vs Gore one which was a lot closer in terms of electoral votes.
Back to the article, I think Nate Silver's failure only shows to the general public that electoral predictions are rubbish. Maybe "failure" is a strong word because he genuinely seems to be the best at what he's doing, it's just that the domain in which he's involved is turning out to be bogus. I'm sure that there was a crystal-ball viewer that was the best at what he/she was doing, it's just that crystal-ball viewing turned out to be bogus.
However, FiveThirtyEight gave Trump a 30% chance of winning, and in the days before the election was stressing the fact that a Trump win was very possible.
They didn't predict his win, but they did better than anyone else in the mainstream.
> They didn't predict his win, but they did better than anyone else in the mainstream.
Kind of proving my point, as it shows that Nate Silver was the best at crystal-balling the election result. Afaik 30% is still bellow the 50% (or 0.5/1) threshold generally needed to make a decision, as the article also mentions.
I agree. I think people's shock was a failure of imagination. They couldn't imagine a Trump win, so even though the polling data showed he could, they discounted it. That, I think, explains why Silver was criticized so heavily for saying what he did days before the election: they couldn't imagine a Trump win, therefore he must lose, which Silver must also think, therefore Silver is trolling.
Actually, 100% means the event is "almost certain". That's not an accidental choice of words; it's a technical term which conveys important meaning about how we reason about probability.
Good read. From that page, here's an example that nicely illustrates the concept:
Imagine throwing a dart at a unit square (i.e. a square with area 1) so that the dart always hits exactly one point of the square, and so that each point in the square is equally likely to be hit.
Now, notice that since the square has area 1, the probability that the dart will hit any particular subregion of the square equals the area of that subregion. For example, the probability that the dart will hit the right half of the square is 0.5, since the right half has area 0.5.
Next, consider the event that "the dart hits a diagonal of the unit square exactly". Since the areas of the diagonals of the square are zero, the probability that the dart lands exactly on a diagonal is zero. So, the dart will almost never land on a diagonal (i.e. it will almost surely not land on a diagonal). Nonetheless the set of points on the diagonals is not empty and a point on a diagonal is no less possible than any other point: the diagonal does contain valid outcomes of the experiment.
Interesting. Of course to do the experiment as described you would have to determine the point the dart hit with infinite precision. If there is any limit to the precision with which you can determine the point of impact, then the probability that the dart hits close enough to the diagonal that you can't tell if it was hit or not becomes finite.
To try and explain Taleb's argument as simply as possible:
If someone tells you today that an event is 100% certain, and tomorrow tells you that it is impossible, then you intuitively will not trust their forecasting. If it's certain, then it's certain not to be impossible tomorrow
Now if someone tells you something is a 90% chance today, and a 10% chance tomorrow, again you don't trust their predictions.
There is a probability associated with "changes in probability". The probability of going from 100 to 0 should intuitively be 0%. The probability of going from 90 to 10 intuitively must be something like 20% (it's like 80 of the 90s "didn't happen", with a probability of 20).
A key point is that you can add this up every day, if something goes from 90->10->90 then it's even more unlikely than going 90->10.
So if you extend that intuition, put some real maths behind it, you can tell whether something is "likely to be a real probability" by the rate at which it changes. And if you have lots and lots of repeated predictions, you can be confident that something isn't a good probability (every timeseries has like a <10% chance of doing exactly what it does, so their combined probability [of being good probabilities] is like 0).
Now Nate Silver accepts this, but says that he's not actually putting a probability on the event in the future, but some non-existent "probability on that event in the future if the future was now". But that doesn't correspond to anything useful or measurable (it's untestable!!), and most people will assume it follows the normal meaning of probability, and it's honestly just silly.
I've always treated 538's forecasts as meaning nothing more or less than "We have a model, we fed the current data into it and ran N simulations of the event, and in X% of those simulations, this was the result". Which, as I understand it, is literally how they're generated.
Any discussion of the accuracy of the model (which should be the only thing that admits debate) has to be retrospective. Just saying "well the forecasts changed" doesn't inherently discredit the model, especially because things like elections turn out to be highly sensitive to tiny variations (even something as simple as a rainy election day in a few key precincts can completely flip an outcome).
Yes but that's not how the proles understand the blog. They take it as "x has a y % chance of happening".
Which is Taleb's beef with it. The commoner thinks its Silver putting his money on something, but when it doesn't happen he says "well I didn't say _that_".
And it in the day and age of more robust machine learning models, that the 538 models are so volatile is a pretty weak excuse.
Our options aren’t “high confidence converging predictions, or 538”.
The options in the current era for understanding of the electorate’s mood are “overly confident individual polls, poorly analyzed with completely inadequate models”, vs “538-style epistemically humble models accompanied by discussions of their confidence, which can be scored in aggregate after each election”.
> The options in the current era for understanding of the electorate’s mood are “overly confident individual polls, poorly analyzed with completely inadequate models”, vs “538-style epistemically humble models accompanied by discussions of their confidence, which can be scored in aggregate after each election”.
Also, “politically motivated actors selling narratives that reinforce their preferred outcome largely without data or with cherry-picked data.” Don't forget that option
Could you use the same reasoning to discount weather predictions?
September 10, 2018 - Hurricane Florence predicted to be category 4 hurricane on landfall, 80%
Septemeber 12, 2018 - Hurricane Florence predicted to be category 4 hurricane on landfall, 20%
September 14, 2018 - Hurricane Florence makes landfall as Category 1 hurricane
(the above numbers are demonstrative, loosely based on my memory of how events actually happened with Florence)
There is something to be said for predictions about the future based on today's environment, while still allowing for the reality that the environment could change. Predicting a single baseball game right before it happens is just a different kind of prediction than simulating a model that has noisy cross-interacting inputs.
Readers/listeners of 538 need to understand (and Nate spends a lot of time educating about this) exactly what the model is calculating and what it isn't. Nate calls out all the time that the model can only be as good as the polling that provides the inputs. And polls can swing for all sorts of reasons: there's not many of them for a district, only highly biased ones are available, people's actual voting intentions change from week to week.
Am I missing the point of what you're trying to say?
I think a more appropriate interpretation would be to use the same reasoning to acknowledge how freaking hard it is to predict the weather. Especially extreme weather.
> Now Nate Silver accepts this, but says that he's not actually putting a probability on the event in the future, but some non-existent "probability on that event in the future if the future was now".
Could you link to where Silver discusses this? I'm interested in seeing his description of exactly what his numbers mean.
What I'm referring to is the "now-cast", but his other two definitions both seem to shy away from saying "this is flat-out the probability we think of the election".
The point is, you can redefine or choose a definition of probability if you want, but if it's less useful than the normal definition (and confusing to people!) then people are free to criticize your work on that basis.
And there's a very useful, testable, mathematical definition of probability that allows us to equally assess everyone's predicting ability, and Nate Silver is dodging it.
If you're interested in this subject, there's a non-mathematical discussion somewhere in Tetlock's book Superforecasting which is interesting in general.
Oh okay. I think it's fair to just take his polls-plus model as his prediction and ignore the now-cast. But I wouldn't say that showing the now-cast is somehow being sneaky. He's just providing extra information.
I appreciate the stuff about two types of certainties, but where are they actually arguing anything remotely relevant like that? As far as I can tell, based on what I see bubble up to my twitter feed and what tweets are displayed in this post, the disagreement is based entirely on Talib’s claim that Silver switched his prediction from 80% to 50%. This really has nothing to do with actually understanding probability or prediction models. It’s just Talib sharing some other pundit’s deliberate misinterpretation of a single phrase deliberately taken out of context in an interview with Silver.
I don’t think their disagreement has much to do with science or mathematics. I think the two guys have big online followings, each leaning heavily toward two different political ideologies, and the two leaders just have to try to beat each other up on a Twitter to decide which side is better.
He should not have accepted the honor if he didn’t call a winner in any of the states!
And he didn't, as far as I recall. I remember him saying something to the effect of "that isn't that impressive, even a simple method looking at polls would have gotten nearly all the states correct". Don't confuse what other people focus on for something he is boasting about.
It's not clear to me what the author is getting at. 538 is good at aggregating polls and showing some probability. The author complains that's not enough because a non zero probability of the less probable outcome gives 538 too much cover when the less probable outcome occurs.
Isn't that a shared characteristic of any other model dealing with probabilities?
Is the author trying to say Nate Silver gets too much spotlight considering he doesn't guarantee outcomes?
I didn't sell in 2008 and kept buying. I have the same strategy today. Not sure if that also makes me a guru if that's the bar (as the author applies it to Taleb).
If 538 is trying to do aleatory probabilities, then it is kinda like a broken clock - right sometimes, but as a tool for answers about future events of any complexity (i.e. the real world, where actions have consequences, and errors can be grave..), I'd avoid putting one's confidence in something that claims it has a simple/'linear' relationship to the real future.
It is also a cop out to claim "everybody who thinks probabilities based on initial poll responses are not representative of eventual votes" must clearly not understand probability. Maybe I am straw-manning or just don't understand the actual disagreement.
As a poll aggregator? I'm sure it's probably a fabulous tool for past, historical, observed data. The brand itself, probably even more valuable, as a symbol with some sort of social authority in certain circles, I guess.
Anybody who reads my comment history will see I have obviously become a Taleb stan, but his approach seems right to me: "show me the money". Which in his case means mathematical papers of proofs primarily, actual money, second.
I think he would say that the "nowcast" is so useless and deceptive to be malpractice, since everyone will interpret it as a forecast, and even if they didn't it's not in any way useful. Which I agree with (unsurprising since I've semi-invented it as an opinion for another person).
It's only useless and deceptive if you willfully do absolutely nothing to understand what it is, or how it differs from the other models 538 has, which apparently is the case for Taleb.
Silver has said before, at least in the podcast, that everyone at FiveThirtyEight hates the nowcast and its various incarnations but I think there's some pressure from the business side of things (for views, likes, shares, etc) to have something volatile for people to eat see day in and out.
I mean obviously its hard to make predictions of complex things as the data from every angle is difficult to get and sometimes incorrect. I used to love Taleb but he is more of a broken cloc, many say if he stopped writing after black swan he would have gone down as great but some of his other points and politics have undermined him.
I think Silver is actually pretty legit. He was over-hyped, then he was largely panned as a chrlatan but through and through he has been pretty consistent. The Obama elections he rose to prominance but he also did well in the most recent election. The absolute only news outlet saying that Trump had a statistical chance of winning. He hovered around 20-33% at peak while Huffington post told him he was an idiot. Many major outlets didn’t personally attack him obviously but had Trump in low single digits throughout the last weeks
Taleb actually makes a great point here and published a pretty cool paper about it. The point is that if a probability of the event changes too much, you can arbitrage it. E.g. assume these are payoff odds and you can sell your position before the event. He then found some nice no-arbitrage conditions that "real" probabilities must meet and showed that Nate's predictive timeline allowed arbitrage. Unfortunately Nate never responded to this claim with anything other than "math is hard" and other nonsense.
I don’t have the background knowledge to properly understand the math in that paper. But I do know that you can’t simply set bounds a priori on how much a probability can change after additional information has been gathered; no matter how pathological the swings, you can design a system, and a series of observations of that system, that would make all the forecasts (i.e. conditional probabilities) correct. Thus the paper must be making additional assumptions. As far as I can tell, those include at least that the electoral process can be modeled as Brownian motion, which is a martingale, but is not the only type of martingale; in particular, I’d expect random motion to be a good model of typical polling drift, but a poor model of sudden polling swings caused by news events (which in reality are a large change caused by a single random event, not the sum of a series of small changes caused by independent events that just happen to mostly point in the same direction). I am not sure whether the assumptions also include an estimated value of `s` or volatility; the paper doesn’t seem to explain how it’s calculated, but maybe it can be derived from the raw polling results plus the assumption of Brownian motion? In any case, the paper says nothing explicitly about what data was used to produce the “rigorous updating” graph. I’d love if someone could explain this to me…
Subjectively, it’s hard for me to believe that an unbiased forecast would truly be so utterly noncommittal until just before Election Day, or indeed that there’s enough data to answer that question, especially seemingly from just one election result. But my subjective impressions, of course, could be utterly wrong. I’m very curious whether or not this is the case.
I believe he's making a slightly more subtle point. It's not just that the forecasts are swinging too much, it's that they're swinging too much too early.
Consider an option on a stock (which is the analogy Taleb is making here). If you buy a 1 year call option on AAPL and tomorrow they announce that they beat earnings by 10%, that's not a huge deal for you. If on the other hand, you owned a 1-week expiration call, it is a big deal for you. That is, your prediction should be less sensitive to changes in environment the further out it is.
I believe he is somehow formalizing this statement, and then showing that Nate Silver's forecasts violate it, but I too don't fully understand his formalism.
I think that Taleb's problem is that he thinks all the uncertainty in Silver's model comes from the fact that people might change their minds about who to vote for.
This is why Taleb can't make a model that fits Silver's forecasts. Silver could only be confident early on if he knew that people weren't going to change their minds much. But if people don't change their minds much then Silver's forecast shouldn't fluctuate much as time passes. Alternatively, if the forecast fluctuates a lot, it must be because lots of people are changing their minds. But then Silver shouldn't have been so confident to begin with!
But in fact the uncertainty in Silver's model isn't (wholy) caused by the possibility that people change their minds. It's mostly caused by the possibility of polling error. As we approach the election people have less time to change their minds, but the possibility of polling error doesn't change. This why Taleb can't create a model under which Silver's forecasts are rational; he's not taking into account polling error.
No, it's mostly caused by people changing their minds on who to vote for (or whether to vote). That's why the estimates move around a lot when there is news affecting the election, like the given example of Comey re-opening the Clinton investigation.
Those things may be true...but how does that relate to the issue Taleb is highlighting? That the forecast is overly volatile with respect to arbitrage pricing theory.
That would be only true if the thing you are forecasting becomes more sensitive to changes in the environment as time passes. It is true in your option example but not entirely true for elections.
I would say most people make up their mind more and more with information release over time. Towards the end you know the candidates very well and its hard to change your mind. How many Trump supporters will change their mind even now?
Sam Wang of the Princeton Election Consortium did commit to a prediction, in August 2016, that the 2016 would have some of the lowest polling variability on record. (I think that he declared something like 2.6% uncertainty.)
He seems to miss the point of what 538 is doing. Their nowcasts are attempts to say what would happen if the election was today, which throws away the time-dependent uncertainty.
Nate Silver built an aleatory model whereby low probability high effect events (Comey reopening the investigation on eve of the election) are discounted because if you add enough of these potential disruptive events the model will be 50%-50% and that has no value to anybody. Who wants to read an article that said "election forecast 50-50" every day.
If a 538 model says one outcome is 75% probable, it really should come with a giant caveat saying "at current trends assuming nothing out of the ordinary occurs". Taleb's beef is that if that is the case, the 75% does not mean in reality this result is 75% likely to happen.
What is a really interesting takeaway is that if in an event where the outcome you desire based appears unlikely, your job is introduce as much previously undefined or discounted uncertainty. Ideally you engineer a black swan event or at least do what you can to make it happen. This would be fun to model from a game-theory approach instead.
> What is a really interesting takeaway is that if in an event where the outcome you desire based appears unlikely, your job is introduce as much previously undefined or discounted uncertainty. Ideally you engineer a black swan event or at least do what you can to make it happen. This would be fun to model from a game-theory approach instead.
In chess, if you are behind, John Nunn's two recommended strategies are "grim defence" (if your opponent's advantage is not so large as to make it easy for them to force a win) and "create confusion": create complicated tactical situations and hope your opponent makes a mistake. The farther behind you are, the more appealing "create confusion" gets by comparison.
> whereby low probability high effect events (Comey reopening the investigation on eve of the election) are discounted because if you add enough of these potential disruptive events the model will be 50%-50% and that has no value to anybody.
I don't think you can say this. The challenge is that it's impossible to even enumerate all the possible surprises that could drastically swing an election, and incorporating some of them into a model would require pulling numbers out of your ass for how to weight those unpredictable possibilities, and even picking which potential surprises to factor in is a similarly arbitrary decision for which there is insufficient evidence to provide guidance. But none of that means that attempting to factor in such possibilities will drive your model's predictions toward 50%; your predictions could end up almost anywhere in the unit interval depending on the value of the priors you pulled out of your ass, and your final number ends up saying more about your biases than about the state of available predictive evidence.
I don't think this is a good description of Silver's model. It doesn't discount disruptive events at all. It implicitly takes them into account.
Silver knows that people's opinions can be changed by events that happen before the election. This is why polls taken early on are less informative about the final result than polls taken closer to the day. Based on historical records, Silver knows how the accuracy of polls depends on the date they are taken, and he weights them accordingly. This process automatically models the possibility that an event could suddenly change people's opinions just before the election. It's taken into account in terms of the accuracy of polls.
I'm confused by the existence of this article. Comparing the accuracy of two probabilistic predictors is pretty much a solved problem. (At least in situations where the two predictors have both made predictions about a fairly large number of events in the same set, S.) Just add up the log scores for both predictors and see which does better. [1]
Another thought: Although the author didn't lie about its meaning, I thought the graph "Stated Probabilities Compared with Average Portions" was visually misleading. At a casual glance, it appeared that 538 was all over the place with its senate predictions. But if you look closer, you can see that many of the red data points are at 0 or 0.5 or 1. This suggests that these data points represent the outcome of just 1 or 2 elections, which isn't really enough to say that 538 is doing particularly badly.
I'm still waiting for somebody to talk about how tightly coupled Nate Silver's forecasts are to the political processes he attempts to forecast.
What if Nate Silver were to run around on the football field before each play whispering in the players' ears about the how likely they are to win, according to his latest forecast. Imagine that they believed in him, or at least felt some degree of superstition when he came around.
I just think there's something innately flawed about public forecasting of an election. Mathematically and maybe even ethically. I'm still trying to figure out how to articulate why I feel that way.
It could even be argued that polling itself could influence opinions by introducing bias at a critical moment when someone is being asked to consider who they are voting for. What if that's the moment when they make up their mind?
I just think there's something innately flawed about public forecasting of an election.
It used to be the case in France that you weren't allowed to publish polls two weeks before an election for just this reason. They eventually scrapped this with the argument that the all 'elites' had access to internal and private polls anyway so all you where doing was denying the 'people' the same access to information that the 'elites' had.
It could even be argued that polling itself could influence opinions by introducing bias at a critical moment when someone is being asked to consider who they are voting for.
This is a pretty well known effect, to the point that many political strategists try to avoid publishing or drawing attention to polls showing their candidate too far ahead in case it makes voters complacent and decide to stay home. The ideal situation is to make people convinced the polls show the candidates tied 50-50 going in to the election and that 'your' vote could easily swing the whole election.
This is slightly wrong. It was initially one week, not two. It was changed because it was unconstitutional, as it limited freedom of expression too much, not because it was useless or anything to do with elite vs people. And it still exists, limited to the day of the election and the day before.
Source: https://www.conseil-constitutionnel.fr/nouveaux-cahiers-du-c...
Author does informative critique of 538. I'm not sure at all that it has anything to do with Silver vs Taleb twitter war though. It's more like premise for expressing author's own thoughts.
However I totally dig what 538 does and appreciate their analysis. I understand that there is a complex underlying model with many parameters taken subjectively. The results of Monte Carlo runs of this model are very interesting to me. The fact that they describe distribution of outcomes and not just a single number is important property of a Monte Carlo model and I would have an issue if that wouldn't provide that info. If I would like to analyze complex event that has lots of moving parts and uncertainties I would also build a model to see what kind of distribution I would get. The fact that somebody published results of their model is quite useful.
The analogy would be European and American weather models - no one say that their results are exact and everybody understand that uncertainty in result comes from inherent uncertainty of initial state as well as shortcuts and approximations each of the model takes. No one says that the results of these models are useless because of that. And everybody finds it valuable if weather forecast predicts rain tomorrow with 40% chance. So I don't see how political prediction is only valuable (by the words of the author) if it comes without probabilities attached to it.
> Practically what this means is that they do not predict a winner or looser but instead report a likelihood
Yes, this is good. It means that you can evaluate their accuracy to a greater degree than % of times correct.
> Further complicating the issue, these predictions are reported as point estimates
They've learned from this, and now show their distribution of results very clearly. However, there's nothing mathematically fraught about reporting a prediction as a point estimate.
> The problem is that models are not perfect replicas of the real world and are, as a matter of fact, always wrong in some way.
...yes? And so what? This is a fully general counterargument to all of science.
> Predictions have two types of uncertainty; aleatory and epistemic.
Don't say things like this with 100% certainty when your source itself says "the validity of this categorization is open to debate", and also when your source is Wikipedia.
> However, as you can see, there is still a noticable variation of 2–5% of actual proportion to predictions. This is a signal of un-addressed epistemic uncertainty. It also means you cannot take one of these forecast probabilities at face value.
Models aren't perfect. Also, 538 is very careful to address their epistemic uncertainty! They also try very hard to not change their models significantly after they publish them, in order to avoid letting their personal biases tinker with the results.
This post goes out of it's way a to avoid mentioning that there are pretty well established ways of measuring the accuracy of predictions, instead using things like "look, it's not quite a line" and "look, the line went up and down before the election".
If you're interested in actually reading about this from a mathematical viewpoint, read Taleb's paper, which has some actually interesting thoughts on why 538's algorithms are too eager, or read Madeka's paper, which includes comparisons of 538 to other people trying to make predictions, and finds that actually, they were better than most other news sources.
All of which increases the probability that I will ignore Taleb's tweets. I follow people who are smart enough to disagree in a civil manner, and smart enough to recognise that there may be things one doesn't know.
The way he talked to renowned classicist (and feminist, which seemed to be a problem for Taleb) Dame Mary Beard in another spat was disgusting. Trying to impose statistical certainty on ancient history ended up making him look pretty dumb to everyone but him.
The problem is that the smart arrogant jerk may have the right answer sometimes when it counts, while being a pain in the ass the rest of the time. Better just to stay open to all feeds. He does come off as an arrogant dick, wow.
I used to look at things this way, with a "just filter out the rudeness" mindset. However in practical terms I think listening to somebody's rudeness does have a really big cost (to estimate: how much would you need to be paid to make it worth your time to listen to rudeness for an hour?)
Yes, but when they would settle the rules of the bet, Taleb's confusion between nowcast and forecast would have to go away and what they would use to base their bet then? Losing an argument is as bad as losing a bet. I don't think Taleb want's to acknowledge that he was picking a fight for nothing.
Note: I'm don't think Taleb has a wrong idea. It's just that he attempt to fight with Silver over it is based on confusion. Taleb has written important and interesting books and papers. He has always been very opinionated and aggressive. His feuds don't produce debates that are worth following.
Election Predictions as Martingales: An Arbitrage Approach, Nassim Nicholas Taleb Quantitative Finance. 452 (1): 1–5. doi:10.1080/14697688.2017.1395230
https://arxiv.org/abs/1703.06351
Nate calls the chances like 80:20 Clinton:Trump.
Taleb takes the side: 100,000 USD that Trump wins.
If Trump wins, Nate pays 500,000 to Taleb. If Clinton wins, than Taleb pays 100,000 USD to Nate.
But that doesn't seem to be the nature of their disagreement. Taleb's main complaint seems to be about how the swings in the prediction probabilities far out from the event indicates a fundamental flaw in how the underlying model uses probability.
That being said it would be a fun little exercise to see who much money would be made if they bet had $1 on every race 538 predicted in 2016 and 2018 using the model odds.
Indeed after skimming through the paper[1] (I find it hard to grasp without a solid background) it seems that Taleb wants to play this game every day before the election and expects to come out ahead on election day when all the bets are settled.
I liked the first chapter of "Fooled by Randomness," but I stopped reading the book when each chapter regurgitated the same points, each more nihilistic than the last. Perhaps Taleb should concentrate on the accuracy of his own predictions:
I'd be more impressed with Taleb if he published the results for his Emperica fund for years other than 2008. Taleb's strategy was to buy options way out of the money. In boring years, the fund lost money. It did really well in 2008. Not enough numbers have been published to determine if his strategy was a net win over a business cycle. The win seems to have coming from the fund's short lifetime including 2008.
You only need to win once. One of his big things is about optionality and anti-fragility. If you buy something that barely hurts you in terms of cost, but can sky rocket massively if something big happens, you can capitalize on it, and all other costs were meaningless. Even VCs try for that, although with numbers that aren't nearly as good.
What do you mean by a business cycle? If his fund just goes until it makes it big and then he stops, is that not a business cycle? Did he continue? I don't know. It just seems weird to ask about a business cycle in the case of a strategy that is very long played, waiting for unpredictable results. The only thing that matters is can you play long enough to hit that result if that is your game? It seems clear that it did win, or else he'd probably be focusing more on it until he did? It's not like a regular business with actual output or production that can be easily replicated and measured, it's just a play till you win or lose deal and I'm sure there are others that did similar things and lost, just like many more companies are dead than alive and we only see the alive ones.
I understand wanting to see the numbers, but this is just one sample for a strategy.
He gets predictions more than most. You should go read that book. There are many chapters on the incorrect weight people add to the outcome of correct predictions (or incorrect predictions, if that is what suits their agenda).
For me, it'a less about the accuracy of his predictions, and more about his intellectual smugness rubbing me the wrong way. He has no problem with the hypocrisy of attacking Nate Silver over some supposed weaknesses in his election prediction model, but does not seem to care about the inaccuracies of his own public predictions.
Nate Silver took political polls and hunches, which were far less accurate, and came up with something significantly better. Even in 2016, Fivethirteight's model showed the odds of Trump winning were not that remote versus major media outlets "guessing" the race was all but over.
Election prediction models will always be inherently unstable in my mind. It's pretty much impossible to predict the when, if, how, who of an Anthony Weiner type scandal.
I'm infuriated that the author would allege that Taleb has a better way to do this but just links to a highly technical paper instead of addressing it.
I think the difference in opinion is caused by Taleb modelling the election as though all of the uncertainty is caused by the fact that people might change their minds, whereas Silver is also including the uncertainty from polling errors.
Silver's predictions fluctuate a lot as time passes. Taleb is assuming that all of these fluctuations are caused by the electorate changing who they're going to vote for. Then he says "if you knew people's opinions were as volatile as this, you shouldn't have been so confident earlier!", or words to that effect. If lots of people will change their minds, then early polling won't be very informative about the final result.
But the fluctuations in Silver's predictions are actually caused by his changing certainty about how accurate the polls are. Silver does take into account that people might change their minds, but this isn't the cause of most changes in his predictions. So people actually change their minds a lot less often than Taleb's model of Silver's model suggests. This is why Silver can make strong predictions early on.
The difference between epistemic and aleatory uncertainty is not well understood even in the machine learning world. Most applications of machine learning to domains where uncertainty needs to be quantified use aleatory uncertainty, when epistemic uncertainty is the right one. I wish we had better ways to explain these concepts, and perhaps better terms too. Alternatives like intrinsic/extrinsic or irreducible/reducible don't really work either, because they make it seem like it's a dichotomy.
I'm profoundly confused by the deep respect Taleb commands in the eyes of many people here, apparently. The guy has a few good ideas that he repeats ad nauseam, but he punches way over his own intellectual weight too often. A few especially egregious examples are his deranged GMO hate "paper" which is profoundly ignorant of the involved biology [1], his support of Syrian government's slaughter of civilians [2][3], and his Putinversteher views ([4] reads as a parody for anyone knowing anything about Putin and modern Russia). Combined with his unlimited arrogance, I don't understand how people can respect him intellectually.
Re: the desire to choose a 50% (or other) decision threshold.
It doesn't have to be a binary choice. Knowing the distinction between 65% and 80% improves your betmaking capabilities by allowing you to hedge, in some cases guaranteeing a profit [1] regardless of the odds and how they change.
Taleb's biggest thing is exposing holes in predictive models. Silver's biggest thing is making and using predictive models. So it's only natural for these two to quarrel.
Of course the US elections are not as accurately predictable as e.g. German ones, or sports events. Not simply due to different pollster sources and their data quality (which Nate Silver tries to account for).
The biggest and most unpredictable factor in US (and British) elections is their winner-takes-all system, with the second layer of the Electoral College in the US which does not account for popular votes either.
I would love to see the Nate Silver point by point response to this post. I think it would be humorous, but also add some insights into his proprietary black box.
I listen to the 538 podcast every episode, and I think I agree with cm2012 that this is a case of mixing up the "nowcast" with the actual forecast.
The author doesn't seem to understand how the 538 model works. The 538 model does take epistemic uncertainty into account; it just doesn't call it that. The model calls it "likelihood of polls moving by X% between now and election day". In 2016 they offered a "now-cast" which didn't include that, and answer the question "based on where the polls are right now, how likely is any given outcome if the election were held today?" while the "classic" model answered the question "based on where the polls are right now and the fact that unanticipated things will happen between now and the election, how likely is any given outcome?". For 2018, they dropped the "now-cast", because people didn't understand what it was saying.
True, the epistemic uncertainty is limited -- it's based on past elections, because hey, 538 works with actual data. One could argue that rather than saying "71.4% Clinton 28.6% Trump", a better model would have said "71.3% Clinton 28.4% Trump 0.2% Election is cancelled due to nuclear war / natural disaster / etc" but I don't think anyone sensible is interpreting the model as excluding such extreme outcomes.
I thought that the post was fairly unclear, but for me, it seems like the main argument against 538 is that it makes unfalsifiable claims about the probability of individual elections - picking a winner is a falsifiable claim, but assigning a probability always allows Silver to claim something like “even events with a 10% probability occur frequently” even if his model assigns a high likelihood of victory to the loser of an election.
My favorite Nate Bronze takedown remains Carl Diggler[1].
Five thirty eight is pretty consistent with saying their forecasts more than a few days out are unreliable. They’re fairly accurate with with their final forecasts, though.
Nate Silver always calls the nowcast (blue line) for what it is - a forecast for if the election was held that day. The forecast, made a few days before election day, is the official forecast and the one that beats most other estimates.
That's a really misleading graph, for a couple of reasons:
1. It conflates the forecast with the now-cast. The former is a prediction of what will happen on Election Day (whether that day is six months away or one day away). The latter is a prediction of what would happen if the election were held that day. In theory, those two would converge to the same value on the day of the election, though in practice, that wouldn't actually happen due to computational differences.
2. Silver never says that he "should be judged" by [just] the final result. In fact, he's gone out of his way to say that. The problem is, that's the only point on the graph where we can compare a model (predicted value) to the actual (observed value). There isn't an election on any of the other days, so even if both agreed to look at the now-cast and use that as grounds to evaluate the model, we still would only have one datapoint which is nonzero in both dimensions. In other words, we have a blue line, yes, but we only have a red dot. Taleb wants to extrapolate the red dot into a horizontal blue line, and then use that to judge Silver's model, which is ridiculous.
Your second point is kind of confused. You don't need the prediction to coincide with the event, to judge the quality of the prediction. If I predict the elections for 2 years time then you can still judge whether I got it right or wrong. If I do that for multiple elections for multiple cycles, you can decide whether I'm a good forecaster.
If I then make another prediction that's always 1 year out, you can decide how good that prediction is and start to make a curve of how good my predictions are by how far out they are.
Taleb is saying that the blue line is so incorrect it's meaningless and deceptive and Nate should product a red line which is a real probability that he can actually be judged on.
I've been following this spat since it reignited several weeks ago, but also since Taleb first attacked Nate on Twitter back before the 2016 elections, and it has been the final nail in the coffin as to any supposed credibility Taleb commands.
The quandary that FiveThirtyEight and Silver are in is that the general public is stupendously bad with probabilities. Prior to the the 2016 election, Silver was personally being attacked, in some cases by major media organizations like the Huffington Post, and accused of tipping the scales towards Trump (for some inexplicable reason), since their models were giving him approximately a 1 in 4 chance of winning, since everyone knew it was a certainty Clinton would win. Then, in the months following the election, the narrative somehow switched, and the fact that Trump won despite only being given a 1 in 4 chance by FiveThirtyEight meant that Silver was now a hack and the model was wrong. This same criticism has been leveled by many right-wing media personalities, who had been trotting out Silver's 2016 predictions leading up to the election, as well as by the New York Times, whose own models were giving Clinton nearly a 90% chance of victory.
So now, fast forward to this year's election, Silver is doing his best to drill into people's minds that, yes, their model showed a Democratic win in the house as the most likely outcome, but that absolutely doesn't not mean their model says it is an absolute certainty, or that other outcomes would be unusual. It isn't covering his ass, its educating the public on how probabilistic statements work.
This is something Taleb should know. But Taleb has proven himself to be a hack and an intellectual parasite. He hasn't actually produced any ideas worth discussing since "Fooled By Randomness" which was nearly 20 years ago. Basically all of his works since then have been rehashing the same idea, or, as with Skin in the Game, pseudo-intellectual ramblings containing little to no interesting ideas, and what ideas do exist have been rehashed a million times by others.
Since then, he has taken to making ad-hominem attacks and nonsensical critiques on others in the public view, often on Twitter. I hate bringing up Trump, as it happens far too often in discussions these days, but some of his methods of "debate" are strikingly similar: strange monikers ("klueless Nate"), schizophrenic jumps between topics/arguments, often indecipherable language, and a ridiculous volume of output. All of this combines to mask the fact that most of the time, he isn't actually saying anything, or at least anything sensible, but by shouting loud enough, long enough, and purposefully making his point difficult to identify, it almost becomes impossible to argue with him, and a non-insignificant number of people will assume he is right.
Interestingly, I've also found many of his supporters to be similar to some of Trump's or other figures such as Musk. There is a cult of personality built up there where they have established the person as a visionary first, and therefore anything and everything they do or say is correct. Many of them parrot back the same, shallow catchphrases of his, that rarely contain anything insightful but are general enough that they can be thrown out in any situation: critics or dissidents are "Intellectual yet Idiots" or don't have "Skin in the Game". Note that the arguments of these critics or dissidents are almost never addressed.
Unfortunately, the spat between Taleb and Silver (which actually started back prior to the 2016 elections), is exhibit A in why Taleb is an intellectual charlatan, and shouldn't be given the time of day. Silver isn't some perfect person either, but in general he has been a reliable voice in the realms of political prediction, fairly forthcoming on shortfalls, and his most damning criticisms all seem to be built upon critical failures in the understanding of probability, or a willful misrepresentation of his results.
The whole 2016 situation feels like many people are borderlining on delusion, Nate included. The Comey thing certainly helped Trump, but even the day of the election 538 was still forecasting a massive win for Hilary. How can the probabilities be that wrong?
Because the polls were wrong. Or, at least, incomplete. It doesn't matter how good your algorithms are if the data isn't there. And 2016 proved that 538, and really most media outlets, don't have a grip on America.
In other words, the existence of a black swan event in that particular system didn't really matter.
Silver is a glorified modern fortune teller, who seems to have deluded himself into thinking that simply having data means you can prescribe meaningful probabilities to the future. Assigning probabilities to a coin flip is easy. Figuring out the most likely winner of a game of baseball is a little harder; there's a whole movie about a guy who was basically doing that in 2002. Politics is a completely different field. You can't possibly assemble all of the data necessary to be remotely confident in the probability of outcomes. And let's not forget unpredictable black swan events.
But the readers love it. And, especially during the elections, the media would bring him out like a golden boy computer whiz, because he was saying the same things they were, but he's really smart and has the data and magic algorithms to back him up. And he was right that one time 8 years ago. He's about as pointless as Sean Hannity, with the one exception that, at least recently, Hannity was actually more correct about the future than Nate. It doesn't matter if you are right or wrong, if the reasons are wrong.
FiveThirtyEight Prediction: 48.5% to 44.9%
Real outcome: 48.2% to 46.1%
I agree that the horserace coverage is silly (especially given how polling operations use sampling and such), but its better to look at aggregate trends in polling than to ridiculously overcover outlier polls.
I also certainly don't think its fair to say "28.6% chance he wins" is a "massive win" prediction. Silver's take was regarded as indefensibly right-leaning and attacked.
> but even the day of the election 538 was still forecasting a massive win for Hilary
It was predicting a large electoral win as the median scenario, but only a 70% chance of Clinton winning at all, because discrepancy between actual outcomes and polling were likely to be correlated across states, and many states were very close. This is where Silver differed from the other forecasts that were saying Clinton 90%+ based on the (historically false) idea that differences between votes and polls were independent across states.
> You can't possibly assemble all of the data necessary to be remotely confident in the probability of outcomes.
You often can be more than remotely confident, but it's true that the national result of the 2016 Presidential election wasn't one of those times. OTOH, Silver’s forecast reflected that, since ~70% is extremely low confidence.
As just one example, the whole digression on a “decision boundary” is conceptually mistaken. Once you report a posterior probability, it’s up to the user to establish a simple real number threshold for placing a bet (if we’re talking about means to put skin in the game). The posterior you just learned holds all information, your only action is to threshold it. That’s the Neyman-Pearson lemma.
If you allow generalizations to interval-valued probability, which might be sensible under these conditions, the situation gets more complicated. But the writer of the post did not mention this.
Another place this came out is his initial discussion of aleatory vs. epistemic uncertainty — this can be done clearly, but here we read:
> Aleatory uncertainty is concerned with the fundamental system (probability of rolling a six on a standard die). Epistemic uncertainty is concerned with the uncertainty of the system (how many sides does a die have? And what is the probability of rolling a six?).
This is just not helpful. He has said “probability of rolling a six” twice.
Source: do UQ in day job.