Reasonably politely pushing back on these has got my post not only downvoted to oblivion (after upvotes) but then flagged (!!) while of course the guy who was abusive in return, no problems.
I know you have written eloquently about HN in the past, but I think that site is gone. And I remember because I've been on it for over a decade. I joined an early stage start up on it.
Sadly, they give no means for deleting your account. But thought I'd flag to you anyway.
Anyway, I will be abandoning this account, but it's maybe something for other HN old timers to think about.
I had similar experiences suggesting covid had natural origins. I'm not sure the culture that HN possesses is the same as the one it alleges it desires.
Anyway, last post from this account. I am sure this one will equally be nuked... :)
I am writing a book I intend to release commercially, and a good sized one at that (likely 1,500 pages+). I am absolutely stubbornly determined to release it as both print and DRM-free PDF.
It'll get pirated anyway (if anybody cares enough to do so), so why bother punishing paying customers?
Incredible work! I'm writing my own book (obligatory mailing list link [0] and description [1]) and I wondered how you tackled the mental side? It is a bit of a rollercoaster I am finding. That constant fear gnawing in the back of your head 'is this really any good at all?' :) I suppose it is the price of caring.
Prior to writing my first book, I already published quite a lot of articles, so I knew that there was some non-empty audience set out there :)
I started a crowdfunding project for the first book too, so that the printing and typesetting costs get covered. They were covered fully, so I knew that I won't dip into red numbers as a consequence. (This was a major worry of mine.)
I was still pretty nervous about acceptance, but it turned out OK. Whew.
This really chimes with me, I'm writing a book about the Linux memory management subsystem [0] and my primary motivation is to learn the subject more deeply and what Paul says here really aligns with my experience.
The effort to explain its machinery demands that I look very deeply - trying to answer questions like 'how does a page of memory get reclaimed?', or 'how does a userspace allocation propagate through the kernel?' - and following the white rabbit all the way down the burrow, forces a far deeper level of understanding than simply figuring something out for a patch.
I feel like this isn't publicised enough. The lab leakers are louder and people definitely seem to want it to be true more than natural origins and they are not being honest in their attempts to dismiss these papers.
The 'counter' is incredibly weak sauce and she's been rebutted several times. Again, as I said, if you just ignore rebuttals you can keep acting like you have a point.
The second post is a scurrilous and nasty smear job from a thoroughly unpleasant individual and I'm quite appalled you posted it. Shameful.
The "two lineages" argument has been grossly oversold. They're just two SNPs apart, making it near-impossible to distinguish whether they evolved in animals (implying two introductions into humans) or in humans (after one introduction). If they were more different, then we could exclude evolution in humans, since it's unlikely the virus could spread for that long without causing enough sickness and death for someone to have noticed earlier. With just two SNPs, that's much harder--SARS-CoV-2 picks up something around 1/3 of an SNP per transmission, so it's not even that unlikely that the lineages formed in a single human-to-human transmission (p ~ 1/9). It's also possible that an intermediate lineage existed but went extinct before it could be sampled, as most lineages do.
Pekar et al. do some complicated phylogenetic modeling that purports to show the MRCA in humans is too recent for a single introduction. That result is unintuitive, and I believe their model is highly suspect, per my comments and links at
I agree that "of course nobody knows", and so do Ebright and Chan; they are careful to assert only that further investigation is required, if necessary by subpoena (e.g. for any sequencing data potentially containing early genomes of SARS-CoV-2, whether as the deliberate target or from contamination like those Antarctic soil samples).
The author of the thread that you're praising does not though; she considers the question closed, and has viciously attacked those calling for such investigation, including Chan, whom she called "an intellectually dishonest, manipulative conspiracist". (Ebright gets rather unpleasant himself, so perhaps one could excuse her behavior to him as tit for tat; but Chan does not.) I find both those attacks and the overselling of Pekar's result to be deeply unfortunate. Don't you?
agree with some of your statement, not with your sentiment. I find your comment slightly off topic. I asked the original commenter if they read the thread, because they answered with a link to a horribly weak rebuttal of the first paper and a even worse personal attack on the author of the tweets.
so I should care if she attacks somebody "viciously" yet not if she is attacked in an even worse manner? Interesting.
In your comment, you linked to Pekar's paper. You said that after reading it, you found a natural cause more plausible. I've also read that paper, and I'm much less convinced. That's what I wanted to discuss, and I don't see how it's off-topic. If anything I wrote came across as a defense of Ebright's tone or any aspect of that Thacker piece, then I expressed myself unclearly; for the avoidance of doubt, I think they're bad too.
Have you looked at Pekar's full model, as set out mostly in the supplementary materials? This isn't any standard molecular clock approach. It's a byzantine stack of plausible but somewhat arbitrary assumptions, ending in a simulated phylogenetic tree. The shape of that tree with one introduction doesn't match the shape of the actual tree constructed from the earliest real samples in Wuhan, so Pekar concludes there were two introductions. But I'm not aware that such an approach has ever made a successful prediction, and there's no circumstance in any field where I can imagine trusting a model of such complexity without validation. Their sensitivity analysis is meaningless, varying some irrelevant parameters but keeping what seems intuitively like the main determinant of that shape (the connectivity of their contact network) fixed.
You are correct that Alina Chan's thread doesn't address that aspect of Pekar's argument. Others have though, per the Twitter threads I linked. What do you think?
Ah I expected to get downvoted into oblivion, I know it's a cliche to say it but HN isn't what it once was.
A key point to take away is that the fact there are 2 lineages means that lab leak is super unlikely. It would require somebody from the lab to come to the same relatively small market to give lineage A, then somebody else who got infected through a totally different evolutionary route at the lab to happen to come to the market just after and both to not spread it anywhere else.
Of course lab leakers are working hard to try to deny this or claim the data is wrong or yada yada. It's an ongoing battle and they have several highly dedicated 'independent scientists' (lol) working on it seemingly 24/7.
> Of course lab leakers are working hard to try to deny this or claim the data is wrong or yada yada. It's an ongoing battle and they have several highly dedicated 'independent scientists' (lol) working on it seemingly 24/7.
I think the downvotes are because of this type of statement, which suggests that people who have read the papers, and thought about the conflicting evidence, and finally come to a conclusion that differs from your viewpoint, dislike the implication that they are simply nutty conspiracy-theorists tirelessly and obsessively working to undermine rational science.
You do realise there are countless people who have 'read the papers and thought about the conflicting evidence' about climate change right?
The reality is unfortunately that these people ARE nutty conspiracy theorists. They are not 'thinking about the papers' and providing a reasonable push back, they're reiterating long debunked points (did you read the thread I posted?).
Ebright already claimed the papers were scientific fraud until he thought better and deleted the tweet. This is the kind of person you're dealing with.
I'm not going to both-sides something which now has very clear evidence in one direction and not a shred of evidence in the other, I'm sorry I'm just not.
Hacker news of old would be more open to that. Again,this is why I no longer post here.
The two lineages are literally just two SNPs apart. SARS-CoV-2 averages ~1/3 of an SNP per human-to-human transition, so ~1/9 of such transmissions generate "two lineages" at least that different. So intuitively, it seems easy to believe those lineages could have evolved in just a few weeks of early cryptic (unsampled) human spread.
Pekar et al. did some complicated modeling that purports to establish that the MRCA in humans is so recent that the two lineages must have arisen in animals, implying two introductions into humans. I believe that's highly suspect though, per my explanation and links at
They aren’t mutually exclusive though right? We can say the initial outbreak was from the wet market. Does that mean the virus itself is of 100% natural origin?
What I saw is that it started in SW Indonesia, in a jungle near the coast, in 2008. That was its first appearance:
covid-19 origin's
i see this coronavirus first originated in South West India
And first jumped to human in coastal Bengkulu/West Sumatra in Indonesia
I'll try to get time on these events
end of 2008: origination in SW india near Mangalore
1Q2013: first jumps to humans in BGK/WSU indonesia after an animal outbreak on the coast in late 2012
first substantial human outbreak: Mid 2013 slightly inland from outbreak location.
but of course there's no political capital to be made by blaming these poor places ....
feel kind of sad to consider how much of the crazy blame game is actually driven by knowledge, desire for truth and competence, and how much of it is just driven by political competition....i guess the motivation doesn't matter...the result matters. the output is not about truth it's about politics and narrative ... so sad....
This is not evidence. It's hand waving 'how do you explain' stuff that has been addressed many times by experts in the field. No credible evidence whatsoever has been presented. If you ignore experts and just keep repeating the same old nonsense of course you claim to have evidence in the same way 9/11 truthers do.
The problem with this is it is full of utterly debunked nonsense that Ebright knows full well has been addressed.
It's Brandolini's law [0] in action, and no matter how many times virologists rebut these positions the same old tripe gets wheeled out again, and again, and again. If you ignore rebuttals and just repeat yourself + get 20k likes on twitter how many people are going to see a rebuttal like the excellent one from Dr Rasmussen here - https://twitter.com/angie_rasmussen/status/15669764417361715...
The recent science papers which have been closely peer reviewed and none of these guys have been able to rebut (but oh have they tried, since they are VERY inconvenient for lab leak), instead it's the same tired nit-picking and misunderstanding.
It's sad to me that somebody with a name and position and blue tick abuses that position to spread what amounts to misinformation, but that's where we are in society I guess.
If you want a test, see how these lab leak advocates have reacted to any paper or data that pushes back against their claim. It has been a mad scramble to attack, nit-pick, dismiss and smear (Ebright especially likes confict-of-interest implications). These are not honest people.
So a paper on a lab-made coronavirus related to SARS that can infect human cells from Wuhan on topic about the origins of a coronavirus related to SARS that can infect human cells that caused the pandemic from Wuhan, is a separate personal desire?
>It's about a completely different study that didn't result in any pandemic.
Because after the article was published, Wuhan stopped working on this, right? Or is expected China (or any other country) publishes everything related to something delicate as this?
Seriously this "rebuttal" is anything but excellent. The ironic answers to good remarks makes it even worse. Sounds more like trying to evade the ~~points made~~ info given rather answering them.
As for the papers, origins of pandemic and origins of virus are distinct things. So, yes, we can all agree that evidence shows that pandemic started in the market, and as virus can infect animals sold there, this points towards that virus came from animals. But genetically is there anything that can distinguish a natural from a lab-made virus?
What is often missed is that chimeric viruses are easy to detect. The viral genome will show clear evidence of manipulation from random base insertions and clear homology with all the ancestral viruses. Hiding the signs of manipulation would either require vast amounts of time and resources (the expense and man power would make it very difficult to hide) or straight up science fiction technology. The chimeric origin hypothesis is not a plausible explanation for the origin of sars-cov2, which means the nature link is not relevant.
The other lab leak hypothesis is that a specimen collected and cultured by scientists, infected a lab employee and this patient zero then transmitted the virus to others. This is a plausible option, and it is being researched. However it is less plausible than wild transmission based on a simple numbers game. What is more likely, a breakout infection cause by a dozen scientists specifically trained and equipped against this possibility, or a transmission to one of the millions of other people who routinely interact with these bat populations? Both are possible, but one is much more likely. Before covid19, WIV had published research indicating that novel coronaviruses routinely jump from bats to humans in that part of the world. Most of these viruses aren't don't last in human hosts, but it's clear that it was only a matter time before something nasty got through. After all, it's already happened once before.
The real nail in the coffin is that research[0] has shown that there were at least two, independent transmissions of sars-cov2 to humans. For this to happen as part of a lab leak it would require WIV to have found and cultivated 2 different strains of sars-cov2, and then each of those strains would have to escape the lab.
Now the two distinct genomic lineages seem to indeed present a challenge to lab-leak hypothesis. It's explained in the original study[0] that the second lineage B came from A by intra-host evolution. Due to the molecular clock of the virus the single-introduction origin of the pandemic from a lineage A can be ruled out.
Have you looked at Pekar's full model, as described mostly in the supplementary materials? A typical molecular clock approach wouldn't give anywhere near the accuracy necessary to exclude evolution of lineage B (just two SNPs away) in humans. Pekar instead builds layer upon layer of complexity, with dozens of reasonable but somewhat arbitrary judgment calls, in the same general direction as econometrics. From the shape of the resulting modeled phylogenetic tree, he purports to exclude a single introduction into humans.
I'm not aware of any case where any similar model has been shown to have predictive power, and there's inherently no way to validate this one against any physical data. So I believe this result has been grossly oversold, per my comments and links at
> A typical molecular clock approach wouldn't give anywhere near the accuracy necessary to exclude evolution of lineage B (just two SNPs away) in humans
You're ignoring other data which is counter to the idea of B evolving from A in humans. Pekar's models are not the only evidence.
- Early cases were predominantly B
- A shows less generic divergence than B, this is what Pekar is talking about with regards to the discontinuity in the early clock.
When we first started discussing this - I spoke up because I was annoyed by you trashing peer-reviewed papers when it was obvious you weren't even attempting to grok the phylogenetics involved. Still annoyed.
It's been genuinely interesting watching the scientific debate to root the SC2 tree over the past few years because of the involved paradoxes.
"Just a few SNPs" is just such a silly argument when stacked against peer-reviewed phylogenies in high-impact publications.
Have you looked at Pekar's full numerical stack yourself, as described in their supplemental materials? If yes, then why are you confident that their choice of the Barabasi-Albert algorithm to generate a fixed infection network correctly models the earliest spread of SARS-CoV-2 in humans? In particular, why choose to study robustness against doubling time (which seems intuitively like it wouldn't affect the shape of the tree much), but not robustness against that connectivity (which seems intuitively like it would)?
The rest of their arguments depend fundamentally on the polytomy thing, because nothing else excludes an earlier (even September) first introduction into humans. With an earlier introduction and thus more extensive unsampled spread, it's much harder to insist that A and B would be first sampled in the same order in which they evolved in humans, or make any similar early claims with confidence.
You are correct that I hadn't fully understood their polytomy argument before you brought it up, and I appreciate you bringing it to my attention. I still don't think it's very good, though. I later found Erik van Nimwegen's criticisms, which roughly followed my own; so I don't think I'm taking a fringe position here. Indeed, I've never seen anyone citing or defending Pekar engage in any way with the numerical complexity of that model. It seems like anyone who's looked inside the box becomes a critic, thus my hope that you'll do so.
High-impact publications have shown unfortunate willingness to publish low-quality work that would exclude research-related origin of SARS-CoV-2. For example, I assume you followed Nature's publication, editor's note, and ultimate extensive correction of their pangolin paper, and that you agree pangolins aren't the proximal host. This makes me less inclined to trust in their reviewers here, and more inclined to trust my own judgment (or that of the two Twitter threads I've linked elsewhere).
> In particular, why choose to study robustness against doubling time (which seems intuitively like it wouldn't affect the shape of the tree much)
As I understand it, the doubling times observed in the simulations were primarily the result of the ascertainment and transmission rate parameters.
Care to elaborate why you think the robustness of the model with respect to transmission rate should be assumed? I don't share your intuition here, and note that the authors observe, "that sensitivity analyses with longer doubling times increase the support for multiple introductions."
You really fault them for robustness analysis here?
To be clear I don't fault them for studying robustness against doubling time; I fault them for not studying robustness against connectivity of the infection network, since that seems like it would be more important than any of the parameters that they did study. My intuition is that when spread is highly deterministic (e.g. if R0 = 2 and each patient infects exactly two others), it's easy to make inferences about past spread from the present. For example, in that case it really would be near-impossible for a later lineage to outcompete an earlier one.
But we know the spread of SARS-CoV-2 is actually stochastic, with most lineages dying out but a few exploding due to super-spreader events. In that case it's much harder to judge whether a clade is big because it had more generations to grow, or just big because of a few (un)lucky founder effects. In Pekar's epi simulation, that stochasticity is modeled by their connectivity network. I expect that a more overdispersed network (i.e. greater variance in the number of edges at each vertex, keeping the same average) would make non-modal outcomes--like the real pandemic's phylogeny, if it arose from a single introduction--more likely.
Their results of the simulations are stochastic. They discuss this in-depth, as it complicates their analysis.
I don't understand what you're trying to say. Everyone agrees that the spread is stochastic. Why are you starting with a hypothetical misinterpreation of an R value to make a deterministic strawman? You think that their simulations were too deterministic because of their connectivity network?
> -like the real pandemic's phylogeny, if it arose from a single introduction-
> You think that their simulations were too deterministic because of their connectivity network?
Yeah, pretty much; and it's what other critics, including well-credentialed mathematical biologists, are saying too. There's a continuum of dispersion, with my perfectly-deterministic strawman at the left extreme but extending to infinity. Their power-law network adds some dispersion, but how do we know it's enough? I believe they chose that distribution because it's been shown to fit some real data (including the spread of HIV) reasonably well; but how do we know it fits the early spread of SARS-CoV-2, in the earliest lineages of the virus with unknown biology, in an unknown group of people with unknown behaviors?
I don't know how to root the phylogeny, and I'm mistrustful of anyone who claims they can based on the limited information available. Anyone who's built and attempted to validate mathematical models knows that sometimes, there's simply not enough information to confidently reach any useful conclusions. Absent validation of the approaches used here (e.g. evidence that they've successfully made predictions in the past in similar situations), I believe that's our situation here.
> because nothing else excludes an earlier (even September) first introduction into humans. With an earlier introduction and thus more extensive unsampled spread, it's much harder to insist that A and B would be first sampled in the same order in which they evolved in humans
The tMRCA clearly excludes an earlier introduction. Because the tMRCA is based on genetic diversity, you cannot calculate a tMRCA based on all the known samples, get a date, and then say "oh, geez- well, there was also wide cryptic spread before that." It just doesn't make sense. Pekar addresses this point directly.
A race between the first A and the first B is a strawman. Rather, it's the predominance of lineage B over A in the early pandemic which is interesting. It would be unexpected for lineage B to dominate if A came first. Much of the modeling is to get a handle on how unlikely that situation would be. It shouldn't be surprising that the models don't support it as being likely. (But, that's not the only evidence.)
If you're willing to actually think about and engage on the phylogeny - stop with the "just a few SNPs" nonsense, and ask yourself what you really think the early origins looked like. If it really was a single introduction - Was lineage A ancestral? Was B ancestral? A C/C ancestor? A T/T ancestor? All these have interesting problems being supported by the data.
Finally, after reading some of your earlier comments, I'm realizing that you're conflating several techniques from Pekar's paper, eg:
> Have you looked at Pekar's full model, as set out mostly in the supplementary materials? This isn't any standard molecular clock approach. It's a byzantine stack of plausible but somewhat arbitrary assumptions, ending in a simulated phylogenetic tree.
His epi simulations are separate from the tree-building, with the possible exception of rooting, which he was using the output of the models to inform. Otherwise, the epi modeling which everyone is hand wringing over is really separate and doesn't end "in a simulated phylogenetic tree."
There /are/ novel methods used in the tree building (eg, non-reversibility of base substitutions), but that's a whole separate technique.
> Essentially Pekar's argument is a "two introductions of the gaps"--that if their model of a single introduction doesn't conform to reality, then it must have been two introductions.
BS. Again - understanding the paradoxes and debate involved in rooting the tree is basically required to understand the importance of this paper. The existing data is confounding and didn't conform to a logical understanding of viral evolution. A separate introduction elegantly explains the existing evidence.
If their modeling isn't strong enough evidence for you, fine. But that's different than throwing everything out because you don't understand how "just a couple SNPs" can still provide sufficient resolution to make phylogenetic inferences possible. If you think that "just a couple SNPs" /don't/ provide enough for experts in the field to inform their phylogenies, at least get to that argument directly instead of throwing ignorant shade at an unrelated portion of the paper.
Thanks for the links to those other threads. Nod's was interesting, but AFAICT, way off-base, starting around "Needless to say, early winter in Wuhan is not the Mardi Gras."
Here's Pekar's earlier thread which I recently reread and found helpful for understanding the significance of the phylogeny (#20 is where he gets into how lineage A breaks the clock):
I think you're talking about their model in "Inferring the MRCA of SARS-CoV-2", and I'm talking about their model in "Separate introductions of lineages A and B"? So you're saying they don't use the epi simulations to root and build the phylogenetic tree of real sampled genomes, which is true. I'm saying they do use the epi simulations to build a phylogenetic tree for each simulated pandemic, whose shape (polytomy structure) they then compare against the real tree:
> We simulated SARS-CoV-2–like epidemics (22, 23) with a doubling time of 3.47 days [95% highest density interval (HDI) across simulations, 1.35 to 5.44] (24–26) to account for the rapid spread of SARS-CoV-2 before it was identified as the etiological agent of COVID-19 (figs. S21 and S22, tables S3 and S4, and supplementary text). We then simulated coalescent processes and viral genome evolution across these epidemics to determine how frequently we recapitulated the observed SARS-CoV-2 phylogeny.
Coverage of this paper in the popular press usually said something like "study finds that SARS-CoV-2 arose from two introductions into humans", so I thought the latter was the more important result and started there. Like in your second link, Worobey says:
> [...] We then go on the explain, point by point, that it is not a two-mutation difference that is unexpected. It is a two mutation difference between two large clades like lineage A and lineage B, each displaying a MASSIVE polytomy at their root. This is something that [sic] DO NOT see in ~99.5% of simulations. That is the crux of the paper. Not the idea that two mutations can't happen in a single transmission event.
Are those "simulations" not the SIR-type epi simulations (followed by simulation of the mutations and sampling, then construction of the tree)? I believe his 99.5% is 100% minus the 0.5% from Figure 2C.
Their former model is of course independent of their SIR stuff, and indeed purports to independently establish tMRCA in humans too recent for significant cryptic spread. It carries a different set of plausible but arbitrary assumptions though, again about the stochasticity/overdispersion and sampling rate of early spread, just less directly.
Glad we're on the same page about the multiple techniques now. Statements you made like, "Pekar et al. do some complicated phylogenetic modeling that purports to show the MRCA in humans is too recent" and "This isn't any standard molecular clock approach. It's a byzantine stack of plausible but somewhat arbitrary assumptions" made it clear there was confusion before. Their tree is based off a couple novel modification to established techniques. Your characterizations were inaccurate and laughable.
> It carries a different set of plausible but arbitrary assumptions though, again about the stochasticity/overdispersion and sampling rate of early spread, just less directly.
So, you don't only have problems with the modeling of the authors, but their base phylogeny too? Do you reject their tMRCA? Good grief.
I'm still looking forward to discussing the molecular phylogenetics of this paper sometime.
On reflection, I believe the first of my statements that you've quoted was indeed incorrect, and that I was also incorrect when I just wrote:
> Their former model [...] purports to independently establish tMRCA in humans too recent for significant cryptic spread.
Even if SARS-CoV-2 really entered humans in December, with minimal cryptic spread, that's still enough time for the two lineages to evolve in humans, since they're (sorry) just two SNPs apart. I believe Worobey knows this, and that's the reason why he emphasizes the "Separate introductions" model, since their polytomy thing--and not any question of time for cryptic spread--is their best and only argument to exclude that. So I was wrong to mention the tMRCA at all, since even perfect knowledge of that wouldn't tell us confidently how the two lineages arose.
The second of my statements seems correct to me. Not only is their argument for two introductions not a standard molecular clock approach, but it's not a molecular clock approach at all, since "Inferring" provides no support. Their only support comes from the polytomy thing in "Separate". This makes the accuracy of their epidemiological simulation highly relevant, thus the "hand-wringing" over that.
I'd note that you yourself referred me to "Separate", back in:
So why did you switch to "Inferring"? I guess we could discuss that too, but per above I don't believe that could provide significant support for two introductions into humans, and thus not for natural vs. research-related origin. Do you believe otherwise? Or do you just mean the approach is of general interest, independently of that question of origin?
> Not only is their argument for two introductions not a standard molecular clock approach, but it's not a molecular clock approach at all, since "Inferring" provides no support
Okay, lets revisit this now that some of the terminology confusion is recognized.
"Inferring the MRCA of SARS-CoV-2" introduces their phylogenies. It was produced with BEAST as described in their methods. I believe this is the model you were referring to as "Inferring." Yes?
I don't understand what you're trying to say here. If you don't understand how their phylogeny helps support their theory of multiple introductions, I don't know what to tell you. Maybe just another clarification of what you're trying to say would help.
> I'd note that you yourself referred me to "Separate", back in ... So why did you switch to "Inferring"
Because we're discussing multiple things in the same paper?
> Even if SARS-CoV-2 really entered humans in December, with minimal cryptic spread, that's still enough time for the two lineages to evolve in humans, since they're (sorry) just two SNPs apart.
This isn't the evidence the authors present. The argument isn't "there isn't enough time to go from A -> B." IIRC, I've seen similar acknowledgements that even more rare mutations have been observed in a single transmission during the course of the pandemic. They're just highly improbable.
The most direct evidence (as I see it) for B not evolving from A in humans is the unexpected lack of genetic divergence in lineage A compared to B. Lineage B should show a younger molecular clock, it doesn't.
> I believe Worobey knows this, and that's the reason why he emphasizes the "Separate introductions" model, since their polytomy thing--and not any question of time for cryptic spread--is their best and only argument to exclude that. So I was wrong to mention the tMRCA at all, since even perfect knowledge of that wouldn't tell us confidently how the two lineages arose.
Nonsense. The tMRCA is key evidence in how the lineages arose. One of the reasons for the epi modeling was to figure out the plausible time between the primary case and index case. It shows there is at most a few dozen people infected before the genetic diversity was captured through sampling. (`Results: Minimal cryptic circulation of SARS`)
I don't think you understand their argument here, at all.
> Not only is their argument for two introductions not a standard molecular clock approach, but it's not a molecular clock approach at all, since "Inferring" provides no support
Please elaborate why you think their use of the molecular clock is novel. It's really not.
> Do you believe otherwise? Or do you just mean the approach is of general interest, independently of that question of origin?
As explained above, I think the authors provide compelling evidence of multiple introductions using solid phylogenetic inference and solid molecular epidemiology. Bottom line is that there simply isn't an alternate hypothesis which explains the available evidence, and they illustrate why.
Here's a video you might not have seen, with Pekar and Wertheim. I've cued up the portion with a great explanation of why the evidence in the MRCA and genomics is so important. If you're going to continue to try and tear down their arguments, you probably want to really get this part.
I think I understand what Worobey and Pekar write on Twitter, though I disagree with much of it. I don't understand what you're saying, so I'm afraid we're still talking past each other.
Do you agree that there are two mostly-independent models in the paper, one described in the section titled "Inferring the MRCA of SARS-CoV-2", and another in the section titled "Separate introductions of lineages A and B"? When I write "Inferring" and "Separate", I am referring to the models described in the sections with titles beginning with those respective words.
You wrote earlier:
> His epi simulations are separate from the tree-building, with the possible exception of rooting, which he was using the output of the models to inform. Otherwise, the epi modeling which everyone is hand wringing over is really separate and doesn't end "in a simulated phylogenetic tree."
As to "Separate", I believe that's incorrect. That model begins with an SIR-type simulation, and outputs the shape (polytomy structure) of the phylogenetic tree of that simulated pandemic, which they compare against the shape of the real pandemic's phylogenetic tree. Do you disagree? If so, what do you believe is the output of that "Separate" model?
I agree that the "Inferring" model does not depend on the epidemic simulation. I don't believe the "Inferring" model provides significant support for two introductions though. I believe that's the reason why most public debate has been about "Separate".
Yeah, I think we're basically on the same page with their methodology and models now.
I didn't realize you were nicknaming the models based on applying them to the result titles, so was quite confused, especially when we both used those words in the quoted sections, so it sounded like you were referring to portions of our conversation. So yeah, talking right past each other.
No, the two models don't correspond to the results cleanly. ie, when the authors claim "Separate introductions of lineages A and B" in the results, they provide evidence from both. (They're presenting the results of the models in support of their phylogeny.) I agree that "Inferring the MRCA of SARS-CoV-2" is pretty much independent of the epi stuff.
> As to "Separate", I believe that's incorrect. That model begins with an SIR-type simulation, and outputs the shape (polytomy structure) of the phylogenetic tree of that simulated pandemic, which they compare against the shape of the real pandemic's phylogenetic tree. Do you disagree? If so, what do you believe is the output of that "Separate" model?
I thought we were over this. We both agree that one of the results of the epi simulations was sampled genetics and a resulting tree from the simulation. That doesn't mean that their phylogeny is the direct result of their epi simulations. Their simulations are in support of their phylogeny. Their theorized phylogeny essentially existed prior to the modeling, and which is why I called them separate, ie, independent.
The `Materials and methods summary` is quite clear, especially `Phylodynamic inference and epidemic simulations`.
edit: Our thread is too deep for HN, might not be able to reply? I'll try and keep an eye for new replies if you want to fork off somewhere else.
But, where's your horse in this race? You speak a lot about what you think sucks and very little about what you actually believe here.
> I agree that the "Inferring" model does not depend on the epidemic simulation. I don't believe the "Inferring" model provides significant support for two introductions though. I believe that's the reason why most public debate has been about "Separate".
Funny. My theory is that most people don't have enough knowledge of molecular genetics to make heads or tails of the paper, and so are of course silent on those results. They didn't follow the debate over the past few years, and are showing up and trying to understand something without context or the requisite knowledge.
When you say "Public debate" you need to admit you're talking about a particular part of a particular website or two where a small number of people are picking at nits and can't even address the core of the findings the authors present here.
We're making some progress, at least. I believe this site rate-limits deep threads, but doesn't cut them off entirely.
So I guess we were also talking past each other on "Separate". By "simulated phylogenetic tree", I've always meant "phylogenetic tree for one of their simulated pandemics", not a tree for the real pandemic. We also agree that Pekar's argument isn't based on the time necessary for the two lineages to evolve in humans, since at least that much difference could arise even (with p ~ 10%) in a single human-to-human transmission.
So to exclude evolution of the two lineages in humans, they needed something else. Loosely, that's the observation that (stochasticity of spread aside) we'd expect the earlier lineage A to have more and more diverse descendants than the later lineage B. Their epi model in "Separate" is a formalization of that, and if they could correctly and confidently model that spread then I believe it would be sound.
It seems like we disagree as to what forms the paper's core result, though. I'm taking my own cue from Worobey's Twitter comments, because (a) he's an author, so he presumably should know better than most, and (b) while I disagree with his conclusion, I do see the flow of his argument. In the thread that you linked and I quoted, he describes the result of that "Separate" model--which fundamentally depends on the epi stuff--as the crux of the paper. That makes sense to me.
I believe you prefer to think in terms of construction of the phylogenetic tree for the real pandemic, like to frame the question of number of introductions in terms of the number of roots for the tree. That's in a certain sense equivalent, but it seems much less intuitive to me. The "Separate" approach makes the epidemiological assumptions explicit. Those assumptions are obviously always relevant though, so they're still relevant when you frame the problem in terms of the real tree; they're just much harder to express in the parameters (R0, serial interval, dispersion parameter k, etc.) typically used to model a pandemic.
When they built the real tree, they observed that any single root fits badly. (Per your other comment, I agree that's what they did in "Inferring" with BEAST.) More roots would fit better; but that's always true for any phylogeny unless there's a penalty for each additional root, since more roots improves all the other usual measures of fit. Without quantifying what that penalty per additional root should be, it's not possible to say whether the poor fit is because the tree really should have two roots, or for other reasons (unmodeled stochasticity of spread, imperfect sampling, etc.). It's not too easy to convert those pandemic parameters into that penalty. So it makes sense to me that they didn't try, and instead switched to the SIR-type simulations in "Separate", which they're treating as their most important result.
As I've noted earlier, I don't believe it's possible to reach any confident conclusion (as to research-related vs. natural origin, the number of introductions into humans, or most of the other topics of major contention) from the evidence currently available. I'd have little objection to this paper if it were framed as exploratory work, whose speculative conclusions should not be trusted without further verification. That's not how Worobey and others have portrayed it in the popular media, though, and also not how you've initially portrayed it here.
I think it might be productive to dive in on this part
> Loosely, that's the observation that (stochasticity of spread aside) we'd expect the earlier lineage A to have more and more diverse descendants than the later lineage B. Their epi model in "Separate" is a formalization of that, and if they could correctly and confidently model that spread then I believe it would be sound.
Yeah, that's the observation. However, you're invoking the epi model at the wrong time. If you read `Inferring the MRCA...`, all of this is already known and observed before the modeling is even run. The epi model doesn't contain these results. They constructed their SC2 tree, then brought it over to the epi model to play with it.
If you want a "formalization" of that observation, perhaps Table I will do.
The results are best read in order.
If you're trying to better understand the phylodynamic model, perhaps "Inference of Viral Evolutionary Rates from Molecular Sequences" by Drummond would be interesting.
I think you're failing to appreciate the reason why they built the "Separate" model. Their headline claim is that SARS-CoV-2 arose from two zoonotic introductions into humans. If you want to express that claim in terms of the real pandemic's tree, then the relevant tree is the tree in humans only, which would then have two roots.
The construction of such a tree inherently depends on our assumptions on the epi dynamics. For example, if you give me a hundred genomes and I propose a hundred roots, then that wouldn't usually be a very good tree; but if the disease in question were known to spread animal-to-human but not human-to-human, then that might be correct. Nothing in their "Inferring" model allows them to incorporate such obviously relevant information, so that seems like an obvious deficiency.
To put it another way, you write:
> If you read `Inferring the MRCA...`, all of this is already known and observed before the modeling is even run.
After "Inferring", I believe they know the real tree has structure that's obviously non-modal (i.e., not the most likely outcome) given any single introduction. I don't see how they'd know whether it's a p = 20% non-modal or p = 0.5% non-modal outcome without an epi model like "Separate", or some kind of ugly incorporation of the epi dynamics into BEAST that they wisely didn't attempt.
I believe that's why the authors built "Separate", and its basic form is good work. (If you don't, then why do you think they spent their time on that?) I just disagree with their parameter choices and excessive confidence in their result.
As to your other reply, I agree the 10% is a rough number, not considering mutation biases and such. That's just the probability in a single transmission though, and it's also possible (and more likely) that the two lineages formed in humans with intermediate lineages that went extinct before they could be sampled. I think we at least agree that timing alone is insufficient to exclude evolution of the two lineages in humans though, even assuming a December introduction? I'm just trying to confirm that none of the evidence you see for two introductions in "Inferring" comes from its tMRCA.
Sorry; maybe I'm too stupid or lazy, but I genuinely don't get your point. Is it just that when they construct the tree in "Inferring", it looks qualitatively surprising (non-modal) given any single introduction, assuming (as I do as well) that A predates B? But we've known that for literally years now. As I understand the paper, their novel contribution is to quantify how surprising that looks, whether it's p ~ 20% surprising (which wouldn't mean much) or their claimed p ~ 0.5%. That's what they do in "Separate", and it correctly and inherently depends on the epidemiological modeling that I don't trust.
Again, in the Twitter thread that you yourself linked, Worobey says:
> This [the real polytomy structure] is something that [we] DO NOT see in ~99.5% of simulations. That is the crux of the paper.
The simulations in question are the epidemiological simulations from "Separate". You've told me to disregard Worobey's comments here; but while it's possible that Worobey has misunderstood the significance of his own paper, it seems more likely to me that you have.
> (with p ~ 10%) in a single human-to-human transmission.
That math is absolute garbage. One, the odds of a C/T -> T/C double mutation in a single transmission for the clade-defining markers isn't the same as T/C -> C/T, so at the very least you need to state an ancestral lineage to do any math like this. It also doesn't take into account the different priors for reversions, synonymous mutations, and the C-T transition bias in humans.
> When they built the real tree, they observed that any single root fits badly.
No. Go read the paper again. ("Our unconstrained rooting strongly favors a lineage B or C/C ancestral haplotype...") It's when you try and root in lineage A that things go sideways.
> I believe you prefer to think in terms of construction of the phylogenetic tree for the real pandemic, like to frame the question of number of introductions in terms of the number of roots for the tree.
> More roots would fit better; but that's always true for any phylogeny unless there's a penalty for each additional root,
No, it's not multiple roots, they just place the likely MRCA of SARS-CoV-2 in animals. ("If lineages A and B arose from separate introductions...") It's one tree. With one root. However, that root is in an animal instead of a human.
You can calculate the MRCA for any portion of the tree, including the descendents from the two+ hypothesized introductions. This MRCA is distinct from the SARS-CoV-2 MRCA. Is this what you mean by multiple roots?
> It seems like we disagree as to what forms the paper's core result, though. I'm taking my own cue from Worobey's Twitter comments
If you're trying to understand the paper's core result, read the paper, not twitter.
The first paragraph in `Discussion` frames the crux of their argument I was trying to get across. Notice that they cite the paradox I'm trying to get you to understand, as well as citing genomic diversity as core evidence, as opposed to any argument about the exact timing of A and B samples, or the unlikelihood of multiple mutations.
I've read the paper. What got my senses tingling, is that:
- there has been no investigation on which strains were in the lab
- if the data is to be trusted, given it's in China.
Till those questions have been answered I remain skeptical about the origin.
Thanks for sharing this.
For no expert in this field like me, peer-reviewed articles in a highly regarded publication like Science Magazine eliminated the lab theory. But I guess trust in science is unfortunately not as big as it was. And some people are biased as they want some theory to be true.
Here're some interesting parts from paper's concluding section.
>These findings suggest that infected animals were present at the Huanan market at the beginning of the COVID-19 pandemic; however, we do not have access to any live animal samples from relevant species. Additional information, including sequencing data and detailed sampling strategy, would be invaluable to test this hypothesis comprehensively.
>The sustained presence of a potential source of virus transmission into the human population in late 2019, plausibly from infected live mammals sold at the Huanan market, offers an explanation of our findings and the origins of SARS-CoV-2.
It doesn't eliminate or prove anything. Only makes a case that zoosis is a good hypothesis.
Actually all of his claims are facts unless proven otherwise. Do you have proof that these facts (Wuhan playing with these viruses, funding etc) are not true or you would like them to not be true?
'Claims are facts until proven otherwise' is quite the take, I think Karl Popper would take issue with that.
I linked a thorough thread from a domain expert explaining in detail why the claims are false. These things have been addressed/debunked many, many times but still get repeated because the untruths get amplified and the rebuttals not.
This is sophistry. The OP is stating facts. You are being "vague" by saying that these have been "debunked" relying on the long term "no conspiracy" theory which does really not apply in this case. This is a very clear Bayesian probability problem.
You guys are missing the point. Either you believe some unknown variable has zero probability or even if it is small you think ifs not worth discovering, for an event that is the most important since ww2. Is there proof of zero probability? No.
The "all claims are facts unless proven otherwise" is a wrong way to think about. You must prove a claim, not the other way around. What is perhaps better to say is that he makes no arguments but just lists observations (nitpick: therefore can be no such thing as rebuttal). As the commentary tweets don't call them wrong, we can assume they're indeed correct (thus nothing was debunked). That said, I'ld have preferred if he had correct citations such as the link to the Nature article.
I wonder if you take an "all claims are facts until proven otherwise" approach to religion as well, and if so, how do you reconcile the mutually-exclusive (but non-falsifiable) claims that various religions make counter to one another?
It’s a fact that I had coffee with someone who grew up a thousand miles away from me yesterday. If I were to infer from that fact that there was something unusual about that, I would be wrong.
Interesting this came out just as I rejigged and updated a visualisation tool I made a decade ago (!) which shows sorting algorithms in a really pretty rainbow colour form. I think it gives a good insight into how these work.
This reminds me somewhat of golang - ostensibly open source project but in reality all meaningful decisions are made by those at the company (including shameful throwing away of community work). Perhaps we need a different term to differentiate between open source projects that truly have open governance vs. ones that are essentially bolting on some open source shaped stuff.
I know you have written eloquently about HN in the past, but I think that site is gone. And I remember because I've been on it for over a decade. I joined an early stage start up on it.
Sadly, they give no means for deleting your account. But thought I'd flag to you anyway.
Anyway, I will be abandoning this account, but it's maybe something for other HN old timers to think about.
I had similar experiences suggesting covid had natural origins. I'm not sure the culture that HN possesses is the same as the one it alleges it desires.
Anyway, last post from this account. I am sure this one will equally be nuked... :)