Hacker News new | past | comments | ask | show | jobs | submit login

This is the paper with the title: https://academic.oup.com/mbe/article/36/2/220/5229930

Almost every article I've seen that uses that naming trope fundamentally misunderstands Wigner's point in the original (https://www.dartmouth.edu/~matc/MathDrama/reading/Wigner.htm...). What Wigner was hinting at (he never really comes out and says it) is that he thinks math and science are two totally different domains and it's surprising to think that mathematical models of physical systems would be able to make generalized out-of-scope predictions. Ultimately, the best thing any theory can do is predict something we didn't expect from the previous models, and then have an experimentalist go and show the new theory's prediction is more consistent with natural observations. Both relativity and QM have done that repeatedly, although Wigner was surprised at that, as he believed that math was an independent domain untethered to physics (many today assume that the universe is effectively a physical embedding of a mathematical structure, and our mathematical theories are simiplified approximations of that structure, so it not's super surprising that a good math model would make good physical predictions), and I think this article was him basically hinting at that new idea without coming out and saying it.

As for why CNNs are useful here... in the generalized genotype-to-phenotype problem, where you are trying to take a list of a person's mutations and predict their physical attributes, there are some phenotypes/traits which are absolutely and totally explained by a single mutation in one gene. In those cases you could train classifiers using simple binary features ("has_mutation_TtoAatPosition37OfChromosome1") and make pretty good predictions.

But most traits are only predictable by making complex non-linear models that take more locations, and the interactions between locations. In some cases, it's 1-2 mutations in a single gene near each other, in other cases, it's 100 different mutations spread throughout the genome, and in other cases, many thousands (the variance of height in humans is a good example where a large number of effects combine non-linearly). CNNs are great for dealing with non-linear data with non-local interactions.

Sequence models also work well (always fine this funny because you're doing ML sequence models on DNA sequences) because so much of the signal can be found in the neighboring bases. For example, in transcription factors, where a protein recognizes a short chunk of DNA, a short window (10-20 base pairs) is recognized and it has significant internal predictability.




Interesting point about Wigner, but about the article...

This study does not examine phenotypes, but may be applied to them somehow. Instead:

>we use simulation to show that CNNs can leverage images of aligned sequences to accurately uncover regions experiencing gene flow between related populations/species, estimate recombination rates, detect selective sweeps, and make demographic inferences

I believe this works well because the sorting of the data (Fig. 2) introduces phylogenetic information into the image to be analyzed. This reminds me of neighbor joining [0], but has some differences. Without this ordering, their method does not work as well.

0)https://en.wikipedia.org/wiki/Neighbor_joining


> sorting of the data

Perhaps the main advantage is that it can filter out non-adaptive, non-functional (noisy) mutations this way by simply averaging similar genomes? In that case, the rows of the learned M x N kernels should be nearly identical and one could have simply averaged M data rows at a time and fed it to the 1D CNN.

What other phylogenetic information could possibly be inferred?

Edit: It could also be thought as data augmentation as it effectively creates novel inputs each time. IIRC there was also a technique for hardening against adversarial examples which simply fed the network averaged datapoints along with the original data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: