I meant, which *specifics* are you thinking of? > *Genetics is amenable because ...

whimsicalism · on Nov 30, 2020

> Not sure what you mean by that

I spoke loosely, my mind skipped ahead of my writing, and I didn't realize that we were parsing so closely. "Genetics (the field) is amenable because the object of its study (the genome) is a sequence" would have been more correct but I thought it was implied.

> without a specific purpose there’s no point in doing so

Well yes, prior to the success of transfer learning I could see why you would think that is the case, but if you've been following deep sequence research recently then you would know there are actually immense benefits to doing so because the embeddings learned can then be portably used on downstream tasks.

> it’s purely limited by data availability.

Yes, and transfer learning on models pre-trained on unsupervised sequence tasks provides a (so-far under-explored) path around labeled data availability problems.

I already linked to a paper showing a task that these sorts of approaches outperform, and that is without using the most recent techniques in sequence modeling.

Maybe read the paper in Nature that uses this exact LM technique to predict the effect of mutations before assuming that it doesn't work: https://sci-hub.do/10.1038/s41592-018-0138-4

I am not directly in the field, you are right - but I think you are also being overconfident if you think that these approaches are exactly the same as the HMM/markov chain approaches that came before.

klmr · on Nov 30, 2020

Thanks for the paper, I’ll check it out; this isn’t my speciality so I’m definitely learning something. Just one minor clarification:

> Maybe read the paper … before assuming that it doesn't work

I don’t assume that. In fact, I know that using ML works on many problems in genetics. What I’m less convinced by is that we can expect a breakthrough due to ML any time soon, partly because conventional techniques (including ML) already have a handle on some current problems in genetics, and because there isn’t really a specific (or flashy) hard, algorithmic problem like there is in structural biology. Rather, there’s lots of stuff where I expect to see steady incremental improvement. In fact, in Wikipedia’s list of unsolved biological problems [1] there isn’t a single one that I’d characterise specifically as a question from the field of genetics (as a geneticist, that’s slightly depressing).

But my question was even more innocent than that: I’m not even that sceptical, I’m just not aware of anything and genuinely wanted an answer. And the paper you’ve posted might provide just that, so go and do my research now.

[1] https://en.wikipedia.org/wiki/List_of_unsolved_problems_in_b...

jcims · on Nov 30, 2020

Not being in the field, I would term what I see in this story as a ‘bottom up’ approach to understanding genetics/molecular biology. More akin to applied sciences than medicine or health. This, for example, seems to be very important but it still leaves us with a jello jigsaw puzzle with 200 million pieces and probably far removed from immediate utility in health outcomes.

Then there’s the more clinically oriented approaches of looking at effects, trying to find associated genes/mutations whatever mechanisms exist in between to cause a desirable or undesirable outcome. I’d call that ‘top down’.

I’m sure the lines get blurred more every day, but is there a meaningful distinction into these and/or more categories that are working the problem from both ends? If so, are there associated terms of art for them?