I get the skepticism. There have been a lot of surprising revelations in biology and I don't think anyone would argue we have every angle nailed down. However, the idea that some DNA is genuinely “junk” is based on more than a hunch. It’s from looking at patterns across species. If a sequence really mattered, then changing it should cause a problem. That would put pressure on the sequence to stay the same, generation after generation. Yet we see big stretches of DNA mutating freely, at rates that exactly match what would be expected from accumulation of random copying errors. That suggests these sequences aren’t under selection for any important function.
This isn’t just “we don’t know what it does, so it must be junk.” It’s more like, “We can’t find any sign that it matters, and everything we know about evolution says if it mattered, we’d see fewer random changes there.” Down the road we might uncover small roles for some of these regions, but at this point, calling them junk is just an honest read of the evidence we have.
The absence of clear selection pressure on certain RNA pairs doesn’t prove they lack function; many biological roles are subtle, context-dependent, or involve redundancy, making them difficult to detect with current methods. Freely mutating sequences could still influence genome architecture, gene regulation, or adaptation in ways not yet understood, as seen with elements like noncoding RNAs and transposable elements previously dismissed as “junk.” Additionally, these sequences may serve functions over long evolutionary or environmental time horizons, becoming critical under future conditions we cannot yet predict, underscoring the importance of not prematurely dismissing them.
I'm not suggesting that these sequences can "look forward" in time. However, consider that mutations are constantly occurring. These mutations shouldn't be dismissed as "junk" simply because they seem unnecessary now. In the future, they could become essential.
Over long evolutionary or environmental timeframes, these sequences may take on important functions, potentially becoming critical under conditions we can't currently foresee.
If such a mutation occurs, that sequence would no longer be junk. Until and unless it does happen, it's still junk. But it's silly to get hung up on the sequence, or on the word "junk", based on such a slim chance. What are you trying to prove here?
The weird thing is that some of these lncRNA don’t seem to be under super strong selection pressure, at least at the level of individual nucleotides. Their promoter regions are conserved, which indicates that the cell really does need to produce them, but it doesn’t seem to care much the actual sequence. Very strange.
Anyways there definitely are non-coding regions that just don’t do much and evolve neutrally. I’m hesitant to call them junk but only because that designation has burned biologists so many times.
I think we’re basically on the same page. As you note, a conserved promoter without strong sequence conservation elsewhere suggests functions that might be more structural or regulatory. Still, it’s also true that some (actually many) non-coding regions show no evidence of selection and appear to evolve neutrally.
To borrow an example: an onion likely doesn’t need 5x more DNA than a human, and a lungfish probably doesn’t need 30 times more than we do (and 350x more than a pufferfish). And yet, these enormous genomes exist. It’s very likely that portions of these sequences are what we’d call “junk,” i.e., DNA that doesn’t confer a meaningful functional advantage and can accumulate due to the relatively low cost of carrying it along.
If we want to avoid the term “junk,” we could say something like “areas of the genome for which we assign a very low prior probability of functional importance.” But “junk” is a concise shorthand to acknowledge that, while some non-coding sequences matter, there are also huge swaths of DNA in many eukaryotes that show no signs of being anything other than evolutionary baggage.
Great overview. Worth adding some population genetics: Multicellular organisms typically have small effective population sizes and reproduce slowly in comparison to bacteria. Selection has a hard time “getting a grip” on variants with very weak effects on fitness. Drift becomes much more important.
Bacteria have high population sizes. Selection can be quick and brutal. Low levels of “code of unknown function” in bacteria is perhaps related to replicative efficiency. Fast DNA replication is highly advantageous in nutrient-rich environments. No space (or time) for junk DNA.
Sometimes with lncrnas the structure is what is more important than sequence. You can have two lncrna with different sequence but the same kmer structure. This makes logical sense as while proteins often bind to specific sequence the reasons for that are merely structural. In protein you can also have conservative missense mutations that are tolerated as binding affinities may not have changed swapping out an amino acid residue for another with the same charge or polar properties.
> Yet we see big stretches of DNA mutating freely, at rates that exactly match what would be expected from accumulation of random copying errors.
But the rate of mutation is itself subject to selection. There isn't a base rate, just a setting that's different for different parts of the genome. Some parts have more copy errors than other parts. Some are hung out in the sun more often.
So you can conclude from the mutation rate of a particular stretch that it would probably be bad if it started mutating more, and that it would probably be bad if it started mutating less, but not that nothing's influencing the mutation rate.
This isn’t just “we don’t know what it does, so it must be junk.” It’s more like, “We can’t find any sign that it matters, and everything we know about evolution says if it mattered, we’d see fewer random changes there.” Down the road we might uncover small roles for some of these regions, but at this point, calling them junk is just an honest read of the evidence we have.