The label “junk DNA” was one of the biggest mistakes in the history of genetics. A lot of high school textbooks still reference this term and it’s worse than misleading.
In many ways, non-coding DNA is just as important as the parts of the genome that code for proteins. Non-coding DNA determines expression levels, genome confirmation (shape), and replication efficiency among other things.
The term junk DNA misleads students into thinking that these sections of DNA play little part in how a cell functions. Quite the opposite, the “junk DNA” is responsible for orchestrating the “non-junk” bits.
The term junk DNA triggers a lot of confused discussion (on HN and everywhere else), and I suspect a part of that is our getting defensive about the idea of our DNA containing "junk". That term is just more loaded than saying something more benign like "non-functional".
But another part is the term is poorly defined, this article seems to use junk DNA to mean the until-recently unsequenced portions of our genome (and I think that's an unconventional usage), some comments here take it to mean non-protein coding, and another common use is for the term to mean non-functional.
If it helps, a defensible recent accounting is probably something like 1% of our genome being protein coding, perhaps 10% being functional in some way but not protein coding (e.g. regulatory, or transcribed to RNA that is functional etc), and the remaining 90% being without known function and likely non-functional.
After further years and much great painstaking work we'll perhaps learn that to a bit more is functional, though it may end up being say 11% vs 89% non-functional. And that's ok! I wouldn't worry progress being stunted by assumptions of too much of the genome being non-functional, rather the opposite, continuing to believe there is function where there is little evidence to warrant it.
disclaimer: not a geneticist, but sometimes write tools they might use.
I would only disagree with (1). I think a lot of the confusion comes from the fact that coding DNA is much easier to understand. Coding DNA essentially has one function, to code for proteins. Only the local sequence matters when considering coding sequences.
Non-coding DNA is much more difficult to understand in terms of function. It can act to regulate gene expression both through the local sequence (promoters, enhancers) and through long distance effects. For example, very distant promoters can be brought together within the nucleus and hence interact with each. These interactions are a result of both the local sequence and the sequence of the very distant interacting region. The interactions are also dependent on the concentrations of proteins/transcription factors that also interact with these sequences. We have no good way of modelling that kind of complexity.
In defense of #1, I've read
that retrotransposons can be irrelevant to the function of the cell. they survive random mutations through making more copies of themselves, so mutations will select for retrotransposons that A) increase their number of copies and B) don't harm the fitness of the host. they don't need to actually benefit the host to survive selection pressure. if they do, that's gravy.
I suppose they could also survive selection pressure by hurting the organism if they're not expressed. for instance, by capturing a vital protein-coding gene from the host, and arranging things so the transposon must stay active for the gene to be transcribed.
Yes, you are right. The genome is a battle field between human reproduction and viral sequence integration and reproduction in our “shared” genomes. We think of it as a human genome but it is hodgepodge of what we call “our” genome and half a million viral-derived sequence that also wants to replicate every once and a while.
Eukaryotic genome are the worst noodle code you have ever seen—but miraculously they help to make humans and other creatures, bugs, and plants with some consistency. That is a miracle to even the most hardened atheist!
Science has a long history of giving things arbitrary or whimsical names, and regretting it. Consider "real" and "imaginary" numbers. Or names that have different technical and popular definitions, such as "introversion."
Imaginary isn't exactly wrong though. Imaginary co-ordinate results in quantum mechanics correspond to non-physical observations - they "exist" but are unobserved until something squares them into reality.
Same with AC power transfer - the imaginary power doesn't drive the load, but it's quite real because I^2*R turns the term into a real power for resistive losses.
> Imaginary co-ordinate results in quantum mechanics correspond to non-physical observations - they "exist" but are unobserved until something squares them into reality.
This sounds wrong for three reasons:
1. You can multiply any vector of qubits representing a quantum state by any complex number and it's the same state.
2. Measurement is always with respect to a basis. A result can be measured with 0% probability in one basis, 50% in another, 100% in another.
3. It doesn't take into account interference, so it's missing the whole point of how quantum differs from regular probabilities.
No, way too generous a definition; one that posits a “selected” function for every nucleotide. Human effective population size is much too low to effectively clear the genome of junk.
Read Michael Lynch: The Origins of Genome Architecture.
Yes, and I do too and you are wrong—-it is not misleading at all—despite what ENCODE has claimed.
I suspect you have not had a recent course in population genetics.
Read Michael Lynch’s “The Origins of Genome Architecture” before you make dubious pronouncements. The human genome is a jungle of code and under rather weak selection due to small effective population size. There is a tremendous amount of cruft in all human genomes due to mobile element insertions and spread.
Cannot follow your comment. Junk as noun refers to rubbish/garbage and therefore means something to be worthless. Parent comment says that it isn't worthless. My questions are:
Are you saying this is wrong and so-called junk DNA is indeed worthless?
Why you say to disregard ENCODE claims and are the functions mentioned in parent even related to them?
Can you make a synopsis of what the book you're recommending talks about?
Also note the book predates ENCODE project results and being 15 year old it doesn't consider any newer developments either.
Question from the peanut gallery: if you were to flip a single bit in this junk DNA, are the outcomes only slightly different or could they be wildly variable depending on which bit was flipped?
Wildly variable. That’s true for all DNA including protein coding DNA. It’s much more likely that changing a base in a protein coding region will result in an effect, but still nowhere near guaranteed.
Statistically, the average human has ~1 base variant (SNP) in a protein coding region somewhere in their genome. In almost all people the effect of that SNP is no more apparent than the effect of SNPs in “junk DNA”.
what do you mean by "~1 SNP?" one de novo amino acid substitution? or inherited? we have millions of variants.
the redundancy of codons - several codons code for most amino acids, and one-base-pair changes are often synonymous or will code for similarly-charged residues - provides some protection.
on the other hand, if you have a mutation in, say, a non-coding intronic splice site, that can really fuck a protein up. I got unlucky there.
So much charming and well intentioned misinformation on this thread; as much as if I were to weigh in on the pros and cons of MariaBD versus PostgreSL.
Coding sequence is under significantly higher levels of active “purifying” selection than regulatory DNA, and much higher than de novo mutations in repetitive mobile elements that only rare perturb phenotypes.
Single point mutations in regulatory elements such as enhancers have been shown to change the expression levels of multiple genes. These elements are under just as much selective pressure as any coding region.
The label “junk DNA” was one of the biggest mistakes in the history of genetics. A lot of high school textbooks still reference this term and it’s worse than misleading.
In many ways, non-coding DNA is just as important as the parts of the genome that code for proteins. Non-coding DNA determines expression levels, genome confirmation (shape), and replication efficiency among other things.
The term junk DNA misleads students into thinking that these sections of DNA play little part in how a cell functions. Quite the opposite, the “junk DNA” is responsible for orchestrating the “non-junk” bits.