Hacker News new | past | comments | ask | show | jobs | submit login
DNA seen through the eyes of a coder (berthub.eu)
301 points by dunefox on Dec 22, 2021 | hide | past | favorite | 112 comments



I find the coding analogy for DNA or cell biology in general to be rather problematic. It's fine as a very rough and oversimplified analogy, but if you try to apply it to the details I think it hurts more than it helps.

And important aspect of biology is that it is really much more analog than digital in many areas. Many interactions are not entirely on/off, and they are influenced by a lot of other factors in the environment of the cell. This is a part that really doesn't match well to how people understand coding. The digital aspects are also inherently coupled to their physical representations. That are not just four letters A, C, T, G in the DNA, each of them also has different chemical and physical properties.

Some analogies to coding concepts work quite well, but you really have to be aware of the limitations. And I often see computer scientists trying to use these analogies to understand certain parts of biology, but inevitably they will move out of the bounds of where the anlogy works unless they actually know the underlying biology.


As a professional molecular biologist who now does programming for a living, I wonder if the metaphor could be turned on its head:

Imagine that every one of people who designed computers, from the chip makers, to the ones making the instruction set, to the firmware coders etc. had no concept of "abstraction", adherence to standards or even a logical, coherent design. Would computers be more like cells then?

I could imagine a CPU which does not have a clock to keep everything syncronized - where the die is not separated into logical areas of decoders ALUs etc but just a jumped mess of wires and logic gates that somehow works to do computation. There is no "assembly language" for this CPU, but give it the right binary inputs at the right time and it will probably do what you ask it to (depending on its complicated internal state).


>a [jumbled] mess of wires and logic gates that somehow works to do computation

You could create this situation explicitly using FPGAs.

>no concept of "abstraction", adherence to standards or even a logical, coherent design

There's no concept of abstraction, but there is remarkable consistency in the genetic code for all that. The fact that these tiny machines are understandable at all is remarkable. Our biosphere is effectively a single process with an uptime of ~4 billion years, including the ability to recover from meteor strikes that killed off 99% of species, multiple times (albeit with different species!)


None of the mass extinctions killed 99%. The worst one, the Permian-Triassic extinction event, killed 83% of genera.

https://en.wikipedia.org/wiki/Permian-Triassic_extinction_ev...


Wait until 2100. We’ll have 99% of species gone, and we don’t even have to change what we’re doing.


Can you provide a source for that assertion? It sounds unlikely to be true, but I'd like to know if it's true.


Source is my depression, sense of powerlessness about the future, and listening to too much Bernie talking about what the 1% is going to do to the 99%.

Logically, I know this can’t be worse than the K-T boundary, where a whole continent of forests simultaneously caught fire, and a hypercane formed over the Yucatán. On the first day. But it sure feels that bad. We live in a world people flip out because one wolf was reintroduced to a broken ecosystem yet no one gives a shit about the million people a year who die from fossil fuel air pollution. That one’s real: https://arstechnica.com/science/2021/12/fossil-fuel-combusti...


Humans today enjoy higher standards of living, increased access to food, water, medication, and shelter than any other time period in human history.

There can high levels of inequality, yet this can still be true.


Indeed asynchronous digital systems, such as asynchonous CPUs or "wavefront arrays" [1], have been experimented with in the past. Looks like they never picked up steam, though.

[0] https://en.wikipedia.org/wiki/Asynchronous_circuit#Asynchron...

[1] https://en.wikipedia.org/wiki/Systolic_array


Asynchronous circuits are very hard to design and prove correct unless they are very small, though there are abstractions that work for specific cases. Debugging race conditions and metastability problems, or proving that they can't happen, is no fun. But there's massive support for synchronous logic design, it is a well-understood discipline.


Genetic algorithms tend to design circuits that have some elements in common with what you describe.


It's my annual opportunity to trot out one of my favorite articles ever:

https://www.damninteresting.com/on-the-origin-of-circuits/


Thank you for reposting this, it is still after so many years one of the most interesting results from computing that I am aware of the way that something that looks entirely alien to my electronics designer eye actually works is pure magic.


> Imagine that every one of people who designed computers, from the chip makers, to the ones making the instruction set, to the firmware coders etc. had no concept of "abstraction", adherence to standards or even a logical, coherent design. Would computers be more like cells then?

Honestly that sounds like basically what we got. The "design coherence" on the scale of the entire industry has been pretty low for decades.


Not sure what you mean here. What would high "design coherence" on the scale of the entire industry look like?

I am sure things could be more optimal somehow, but the industry generally operates on a principle of interfaces - and it's true, the higher level you go (all the way up to the people using the technology) the messier it gets, but it certainly doesn't feel like an organism with no concept of "abstraction" as the parent describes.


> Not sure what you mean here. What would high "design coherence" on the scale of the entire industry look like?

Like everyone uses NixOS/Nixpkgs and also it's more ergonomic.

> But the industry generally operates on a principle of interfaces

Aha! Actually I would say it does not, and there in is the problem. Most interfaces only have a few implementations, and as much as I love FOSS it has made that problem worse (in the context of the economy we've got). The interfaces we do have tend to become monotonically more complex over time as there is no systematic culling, and the result is we increasingly drown in accidental complexity.

Basically, accumulated implementations are viewed as big scary problems were maintaince to simplify is a cost no one wants to bare.

A lot of this is the same critique as http://langsec.org/, generalized to areas other than security.


The big difference between computers and living organisms is that computers are designed by humans and meant to be understood by humans. Those that design computers must understand them, and so do those that write code for them, those that write compilers, those that write operating systems, and so on. None of this applies to a living being or any of its parts.

I think your thought experiment makes the mistake that such an abstraction-less computer would still be designed and built by humans. That won't work, because no human can understand the whole of it all at once, hence the need for abstractions. A computer "designed" by an incredibly complex computer program could push that barrier but probably not break it. If you really want an analogy to actual life, such a computer would have to be the outcome of some evolution-like process. And no, not what people like to call "evolutionary process" or something like that today, but actual evolution that mirrors biology -- and quickly becomes too complex to understand by humans.


This horrifies me... This is why I'm not a biologist.


There is abstraction though. It's just fuzzy.

3 DNA base pairs abstract to an amino acid which has chemical properties. It's just theres some redundancy which 3 base pairs get converted to an amino acid. And if there's a single mistake somewhere with just one amino acid, the structure of the thousand amino acid long protein probably has some fault tolerance there.

Those amino acids are arranged in a way in the DNA code to make bigger motifs, alpha helices, beta sheets, active sites, coordinating complexes with inorganic ions.

It's all in the DNA code, though I suspect this train of thoughts is influenced by epigenetics. Parts of the DNA code also responsible for the regulation of how much the protein itself is expressed. That can still abstract to conditional and loop structures.

Yes some of the base pairs strongly bind together in the double helice which can influence stability of the DNA strand. The GC pairing has 3 hydrogen bonds, where AT has 2, but I think this influences the distribution of junk DNA, as you still need codons to make proteins which is the central dogma of biology.


There are lots of problems with this approach, even though of course it is also true on some rather important level.

The first level of problems is that what you're describing applies to only a tiny percentage of the DNA in a cell, and even for that DNA, it may overlap/exist in parallel with completely different "codings" that are more to do with structure, packing and replication machinery than protein coding.


> This is why I'm not a biologist.

I don't mean to distress you, but you are a biological computer.

Just rejoice that you needn't read the release notes.


Except we need the release notes! And there aren't any.


Not that most apps even give good release notes. If we did have them you could expect:

“Thanks for being human. This update changes your DNA and may contain bug fixes and performance improvements. Keep on humaning!”


Genesis 1 is about as detailed as many of the release notes I see. And about as accurate.


The good news is that humans are self-documenting.


The documentation for the human body is much better than anything Google ever published, that's for sure


the definition of "computer" in "you are a biological computer" has been the subject of intense philosophical and scientific discussion for more than 70 years, and at this point I would say the best summary of the state of that discussion is "well, maybe but also maybe not".


If we get to the point where design is done by algorithm and no one needs to understand why it works in detail, them I would fully expect that to happen. We're not there and I think that has been good. We have "evolved" computer hardware much faster than I would expect otherwise and have seen several very evolutionarily-challenging discontinuous jumps in design.


// had no concept of "abstraction", adherence to standards or even a logical, coherent design.

I see your point. But I would like to make a counterargument.

If you consider the time scale with which biological systems have to work, you probably won't call their design illogical or incoherent. Computer programs have been around for not even a century. Billion years is a whole different ballgame. Can anyone even imagine writing a computer program with any guarantee that it would be readable by someone 1000 years from now? We have a tough time figuring out what our our own code does a few months after writing them.

And wrt fault tolerance, resilience, I don't know, the design language of nature definitely seems far more evolved..


There are a lot of classical examples of the incoherence of biological systems -- a very common one is some nerves on the neck of giraffes[0]. On this analogy, the most important part is that you're never, ever, in the millions of generations of DNA, allowed to do a rewrite. Only small, incremental changes, each one of them giving the organism more fitness than before. No abstraction can survive this constraint -- a legacy codebase several decades old that's never seen a rewrite is already an unintelligible mess. I think the code analogy holds here.

[0] https://bioone.org/journals/acta-palaeontologica-polonica/vo...


I've seen a few codebases that are multiple decades old without rewrites. They are understandable and intelligible. Linux is 30 years old as an open source example.


Your retinas are mounted back to front, and as a result will fall off if they're knocked too hard.

Nature works on a "just good enough" principle, in general.


I think you go too far. I think it's easy for you to see where the similarities between human computing and biology fail because you're so steeped in the world of coding. But imagine if you had no concept of information theory - imagine if the best mental model you had was that life was a very complicated physical process. The fact that we can literally email viruses would be very surprising! T, A, G, and C are different molecules, but they're representations. And the mechanism of viruses would also be a total mystery - how can such a simple physical object make copies of itself?

Life is, at its core, an informational phenomenon. It's not a coincidence, or a "rough and oversimplified analogy", that we have computer viruses to mirror biological viruses - they are both consequences of the same underlying principles.


Plenty of "most of the time" on the way from DNA to proteins, and chances are that "code" developed through eons of diceroll trial+error will end up relying on ambiguities more than being harmed by them.

But doesn't the software mindset, again, have better mental models for things like that than most other disciplines? Think "relying on undefined behavior", or that old question of bug or feature. Sure, there are many pitfalls and possibilities for wrong conclusions to be drawn, but I'd dare to say that the "coding mind" is better equipped to become aware of them than others and not worse.

edit, an example: we can imagine a piece of code that works both when compiled as java and when interpreted as groovy, but has a different outcome depending on what language it is. We might even have written some for entertainment (reminds me of that ambiguous-on-Apple PNG that made the rounds a few days ago). And we could easily imagine a system that would collapse unless it occasionally gets both, the "java" outcome and the "groovy" outcome. We certainly wouldn't deliberately build a system like that, but we could easily do if there was, say, an obfuscated gradle contest. And "deliberate" is out of scope anyways. Enter eons of dicerolls, enter a system that mostly controls itself via balancing (as we know full well from high level pharmaceutical science). Yeah, it's unpredictably complex, and the coding mind should be the last to jump from "GATC is essentially digital" to underestimating that complexity and unpredictability.

edit2: I think the core of what I wanted to say (but never even tried) is that someone not used to code will likely try to wrap their head around the "ifs" and "comments" and be so occupied by that that task that the result will be a mental model of predictability that won't map well to observed reality, whereas the coder, with years of "the daily wtf", won't for a moment see any contradiction between observed reality and "lots of kind of code-like things"


There are plenty of critical pieces of software that rely on the fact that they produce different behavior when interpreted as a different language.


I don't think "coding" is an analogy. It's pretty clear that DNA is a kind of code. It's also fairly obviously Turing complete. It's rather more than Turing complete in fact. It encodes for life, after all!

I think it does make sense to deal with DNA from an information science perspective. It might not be sufficient, there might be higher levels of emergence to be taken into account.

But I think that without having a bit of an understanding of information science, it's really really hard to understand anything about how DNA works at all.

That said, the article actually seems to be pretty good starting point for programmers who want to get an initial intro to molecular genetics, and the author doesn't take themself TOO seriously.


I used to be a network engineer but since becoming disabled with a familial genetic disorder 20 years ago (Autism, Bipolar Disorder, Hyperlipidemia) I decided to get my genetics and study its implication in my disorder.

I think the coding analogy is more quantum computing than anything we currently know, which is why the binary analogy seems to fail. It is computing but on a whole 'notha level. Quibit instead of Binary, or maybe something we are yet to understand.

The chief complaint I have about that article is that it does not even get into the fact that we actually have two genomes that are symbiotic; Our Nuclear Genome and out Mitochondrial Genome. Glad he did touch on the epigenetics however.

I do know that understanding your genetics at the level that I have can help you manage any unknown chronic condition. It turns out my mitochondrial genetics were much more important for me (I am pretty sure I have a Complex I deficiency, thought really a trait) and that a sort of mismatch with my Nuclear Genome (mostly Calcium, Potassium, and Sodium signalling) makes me very sensitive to my environment through oxidative stress signalling.

If you can see this complex dance you will appreciate just how wonderful and complex of a machine we are that we will never fully understand. In the end my cure is more art than science. More feeling than data.

Some examples that might bother you: - Long Chain Omega 3 PUFAs are crucial to my health and I cannot eat short chain PUFAs for any extended period. This is due o my FADS1 and FADS2 genetics

- I am Electro-Hypersensitive. Exposure to household low frequency EMFs from wiring increases oxidative stress ion my body and causes me pain, insomnia, and mood disturbance. This is mainly from how EMFs trigger the Volatage Gated Ion Channels.

- Exposure to chemicals, which includes many medicines, triggers an immune reaction (drug induced Lupus) from the oxidative stress which is created through their metabolism by CYP450 enzymes.

I am able to remediate these issues by diet and environment changes and mostly through the supplements; Asocrbic Acid, Manganese, Zinc/Copper and P5P.


> household low frequency EMFs from wiring increases oxidative stress ion my body and causes me pain, insomnia, and mood disturbance. This is mainly from how EMFs trigger the Volatage Gated Ion Channels

Wow, I've never heard that before - do you have any good papers you can link to so I can read some more?


I am interested as well. They appear to be spouting off pseudoscience that sounds much better than the average person but I am open to seeing evidence otherwise.


Martin Pali is at the forefront of this but, you know, this is scary to a lot of people for some reason.

The most studied ion channels are the Voltage Gated Calcium Ion Channels

https://onlinelibrary.wiley.com/doi/full/10.1111/jcmm.12088

https://www.spandidos-publications.com/ijo/59/5/92

https://www.sciencedirect.com/science/article/abs/pii/S01434...

https://emmind.net/openpapers_repos/Applied_Fields-Hazads/Va...


I have linked some papers in a response to another comment, but here is a youtube video:

https://www.youtube.com/watch?v=0RIskTMLV40

I feel the research is lacking on this because there is a large genetic component. I do not feel LF-EMFs are a danger to everyone, but certain people, like me, with gene changes in CACNA1C, might be more affected by LF-EMFs. It is no coincidence that CACNA1C is also linked to pretty much all the mood disorders.

And I am in no way saying that LF-EMFs are the cause of all mood disorders.


What practical steps did you take to go about investigating your own genome like this?


It sure sounds like they spent a lot of time here: https://www.naturalnews.com/


No, actually not. I hate that website with all my being, and I hate all the naturopaths that spout nonsense about nutigenomics.

I sat in on classes at UNC Chapel Hill and I only use peer reviews studies. If you want to challenge my knowledge please do so. I have spent 20 years doing this.

I have a livelong disability and it runs in my family. NO one knows the cause but I am pretty sure it is mitochondrial and is largely affecting oxidative stress.

https://pubmed.ncbi.nlm.nih.gov/20833242/


It is hard and it takes a long time. And I would not recommend it if you are not suffering from any chronic illnesses. Most people's disorders are due to diet/environment.

A good place to start is uploading any data you have into https://promethease.com/. I would ignore most of the "good" and "bad" results, I only used it to find rare SNPs that might be linked to my disorder.

The best thing you can do is get your whole genome run. You can do it now for about $200.


So how much can I find out with my 700k rows of SNPs? Any resources you recommend?


If you have your genome from 23andme; https://promethease.com/

That is just a start, it is useful but incomplete. I usually end up searching for SNPs in my text file based on research I read. I made a bash program which I can use to run a bunch of SNPS of a specific gene.

Oh, also:

https://www.snp-nexus.org/v4/

https://www.uniprot.org/

https://www.ncbi.nlm.nih.gov/snp/

https://smpdb.ca/view/SMP0000017

http://slc.bioparadigms.org/


> Many interactions are not entirely on/off, and they are influenced by a lot of other factors in the environment of the cell. This is a part that really doesn't match well to how people understand coding.

Eh, I think you are correct re: being a rough and oversimplified analogy, but those same assumptions of how digital tech works break down when doing digital tech at large scale as well (FAANG).

A well-known example + cases I've seen repeat over the past few years include: IO Latency variations because of noise (someone yelling at a disk array), bit flips on the network just right to evade TCP error checking, ICMP packets bit flipped just right so they crash network control plane software. ASICs with die defects such that the brick if you try to load a specific set of subnets into TCAM n times, CPUs that wear out such that specific instructions return erroneous results after n years of use, and so on.

Basically, with warehouse-scale computing you encounter the consequences of very unlikely events every day, and many unlikely events happen at the interface of digital to analog (which is defined by... different chemical and physical properties).

At some point you accept that the details will always be fuzzy, but if the model works well enough most of the time, it's good enough for most practitioners. You just need to socialize that there are exceptions and make sure people know where to turn when something doesn't behave as expected.


Totally agree. If I had a dime for each time I saw people make bad oversimplifications based on "DNA as blueprint" or "DNA as code" I'd not be rich, but I would be able to buy myself a nice dinner, at the very least.


The digital aspects are also inherently coupled to their physical representations. That are not just four letters A, C, T, G in the DNA, each of them also has different chemical and physical properties.

Weak sectors on optical media and the classic https://en.wikipedia.org/wiki/Lace_card are similar analogies in computing.


Yes DNA is more like a lego assembly language.


We may see the code but we have no interpreter.

Also it seems part of the bootstrapping lies in the female womb hosting the foetus.


The biggest issue I have with these analogies is that cells and organisms proliferate. It would be like a computer copying both its hardware and software to create new computers. This behavior is central to understating biology and absent in computer science


>I find the coding analogy for DNA or cell biology in general to be rather problematic. It's fine as a very rough and oversimplified analogy, but if you try to apply it to the details I think it hurts more than it helps. [...] , but inevitably they will move out of the bounds of where the anlogy works unless they actually know the underlying biology.

I think the author's "digital" explanation for DNA is more helpful than problematic because we can just understand where that knowledge can be appropriately applied. Focusing on the higher abstraction of DNA as "information" or "data" is important in bioinformatics: https://en.wikipedia.org/wiki/Bioinformatics

And analyzing DNA as "information" and not just analog chemistry is also important in CRISPR gene editing. Yes, the engineers creating machines used CRISPR editing need to know more "analog" aspects of biology but other scientists can focus on the "digital data" perspective.

I think a similar digital-vs-analog split happens in computers. Consider the following different physical (analog) realizations of computation devices:

- wood gates (https://www.youtube.com/watch?v=GcDshWmhF4A)

- vacuum tubes

- silicon wafers of transistors

- optical gates using light

Can we extract a useful "information data" perspective that transcends all of those vastly different devices? Yes, we call it "computer science". We're discovering that many "computer software" ideas also apply to DNA. Scientists can look at the human genome as 750 megabytes of code and try to "reverse engineer" it. It's like forensic hackers using IDAPro/Ghidra to reverse engineer a malware binary to understand how it works.

But that doesn't mean the "analog" can be ignored. Compsci telling us about an algorithm being O(1) vs O(log n) doesn't reveal that vacuum tubes give off more heat and fail more often than silicon transistors. Or the thinness of a wire trace on a circuit board means that deploying a computer in space orbit makes it susceptible to cosmic rays flipping a random bit causing weird errors. All those analog difficulties in the real world doesn't mean that looking at computers via lens of Computer Science is problematic. It's just a different perspective that lets us make progress in that specific area. Every knowledge specialization has inherent limitations.

(I notice that my comment supplements dTal's comment: https://news.ycombinator.com/item?id=29649339)


> I think a similar digital-vs-analog split happens in computers.

Does it? In computers, digital computation is the ideal that we try to mimic via analog circuits (or vacuum tubes, relays, or whatever). We are actually glad that we can chain two CMOS inverters and the result is that anything close to Vcc becomes Vcc, and anything close to GND becomes GND.

As far as I understand, in biology "improving signal quality" like that actually disturbs the system because you are removing an "error" that actually has a useful function somewhere else.

Or for a software analogy: Consider a program that breaks memory safety, but you can't fix it because some other part relies on that "bug". Now consider a program that doesn't just contain lots of weird quirks like that, but has grown around them and makes them an integral part.

At least that is how I imagine chemistry inside a cell as a layman.


So as you might intuit, having everything work totally differently doesn't scale so well. So a lot of the chemistry inside cells is strongly standardized. Ok, so the nice thing about standards is that there's a bunch to choose from, and sure there's a lot of times when cells cheat just a smidge (or two, or three).

But... end of day, if you squint just a little, most everything is done with a standard set of amino acids that chain together in standard ways. In some AA's The Amino and Acid groups used to daisy-chain them together dwarf the actual active part! And data is stored using a standard set of 4 ( ok 5, ok... exceptions exist) bases.

(Just to be sure, eg. Hemoglobin does suddenly have an Iron atom in there, but it's still _mostly_ made of standard "lego blocks" == amino acids)


I think you're right that it doesnt fit perfectly but damn still, it's a self reproducing structure interpreted by a complex hardware to transform inputs into outputs.

If we cant use programming metaphors to understand how to patch it, at least maybe to explain exactly what we are fundamentally this is interesting no?

We are software running on a hardware (the environment, the interpretation precursors in the womb on startup, the rest of the machinery ALSO created by the software), are we not ? I feel it's cool we can encode some useful messages as RNA for vaccines with headers, commands, checksums even if ofc it's not exactly like electronics.


>DNA is not like C source but more like byte-compiled code for a virtual machine called ‘the nucleus’.

I think a comparison to source code would also have merit.

The DNA gets transcribed into mRNA in the nucleus, these are transported into the cytoplasm, where the Ribosomes then translate that information into Polypeptide chains, which are then folded by Chaperones and the Machinery of the Endoplasmic Reticulum. The end result are Proteins, the biochemical "actors" and building blocks. These steps are comparable to compiling source code into opcode.

Also, the DNA itself is just half the story.

Which parts of the DNA can be transcribed, and how often, is encoded in the histone modifications (the histones are "packaging molecules" in the nucleus around which the DNA is partially "wound up") and the Methylation modifications to it. This is known as the epigenetic code, and its just as vital for the biochemical machinery to function as the DNA itself;

eg. It would be very very bad if, say, muscle cells suddenly started behaving like liver cells. Since both have the same DNA (in one organism), it's the epigenetic code that controls which parts of the DNA they "express", and thus what functional identity the cells have.


> It would be very very bad if, say, muscle cells suddenly started behaving like liver cells.

There are people that would like to sacrifice a few muscle cells for some liver cells (or any other organ). Recreating such specialized cells would be a big step towards organ regrowth, would it not?


This is the idea behind stem cell therapy. You can reset specialized cells back to their "possibilities are endless" stem cell state and then attempt to coerce them to form the type of cells you want. Search for "induced pluripotent stem cells" if this sort of thing interests you.


This is actually an area of research, "releasing" the locks of the epigenetic code and getting differentiated somatic cells to behave like pluripotent stem cells again, in order to use them to recreate damaged tissue.

However, while that would be fine in a test.tube, it would be very bad if it happened inside the bodys existing differentiated tissue.


Machine code also gets decoded and is "reprocessed" by microcode in a modern CPU which then decides how to translate the code into what actually runs in the CPU.


The word 'Ribosome' does not occur anywhere in the article, which I definitely would expect to be there if you're going to make a coding analogy.

https://en.wikipedia.org/wiki/Ribosome

For a computing analogy: a Ribosome serves as a molecular assembler (a processor) that depending on the instructions (the program) encoded in DNA (or rather RNA by the time the ribosome is working with it) pulls this or that molecule from the soup present in a cell to attach it to its working copy of a polypeptide chain (the output). Once complete (as signalled by a 'stop' codon, think of it as 'ETX') the thing detaches the product.

The Ribosome consists of a main engine and a 'cap' that can attach over the top of an RNA string so that it can start transcription at other points ('STX') than just the ends.


https://en.wikipedia.org/wiki/Ribosomal_frameshift if you really want to bend your brain a little.


Yes, that reminds me of some of the assembly tricks we used to pull when space was really tight. Use this entry point and you have one set of instructions, use another and you get a completely different set. I've never managed to do this with more than a few instructions. But that gives quite a kick to get it to work and to know that there isn't a disassembler on the planet that will make sense of it :)


There is an analogy with data compression: if you remove a byte, in for example a zip file, and then try to expand it, it might work and give you a different result (it likely doesn't due to a checksum error). I think something similar was used in the SHA-1 collision attack (shattered.io) to generate different jpeg images that have the same checksum.


There's a very interesting hypothesis about the origin of life that puts a primitive ribosomal assembly of amino and nucleic acids at the center. The idea is that these amino chains and nucleic acid chains (protein and RNA) kept self-assembling until a version was created cabable of self-replication. This posits abiological generation of nucleic and amino acids and linked chains as well.

Once these primitive ribosomes were capable of reproducing themselves (presumably with a very high initial mutation rate), biological evolution began. Such ribosomes would have two jobs: reproduction of nucleic acid chains, and translation of some fraction of those nucleic acid chains into protein chains. Not all the amino acids would have been used at this early stage.

This is perhaps a more manageable analogy to coding than looking at something as complex as the human genome, which I think is a real error in the article (though of course humans are more interested in their own genome). It'd make more sense if the author had used one of the smaller independent bacterial genomes as the model.


I've been curious for a while whether attempts have been made to simulate exactly this sort of evolutionary bootstrapping. It would make the theory much more plausible if under certain simulated conditions we could see life emerge. But maybe that's too computationally intense to do effectively.


Agreed, if you want to explain something you should do so from the simplest example, not from one of the most complex examples.


I think looking at DNA as "source code" that either gets "compiled" into proteins or "commented out" really betrays the way DNA actually works... making it a poor analogy.

DNA inside of cells has a 3 dimensional shape within the cell nucleus... The so called "Junk DNA" mentioned in the article can greatly modify the 3D geometry of the DNA and thus insertions or deletions of junk DNA can have large biological consequences (by influencing, for instance, the distance between an enhancer and promoter).

This video provides a nice visual representation of what DNA actually looks like: https://www.youtube.com/watch?v=Pl44JjA--2k


The article covers that extensively.


It...doesn't? Sure, it gets into junk DNA, but there's only a brief mention that the genome has structure, and it's not examined in any depth. Notably, they never discuss enhancers or insulators, which I feel would really improve the article, even though I can't think of a good analogue (which might be even more interesting for the intended audience).


This whole section relates to that...maybe you can suggest an addition to the author?

There are lots of possible explanations for the massive amount of non-coding DNA - one of the most appealing (to a coder) has to do with ‘folding propensity’. DNA needs to be stored in a highly coiled form, but not all DNA codes lend themselves well to this.

This may remind you of RLL or MFM coding. On a hard disk, a bit is encoded by a polarity transition or the lack thereof. A naive encoding would encode a 0 as ‘no transition’ and 1 as ‘a transition’.

Encoding 000000 is easy - just keep the magnetic phase unchanged for a few micrometers. However, when decoding, uncertainty creeps in - how many micrometers did we read? Does this correspond to 6 zeroes or 5? To prevent this problem, data is treated such that these long stretches of no transitions do not occur.

If we see ‘no transition,no transition,transition,transition’ on disk, we can be sure that this corresponds to ‘0011’ - it is exceedingly unlikely that our reading process is so imprecise that this might correspond to ‘00011’ or ‘00111’. So we need to insert spacers so as to prevent too little transitions. This is called ‘Run Length Limiting’ on magnetic media.

The thing to note is that sometimes, transitions need to be inserted to make sure that the data can be stored reliably. Introns may do much the same thing by making sure that the resulting code can be coiled properly.

However, this area of molecular biology is a minefield! Huge diatribes rage about variants with exciting names like ‘introns early’ or ‘introns late’, and massive words like ‘folding propensity’ and ‘stem-loop potential’. I think it best to let this discussion rage on a bit.

2013 Update: ten years on, the debate still hasn’t settled! It is very clear that ‘Junk DNA’ is a misnomer, but as to its immediate function, there is no consensus. Check out Fighting about ENCODE and junk for a discussion of where we stand.

2021 Update: eighteen years on, the debate is nowhere close to being settled. It is now somewhat consensual that ‘Junk DNA’ has important and diverse functions, but new discoveries are being made on a daily basis. https://www.advancedsciencenews.com/that-junk-dna-is-full-of...


I think you're grossly underestimating how large of a hole this really is.

The author spends entire sections on things that are completely unimportant (e.g DNA error correction)... while leaving most of epigenetics and regulatory genomics completely out of the picture.


If you think spaghetti code is bad . . . DNA is basically spaghetti code, written in actual spaghetti.

(I'm a biologist; the more you study this, the more confusing it gets.)


I’ve always loved this description of how DNA and programming differ:

“ Do genes behave like lines of computer code? Our April puzzle discussed ways in which genes hold true to this analogy: They have control structures commonplace in computer programs, such as “if-then” logic, “do loops,” timing routines and cascading “subroutine calls.” We also listed some ways that DNA programs differ from ordinary computer programs: Genes program three-dimensional structures from scratch in a water-based medium, using massive parallelism and swarm programming while making use of, and being constrained by, the laws of physics and chemistry.”

https://www.quantamagazine.org/the-dna-computer-program-puzz...


Some amazing animations that show some of the primary operations related to DNA:

Transcription (copying sections of DNA into RNA) - https://www.youtube.com/watch?v=SMtWvDbfHLo

Translation (turning messenger RNA into proteins) - https://www.youtube.com/watch?v=TfYf_rPWUdY

RNA Splicing (removing non-coding sections of mRNA) - https://www.youtube.com/watch?v=aVgwr0QpYNE

DNA Wrapping & Replication(two part video) - https://www.youtube.com/watch?v=OjPcT1uUZiE

One thing I would encourage folks to do as you listen to the narration of these is to consider any time words are used that imply agency and remember that these are basically blobs of magnets in a jostling goo.

You'll find the names Drew Berry and WEHI referenced in most of these. There are many more stunning and IMHO profoundly informative examples of biological machinery from them on youtube.


Life is just test-driven development, taken to an extreme.


Psh, biology tests in production. If it works well, others will fork it.


What are the tests?


>What are the tests?

Tests == adaptation to the environment

E.g. short 2 minute video showing how bacteria evolves with random mutations to produce resistance to antibiotics:

https://www.youtube.com/watch?v=plVk4NVIUh8

Likewise at a macro scale, if Earth's atmosphere becomes inhospitable because of global nuclear war or a big asteroid collision, then the new "test for life" is adapting to the radioactive dust cloud. Keep mutating across generations until survivors can thrive.


If you consider the intended purpose if the software to be "tests" then all development is test-driven development. But that is not the usual meaning.


Can it survive long enough to produce offspring.


Bad test ;). The offspring needs to be fertile.


This question for example or someone getting covid. All who fortunately have error (mutation) may pass them.


The question you just asked is it testing.


You are, for example.


The author gave a talk at SHA2017 about DNA: https://media.ccc.de/v/SHA2017-31-dna_the_code_of_life


I would say DNA is more like a neural net than a neural net is like an actual brain.

The complex interdependencies, redundancies, nth order optimizations accrued over a long time, conditional relevance of different inputs, etc.

Code implies a possibility to understand the logic and write it out. The reality is we're going to be something much closer to trying to unpack a blackbox ML model with SHAP.


I do not like to be negative, but perhaps this book could be titled -- DNA seen through the eyes of a coder who does not know any biology.

The genome is not a computer program with lots of #ifdef's. It is a data store, much of which seems to be evolutionary left-overs, and a tiny fraction of which encodes the molecular machines that allow it to function. Depending on how you define decoding, perhaps less than 10% of the enzymes encoded by the genome (which are encoded by less than 0.2% of the genome) are involved in processing the genome data (DNA polymerase, RNA polymerase, ribosomal proteins, topoisomerase, etc). A larger fraction of enzymes are used to provide the energy and raw materials to keep the cell going, with an additional set of enzymes used for cell signaling. And much of the rest of the proteins in the cell (again, remembering that less than 2% of the human genome is protein coding) are used for structural purposes.

The genome has a large amount of potential information. In higher organisms and higher plants, a very small fraction (2-5%) of that data is evolutionarily conserved to the extent that we think we know what it does (many of us think the rest is literally evolutionary junk). The parts we understand encode molecular machines, but it takes a very broad definition of "code" to imagine that data is a program.


> It is a data store

problem is, it's so, so so much more than a data store. There's no reason at this point, as we slowly come to understand more and more about DNA packing, structure, transcription control etc., to conclude that the non-coding parts are "junk".


This. In addition to this, the complexity that each new biological system that's discovered adds to the entropy of the entire system. The classic mantra has been to explain away complexity with "evolution of the gaps" but this article is showing how this isn't even close to sufficient anymore.



There's also "can a neuroscientist understand a microprocessor?"


Only tangentially related, but I’ve been reading and enjoying Immune by Philipp Dettmer (founder of the YouTube channel Kurzgesagt). Would recommend it to anyone that likes to read about the fascinatingly complex processes the human body uses to keep itself ticking.


> Perhaps it is as impossible to predict if a program will ever finish as it is to create a functional genome that cannot get cancer?

Quite interesting. Though the hard part about the Halting problem is its infinite time scope. With a life form’s genes you only need to compute ~100 years max.

> It appears as if H3 and H4 were authored very carefully as they do have a lot of ‘synonymous changes’, which through the clever techniques described above do not lead to changes in the output.

Or they’ve just been around long enough to have experienced every change imaginable, and only those ones that had no effect produced viable life?


Little constructors (ribosome) that translate arbitrary character strings (DNA) into three dimensional structures (proteins) of base material, that, if happen to bump into each other at the right time, the right place and the right order in the right environment, tend to repeat the game?

Well, reviewing some code bases, I understand where the analogy comes from. Surely both phenomena often do not imply an intelligent design...


https://en.wikipedia.org/wiki/Geometric_group_theory

and

https://en.wikipedia.org/wiki/Word_problem_for_groups

I'd claim are involved but outside the scope of the straight DNA is protein machine language analogy.


DNA is a lot more than the analogy of computer code. A cell is also a 3D printer that runs on DNA, its a factory that manages it's inventory with DNA instructions, and much more.

Sweeping complex bio systems under the rug as evolution is unfair and quite frankly ideological. The sheer amount of complexity that's unaccounted for and the fact that it's constantly increasing is truly phenomenal.


"... there is no source, the bytecode has multiple reentrent abstractions, is unstable and has a very low signal to noise ratio, the runtime is unbootstrappable, the execution is nondeterministic, it tries to randomly integrate and execute code from other computers... multiple reentrant and self-modifying abstractions. absolutely everything has subtle side effects."


Genes are functions. Read this book, it's factually deeper than it appears https://www.amazon.com/gp/product/0190234768/ref=ppx_yo_dt_b...


Another interesting article of his (that you may have already seen): https://berthub.eu/articles/posts/reverse-engineering-source...


The really cool thing about DNA is it encodes information and does compute.

DNA structures can be designed to solve problems like the traveling salesman problem.

So it's a self-computing stochastic machine. How cool is that! :)

Can't pull up sources atm but google "DNA computing" & was a biophysicist :)


Biological computers may be messy but they've got a First Mover Advantage of 5 billion years.


Correct me if I'm wrong, but the assumption that there are big, "commented-out" sections of DNA has recently been challenged? I read somewhere (likely here) that those sections were somehow involved in folding proteins or DNA strands.


This made me stop reading:

> In such a way, the competing interests of the father (‘large strong children’) and the mother (‘survive pregnancy’) are balanced.

Citation needed. This seems unnecessary antagonizing. I don't think there is an evolutionary interest of the father that the mother does not survive pregnancy. Without nursing babies had probably very low chances in the past and are still at high risk in many countries.

When thinking about this across multiple generations this makes even less sense. Since all those guys who pass large strong genes to their children will have no children at all because the mothers die during pregnancy or birth. Which is a high risk scenario for the unborn children.

Additionally the mother also has an interest in large strong children. They are her offspring too.

There is more a balance between the fetus taking everything and the mother trying not to decay too fast because of that. This is completely different to this.


I find it’s a lot less interesting to consider the mapping of DNA to code or blueprints, instead considering the process of evolution as a way of dealing with a legacy code base. This code is functional (just barely) but it’s fragile. Lots of cross-dependencies, much misuse of functions (with side-effects!), and no real way to do static analysis on the code base to figure out how it works.

The problem is, there’s always new requirements coming in! So how would you design this software so that it is robust to changes and doesn’t require a complete rewrite to add new functionality.

The superpower of this codebase is in composability of code, with loosely defined interfaces. In this way, all parts of the system can be incrementally improved without breaking existing functionality.


More of these articles. They don't spend time describing somebody's clothing, appearance or manners.

Yet it's "long form" writing at its finest.


Have seen the reverse: coders as seen through the eyes of an RNA scientist. It was not very flattering but unquestionably accurate.


To me, DNA is more like an esolang than any normal programming language.


Now it is a matter of time to implement Reed-Solomon on a daily basis … within Quantum Computing.


> DNA is not like C source but more like byte-compiled code for a virtual machine called ‘the nucleus’.

This "virtual machine" gives feedback in 40 years, perhaps 1000 years. I am all in for fast innovation. But we should not give a pass to jocks who just want to spread their seed around!




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: