The most interesting thing here is the open-source bio startup aspect. This is a bold step, and I hope it prompts more startups to go this direction.
They are clearly at the early stage of tech dev. And the reward of $200k for the polymerase to allow longer reads seems pretty low to me... unless you're a grad student and the university doenst take your IP...
Honestly it's pretty hard to get lab access without somebody also getting rights to your IP, and labs are a bit more expensive than setting up an H100, so it might be tough going.
Even if no external party solves their polymerase problem I have a good feeling about this direction though. Godspeed.
$200k for a better polymerase was definitely way too low a decade ago, when PacBio had an enzymology group with a multi-million $ budget running crystallography at SLAC to build a better polymerase.
On the other hand, $200k might be enough today if someone finds a way to apply AlphaFold and friends to this problem.
On the other other hand, I'm not convinced a better polymerase will solve their problems. Photobleaching might be their biggest problem, and I don't know that the polymerase itself can do much to protect against that.
The most annoying part with the 200k polymerase prize is that they don't actually provide protocols for validating the polymerase works well. Me and some friends can definitely do the AI-protein design + DNA synthesis + protein synthesis (and would want to), but no way am I gonna figure out idiosyncratic sequencer protocols.
The lack of a validation protocol is also an issue for the another way for doing this — generationally mutate the polymerase and let evolution do the work for you. But, this only really works when you have an automated test to know which variations are working better than others. You really want to be able to automate the process to get many (many) generations to effectively explore the search space.
But, the contest isn’t really about length — it’s about polymerase preference for fluorescently labeled dNTPs vs unlabeled.
> 454 Bio have opened a competition with a prize of $200,000.00 to evolve a DNA polymerase optimized to use Lightning Terminators™ while not incorporating their hydroxymethyl cleavage products with a minimum 1000:1 ratio.
This seems to be an issue with their chosen “One Pot” reaction mixture. When you can’t wash away byproducts after each cycle, with this method, you get a build up of non-labeled molecules. And these are the ones that the polymerase prefers. So, of course they end up with single-digit read lengths… the polymerase is doing what it does best. This seems like figuring out this chemistry should have been fleshed out before trying to build the sequencer.
So, I guess it turns out that there is a validation endpoint, but still not a very convenient way to measure it…
Yeah, now that you mention it, that seems like a major issue. Of course unlabeled nucleotides will outcompete labeled ones, they're smaller so they get less steric hindrance. And I don't understand what can be done about it at the polymerase binding site. Maybe some induced proximity chemistry?
Right? The unlabeled nucleotide will still fit in a binding pocket for the labeled version. And it should always have better binding.
But this is an obvious issue. With this type of photo-reaction, you’ll get a build up of unlabeled nucleotides. There is still some potential for the technology, but it’s really hard for me to see how this works in one tube without some kind of washing (which also increases costs by a lot).
Maybe there is a way to add multiple moieties to the NTPs, such that removal of the fluorophore would cause a bigger hindrance? (Fluorophore + extra-moiety would bind, but extra-moiety alone would have more trouble binding?)
"And it should always have better binding."
This is a very strong statement. There are (at least theoretically) a myriad of ways to redesign the binding pocket to preferentially bind the larger labeled nucleotide. Consider many proteases excel at binding the larger substrate peptide and having essentially no detectable binding to the resulting smaller cleaved peptide(s).
Strong, yes. It’s certainly possible to have binding sites that prefer larger molecules. However, I’m not sure you’d be able to do so while still being able to function as a polymerase. That’s the thing I’m thinking about. You have to have an enzyme that will preferentially bind a labeled nucleotide, while still being a good polymerase. And because the resulting enzyme should still be a good polymerase, it would likely also maintain good binding affinity for unlabeled nucleotides.
I’ll mention that your example of a protease is a good counter example — the function of the protein is to bind the larger peptide and cleave it. The evolutionary pressure was to bind the larger molecule and release the smaller (cleaved) ones. Here, the evolutionary pressure was to bind the smaller (unlabeled) nucleotide to build the DNA strand. This is a complex reaction that involves a lot of ligand binding and intermediates. Trying to add a function to preferentially bind a larger fluorophore labeled base seems like it would be overly disruptive or at least reduce the rate of polymerization.
I don’t know enough about the structure of DNA polymerase to say how feasible this really is — but the extra bulk of the fluorophore seems like it would cause issues.
But, is it possible? Sure! This is biology — there is always a way (or three)! I’d expect for there to be a way to generate a polymerase that does this via mutagenesis. But it would be very difficult and probably more expensive than the prize.
If they were able to keep the concentration of fluorescently labeled nucleotides higher, this wouldn’t be an issue. But due to the technology choices (one tube reactions, laser release), they are stuck trying to get a better polymerase.
I certainly wish them luck, I’d love for this to work for more than 5-6 bp.
Any recommendations on learning how to do this? As well as the limitations to your approach? I've been interested in engineering site-specific DNA binding proteins for over a decade now, but the tools I've looked at are specific for protein to small-molecule or protein to protein interfaces.
It turns out the dissertation is not quite published yet, but this paper [1] will be a part of it. Sam and I were in the same lab and his work has evolved to focus even more on AI-guided protein design. Perhaps the quote from the abstract that made the connection for me to this post:
> We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.
If you’re up for reading a dissertation I have one in mind I can dig up. I will check back here in 12 hours or so or you can contact me from my profile.
Likewise, I'd hoped for something designed to pound my ears, not my genes.
But, anyway .. Here's another amazing open source sequencer, hard and soft, worthy of the attention of the sequencer-loving subdivision of the HN music-making aristocracy:
This is super interesting and a promising development for those of us dreaming of running a lab out of a garage. I've never worked with Rothberg directly, but I've read great stories of him cultivating entrepreneural talent in molecular biology.
When PacBio was scaling their technology, they had lots of problems with photobleaching. It turns out shining intense laser light at a polymerase-nucleotide-fluorophore complex and inducing lots of electron flux in the fluorophore eventually causes an electron to get overexcited, jump around, and damage the machinery. PacBio eventually invested a lot in builidng elaborate shields into the nucleotide-fluorophore link, able to absorb the stray electrons and save the polymerase.
Given that experience, I'd expect shining intense UV light into a TIRF volume to break the reversible terminator link will cause lots of photobleaching. It will be interesting to see if 454.bio can overcome this.
I was surprised by their choice of current-limiting resistors for the LEDs (instead of constant current driver) but I guess their goal was to keep the hardware As Simple As Possible
Even commercial designs will cost-optimise. There's no need for extra complexity if it costs more and has lower reliability, unless they're going for planned obsolescence.
The issue is, if you use current limiting resistors, the color rendering index and the absolute brightness are highly variable. That seems like it would matter a bunch.
The problem of non-selective cleavage of all terminators, whether they have been incorporated or not seems to be the most significant one. Is engineering a selective polymerase really the best solution here? If you don't top-up the reaction with un-cleaved terminators, won't the reaction become slower and slower as you go? The ideal solution is somehow have terminators that will only get cleaved after incorporation (maybe due to conformational changes); but I can see how that's probably even harder to engineer than a selective polymerase.
I can see that the complex microfluidics and reagent cycling can be a big added complexity/cost. The continuous process is definitely more elegant, but also seems really difficult to get right for longer read lengths and I wonder if it is really worth it.
What would be the potential benefits of the continuous process? I guess cheaper reagents since it is a one-pot process and faster sequencing speed since you don't need to cycle reagents?
This really gets my blood flowing, it really is the future! The question is: can we get experimentation in biology, say, for high schoolers and undergrads, as easy/cheap/interesting as computing? On the order of buying a good laptop, i.e. - $1,500. What can be done on the order of buying a rapid and a few sensors and a breadboard, $50-$100?
Unfortunately pretty much biological equipment is so expensive and relatively hard to use. Case in point: my son is looking to measure amylase activity in the presence of various inhibitors for diabetes research. The cheapest spectrophotometer devices are $1k-$2k. Surely there must be a way to lose some accuracy but bring the price to $100-$200.
100 or even 200 will be quite a hard target to hit even considering BOM. Most bare bone optics setup will cost you maybe 100, 130 USD. Unless you find some spare parts or go DVD grating route. The best way is to search for ocean optics parts on eBay. There are multiple "benches", sensors and optics sets available most of the time. Still, it will be more than 200.
I got my start in genomics working on an open source DNA sequencer. In the end, we built a powerful scanning fluorescent microscopy platform that saw a few beautiful experiments done on it. I always wondered when this might happen in a truly open way. With the right tech and a focused corporate backing it really could happen. Maybe this is the beginning of something very interesting.
Rather than sequencing, I think it would be better to develop a standardized platform for a wide range of microscopy, but nothing really ambitious. Each new feature you add increases the build challenge. For example, I've built a straightforward transmitted light microscope with automated XY stage and Z focus, with cheap electronics; changing from transmitted to fluorescence would require a great deal of additional components, complexity, and need for precision and accuracy in the design and the build. Adding a "growth chamber" around the scope would as well. Any sort of super-resolution, autofocus, extremely fine movements, etc, are pretty much off the table for a simple build.
Ideally you could draft off of the work done for openbuilds, which is mainly focused on printers, lasers, and CNC machines.
(there are several projects that attempt this; openflexure is an example, but every time I work with their design, I go back to leadscrews with NEMA steppers and linear rail.)
There's a lot more biology than just sequencing DNA. transmitted light microscopy of cells is remarkably useful if you can do it cheap.
A real catalyst would be to couple a sequencer to digitize the DNA together with a matter compiler to output real DNA from the digital DNA representation
Why? craft software to grow the organism in a biologically plausible manner in simulation from the digital DNA... then expose evolutionary pressure to this growth to evolve the DNA in simulation into a more value added organism
real organism --> sequence it's DNA into digital DNA --> evolve the DNA in simulation software --> output real DNA from digital DNA --> insert this evolved real DNA into a real cell to clone an organism now with enhanced attributes
It's basically a little custom microscope with a few illuminators and a Z stage for focusing (not even an XY stage). Not surprising, as sequencing has been microscopy for a while now. It uses TIRF optics.
I'm sure you know this already, but for the benefit of the HN crowd...
Yeah it's been unfortunate that the electronic sensors have not been able to keep up with the optical readouts. Ion Torrent and Genapsys are barely used due to accuracy problems, but Oxford Nanopore's considerable logistical benefits of tolerating a wide range of input DNA concentration, quick time to data, low capital cost, and long reads has kept them alive. Maybe they will grow more with higher accuracy.
Illumina's patents ending recently has prompted a few new startups beyond 454.bio, including Element and Singluar, but with the stiff headwinds of FDA regulatory uncertainty and the long capital life of existing sequencers, it's hard for them to make progress even when they are far cheaper than Illumina. And then we have Ultima and MGI. The me of 10 years ago would not believe both 1) how many strong entrants we have to DNA sequencers these days, and 2) how much Illumina continues to dominate in sequencing. At some point us biologists only have ourselves to blame for our purchasing decisions and their consequences.
Oxford Nanopore with the new R10 cells is good enough for most synthetic biology applications (ie, plasmid sequencing and strain validation), which I'm super excited about. Combine low-capital and long-reads and you can do some really interesting things in the DNA assembly realm. It's pretty recent they got there though
> At some point us biologists only have ourselves to blame for our purchasing decisions and their consequences
At the consensus level the accuracy is high. At the read level there's a lot of room foe improvement. Currently running r10 routinely for surveillance purposes. Seeing per read quality score average of 15.
> Ion Torrent and Genapsys are barely used due to accuracy problems,
Accurancy didn't have very much to do with Ion Torrent's demise. As we're reminded in the OP, Illumina pushed the MiSeq platform, and there wasn't much reason to transition to Ion Torrent. Now, with 250 bp+ PE reads on Illumina, there really isn't much special about Ion Torrent besides its somewhat superior homopolymer performance.
> but Oxford Nanopore's considerable logistical benefits of tolerating a wide range of input DNA concentration, quick time to data, low capital cost, and long reads has kept them alive. Maybe they will grow more with higher accuracy.
This statement is reasonable in today's era, but ONT was nowhere close to being competitive with Ion Torrent and other platforms during the time where labs were considering the Ion Torrent devices vs continued spend on Illumina platforms.
The accuracy, tolerance to unusual DNA samples, and read length with clinically-relevant base qualities were unusable until 2018/2019.
It's been a while since I caught up with the field, and this sounds exciting. I'd like to learn more about what exactly has expired (or will expire?). Is it just the reversible terminator patents, or is there more? Is this the main patent in question? https://patents.google.com/patent/US7541444B2/en or are there others?
I was curious about the reappearance of that number in a biology context (to me, it evokes a famous car engine), and remembered this interesting video on the old and once-very-expensive Roche 454 GS (also a very automotive-sounding name) sequencer:
I'm not educated in molecular biology. Will this device enable DIY enthusiasts to sequence their entire DNA, the whole 700 MB of it, at home? Or is the device only able to sequence small chunks of the DNA?
Is the device able to sequence DNA with minimal errors?
Home lab, or even be able to sequence your genome at home, privately and securely without risking giving your genetic information to companies like 23andme
Forgive me, I'm just totally ignorant here. In order to sequence your own genome and do something like 23andme, I kinda just guessed that you'd need some kind of existing corpus and database to derive meaning from the results.
Is that true? And if so, is it all open source or free?
It's something that's kind of interesting to me. I really enjoyed biotech and chemistry in HS but ended up going in a different direction. If this is something that you can just get into now as a hobby maybe I'll jump down the rabbithole
Nearly everything in this space is OSS. From the aligners to the assembly and the databases.
Last I checked 23andMe didn't perform whole genome sequencing but looked at some number of point mutations. It depends on what information you want. Eg for the sake of interest of actional genomics data.
While it's true that most of the tools for genomics are OSS, there is some good software and in particular some databases that are commercial and quite pricey.
And the sequencing machines and chemicals also don't come cheap, as the market has long been dominated by one vendor.
They return long strings, but not a full chromosome. Additionally, there's always an error rate when reading, typically ~3% of the bases (this is Q15 on the Phred scale [0]). So you want to sequence each chunk many times, typically >30x, though with lower sequencing depth you can use population genetics and a database of known genomes to impute your sequence at the most common sites. Going to >30x will probably take 2-3 flowcells on the MinIon.
However, preparing a DNA sample for input into the machine is not trivial and requires training. As does the analysis to assemble all the chunks into your own genome.
A further challenge is even buying the equipment. Oxford Nanopore, and most biotech companies, keep very tight control of their customer lists, and if it looks like you are trying to order but don't have a proper molecular biology lab to perform sample preparation and to dispose of the waste products correctly, they almost certainly won't sell you anything. There are also strict legal agreements to return the flow cells when done, etc.
So to truly do this, add in the cost of LLC formation and signing up for a biotech startup lab to actually perform the experiments.
They are clearly at the early stage of tech dev. And the reward of $200k for the polymerase to allow longer reads seems pretty low to me... unless you're a grad student and the university doenst take your IP...
Honestly it's pretty hard to get lab access without somebody also getting rights to your IP, and labs are a bit more expensive than setting up an H100, so it might be tough going.
Even if no external party solves their polymerase problem I have a good feeling about this direction though. Godspeed.