I'm a little disappointed that their linked preprint doesn't appear to include any molecular biology; i.e. they don't actually try to synthesize any of their predicted sequences and test function. It wouldn't be an outrageous synthesis task to make some of the CRISPR-Cas sequences they generated.
Also interesting that AlphaMisense is omitted from Figure 2B; it substantially outperforms the ESM-based ESM1b in our hands. But I guess the idea is that this is a general-purpose DNA language model whereas AlphaMissense is domain-specific for variant effect prediction?
Strong second for wishing they tried physically testing some model output. The importance of "model that makes outputs AlphaFold thinks look like Cas" is very different from "model that makes functional Cas variants".
For design tasks like in this paper, I think computational models have a big hill to climb in order to compete with physical high-throughput screening. Most of the time the goal is to get a small number of hits (<10) out of a pool of millions of candidates. At those levels, you need to work in the >99.9% precision regime to have any hope of finding significant hits after multiple-hypothesis correction. I don't think they showed anything near that accurate in the paper.
Maybe we'll get there eventually, but the high-throughput techniques in molecular biology are also getting better at the same time.
You are correct that it is dangerous to rely on the results of a model being an oracle for another model, extremely good models (say F=ma) are used all the time.
This should really be a requirement for bio type related generative methods rather than a nice-to-have. A very high percentage of compounds generated by genai type methods have been shown not to work as intended. Anything without wetlab validation should really be taken with a large grain of salt
A tangent: does anyone have recommendations for a library for easy Swift-Rust interop? This is a cool tool, but I’d much rather make a GUI natively with e.g. SwiftUI and then call out to Rust for business logic. The previous times I’ve looked into this, both languages had to communicate through a C intermediate, and handling more complex types became a chore…
Mozilla's uniffi-rs is really good. You write a common IDL and the bindings are generated automatically. It supports Swift, Kotlin, Python, Ruby and JavaScript (not in the official repo).
All sorts of offtopic prompts are unsurprisingly generating nonsensical answers, but even prompting with "lecture notes on clathrin-mediated endocytosis" yielded:
"In the case of clathrin-mediated endocytosis, it is a process used by eukaryotic cells to take up extracellular material and molecules into the cell. It is a mechanism used by cells to take up specific molecules, and it is a mechanism used by cells to regulate the composition of the cell surface. It is a mechanism used by cells to regulate the composition of the cell surface, and it is a mechanism used by cells to take up specific molecules. It is a mechanism used by cells to take up specific molecules. It is a mechanism used by cells to take up specific molecules, and it is a mechanism used by cells to take up specific molecules."
Global models of gene expression for an entire cell are fairly distant at this point, but there is quite a bit of work into modeling transcriptional activity from sequence. If you're interested in reading more, a relevant technology to search for would be the "Massively Parallel Reporter Assay", or MPRA, which couples pools of 10⁴–10⁵+ synthetic DNA sequences with RNA sequencing to measure transcriptional output. Data from MPRA experiments is being used to train models, although these models are not anywhere near a point where you could model the gene expression of all regulatory elements in a cell; they are usually focused on a specific factor or regulatory sequence.
The "train models" or ML portion is what I'm disappointed with unfortunately. I make ML models to predict things from genetic information somewhat regularly, but we all are aware of the enormous issues with that. I am more interested in the ab initio methods, as I have seen them be spectacularly useful in other fields - like Bethe salpeter equations in condensed matter physics.
I'm confused. Where does this say the vaccine is mRNA? The Valneva VLA15 website explicitly states "VLA15 is a multivalent recombinant protein vaccine" [1], and the linked press release calls it a "investigational multivalent protein subunit vaccine". Does nobody actually read these things?
It's not just the OP. I've heard from multiple sources that "Pfizer has an mRNA vaccine for Lyme in clinical trials." Somewhere along the line someone(s) saw Pfizer, vaccine, and put 1 and 1 together and got 3. And it spread. This is the first time I realized that's not true.
I've heard it described as such but, no, it seems to be a distinct if perhaps somewhat related type of vaccine? (UPDATED: It's not.) The company that actually created the vaccine is also unrelated to the company that developed the vaccine that Pfizer is currently distributing for COVID.
Not sure why a direct quote from the article is getting so heavily downvoted. Maybe there’s something here, but if the variants were in the published paper, it doesn’t really matter if the raw sequences were yanked off of the SRA.
Taum Sauk is unusual in that the dam fully surrounds the upper reservoir (i.e. the lake was made from scratch), although certainly all types of dam can fail.
Pumped hydroelectric has good capacity, but doesn't scale well. The right geography is the main limiting factor, accessibility is a second. The places with good hydroelectric potential tend to be very remote, in the mountains. It's extremely hard to do large earthmoving and concrete construction in remote places. The places that have simultaneously the right topology and relatively easy accessibility are limited.
Pumped thermal storage has no geographic limits. The idea is: compress argon, transfer the heat to thermal store, expand the argon back to the starting pressure to recover some work, then transfer the "cold" to another thermal store. For long term storage, use very cheap materials, like rocks, for the thermal stores. To discharge, reverse the flow, so argon is cooled, then compressed, then heated, then expanded. All this can be done with existing technologies and with a round trip efficiency of perhaps 60%.
Indeed. Maybe someone with more expertise can chime in — it's not clear to me if getting a vaccine with the chimp adenovirus backbone would preclude the possibility of getting another vaccine based on that backbone in the future (because of immunity to the backbone).
Also interesting that AlphaMisense is omitted from Figure 2B; it substantially outperforms the ESM-based ESM1b in our hands. But I guess the idea is that this is a general-purpose DNA language model whereas AlphaMissense is domain-specific for variant effect prediction?