Show HN: An open-source implementation of AlphaFold3

lacker · 2024-09-04T18:36:20 1725474980

This seems really neat!

DeepMind and AlphaFold are clearly moving in a closed-source direction, since they created Isomorphic Labs as a division of Alphabet essentially focused on doing this stuff closed source. In theory it seems nice for academic tools to have an open source version, although I'm not familiar enough with this field to point to a specific benefit of it.

So what's your plan for the company itself, do you intend to continue working on this open source project as part of your business model, or was it more of a one-off? Your website seems very nonspecific about what exactly you intend to be selling.

EdHarris · 2024-09-04T18:43:23 1725475403

Our long term goal is to design enzymes for chemical manufacturing. We decided to build AlphaFold3 because we had seen how useful AlphaFold2 had been for the protein design field. No one else was building it fast enough for us, so we decided we should do it ourselves. We are committed to training and open-sourcing the full version with ligand and nucleic acid prediction capabilities as well since it is so useful for the biotech industry.

fngjdflmdflg · 2024-09-04T19:02:14 1725476534

Have you considered publishing your own paper about your implementation? It would make it easier to cite in the literature later on. Would major journals accept such a paper? I would assume they would if they really had questions about reproducibility.

EdHarris · 2024-09-04T19:15:03 1725477303

OpenFold, which was AlphaFold2's open-source implementation was published in Nature Methods. We will prepare a similar publication once the model is more mature and when we have a nice set of experiments showing the model's interesting properties.

dwayne_dibley · 2024-09-04T20:43:25 1725482605

Hi, how are predictions verified? Does one still do experimental techniques (X-ray crystallography, cryogenic-em etc.) one you have the prediction? Or are predictions so close to reality you can progress without experiment?

EdHarris · 2024-09-04T21:30:16 1725485416

The predictions can be verified by comparing the predicted structure to the experimentally solved structure, either crystal or cryoEM. The model is still training and improving, we will release the benchmarking results after it's complete.

boldlybold · 2024-09-04T21:49:15 1725486555

Thanks for releasing this, I've been looking forward to a truly open version I can use in a commercial setting. What a way to launch the company!

EdHarris · 2024-09-04T22:23:26 1725488606

Thanks!

dekhn · 2024-09-04T19:50:10 1725479410

You probably want to change the name of this implementation as it's not truly AlphaFold3. I wouldn't be surprised if you got a C&D from DM for using the name.

EdHarris · 2024-09-04T19:55:26 1725479726

Yes this is a good point. We are actively speaking with our counsel to check this. Thanks for flagging, though.

snolbert · 2024-09-04T18:39:21 1725475161

Who would've thought only releasing pseudo-code isn't good enough...glad to see the scientific immune system fighting back against closed-source science. Your move Google.

nolist_policy · 2024-09-04T19:07:04 1725476824

How dare they make money with something that is not advertising!

throwaway48476 · 2024-09-04T23:51:01 1725493861

There's nothing wrong with trade secrets, but that's business not science.

lofatdairy · 2024-09-04T19:28:01 1725478081

I mean it shouldn't be enough to publish in nature. The whole point of science is that it can be validated. It's totally fine that they're hosting their models for free on closed servers with limits, even though it's not exactly the most ergonomic.

dekhn · 2024-09-04T21:32:37 1725485557

It was already validated by winning CASP and the paper by Paul Adams (https://www.nature.com/articles/s41592-023-02087-4) which, although it reads like criticism is actually high praise. Everything the model can do, will be (or already has) replicated by the open community.

Also, for work of the highest art (of which AF3 is an example), publication in nature really is the fundamental unit of scientific currency because it ensures all their competitors will get hyped up and work extra-hard to disprove it.

natechols · 2024-09-05T00:12:46 1725495166

The paper by Paul Adams used an earlier version of AlphaFold that was publicly available, not AlphaFold 3 which is not.

dekhn · 2024-09-05T00:20:49 1725495649

My statement is correct; both AF papers were published in nature, and both won casp. AF3 is superior to AF2 which means if adams wrote another paper, it would be on increasingly less interesting fine details.

lofatdairy · 2024-09-05T22:31:36 1725575496

To be clear, I don't think anyone distrusts the benchmarking work nor even the reported architecture, but also no one should need to operate on faith when it comes to work that presents itself as groundbreaking. Probably the first thing everyone did when they tested the model was run a sequence w/ a known cryo structure, but that's insufficient for how deepmind knows researchers will use the model.

> Also, for work of the highest art (of which AF3 is an example), publication in nature really is the fundamental unit of scientific currency because it ensures all their competitors will get hyped up and work extra-hard to disprove it.

IDK about disproving it, again nobody is distrusting the work, but let's also not pretend that a prestige journal is necessary to promote AF3. They could publish in the Columbia Undergraduate Science Journal and get the same amount of press. And to be clear the controversy has largely center on Nature for allowing AF3 to get away with more than they would most other projects, and the wasted time and effort it's taking to reimplement the work so people can add to it. FWIW an author did state that they're attempting to release the code but that's not like a binding vow.

Finally, AF3 strictly speaking didn't win CASP (it almost certainly would) but again this isn't necessarily the point when people talk about validation. The diffusion process does seem to result in notable edge cases (most obviously in IDPs and IDRs but also non-existent self-interactions), it's not a straight improvement in that respect.

benreesman · 2024-09-04T20:38:06 1725482286

I did a very brief stint on computational proteomics. That stuff is absolutely next level.

EdHarris · 2024-09-04T21:01:24 1725483684

Amazing! What kind of things did you work on?

benreesman · 2024-09-04T22:00:49 1725487249

My job was mostly mundane machine learning: classification over very large categorical sets.

I never had anything more than a dim intuition of the serious chemistry going on before the bytes got to me.

MylesHollowed · 2024-09-05T01:55:30 1725501330

Where were you working? That sounds super interesting

benreesman · 2024-09-05T03:07:31 1725505651

I’m a big fan of what you folks are doing by the way.

Haskell (and Nix) people are fond of talking about “constraints as power”.

https://github.com/Ligo-Biosciences/AlphaFold3/blob/ebdf3b12...

benreesman · 2024-09-05T02:40:47 1725504047

I was a contractor for like a month so I’m not at liberty to talk about the details.

There are a number of companies doing innovative things around quantifying proteins and their concentrations in various samples.

I had the privilege to rub elbows with folks working on such cool stuff.

westurner · 2024-09-05T15:06:33 1725548793

Does this win the Folding@home competition, or is/was that a different goal than what AlphaFold3 and ligo-/AlphaFold3 already solve for?

Folding@Home https://en.wikipedia.org/wiki/Folding@home :

> making it the world's first exaflop computing system

flobosg · 2024-09-06T13:00:31 1725627631

Folding@home and protein structure prediction methods such as AlphaFold address related but different questions. The former intends to describe the process of a protein undergoing folding over time, while the latter tries to determine the most stable conformation of a protein (the end result of folding).

EdHarris · 2024-09-05T17:40:01 1725558001

Folding@home uses Rosetta, a physics-based approach that is outperformed by deep learning methods such as AlphaFold2/3.

flobosg · 2024-09-06T12:54:50 1725627290

Folding@home uses Rosetta only to generate initial conformations[1], but the actual simulation is based on Markov State Models. Note that there is another distributed computing project for Rosetta, Rosetta@home.

[1]: https://foldingathome.org/dig-deeper/#:~:text=employing%20Ro...

londons_explore · 2024-09-05T09:30:50 1725528650

If I'm understanding correctly, the model code itself is only a tiny proportion of the challenge. The training compute and training data are far bigger parts.

Google has access to training compute on a scale perhaps nobody else has.

littlestymaar · 2024-09-05T10:15:56 1725531356

Is that really the case though? Available compute sounds unlikely to be the limiting factor here, compared to data which is way scarcer than what's being used to train LLMs, and I suspect Google used mostly publicly available data for training unless they signed deals beforehand with biotechnology companies which have access to more data. That's possible of course, but that doesn't feel very google-y.

EdHarris · 2024-09-05T17:38:13 1725557893

Yes, all data Google used was public. We have enough compute from YC (thanks YC!) to do this. The main thing is the technical infrastructure - processing the data, efficient loading at training time, proper benchmarking, etc. We are building these now.

littlestymaar · 2024-09-06T12:45:56 1725626756

Thanks for the answer! It's much better to have the definitive answer rather than rely on gut feeling (even though it was right in this case).

Keep up the good work!

How much compute does YC give you access to btw? Is that just things like azure credit or do YC have actual hardware?

ck_one · 2024-09-04T18:32:17 1725474737

What's your next step? Why did you decide to focus on enzyme design?

EdHarris · 2024-09-04T18:47:09 1725475629

We think enzymes are super cool! You can build molecular assembly lines at the atomic scale with them. Many pharmaceuticals are already manufactured with enzymes such as the diabetes drug Januvia. Engineering them is a big bottleneck though - takes years and millions of dollars. We want to speed this up with AI-powered design. Next step is ligand-protein prediction capability of AlphaFold3, which is also super useful for modelling enzyme-substrate interactions.

ricopags · 2024-09-05T00:26:06 1725495966

Possibly because it dovetails with pharma mfg and [potentially] food mfg. Could see a case made for enzymatically brewed 'meat inks' [very sorry for this term ;p] for 3d printing the next gen of lab meats.

inciampati · 2024-09-05T05:42:59 1725514979

Are you familiar with ColabFold?

https://github.com/sokrypton/ColabFold

serial_dev · 2024-09-04T18:19:15 1725473955

What an unfortunate naming, I thought I'd see some gravitational waves (as I have no idea what alphafold is).