AlphaFold is an AI system developed by Google DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment.
Google DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI) have partnered to create AlphaFold DB to make these predictions freely available to the scientific community. The latest database release contains over 200 million entries, providing broad coverage of UniProt (the standard repository of protein sequences and annotations). We provide individual downloads for the human proteome and for the proteomes of 47 other key organisms important in research and global health. We also provide a download for the manually curated subset of UniProt (Swiss-Prot).
DeepMind, the AI company behind the revolutionary AlphaFold protein structure prediction tool, has released the next iteration: AlphaFold 3. This new tool goes beyond AlphaFold 2's capabilities by predicting not only protein structures but also the interactions between various molecules. This includes interactions with nucleic acids (like DNA), small-molecule ligands, and modified macromolecules (Nature, 2024).
One of the key limitations of previous AlphaFold versions was their inability to predict protein-DNA interactions. Researchers frequently requested this functionality, as highlighted by John Jumper, a director at DeepMind: "People would always ask, 'That's great, but what about DNA-binding proteins? Can you tell me how they bind DNA?'" AlphaFold 3, co-developed with Isomorphic Labs, addresses this crucial limitation.
While AlphaFold 3 shares some functionalities with the recently published RoseTTAFold All-Atom model, there are key differences. While RoseTTAFold can design entirely new proteins, AlphaFold 3 excels in its user-friendliness. Through a simple web server interface, anyone with a Google account can submit protein or nucleic acid sequences and receive predicted structures, including potential complex formations with other molecules.
The potential impact of AlphaFold 3 is significant. Julien Bergeron, a biologist at King's College London who participated in testing the tool, predicts widespread adoption: "Every structural biology and protein biochemistry research group in the world will likely adopt this right away."
However, concerns exist within the scientific community. An open letter submitted to Nature's editors by a group of structural and computational biologists highlights several areas of concern. The letter criticizes the lack of transparency regarding the underlying code and training data, which are crucial for independent review and potential improvements. The authors argue that this lack of openness goes against established scientific standards and Nature's own publication policies. Notably, the letter has garnered significant support, with over 670 signatories by May 14th (DOI: 10.5281/zenodo.11192369).
In response to these concerns, Max Jaderberg (Isomorphic Labs' AI chief officer) and Pushmeet Kohli (DeepMind's VP of research) announced plans to release the code for academic use within six months.
While DeepMind has previously released the code for AlphaFold 2, it hasn't made the training data or specific training procedures public. This information is critical for understanding and potentially improving machine learning models. To address this gap, researchers led by Mohammed AlQuraishi at Columbia University developed OpenFold - an open-source alternative to AlphaFold 2 that includes training data (Nat. Methods, 2024). This allows research labs to train their own versions of OpenFold, potentially using proprietary data, and generate predictions similar to AlphaFold 2 without relying on Google's servers. Additionally, labs can potentially optimize the model for specific research problems. Dr. AlQuraishi confirmed via email to C&EN that his team is already working on efforts to replicate AlphaFold 3's functionality.