PROTACFold is a comprehensive toolkit for analyzing and predicting Proteolysis Targeting Chimera (PROTAC) structures using AlphaFold 3 and Boltz-1. PROTACs are heterobifunctional molecules that induce targeted protein degradation by forming ternary complexes between a protein of interest (POI) and an E3 ubiquitin ligase. This toolkit provides methods for accurate prediction, evaluation, and analysis of these complex structures and models to advance PROTAC drug discovery.
- Overview
- Website
- Features
- Installation
- Directory Structure
- Usage
- Key Metrics
- Predicted Structures
- Tools
- Data Sources
- License
- Acknowledgments
- Citation
To make PROTAC analysis more accessible, we launched protacfold.xyz, our web platform that automates PDB extraction, identifies PROTAC POI & E3 ligase components, and prepares input files for both AlphaFold3 and Boltz-1.
- AF3 & B1 Integration: Streamlined setup and usage of both AlphaFold 3 and Boltz-1 for comparative PROTAC ternary complex prediction.
- Multiple Ligand Representation Methods: Support for both Chemical Component Dictionary (CCD) and SMILES formats
- Comprehensive Structure Analysis: Calculate RMSD, DockQ scores, pTM, ipTM, and TM-scores for evaluating model quality
- Molecular Property Analysis: Calculate and analyze physicochemical properties of PROTACs using RDKit
- Advanced Visualization: Interactive plots and statistical analysis of prediction metrics
- Benchmark Capabilities: Compare predictions with experimental structures and other computational methods
- Python 3.11+
- CUDA-compatible GPU (for AlphaFold 3)
- Docker (recommended for AlphaFold 3 setup)
We use AlphaFold 3 inference code available from Google DeepMind.
Our detailed instructions for setting up AlphaFold 3 using Docker can be found in the installation guide. For reference, you can also consult the official AlphaFold 3 documentation, though our guide provides comprehensive step-by-step instructions tailored more for PROTACFold users.
Install Boltz using pip:
pip install boltz -U
To run predictions with Boltz YAML input files, please refer to the detailed instructions in the official Boltz Prediction Guide.
- Clone the repository:
git clone https://github.com/NilsDunlop/PROTACFold.git
cd PROTACFold
- Install Python dependencies:
pip install -r requirements.txt
data/
: Contains datasets and analysis resultsaf3_input/
: Input files for AlphaFold 3 (SMILES and CCD formats)af3_results/
: Consolidated results from AlphaFold 3 predictionsboltz_results/
: Consolidated results from Boltz-1 predictionsplots/
: Generated visualizationshal_04732948/
: Data from Pereira et al., 2024 for comparison
utils/
: Utility scripts for structure analysis and property calculationsrc/
:plots/
: Scripts for generating all figures and data for our research.website/
: Local deployment ofprotacfold.xyz
for private analysis using Ollama.
docs/
: Documentation including installation guides and images
Proposed workflow for predicting PROTAC ternary complexes using AlphaFold 3 and Boltz-1:
- Determine PDB structures to analyze and automate JSON and YAML input files with
protacfold.xyz
. - Run AlphaFold 3 and Boltz predictions.
- Analyze results using the provided utility scripts.
The utils/evaluation.py
script automates the extraction of all quantitative metrics from our study (see Key Metrics). It uses the (PDBID)_analysis.txt
files (generated by protacfold.xyz
) to identify POI and E3 ligase chains, enabling fully automated, component-wise RMSD calculations with PyMOL.
Note: Script requires a local installation of PyMOL for structural alignments.
To run a complete analysis on a directory of PROTAC predictions:
# Analyze all AlphaFold 3 predictions in a given directory
python utils/evaluation.py --protac path/to/predictions --model_type AlphaFold3
# Analyze all Boltz-1 predictions in a given directory
python utils/evaluation.py --boltz path/to/predictions --model_type Boltz1
This will generate an evaluation_results.csv
file in the data/af3_results/
directory.
The src/plots/
directory contains all the scripts used to generate the figures and perform the data analysis for our research. These scripts produce a variety of visualizations and can be ran by:
python src/plots/main.py
PROTACFold evaluates predictions using multiple metrics:
- DockQ Score: Quality measure for protein-protein docking interfaces
- RMSD: Root Mean Square Deviation between predicted and experimental structures
- pTM/ipTM: AlphaFold confidence metrics for overall and interface quality
- Molecular Descriptors: Physicochemical properties of PROTAC molecules
All 124 predicted PROTAC structures, as well as two replicas of a 300 ns MD simulation of complex 9B9W, are available on Zenodo. An example of a high-quality prediction, the structure for complex 7PI4 is shown below. The experimental structure in gray, with the AlphaFold 3 prediction in gold and the Boltz-1 prediction in cyan.
-
AlphaFold 3 - DeepMind's state-of-the-art protein structure prediction model
-
Boltz-1 - MIT researchers open source biomolecular interaction model
- DockQ - Quality measure for protein-protein docking models
This project integrates data from:
This project is licensed under the MIT License - see the LICENSE file for details.
- The AlphaFold team at Google DeepMind
- The Boltz researchers at MIT
- Developers of open-source tools used in this project (RDKit, DockQ)
- PyMOL for visualization
- Contributors to PROTAC databases and experimental data
If you use PROTACFold in your research, please cite the paper: Predicting PROTAC-Mediated Ternary Complexes with AlphaFold3 and Boltz-1