Dauparas, J. et al. |

LigandMPNN

38
1
Protein Design
    More
Step 1: Upload your data

Upload Backbone PDB File

Drag your file(s) or upload
  • Your file can be in the following formats:pdb
  • The Protein Data Bank (PDB) format is the standard file format for storing atomic coordinates and other information about biomolecules. It contains details about protein and nucleic acid structures including atomic coordinates, crystallographic structure factors, NMR experimental data, and metadata about the structure.
or
Don’t have a file?
Use our demo data to run
Use Demo Data

Upload LigandMPNN Checkpoint File (optional)

Drag your file(s) or upload
  • Your file can be in the following formats:pt
  • The PyTorch model file is a file format used to store information about the three-dimensional structures of biological macromolecules.
or
Don’t have a file?
Use our demo data to run
Use Demo Data
Step 2: Set Parameters
0.100
0.100
1.000
1
10
30

LigandMPNN is a deep learning-based protein sequence design method that explicitly models non-protein atomic contexts, including small molecules, nucleotides, and metals. It significantly improves native sequence recovery and side-chain conformation accuracy compared to existing methods like Rosetta and ProteinMPNN. You can use LigandMPNN after RFDiffusion-All Atom to refine protein sequences and optimize interactions with ligands for enhanced binding affinity and specificity.

Example use case:

Designing protein sequences that interact with specific small molecules, nucleotides, or metals to improve binding affinity and specificity for applications in drug discovery, biosensors, and enzyme engineering.

Technology:

Graph neural networks (GNNs) based on ProteinMPNN, with additional encoding layers for ligand-protein interactions.

Limitations:

  • Performance may be limited for compounds with rare or novel chemical elements not well-represented in the training data. Hybrid approaches with physics-based modeling may be needed for low-data regimes.
  • Some parameters are kept as default; please check the original GitHub repository for details.

Metrics:

  • Sequence recovery near small molecules: 63.3% (vs. 50.4% for Rosetta & ProteinMPNN)
  • Sequence recovery near nucleotides: 50.5% (vs. 35.2% & 34.0%)
  • Sequence recovery near metals: 77.5% (vs. 36.0% & 40.6%)
  • Side-chain chi1 angle recovery: 86.1% (vs. 76.0% for Rosetta)
Citation:
Dauparas, J., Lee, G.R., Pecoraro, R., An, L., Anishchenko, I.V., Glasscock, C.J., & Baker, D. (2023). Atomic context-conditioned protein sequence design using LigandMPNN. bioRxiv.
Released:
Mar-10-2025
Previous Job Parameters
Your previous job parameters will show up here
so you can keep track of your jobs