Protein Electrostatic Properties are Fine-Tuned Through Evolution

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Protein Electrostatic Properties are Fine-Tuned Through Evolution

Authors

Shen, M.; Dayhoff, G. W.; Shen, J.

Abstract

Protein ionization states provide electrostatic forces to modulate protein structure, stability, solubility, and function. Until now, predicting ionization states and understanding protein electrostatics have relied on structural information. Here we demonstrate that primary sequence alone enables remarkably accurate pKa predictions through KaML-ESM, a model that leverages evolutionary representations from ultra-large protein language models ESMs and pretraining with a synthetic pKa dataset. The KaML-ESM model achieves RMSEs approaching the experimental precision limit of 0.5 pH units for Asp, Glu, His, and Lys residues, while reducing Cys prediction errors to 1.1 units - with further improvement expected as the training dataset expands. The state-of-the-art performance of KaML-ESM was further validated through external evaluations, including a proteome-wide analysis of protein pKa values. Our results support the notation that protein sequence encodes not only structure and function but also electrostatic properties, which may have been co-optimized through evolution. Lastly, we provide KaML, a sequence-based end-toend ML platform that enables researchers to map protein electrostatic landscapes, facilitating applications ranging from drug design and protein engineering to molecular simulations.

Follow Us on

0 comments

Add comment