Simple baselines rival protein language models in mutation-dense design tasks

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Simple baselines rival protein language models in mutation-dense design tasks

Authors

Talpir, I.; Fleishman, S. J.

Abstract

Computational protein design demands generally applicable models that reliably predict or generate unmeasured variants with superior functional properties. Recent studies have proposed protein language models (pLMs) for design tasks, including zero-shot scoring and transfer learning from limited experimental data. Although pLMs have been used in zero-shot and transfer-learning studies, they have generally not been assessed in benchmarks that explicitly test combinatorial extrapolation from lower- to higher-order variants. Here we benchmark widely used pLMs against conventional baseline methods in recently described dense, experimentally validated multi-mutant landscapes. We find that regardless of architecture and parameter count, pLMs are statistically similar to one another, and none consistently outperforms conventional baseline methods. Furthermore, their ability to distinguish functional from non-functional variants in zero-shot prediction is comparable to that of conventional homology-based methods. We suggest that to contribute to the design of protein function, pLMs may need to encode biophysical and structural priors or be combined with structure-based approaches.

Follow Us on

0 comments

Add comment