Science Cast

Simple baselines rival protein language models in mutation-dense design tasks

librarianMay 6, 2026 8:56am

Views (1)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Simple baselines rival protein language models in mutation-dense design tasks

bioRxivPDFMay 6, 2026 12:00am

Authors

Talpir, I.; Fleishman, S. J.

Abstract

Computational protein design demands generally applicable models that reliably predict or generate unmeasured variants with superior functional properties. Recent studies have proposed protein language models (pLMs) for design tasks, including zero-shot scoring and transfer learning from limited experimental data. Although pLMs have been used in zero-shot and transfer-learning studies, they have generally not been assessed in benchmarks that explicitly test combinatorial extrapolation from lower- to higher-order variants. Here we benchmark widely used pLMs against conventional baseline methods in recently described dense, experimentally validated multi-mutant landscapes. We find that regardless of architecture and parameter count, pLMs are statistically similar to one another, and none consistently outperforms conventional baseline methods. Furthermore, their ability to distinguish functional from non-functional variants in zero-shot prediction is comparable to that of conventional homology-based methods. We suggest that to contribute to the design of protein function, pLMs may need to encode biophysical and structural priors or be combined with structure-based approaches.

TwitterandLinkedIn

0 comments

Add comment

Simple baselines rival protein language models in mutation-dense design tasks

Simple baselines rival protein language models in mutation-dense design tasks

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments