An LLM-driven pipeline for proteomics-based detection and structural modeling of post-translational modifications
An LLM-driven pipeline for proteomics-based detection and structural modeling of post-translational modifications
George, A.; Mejia-Rodriguez, D.; Li, X.; Rigor, P.; Cheung, M. S.; Bilbao, A.
AbstractPost-translational modifications (PTMs) on proteins dynamically regulate their functions and subsequently cellular physiology. Significant advances have been made in their detection and modeling: mass spectrometry-based proteomics has become the cornerstone for PTM detection in complex samples, while emerging structure-prediction frameworks enable modeling of PTM-dependent conformational changes. However, the biological significance of many PTMs remains largely unexplored, in part because integrated pipelines that bridge PTM detection with structural modeling remain limited. We present a generative AI-driven pipeline that integrates PTM detection with structural modeling of their effects on protein dynamics and interactions. The pipeline comprises two complementary tools: PTMdiscoverer and PTM-Psi. First, PTMdiscoverer leverages large language models to identify, annotate, and interpret candidate PTMs from open-search proteomics results, addressing limitations of conventional proteomics tools. Next, PTM-Psi models the structural, functional, and dynamic consequences of these spatially aware modifications on protein dynamics. These two components bridge PTM discovery with mechanistic interpretation at the structural level. We demonstrate our pipeline by using cyanobacterial proteomics data to study potential molecular mechanisms of redox-regulated "dark complex" formation in carbon metabolism, advancing our ability to interpret PTM-mediated regulation in microbial systems.