PopGenAgent: Tool-Aware, Reproducible, Report-Oriented Workflows for Population Genomics

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

PopGenAgent: Tool-Aware, Reproducible, Report-Oriented Workflows for Population Genomics

Authors

su, h.; Long, W.; Feng, J.; Hou, Y.; Zhang, Y.

Abstract

Population-genetic inference routinely requires coordinating many specialized tools, managing brittle file formats, iterating through diagnostics, and converting intermediate results into interpretable figures and written summaries. Although workflow frameworks improve reproducibility, substantial last-mile effort remains for parameterization, troubleshooting, and report preparation. Here we present PopGenAgent, a turnkey, report-oriented delivery system that packages a curated library of population-genetics toolchains into validated execution and visualization templates with standardized I/O contracts and full provenance capture. PopGenAgent separates retrieval-grounded user assistance for interpretation and write-up from conservative, template-driven execution that emphasizes auditable commands, artefact integrity checks, and report-ready figure generation. To control operating cost, an economical language model is used for template selection, parameter instantiation, and minor repairs, while higher-capacity models can be invoked selectively for narrative report generation grounded in recorded artefacts. We evaluate PopGenAgent on a broad panel of routine and advanced tasks spanning preprocessing, population structure analysis, and allele-sharing statistics, and we further demonstrate end-to-end replication of standard analyses on 26 populations from the 1000 Genomes Project, reproducing canonical summaries including ROH/heterozygosity profiles, LD decay, PCA, ADMIXTURE structure, TreeMix diagnostics, and f-statistics. Together, these results indicate that a validated template library coupled with provenance-aware reporting can substantially reduce manual scripting and coordination overhead while preserving reproducibility and step-level inspectability for population-genomic studies.

Follow Us on

0 comments

Add comment