Clonal embeddings allow exploratory analysis of lineage-resolved single-cell data
Clonal embeddings allow exploratory analysis of lineage-resolved single-cell data
Isaev, S.; Erickson, A. G.; Adameyko, I.; Kharchenko, P. V.
AbstractAssays coupling high-throughput lineage tracing with single-cell transcriptomics are transforming studies of development and disease biology, revealing not only major differentiation routes but also continuous fate biases and their putative regulators. Yet, analysis of such data at scale presents challenges due to the sparse nature of clonal data and annotation dependencies. Towards that aim we developed a machine learning approach - clone2vec - which learns informative clone embeddings directly from the cellular expression manifold, bypassing discrete cell-type labels and remaining stable when clones are represented by few cells. This representation summarizes clonal variation as an interpretable geometry that supports exploration, statistics for clone-gene associations, and cross-dataset alignment. In prospective barcoding datasets spanning embryogenesis, tumorigenesis, and hematopoiesis, clone2vec recapitulates established clonal patterns and uncovers new axes of continuous variation that implicate regulatory programs and developmental pathways. In tumor microenvironments profiled with TCR sequencing, clone2vec robustly recovers distinct Treg lineages as well as conserved CD8+ T cell sublineages across cancer types, including several bystander-like clonal subsets. Overall, clone2vec provides a robust, general solution for the exploratory analysis of lineage-coupled scRNA-seq data.