A layered standards framework for integrating single-cell and spatial omics data into brain cell atlases
A layered standards framework for integrating single-cell and spatial omics data into brain cell atlases
Ray, P. L.; Miller, J. A.; Jarecka, D.; Smith, K. A.; Baker, P. M.; Ng, L.; Martone, M. E.; Trivedi, P.; Abeysinghe, R.; Anderson, L.; Bandrowski, A. E.; Edyta, V.; Bhandiwad, A. A.; Chhetri, T. R.; Cui, L.; Giglio, M.; Goldy, J.; Hong, N.; Huang, H.; Huang, Y.; Hussain, Y.; Johansen, N.; Kenney, M.; Kruse, L.; Li, X.; Meldrim, J.; Mollenkopf, T.; Nadendla, S.; Osumi-Sutherland, D.; Sanchez, R.; Scheuermann, R. H.; Tao, S.; Vanderburg, C. R.; Yang, Y.; Ropelewski, A.; Mufti, S.; Lein, E.; Xu, H.; Zheng, W. J.; Ghosh, S. S.; White, O.; Hawrylycz, M.; Zhang, G.-Q.; Thompson, C. L.
AbstractThe BRAIN Initiative Cell Atlas Network (BICAN) is generating large-scale multimodal datasets to profile cell types in the human, non-human primate, and mouse brain. The diversity of single-cell and spatial transcriptomic and epigenomic assays, combined with varied experimental contexts, multiple data-generating laboratories and distributed infrastructure, poses substantial challenges for data integration and reuse in BICAN. To address this, we implemented a standards framework that enables layered integration of these data into knowledge-ready products for interoperable brain cell atlases. This framework organizes data based on three progressively structured layers. First, we introduced an assay-agnostic modeling layer that unifies the representation of single-cell and spatial omics data using a common set of biological entities and processes assessed by diverse experimental techniques. Second, we implemented harmonized metadata standards that capture key experimental features linked to biospecimen provenance across heterogeneous tissue sources, species, and preparations, supporting integration and validation while minimizing burden on data contributors. Third, we present an extensible representation for data-driven cell type taxonomies that integrates molecular data with annotations, ontology mappings, and evidence. Together, these contributions represent an end-to-end framework that transforms heterogeneous datasets into structured, interoperable resources that support broad community reuse via mapping algorithms, annotation systems, and visualization platforms. This approach links biospecimen provenance with cell-level outputs and embeds these in a standardized taxonomy format, enabling downstream applications such as cross-dataset integration, reference mapping, and knowledge-driven analysis. More broadly, our work demonstrates a generalizable strategy for enabling an efficient data-to-knowledge pipeline in a large-scale consortium setting.