Improving plant functional annotation from knowledge graphs using Graph Neural Networks
Improving plant functional annotation from knowledge graphs using Graph Neural Networks
Ngo, T. G. B.; Liseron-Monfils, C.; Das, S.; Ubbens, J.; Ashe, P.; Konkin, D.
AbstractAnnotating genes is essential to crop development and understanding gene functions sheds light on developing crop improvement strategies, such as marker-assisted breeding, genetic modification, or pest resistance. Through an extensive experimental effort and computational annotation projection, tens of thousands of genes have been annotated across plant species, with most of the gene annotations focusing on a well-studied species, Arabidopsis thaliana, but this represents a small fraction of the hundreds of thousands of genes across these different plant species. Phenotypes and their traits result from multiple processes and events involving multiscale information encoded from different omics, such as genomes, proteomes, or transcriptomes. This stresses a need for an efficient computational approach to capture and integrate information from biological networks and transfer this knowledge from well-studied species to unknown species to annotate and discover functional relationships between phenotypes and genes. Despite recent progress, existing methods only consider one or a few omics levels to perform reasoning on functional annotation-to-gene relations. The main objective of this study is to generate and explore a large-scale plant biological knowledge graph, the DasDB, and to enrich gene functional annotation linked to genes in different species using graph neural networks (GNNs). Integrating various data sources from different omics has resulted in a comprehensive graph database, facilitating researchers\' in-depth understanding of complex biological networks at the highest level. In addition, applying GNNs on a large-scale knowledge graph database has shown promise in the ability of deep learning models to transfer this information from well-studied plant species to less-characterized plant species. This study benchmarks a new research direction in producing new functional annotation discovery in plant species with limited functional annotations. This pipeline was applied to a specific research problem: the mechanism involved in pea nodule nitrogen fixation. We managed to identify known gene markers of this process through a systematic analysis of the DasDB, showing the relevance of our approach. Furthermore, new potential targets to better understand and improve this process were identified.