A conditional generative model to disentangle morphological variation from batch effects in model organism imaging studies
A conditional generative model to disentangle morphological variation from batch effects in model organism imaging studies
Valdarrago, R. M.; Hu, H.; Li, R.; Uribe-Salazar, J. M.; Dennis, M. Y.; Quon, G.
AbstractIdentifying organism-level phenotypic variation that arises from genetic variation is a longstanding problem in genetics. Model organisms such as zebrafish are frequently used for genotype to phenotype studies as they can be genetically manipulated, grown and phenotyped for morphological changes in a high throughput manner through automated systems and imaging. However, individuals from model organisms are typically grown in groups: clutches for zebrafish and frogs, and litter for rodents. These clutches act as strong confounders during image analysis, as individuals from different genotypes but grown in the same clutch tend to look more similar to each other than to individuals with the same genotype in other clutches. Existing approaches such as conditional image classification models and domain adaptation approaches perform poorly on addressing these technical batch effects. Here, we propose a conditional latent diffusion model (cLDM) that disentangles these technical batch effects from morphological features by explicitly conditioning on batch-specific variables during generation, enabling targeted separation of technical artifacts from biologically relevant data. This approach enables accurate classification of genotypes using morphological images of individual zebrafish from distinct mutant classes. Furthermore, this approach allows us to efficiently correct for batch effect and demonstrates the versatility of cLDM in tackling domain-specific problems. This work highlights the potential of cLDM to overcome batch effects and extract meaningful features.