From Noise to Models to Numbers: Evaluating Negative Binomial Models and Parameter Estimations in Single-Cell RNA-seq
From Noise to Models to Numbers: Evaluating Negative Binomial Models and Parameter Estimations in Single-Cell RNA-seq
Wang, Y.; Shu, Z.; Cao, Z.; Grima, R.
AbstractThe Negative Binomial (NB) distribution effectively approximates the transcript count distribution in many single-cell RNA sequencing (scRNA-seq) datasets. This has led to its widespread use in various computational tools for scRNA-seq analysis. However, the underlying reasons for its ubiquitousness remain unclear. Here, we use a computationally efficient model selection technique to precisely map the relationship between the choice of the best-fit models --- Beta-Poisson (Telegraph), NB and Poisson --- and the kinetic parameters that control the stochasticity of gene expression. We find that the NB distribution is an excellent approximation to simulated data, that accounts for both biological and technical noise, in an intermediate range of an effective parameter --- the sum of the gene activation and inactivation rates normalized by the mRNA degradation rate. The size of this range increases with decreasing mean expression, increasing technical noise, and increasing sample size (number of cells). These findings have important implications: (i) excellent NB fits span diverse parameter regimes and are not exclusive indicators of transcriptional bursting; (ii) for small sample sizes, biological noise generally becomes the primary factor shaping the NB characteristics of the count distribution, even when technical noise is significant; (iii) under the assumption of steady-state conditions, gene-specific parameters (burst size and frequency) estimated in regions where the NB model fits well, typically show large relative errors, even after corrections for technical noise; (iv) gene ranking by burst frequency remains accurate, indicating that burst parameter magnitudes are often only relatively informative.