Beyond PAM50: Unsupervised Discovery of Anomalous Subgroups in Breast Cancer
Beyond PAM50: Unsupervised Discovery of Anomalous Subgroups in Breast Cancer
gairola, t.
AbstractBreast invasive carcinoma (BRCA) exhibits molecular heterogeneity not fully captured by classifiers like PAM50. I applied an ensemble of four unsupervised anomaly detection algorithms Isolation Forest, One Class SVM, Local Outlier Factor, and Autoencoder to ~13,400 gene expression profiles from 1,218 TCGA-BRCA RNA-seq samples, identifying 41 High-Concordance Anomalies (HCAs) consistently flagged by three or more methods. HCAs showed marked downregulation of ~1,750 genes, strongly enriched for immune-related pathways such as T-cell activation and cytokine signaling, indicating an immune-cold phenotype. In contrast, ~160 upregulated genes were associated with metal ion response, metabolism, and developmental programs. Over half of the HCAs were PAM50_Unknown. Within the Basal-like subtype, a subset of HCAs (HCA-Basal, n=7) exhibited even stronger immune suppression, with 499 additional immune genes downregulated, defining an ultra-immune-cold variant. Upregulated genes in HCA-Basal lacked coherent pathway enrichment. While not statistically significant, HCA-Basal cases (n=5 with survival data) showed a trend toward poorer prognosis. These findings reveal a distinct, immune-suppressed BRCA subgroup often missed by current classifiers, with potential relevance for risk assessment and treatment.