TweetyBERT: Automated parsing of birdsong through self-supervised machine learning.
TweetyBERT: Automated parsing of birdsong through self-supervised machine learning.
Vengrovski, G.; Hulsey-Vincent, M. R.; Bemrose, M. A.; Gardner, T. J.
AbstractDeep neural networks can be trained to parse animal vocalizations - serving to identify the units of communication, and annotating sequences of vocalizations for subsequent statistical analysis. However, current methods rely on human labelled data for training. The challenge of parsing animal vocalizations in a fully unsupervised manner remains an open problem. Addressing this challenge, we introduce TweetyBERT, a self-supervised transformer neural network developed for analysis of birdsong. The model is trained to predict masked or hidden fragments of audio, but is not exposed to human supervision or labels. Applied to canary song, TweetyBERT autonomously learns the behavioral units of song such as notes, syllables, and phrases - capturing intricate acoustic and temporal patterns. This approach of developing self-supervised models specifically tailored to animal communication will significantly accelerate the analysis of unlabeled vocal data.