Download PDFOpen PDF in browserBird-Species Audio Identification, Ensembling 1D + 2D SignalsEasyChair Preprint 606211 pages•Date: July 13, 2021AbstractIn this paper, a method for recognizing bird species in audio recordings is described. We have two dominant models: 1) A binary classifier for predicting if BirdCall is present in the audio or not; 2) A multiclass classifier for predicting which bird is present. Combining 1D and 2D signals gives strong results. We also experiment on ATDemucs which extends Demucs, replacing the BiLSTM with self-attention. In the waveform dimension, we first do source separation of multiple birds along with noise separation as Universal Source Separation. Then we classify each source, both using a 1D waveform model (ReSEMulti, but adding self-attention) and a 2D spectrogram model. We also discussed how we handle different thresholds for different models by a post-processing technique. Ensembling techniques like Voting and Scaling described in gave us a good boost in our results. Our combined architecture, including 1D and 2D signals, achieves 0.619 micro-averaged F1 in the task that asked for classification of 347 bird species. Keyphrases: Attention Mechanism, Audio Source Detection, Bird Species Classification, Demucs, Efficient Net, Ensembling, Multi Domain Meta Training, Transfer Learning, deep learning, sound detection, spectrogram model
|