Limits...
Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring

View Article: PubMed Central - PubMed

ABSTRACT

Automatic classification of animal vocalizations has great potential to enhance the monitoring of species movements and behaviors. This is particularly true for monitoring nocturnal bird migration, where automated classification of migrants’ flight calls could yield new biological insights and conservation applications for birds that vocalize during migration. In this paper we investigate the automatic classification of bird species from flight calls, and in particular the relationship between two different problem formulations commonly found in the literature: classifying a short clip containing one of a fixed set of known species (N-class problem) and the continuous monitoring problem, the latter of which is relevant to migration monitoring. We implemented a state-of-the-art audio classification model based on unsupervised feature learning and evaluated it on three novel datasets, one for studying the N-class problem including over 5000 flight calls from 43 different species, and two realistic datasets for studying the monitoring scenario comprising hundreds of thousands of audio clips that were compiled by means of remote acoustic sensors deployed in the field during two migration seasons. We show that the model achieves high accuracy when classifying a clip to one of N known species, even for a large number of species. In contrast, the model does not perform as well in the continuous monitoring case. Through a detailed error analysis (that included full expert review of false positives and negatives) we show the model is confounded by varying background noise conditions and previously unseen vocalizations. We also show that the model needs to be parameterized and benchmarked differently for the continuous monitoring scenario. Finally, we show that despite the reduced performance, given the right conditions the model can still characterize the migration pattern of a specific species. The paper concludes with directions for future research.

No MeSH data available.


Model sensitivity to hyper-parameter values for CLO-43SD.Each subplot displays the classification accuracy as a function of: (a) the duration of the TF-patches dpatch, (b) the size of the codebook k, (c) the set of summary statistics used in feature encoding fstat, and (d) the penalty parameter C used for training the Support Vector Machine classifier.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5120805&req=5

pone.0166866.g004: Model sensitivity to hyper-parameter values for CLO-43SD.Each subplot displays the classification accuracy as a function of: (a) the duration of the TF-patches dpatch, (b) the size of the codebook k, (c) the set of summary statistics used in feature encoding fstat, and (d) the penalty parameter C used for training the Support Vector Machine classifier.

Mentions: Finally, we explored the sensitivity of the model to each hyper-parameter, displayed in Fig 4. The most influential parameter was dpatch (the duration of the TF-patch): using longer patches (thus learning larger spectro-temporal structures) increased accuracy up to a patch duration of 46.4 ms. Beyond that the patch duration spans most or all of the flight call, and this proved to be detrimental to the model under this scenario. Interestingly, using a small dictionary size of 128 was sufficient, and increasing k did not result in improved accuracy. This result stands in contrast to that observed for urban sounds in [34], possibly due to the reduced variance in flight calls of the same species compared to more heterogeneous sounds such as sirens or jackhammers which have more diverse sound production mechanisms and patterns. The model was also relatively robust to the choice of summary statistic and C value (for C > = 10).


Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring
Model sensitivity to hyper-parameter values for CLO-43SD.Each subplot displays the classification accuracy as a function of: (a) the duration of the TF-patches dpatch, (b) the size of the codebook k, (c) the set of summary statistics used in feature encoding fstat, and (d) the penalty parameter C used for training the Support Vector Machine classifier.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5120805&req=5

pone.0166866.g004: Model sensitivity to hyper-parameter values for CLO-43SD.Each subplot displays the classification accuracy as a function of: (a) the duration of the TF-patches dpatch, (b) the size of the codebook k, (c) the set of summary statistics used in feature encoding fstat, and (d) the penalty parameter C used for training the Support Vector Machine classifier.
Mentions: Finally, we explored the sensitivity of the model to each hyper-parameter, displayed in Fig 4. The most influential parameter was dpatch (the duration of the TF-patch): using longer patches (thus learning larger spectro-temporal structures) increased accuracy up to a patch duration of 46.4 ms. Beyond that the patch duration spans most or all of the flight call, and this proved to be detrimental to the model under this scenario. Interestingly, using a small dictionary size of 128 was sufficient, and increasing k did not result in improved accuracy. This result stands in contrast to that observed for urban sounds in [34], possibly due to the reduced variance in flight calls of the same species compared to more heterogeneous sounds such as sirens or jackhammers which have more diverse sound production mechanisms and patterns. The model was also relatively robust to the choice of summary statistic and C value (for C > = 10).

View Article: PubMed Central - PubMed

ABSTRACT

Automatic classification of animal vocalizations has great potential to enhance the monitoring of species movements and behaviors. This is particularly true for monitoring nocturnal bird migration, where automated classification of migrants’ flight calls could yield new biological insights and conservation applications for birds that vocalize during migration. In this paper we investigate the automatic classification of bird species from flight calls, and in particular the relationship between two different problem formulations commonly found in the literature: classifying a short clip containing one of a fixed set of known species (N-class problem) and the continuous monitoring problem, the latter of which is relevant to migration monitoring. We implemented a state-of-the-art audio classification model based on unsupervised feature learning and evaluated it on three novel datasets, one for studying the N-class problem including over 5000 flight calls from 43 different species, and two realistic datasets for studying the monitoring scenario comprising hundreds of thousands of audio clips that were compiled by means of remote acoustic sensors deployed in the field during two migration seasons. We show that the model achieves high accuracy when classifying a clip to one of N known species, even for a large number of species. In contrast, the model does not perform as well in the continuous monitoring case. Through a detailed error analysis (that included full expert review of false positives and negatives) we show the model is confounded by varying background noise conditions and previously unseen vocalizations. We also show that the model needs to be parameterized and benchmarked differently for the continuous monitoring scenario. Finally, we show that despite the reduced performance, given the right conditions the model can still characterize the migration pattern of a specific species. The paper concludes with directions for future research.

No MeSH data available.