Skip to content

MegaDetector-Acoustic

MegaDetector-Acoustic provides training, inference, and dataset preparation for audio classification in wildlife monitoring. The module is maintained at microsoft/MegaDetector-Acoustic and builds on core APIs in PytorchWildlife.data.bioacoustics and PytorchWildlife.models.bioacoustics.

What's included

  • CLI scripts for dataset preparation (prepare_dataset.py), training (train.py), and inference (inference.py)
  • ResNetClassifier — PyTorch Lightning module for spectrogram classification (binary and multiclass)
  • Mel-spectrogram preprocessing with optional GPU acceleration
  • Annotation readers (COCO-like JSON), including support for the PteroSet / Raven Pro format
  • MD_AudioBirds_V1 — a pre-trained bird classifier distributed as ONNX for direct inference

See the MegaDetector-Acoustic model zoo for the released models.

Demo

The end-to-end notebook at microsoft/MegaDetector-Acoustic walks through:

  1. Data exploration — annotation counts, species distribution
  2. Inference — run MD_AudioBirds_V1 on real recordings, visualise predictions vs. ground truth
  3. Training — build COCO-style annotations, binary classification (target vs. noise), multiclass classification

It uses recordings from the PteroSet dataset.

Projects using this module

  • PteroSet — Machine-learning pipeline for detecting and classifying tropical bird vocalisations from passive acoustic monitoring, with leave-one-project-out cross-validation.
  • CookInlet_Belugas — Passive acoustic monitoring for endangered Cook Inlet beluga whales. A two-stage pipeline covering cetacean signal detection and multi-species classification (beluga, humpback, killer whale), plus an active-learning loop for domain adaptation.

Install

pip install PytorchWildlife
pip install librosa soundfile pyyaml torchmetrics

See the MegaDetector-Acoustic README for full configuration options, training arguments, and output formats.