This repository contains scripts for preparing training data for orca detection models.
Bootstrap-only generation scripts and archived CSV inputs now live under bootstrap/.
The ongoing sample CSVs are:
output/csv/training_3s_samples.csvoutput/csv/testing_60s_samples.csv
Both files can be updated manually by editing rows directly, or via scripts (for example
add_samples.py, process_false_positives.py, and process_false_negatives.py).
The active scripts in src include:
- download_wavs.py: Uses
output/csv/training_3s_samples.csvandoutput/csv/testing_60s_samples.csvto download wav files. - make_spectrograms.py: Creates a png file for each wav file in a subdirectory of
output/png. - train_podsai_model.py: Trains a PODS-AI model on the generated training samples.
- compare_models.py: Evaluates models on
output/csv/testing_60s_samples.csv.
flowchart TD;
podsaiModel[(HuggingFace davethaler/whale-call-detector)];
orcaHelloModel[(HuggingFace orcasound/orcahello-srkw-detector-v1)];
trainingSamples@{ shape: doc, label: "training_3s_samples.csv" };
testingSamples@{ shape: doc, label: "testing_60s_samples.csv" };
wav@{ shape: docs, label: "wav/*" };
testingWav@{ shape: docs, label: "testing-wav/*" };
concatenated@{ shape: docs, label: "concatenated.wav" };
png@{ shape: docs, label: "png/*" };
downloadWavs@{ shape: rect, label: "download_wavs.py" };
trainPodsaiModel@{ shape: rect, label: "train_podsai_model.py" };
compareModels@{ shape: rect, label: "compare_models.py" };
concatenateWavs@{ shape: rect, label: "concatenate_wavs.py" };
makeSpectrograms@{ shape: rect, label: "make_spectrograms.py" };
trainingSamples-->downloadWavs-->wav;
testingSamples-->downloadWavs-->testingWav;
wav-->trainPodsaiModel-->podsaiModel;
podsaiModel-->compareModels;
testingSamples-->compareModels;
testingWav-->compareModels;
orcaHelloModel-->compareModels;
wav-->concatenateWavs-->concatenated;
wav-->makeSpectrograms-->png;
Install dependencies:
pip install -r requirements.txtKey dependencies:
boto3: For accessing S3 audio filesffmpeg-python: For audio processinglibrosa>=0.10.0: For audio analysism3u8: For HLS stream parsingpytz: For timezone handlingfastai==1.0.61: For FastAI model supporttorch>=2.1.0: PyTorch deep learning frameworktorchvision>=0.16.0: Computer vision models and utilitiestorchaudio>=2.1.0: Audio processing for PyTorchsoundfile: Audio file I/Ofastai_audio: FastAI audio extensions (from GitHub)pandas,pydub: Data processing and audio manipulation
- spectrogram_visualizer.py: Adapted from aifororcas-livesystem
- model_inference.py: Provides model inference interface for scoring audio samples
- orcasite_feeds.py: Lightweight module providing the
OrcasiteFeeddataclass andget_orcasite_feeds()helper. Depends only onrequests— noazure-cosmos— so scripts that only need the feeds REST API (e.g.add_samples.py) can import it without pulling in the fullmake_csvdependency tree. - add_samples.py: Splits a WAV file into 3-second segments (2-second hop), saves each
segment to a
new/directory using the standard filename convention, and prints the predicted class for each segment. Useful for labelling new recordings and adding them to the training set. See add_samples.py below. - concatenate_wavs.py: Concatenates WAV files in a directory into a single output file, adding a short beep between clips to make quick listen-through review easier. See concatenate_wavs.py below.
- process_false_positives.py: Re-checks rejected OrcaHello detections by
downloading the 60-second WAV, re-running PODS-AI, and appending whale-class
sub-segments with corrected classes to
output/csv/training_3s_samples.csv. The corrected class is inferred from the human-authored portion of the moderation comments (auto-generated "AI: …" lines are ignored). Explicit negations in the comments are understood: "No humpback" suppresses the humpback match, and "No humpback nor vessel" resolves the corrected class towater. Supports--category CATEGORYto process only detections whose inferred actual category matches the provided value. - process_false_negatives.py: Re-checks confirmed OrcaHello detections by
downloading the 60-second WAV, re-running PODS-AI and OrcaHello segment inference,
and appending segments where OrcaHello predicts resident but PODS-AI does not to
output/csv/training_3s_samples.csvwith corrected classresident. Supports--category CATEGORYto process only detections whose PODS-AI predicted category matches the provided value. - run_inference.py: Runs a model on a wav file and prints the global prediction, confidence, and per-class probabilities.
- LiveInferenceOrchestrator.py: Runs live/date-range HLS inference with the multiclass PODS-AI model and can upload positive detections (resident/transient/humpback) to Azure Blob Storage and Cosmos DB.
- compare_models.py: Evaluates and compares fastai, orcahello, podsai (AST), and oldpodsai (Wav2Vec2) models
on the test set loaded from
output/csv/testing_60s_samples.csvand downloaded bydownload_wavs.py). Reports correct identifications, false positives, false negatives, and average prediction time for each model.
Split a WAV recording into 3-second segments (with a 2-second hop — the same settings
used by run_inference.py), save each segment to a new/ directory using the standard
filename convention, and print the predicted class for each segment. The timestamp
encoded in each filename reflects the actual start time of that sample inside the
original recording.
Output files follow the same naming convention as output/wav/humpback/ etc.:
{node_name_with_hyphens}_{YYYY_MM_DD_HH_MM_SS_PST}.wav
Inference always uses the PODS-AI (podsai) model type. The default model is
davethaler/whale-call-detector on HuggingFace Hub; override with --model-path.
If --node-name and --timestamp are omitted, the script infers them from the input
filename. The filename must follow the same convention:
{node_name_with_hyphens}_{YYYY_MM_DD_HH_MM_SS_PST}.wav
(e.g. rpi-orcasound-lab_2025_12_17_22_34_03_PST.wav → node rpi_orcasound_lab,
timestamp 2025_12_17_22_34_03_PST).
After reviewing the predictions you can move the segments into the appropriate
output/wav/<category>/ directory to add them to the training set.
usage: python add_samples.py <wav_file> [--node-name NAME] [--timestamp TIMESTAMP]
[--output-dir DIR] [--model-path PATH] [--uri URI]
| Argument | Description |
|---|---|
wav_file |
Path to the input WAV file to segment |
--node-name |
Hydrophone node name (e.g. rpi_orcasound_lab). Underscores are replaced with hyphens in output filenames. Inferred from the input filename if omitted. |
--timestamp |
PST timestamp of the start of the recording (e.g. 2025_01_15_12_30_00_PST). Inferred from the input filename if omitted. |
--output-dir |
Directory to save segments (default: new) |
--model-path |
HuggingFace Hub model ID or path to a local podsai model directory (default: davethaler/whale-call-detector) |
--uri |
Optional custom URI to use for all segments. If provided, all output rows will use this URI instead of generating one per segment. Useful when all segments come from the same detection. |
Example — node name and timestamp inferred from filename
cd src
python add_samples.py rpi-orcasound-lab_2025_01_15_12_30_00_PST.wavExample — explicit node name and timestamp with custom model
cd src
python add_samples.py /path/to/recording.wav \
--node-name rpi_orcasound_lab \
--timestamp 2025_01_15_12_30_00_PST \
--model-path /path/to/local-modelExample — use custom URI for all segments
When all segments come from the same detection event, you can specify a single URI to use for all output rows:
cd src
python add_samples.py /path/to/recording.wav \
--node-name rpi_orcasound_lab \
--timestamp 2025_01_15_12_30_00_PST \
--uri "https://live.orcasound.net/bouts/new/rpi_orcasound_lab?time=2025-01-15T20%3A30%3A00.000Z"Output:
Saved: new/rpi-orcasound-lab_2025_01_15_12_30_00_PST.wav
Saved: new/rpi-orcasound-lab_2025_01_15_12_30_02_PST.wav
Saved: new/rpi-orcasound-lab_2025_01_15_12_30_04_PST.wav
...
Loading podsai model from /path/to/local-model...
Segment predictions:
rpi-orcasound-lab_2025_01_15_12_30_00_PST.wav: water
rpi-orcasound-lab_2025_01_15_12_30_02_PST.wav: resident
rpi-orcasound-lab_2025_01_15_12_30_04_PST.wav: resident
...
Concatenate all WAV files in a directory into a single WAV file with a short beep between clips.
cd src
python concatenate_wavs.py <directory> [--output OUTPUT_FILENAME]Example:
cd src
python concatenate_wavs.py ../output/wav/resident --output concatenated.wavRun model inference on a wav file and display the global prediction, confidence score,
and per-class probabilities. For PODS-AI models the per-class probability is the
mean of all local_confidence values (from windows predicting that class) that exceed
the model's threshold — the same statistic used for global_confidence. For the FastAI
binary model, resident = global_confidence and other = 1 - global_confidence.
usage: python run_inference.py [wav_file]
[--node-name NODE_NAME]
[--end-timestamp-str YYYY_MM_DD_HH_MM_SS_PST | --start-timestamp-utc YYYY-MM-DDTHH:MM:SSZ]
[--model {podsai,fastai,orcahello}] [--type {ast,wav2vec2}] [--model-path PATH]
| Argument | Description |
|---|---|
wav_file |
Path to the wav file to score |
--node-name |
Hydrophone feed node name (for download mode) |
--end-timestamp-str |
PST end timestamp used with --node-name (format: YYYY_MM_DD_HH_MM_SS_PST) |
--start-timestamp-utc |
UTC start timestamp used with --node-name (format: YYYY-MM-DDTHH:MM:SSZ) |
--model |
Model type: podsai (default), fastai, or orcahello |
--type |
PODS-AI model variant used with --model podsai: ast (default) or wav2vec2 (older model variant). These map to the currently pinned revisions in src/run_inference.py |
--model-path |
Path to model directory or HuggingFace Hub model ID. Required for podsai; defaults to ./model for fastai; defaults to orcasound/orcahello-srkw-detector-v1 for orcahello; defaults to davethaler/whale-call-detector for podsai |
When using --node-name, provide exactly one timestamp argument:
--end-timestamp-str or --start-timestamp-utc.
Example — PODS-AI model
cd src
python run_inference.py sample.wav --model podsaiOutput:
Model type: podsai
Global prediction: resident (confidence: 0.7000)
Prediction time: 1.23s
Per-class probabilities:
humpback: 0.0000
human: 0.0000
jingle: 0.0000
resident: 0.7000
transient: 0.0000
vessel: 0.0000
water: 0.0000
Example — FastAI model
cd src
python run_inference.py sample.wav --model fastai --model-path ../modelOutput:
Model type: fastai
Global prediction: resident (confidence: 0.7500)
Prediction time: 0.85s
Per-class probabilities:
other: 0.2500
resident: 0.7500
Example — OrcaHello SRKW Detector
Uses the orcasound/orcahello-srkw-detector-v1
model from HuggingFace Hub. This is a binary SRKW (Southern Resident Killer Whale) detector
based on the new OrcaHello inference pipeline (ResNet50 + mel spectrograms, no fastai_audio dependency).
The model implementation is loaded from the orcasound/orcahello submodule. Initialize it first:
git submodule update --init external/orcahelloThen run inference:
cd src
python run_inference.py sample.wav --model orcahelloOutput:
Model type: orcahello
Global prediction: resident (confidence: 0.8000)
Prediction time: 0.92s
Per-class probabilities:
other: 0.2000
resident: 0.8000
You can compare results between models by running each on the same file and comparing the output.
Evaluate and compare fastai, orcahello, podsai (AST), and oldpodsai (Wav2Vec2) models on the same test set of
60-second audio samples. Loads the test set directly from output/csv/testing_60s_samples.csv, then runs each enabled model on the
corresponding WAV files under output/testing-wav/
(downloaded by download_wavs.py), and reports a summary table
with correct identifications, whale-class F1, per-whale-class false positive/false negative rates,
and average prediction time.
Evaluation uses model-specific correctness plus per-whale-class error counts:
- Correct – for
fastaiandorcahello, model predicted "resident" (SRKW) when the label isresident, or anything other thanresidentwhen the label is notresident; foroldpodsaiandpodsai, the predicted category exactly matches the label. - F1 – macro F1 over the whale classes
humpback,resident, andtransientthat are present in the evaluated samples. - R/T/H false positive – model predicted
resident,transient, orhumpbackwhen the correct label was a different class. - R/T/H false negative – the correct label was
resident,transient, orhumpback, but the model predicted a different class. Becausefastaiandorcahelloare binary resident-vs-other models, their transient/humpback FP% values stay at0.0%and their transient/humpback FN% values are100.0%whenever those classes are present. compare_models.pyevaluates end-to-end 60-second WAV inference fromoutput/testing-wav, so its results will differ from the training workflow's held-out evaluation metrics, which score the model directly on the trainer's test split.
usage: python compare_models.py [--testing-csv PATH] [--max-samples N]
[--wav-dir PATH] [--models MODEL_LIST]
[--fastai-model-path PATH]
[--orcahello-model-path PATH]
[--podsai-model-path PATH]
[--category CATEGORY]
| Argument | Description |
|---|---|
--testing-csv |
Path to testing_60s_samples.csv (default: output/csv/testing_60s_samples.csv) |
--max-samples |
Maximum number of test samples to process. If not specified, all samples are processed |
--wav-dir |
Root directory of testing WAV files (default: output/testing-wav) |
--models |
Comma-separated list of models to evaluate (default: fastai,orcahello,podsai,oldpodsai) |
--fastai-model-path |
Path to FastAI model directory. Defaults to model when not specified |
--orcahello-model-path |
HuggingFace Hub ID or path for OrcaHello model. Defaults to orcasound/orcahello-srkw-detector-v1 when not specified |
--podsai-model-path |
Path or Hub ID for PODS-AI model. Used by both podsai (AST) and oldpodsai (Wav2Vec2). Defaults to davethaler/whale-call-detector when not specified |
--category |
Only evaluate samples from this category (e.g. resident, humpback, water). If not specified, all categories are evaluated |
Example — compare all four models
python src/compare_models.py \
--models fastai,orcahello,podsai,oldpodsai \
--fastai-model-path model \
--podsai-model-path /path/to/podsai-modelExample output layout (actual metric values vary with the evaluated dataset):
Loaded 134 test samples from output\csv\testing_60s_samples.csv
WAV directory: output/testing-wav
Models to evaluate: fastai, orcahello, podsai, oldpodsai
...
================================================================================================================
Model Comparison Summary
================================================================================================================
Model Evaluated Correct Accuracy F1 RFP% RFN% TFP% TFN% HFP% HFN% Avg Time
----------------------------------------------------------------------------------------------------------------
fastai 134 52 38.8% 0.120 64.6% 55.8% 0.0% 100.0% 0.0% 100.0% 11.84s
orcahello 134 34 25.4% 0.125 95.1% 42.3% 0.0% 100.0% 0.0% 100.0% 4.59s
oldpodsai 134 72 53.7% 0.477 19.5% 46.2% 16.3% 63.3% 15.5% 38.9% 4.47s
podsai 134 66 49.3% 0.414 29.3% 38.5% 1.0% 70.0% 0.0% 88.9% 6.49s
================================================================================================================
Definitions:
Accuracy = Correct / Evaluated
Correct = fastai/orcahello: resident vs other; oldpodsai/podsai: exact category match
F1 = macro F1 over humpback, resident, and transient classes that are present
[R|T|H]FP% = among non-[R|T|H] samples, fraction predicted as that class
[R|T|H]FN% = among actual samples of that class, fraction predicted as another class
Avg Time = average time spent in model predict() per 60-second WAV file
Note = compares end-to-end 60-second inference on testing_60s_samples.csv
Confusion Matrix for fastai (rows=actual, cols=predicted):
other resident total
human 6 4 10
humpback 10 8 18
jingle 7 0 7
resident 29 23 52
transient 4 26 30
vessel 2 5 7
water 0 10 10
Confusion Matrix for orcahello (rows=actual, cols=predicted):
other resident total
human 0 10 10
humpback 4 14 18
jingle 0 7 7
resident 22 30 52
transient 0 30 30
vessel 0 7 7
water 0 10 10
Confusion Matrix for oldpodsai (rows=actual, cols=predicted):
human humpback resident transient vessel water total
human 7 1 1 1 0 0 10
humpback 1 11 4 2 0 0 18
jingle 0 7 0 0 0 0 7
resident 5 1 28 14 1 3 52
transient 1 9 9 11 0 0 30
vessel 0 0 2 0 5 0 7
water 0 0 0 0 0 10 10
Confusion Matrix for podsai (rows=actual, cols=predicted):
human humpback jingle resident transient vessel water total
human 8 0 0 0 1 1 0 10
humpback 0 2 0 5 0 10 1 18
jingle 0 0 6 0 0 1 0 7
resident 0 0 0 32 0 17 3 52
transient 0 0 0 19 9 2 0 30
vessel 0 0 0 0 0 7 0 7
water 0 0 0 0 0 8 2 10
Note: the potential of the podsai model is greater than shown above. The same version used in the podsai matrix above showed the above when trained:
============================================================
DETAILED EVALUATION METRICS
============================================================
Dataset: trainer test split from output/wav (80/20 split of training samples).
Class Distribution:
water - True: 8, Predicted: 7
resident - True: 23, Predicted: 23
transient - True: 12, Predicted: 12
humpback - True: 12, Predicted: 12
vessel - True: 11, Predicted: 13
jingle - True: 6, Predicted: 6
human - True: 9, Predicted: 8
Per-Class Performance:
Class Precision Recall F1
------------------------------------------------
water 0.857 0.750 0.800
resident 0.957 0.957 0.957
transient 0.917 0.917 0.917
humpback 0.917 0.917 0.917
vessel 0.692 0.818 0.750
jingle 0.833 0.833 0.833
human 1.000 0.889 0.941
Confusion Matrix (rows=true, cols=predicted):
water resident transien humpback vessel jingle human
water 6 0 0 0 2 0 0
resident 0 22 0 0 1 0 0
transient 0 1 11 0 0 0 0
humpback 0 0 1 11 0 0 0
vessel 0 0 0 1 9 1 0
jingle 1 0 0 0 0 5 0
human 0 0 0 0 1 0 8
============================================================
Example - compare only fastai and orcahello
python src/compare_models.py --models fastai,orcahello --fastai-model-path modelExample - limit to 10 test samples
python src/compare_models.py --max-samples 10 --fastai-model-path modelExample - evaluate only resident samples
python src/compare_models.py --category resident --fastai-model-path modelLiveInferenceSystem/ packages src/LiveInferenceOrchestrator.py as a Docker container for
production deployment to Azure Kubernetes Service (AKS), following the same pattern used by
OrcaHello's InferenceSystem.
The two containers can run side-by-side in the same Kubernetes cluster without conflicts.
Build the image from the repo root (requires the external/orcahello submodule):
git submodule update --init external/orcahello
docker build -f LiveInferenceSystem/Dockerfile -t pods-ai-live-inference-system .macOS M-series: prefix with
docker buildx build --platform linux/amd64
Run locally by mounting an orchestrator config at /config/config.yml:
# Linux/Mac
docker run --rm -it --env-file .env \
-v $PWD/LiveInferenceSystem/tests/orch_configs/LiveHLS/LiveHLS_OrcasoundLab.yml:/config/config.yml \
pods-ai-live-inference-system \
--max_live_iterations 2
# Windows
docker run --rm -it --env-file .env ^
-v %cd%/LiveInferenceSystem/tests/orch_configs/LiveHLS/LiveHLS_OrcasoundLab.yml:/config/config.yml ^
pods-ai-live-inference-system ^
--max_live_iterations 2The .env file should contain Azure credentials (see LiveInferenceOrchestrator.py for required
environment variables).
In production each hydrophone location runs as a separate deployment in its own Kubernetes
namespace. The LiveInferenceSystem/deploy/ directory contains the Kubernetes manifests:
<location>.yaml— deployment spec<location>-configmap.yaml— hydrophone-specific orchestrator configuration
To release a new container image, push a tag of the form LiveInferenceSystem.v#.#.#.
This triggers the LiveInferenceSystem-deploy workflow, which builds the image and pushes it to
orcaconservancycr.azurecr.io/pods-ai-live-inference-system.
To deploy to a hydrophone location:
NAMESPACE=orcasound-lab # or andrews-bay, bush-point, etc.
kubectl apply -f LiveInferenceSystem/deploy/$NAMESPACE-configmap.yaml
# Scale to 0 first — required by the Recreate strategy on memory-constrained nodes
# so that the old pod is fully terminated before the new pod starts.
kubectl scale deployment pods-ai-inference-system -n $NAMESPACE --replicas=0
kubectl apply -f LiveInferenceSystem/deploy/$NAMESPACE.yamlTo add a new hydrophone location, create deploy/<namespace>-configmap.yaml and
deploy/<namespace>.yaml using an existing pair as a template, then create the namespace and
secret:
kubectl create namespace <namespace>
kubectl create secret generic pods-ai-inference-system -n <namespace> \
--from-literal=AZURE_COSMOSDB_PRIMARY_KEY='<key>' \
--from-literal=AZURE_STORAGE_CONNECTION_STRING='<string>' \
--from-literal=INFERENCESYSTEM_APPINSIGHTS_CONNECTION_STRING='<string>'The timestamp correction implementation follows the architecture described in the aifororcas-livesystem:
- Uses
DateRangeHLSStreamapproach to download audio from specific time ranges - Downloads from Orcasound S3 buckets:
s3-us-west-2.amazonaws.com/audio-orcasound-net/ - Processes HLS streams with m3u8 playlists
- Uses FFmpeg for audio format conversion
- Returns
local_confidencesarray with scores for each segment
Similar to aifororcas-livesystem config files:
model_type: "FastAI"
model_local_threshold: 0.5
model_global_threshold: 3
model_path: "./model"
model_name: "model.pkl"The following repository secrets must be configured using information obtained from HuggingFace:
- HF_TOKEN — Get this from https://huggingface.co/settings/tokens after logging in as the account used to publish the model (e.g., "davethaler"). This is used by train_model.yml.
or from portal.azure.com:
- COSMOS_KEY — "aifororcasmetadatastore" CosmosDB account → "Keys" → "Read-only Keys" → primary key. This is used by bootstrap make_csv.py and train_model.yml.
- AZURE_COSMOSDB_PRIMARY_KEY — "aifororcasmetadatastore" CosmosDB account → "Keys" → "Read-write Keys" → primary key. This is used by LiveInferenceOrchestrator.py.
- AZURE_STORAGE_CONNECTION_STRING — "livemlaudiospecstorage" storage account. See the "Connection String" section in these instructions. This is used by LiveInferenceOrchestrator.py.
- INFERENCESYSTEM_APPINSIGHTS_CONNECTION_STRING — "InferenceSystemInsights" Application Insights → "Overview" → connection string. This is used by LiveInferenceOrchestrator.py.
- ACR_USERNAME — "orcaconservancycr" Container registry → "Access keys" → "Username". This is used by LiveInferenceSystem-deploy.yaml.
- ACR_PASSWORD — "orcaconservancycr" Container registry → "Access keys" → "password". This is used by LiveInferenceSystem-deploy.yaml.
- ACR_REGISTRY — "orcaconservancycr" Container registry → "Access keys" → "Registry name". This is used by LiveInferenceSystem-deploy.yaml.
- KUBE_CONFIG — This is used by LiveInferenceSystem-deploy-configmaps.yaml. To obtain the KUBE_CONFIG value, run the following:
az aks get-credentials --resource-group LiveSRKWNotificationSystem --name inference-system-AKS --admin --file kubeconfig
This produces a file named kubeconfig, the contents of which can be used as the KUBE_CONFIG value.