GitHub - saiakarsh193/augnito-speaker-diarization: The codebase, instructions and reports for the Augnito Speaker Diarization Assignment

To install the environment

conda create --prefix ./augnito python=3.9
conda activate ./augnito
./augnito/bin/python3 -m pip install -r requirements.txt

To install and prepare the dataset

git clone https://github.com/babylonhealth/primock57
cd primock57/scripts/
bash mix_audio.sh

make the following changes in textgrid_to_transcript.py (inside primock57/scripts/)

# line 15, from this
return [f"{u['speaker']}: {strip_transcript_tags(u['text'])}"
            for u in combined_utterances]
# to
return [f"{u['from']}::{u['to']}::{u['speaker']}::{strip_transcript_tags(u['text'])}"
            for u in combined_utterances]

python3 textgrid_to_transcript.py --transcript_path ../transcripts --output_path ../output/joined_transcripts
cd ../../ # go to home dir
python3 transcript_to_timestamps.py # to convert the joined_transcripts into timestamps

NOTE: Doctor is SPEAKER_00 and patient is SPEAKER_01 in the timestamps

To prepare pyannote dataset

We need to create a pyannote.dataset using the above dataset for finetuning the model.

python3 make_pyan_database.py # will create necessary files in pyan_db directory

Create a file database.yml in the home dir, and add the following

Databases:
  Primock: primock57/output/mixed_audio/{uri}.wav

Protocols:
  Primock:
    SpeakerDiarization:
      full:
        train:
            uri: pyan_db/train_list.txt
            annotation: pyan_db/rttms/train_{uri}.rttm
            annotated: pyan_db/uems/train_{uri}.uem
        development:
            uri: pyan_db/dev_list.txt
            annotation: pyan_db/rttms/dev_{uri}.rttm
            annotated: pyan_db/uems/dev_{uri}.uem
        test:
            uri: pyan_db/test_list.txt
            annotation: pyan_db/rttms/test_{uri}.rttm
            annotated: pyan_db/uems/test_{uri}.uem

To check if the pyannote.dataset was created properly

python3 check_protocol.py

How to run

pyan_finetune.ipynb has the code to finetune the SpeakerDiarization model using pyannote/speaker-diarization-3.1 and pyannote/segmentation-3.0. It uses the created database.yml pyannote.dataset to do this.

pyan_inf.ipynb will run inference on the dataset using both the pretrained and finetuned models and create the output directories timestamps_pyan_diar_pretrained and timestamps_pyan_diar_pretrained.

nemo_inf.ipynb will run inference on the dataset using pretrainedmarblenet-vad, titanet_large-speaker_embedder and creates the output directory timestamps_nemo_pretrained.

der.ipynb has the code to calculate Diarization Error Rate (DER) for all the models used.

The final report of the assignment can be found as Report.pdf.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

To install the environment

To install and prepare the dataset

To prepare pyannote dataset

How to run

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
lightning_logs/version_0		lightning_logs/version_0
primock57		primock57
pyan_db		pyan_db
timestamps_nemo_pretrained		timestamps_nemo_pretrained
timestamps_pyan_diar_finetuned		timestamps_pyan_diar_finetuned
timestamps_pyan_diar_pretrained		timestamps_pyan_diar_pretrained
.gitignore		.gitignore
README.md		README.md
Report.pdf		Report.pdf
check_protocol.py		check_protocol.py
database.yml		database.yml
der.ipynb		der.ipynb
eda_pri.ipynb		eda_pri.ipynb
make_pyan_database.py		make_pyan_database.py
nemo_inf.ipynb		nemo_inf.ipynb
pyan_finetune.ipynb		pyan_finetune.ipynb
pyan_inf.ipynb		pyan_inf.ipynb
requirements.txt		requirements.txt
transcript_to_timestamps.py		transcript_to_timestamps.py

saiakarsh193/augnito-speaker-diarization

Folders and files

Latest commit

History

Repository files navigation

To install the environment

To install and prepare the dataset

To prepare pyannote dataset

How to run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages