CLAP-IPA

The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language. NAACL 2024.

Usage

See kws_example.ipynb and forced_alignment_example.ipynb for a comprehensive example.

Install

git clone https://github.com/lingjzhu/clap-ipa
cd clap-ipa
pip install .

Inference

For CLAP-IPA

from clap.encoders import *
import torch.nn.functional as F
from transformers import DebertaV2Tokenizer, AutoProcessor

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

speech_encoder = SpeechEncoder.from_pretrained('anyspeech/clap-ipa-tiny-speech')
phone_encoder = PhoneEncoder.from_pretrained('anyspeech/clap-ipa-tiny-phone')
phone_encoder.eval().to(device)
speech_encoder.eval().to(device)

tokenizer = DebertaV2Tokenizer.from_pretrained('charsiu/IPATokenizer')
processor = AutoProcessor.from_pretrained('openai/whisper-tiny')

audio_input = processor(some_audio)
ipa_input = tokenizer(some_ipa_string)

with torch.no_grad():
   speech_embed = speech_encoder(audio_input)
   phone_embed = phone_encoder(ipa_input)

similarity = F.cosine_similarity(speech_embed,phone_embed,dim=-1)

For IPA-Aligner, the example usage is in forced_alignment_example.ipynb. The full forced-alignment evaluation code is in evaluate/eval_boundary.py.

Training

For training, you can download data from HuggingFace hub. Then sample train/val filelists are available in data/.

python train.py -c config/clap_ipa/base.yaml

Evaluation

Evaluation code is available in evaluate. Each evalaute code script has almost the same organization, so you can simply pass the .ckpt checkpoint after training to evaluate their performance. Please check the evalaution code for usage.

python evaluate_fieldwork.py --data ucla --checkpoint "last.ckpt"

Pretrained Models

Weights are released under MIT License.

Model	Phone Encoder	Speech encoder
CLAP-IPA-tiny	`anyspeech/clap-ipa-tiny-phone`	`anyspeech/clap-ipa-tiny-speech`
CLAP-IPA-base	`anyspeech/clap-ipa-base-phone`	`anyspeech/clap-ipa-base-speech`
CLAP-IPA-small	`anyspeech/clap-ipa-small-phone`	`anyspeech/clap-ipa-small-speech`
IPA-Aligner-tiny	`anyspeech/ipa-align-tiny-phone`	`anyspeech/ipa-align-tiny-speech`
IPA-Aligner-base	`anyspeech/ipa-align-base-phone`	`anyspeech/ipa-align-base-speech`
IPA-Aligner-small	`anyspeech/ipa-align-small-phone`	`anyspeech/ipa-align-base-speech`

IPA Pack

All datasets are distributed as wds files on huggingface hub.

FLEURS-IPA: https://huggingface.co/datasets/anyspeech/fleurs_ipa
MSWC-IPA: https://huggingface.co/datasets/anyspeech/mswc_ipa
DORECO-IPA: https://huggingface.co/datasets/anyspeech/doreco_ipa

After this study, we found that these datasets still contain inconsistent unicode encoding of IPA symbols.
A cleaner version will be released when we finish another round of data cleaning. The clean data (~1.8TB) and the models trained on clean data are available at: https://github.com/lingjzhu/zipa

To download these datasets:

from huggingface_hub import snapshot_download

snapshot_download(repo_id="anyspeech/fleurs_ipa", repo_type="dataset", local_dir="your_own_folder",local_dir_use_symlinks=False,resume_download=False,max_workers=4)

To load webdataset files:

import webdataset as wds  # Note the typical import shorthand
dataset = (
      wds.WebDataset("data-archives/shard-00{00...24}.tar")  # 25 shards
      .decode()  # Automagically decode files
      .shuffle(size=1000)  # Shuffle on-the-fly in a buffer
      .batch(batchsize=10)  # Create batches
)

Citation

@inproceedings{zhu-etal-2024-taste,
    title = "The taste of {IPA}: Towards open-vocabulary keyword spotting and forced alignment in any language",
    author = "Zhu, Jian  and
      Yang, Changbing  and
      Samir, Farhan  and
      Islam, Jahurul",
    editor = "Duh, Kevin  and
      Gomez, Helena  and
      Bethard, Steven",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-long.43/",
    doi = "10.18653/v1/2024.naacl-long.43",
    pages = "750--772"
}

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
clap		clap
configs		configs
data		data
evaluate		evaluate
G_phoneme.json		G_phoneme.json
LICENSE		LICENSE
README.md		README.md
forced_alignment_example.ipynb		forced_alignment_example.ipynb
kws_example.ipynb		kws_example.ipynb
phoneset.txt		phoneset.txt
requirements.txt		requirements.txt
sample.wav		sample.wav
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLAP-IPA

Usage

Install

Inference

Training

Evaluation

Pretrained Models

IPA Pack

To download these datasets:

To load webdataset files:

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CLAP-IPA

Usage

Install

Inference

Training

Evaluation

Pretrained Models

IPA Pack

To download these datasets:

To load webdataset files:

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages