-
MIT
- Cambridge, MA
-
15:56
(UTC -05:00) - yuangongnd.github.io
-
-
llm_speech_emotion_challenge Public
-
ltu Public
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
-
awesome-whisper Public
Forked from sindresorhus/awesome-whisper🔊 Awesome list for Whisper — an open-source AI-powered speech recognition system developed by OpenAI
-
cav-mae Public
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
-
whisper-at Public
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
-
-
psla Public
Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".
-
Awesome-Multimodal-Large-Language-Models Public
Forked from BradyFU/Awesome-Multimodal-Large-Language-ModelsLatest Papers and Datasets on Multimodal Large Language Models
4 UpdatedJun 11, 2023 -
ast Public
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
-
uavm Public
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
-
gopt Public
Code for the ICASSP 2022 paper "Transformer-Based Multi-Aspect Multi-Granularity Non-native English Speaker Pronunciation Assessment".
-
vocalsound Public
Dataset and baseline code for the VocalSound dataset (ICASSP2022).
-
ssast Public
Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
-
multichannel-antispoof Public
Code for SPL paper "Detecting Replay Attacks Using Multi-Channel Audio: A Neural Network-Based Method"
-
kaldi-io-for-python Public
Forked from KarelVesely84/kaldi-io-for-pythonPython functions for reading kaldi data formats. Useful for rapid prototyping with python.
Python Apache License 2.0 UpdatedMay 17, 2022 -
ESC-50 Public
Forked from karolpiczak/ESC-50ESC-50: Dataset for Environmental Sound Classification
-
espnet Public
Forked from espnet/espnetEnd-to-End Speech Processing Toolkit
Python Apache License 2.0 UpdatedApr 15, 2021 -
tutorials Public
Forked from pytorch/tutorialsPyTorch tutorials.
Jupyter Notebook BSD 3-Clause "New" or "Revised" License UpdatedOct 30, 2020 -
audioset_tagging_cnn Public
Forked from qiuqiangkong/audioset_tagging_cnnPython MIT License UpdatedSep 21, 2020 -
-
kaldi Public
Forked from kaldi-asr/kaldiThis is the official location of the Kaldi project.
Shell Other UpdatedAug 21, 2020 -
skynet-ddp-slurm-example Public
Forked from erikwijmans/skynet-ddp-slurm-exampleExample of using PyTorch DistributedDataParallel and SLURM on skynet
Python UpdatedAug 1, 2020 -
realtime-adversarial-attack Public
Code for IJCAI 2019 paper "Real-time Adversarial Attack".
-
ReMASC Public
ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems
-
docs Public
Forked from tensorflow/docsTensorFlow documentation
Jupyter Notebook Apache License 2.0 UpdatedJun 30, 2020 -
python-compute-eer Public
Simple Python script to compute equal error rate (EER) for machine learning model evaluation.
-
pyroomacoustics Public
Forked from LCAV/pyroomacousticsPyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
Python MIT License UpdatedFeb 10, 2020 -
Autoregressive-Predictive-Coding Public
Forked from iamyuanchung/Autoregressive-Predictive-CodingAutoregressive Predictive Coding: An unsupervised autoregressive model for speech representation learning
Python UpdatedJan 29, 2020 -
SincNet Public
Forked from mravanelli/SincNetSincNet is a neural architecture for efficiently processing raw audio samples.