-
Computer of Science and Technology Beijing
-
RAE Public
Forked from bytetriper/RAEOfficial PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Python MIT License UpdatedOct 14, 2025 -
-
usm-tokenizer Public
Forked from Mddct/usm-tokenizersemantic tokenizer for speech and music
Python UpdatedJun 27, 2025 -
sequence-vector-quantize Public
Forked from Mddct/sequence-vector-quantizedh vq-q or vae exp
Python UpdatedMay 19, 2025 -
wenet Public
Forked from wenet-e2e/wenetProduction First and Production Ready End-to-End Speech Recognition Toolkit
-
audio-pipeline Public
Forked from Mddct/audio-pipelinePython Apache License 2.0 UpdatedOct 17, 2024 -
blsp Public
Forked from cwang621/blspBLSP: Bootstrapping Langauge-Speech Pre-training via Behavior Alignment of Continuation Writing
Python Apache License 2.0 UpdatedOct 15, 2024 -
SpeechTokenizer Public
Forked from ZhangXInFD/SpeechTokenizerThis is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Python Apache License 2.0 UpdatedAug 14, 2024 -
RepCodec Public
Forked from mct10/RepCodecModels and code for RepCodec: A Speech Representation Codec for Speech Tokenization
Python Other UpdatedJul 12, 2024 -
NeMo-text-processing Public
Forked from NVIDIA/NeMo-text-processingNeMo text processing for ASR and TTS
Python Apache License 2.0 UpdatedFeb 29, 2024 -
latent-diffusion Public
Forked from CompVis/latent-diffusionHigh-Resolution Image Synthesis with Latent Diffusion Models
Jupyter Notebook MIT License UpdatedFeb 29, 2024 -
MS-SNSD Public
Forked from microsoft/MS-SNSDThe Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) l…
Python MIT License UpdatedJan 9, 2024 -
-
-
MaTe3D Public
Forked from HumanAIGC/MaTe3DMaTe3D: Mask-guided Text-based 3D-aware Portrait Editing
Apache License 2.0 UpdatedDec 13, 2023 -
Zth9730.github.io Public
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
JavaScript MIT License UpdatedDec 12, 2023 -
awesome-source-free-test-time-adaptation Public
Forked from YuejiangLIU/awesome-source-free-test-time-adaptationA curated list of papers in Test-time Adaptation, Test-time Training and Source-free Domain Adaptation
UpdatedOct 2, 2023 -
fairseq2 Public
Forked from facebookresearch/fairseq2FAIR Sequence Modeling Toolkit
Python MIT License UpdatedAug 24, 2023 -
PromptingWhisper Public
Forked from jasonppy/PromptingWhisperPromting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
Python UpdatedAug 15, 2023 -
-
MyArxiv Public template
Forked from MLNLP-World/MyArxivCSS GNU General Public License v2.0 UpdatedJul 31, 2023 -
RetNet Public
Forked from Jamie-Stirling/RetNetAn implementation of "Retentive Network: A Successor to Transformer for Large Language Models"
Python MIT License UpdatedJul 21, 2023 -
Whisper-Finetune Public
Forked from yeyupiaoling/Whisper-Finetune微调Whisper语音识别模型和加速推理,支持Web部署和Android部署
C Apache License 2.0 UpdatedJul 18, 2023 -
unilm Public
Forked from microsoft/unilmLarge-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Python MIT License UpdatedJul 18, 2023 -
Macaw-LLM Public
Forked from lyuchenyang/Macaw-LLMMacaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Python UpdatedMay 31, 2023 -
asteroid Public
Forked from asteroid-team/asteroidThe PyTorch-based audio source separation toolkit for researchers
Python MIT License UpdatedMay 26, 2023 -
Pengi Public
Forked from microsoft/PengiAn Audio Language model for Audio Tasks
MIT License UpdatedMay 22, 2023 -
-
PaddleSpeech Public
Forked from PaddlePaddle/PaddleSpeechEasy-to-use Speech Toolkit including SOTA/Streaming ASR with punctuation, influential TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NA…
Python Apache License 2.0 UpdatedApr 26, 2023 -
bark Public
Forked from suno-ai/bark🔊 Text-prompted Generative Audio Model
Python Other UpdatedApr 15, 2023