-
sam-audio Public
Forked from facebookresearch/sam-audio基于文本、视觉、时间范围线索的音频分割:The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example noteb…
Python Other UpdatedDec 17, 2025 -
vlash Public
Forked from mit-han-lab/vlash机器人基于视觉的动作执行:Real-Time VLAs via Future-state-aware Asynchronous Inference.
Python Apache License 2.0 UpdatedDec 3, 2025 -
CarelessWhisper-Streaming Public
Forked from tomer9080/CarelessWhisper-StreamingCausal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.
Python Other UpdatedSep 18, 2025 -
Kimi-Audio Public
Forked from MoonshotAI/Kimi-AudioKimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Python UpdatedApr 28, 2025 -
Orpheus-TTS Public
Forked from canopyai/Orpheus-TTSTTS Towards Human-Sounding Speech
Python Apache License 2.0 UpdatedMar 23, 2025 -
silentcipher Public
Forked from SesameAILabs/silentcipherDeep Audio Watermarking : 音频水印
Python MIT License UpdatedMar 17, 2025 -
Spark-TTS Public
Forked from SparkAudio/Spark-TTSSpark-TTS Inference Code
Python Apache License 2.0 UpdatedMar 5, 2025 -
zipEnhancer Public
Forked from boreas-l/zipEnhancer该项目来源于阿里开源的语音降噪模型zipEnhancer
Python UpdatedMar 4, 2025 -
async_cosyvoice Public
Forked from qi-hua/async_cosyvoice使用vllm加速cosyvoice2的推理
Jupyter Notebook Apache License 2.0 UpdatedMar 2, 2025 -
TTS-LLaSA_training Public
Forked from zhenye234/LLaSA_trainingLLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Python Other UpdatedFeb 14, 2025 -
unsloth-LLM-finetuning Public
Forked from unslothai/unslothFinetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory
Python Apache License 2.0 UpdatedFeb 10, 2025 -
Qwen-Agent Public
Forked from QwenLM/Qwen-AgentAgent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
Python Other UpdatedJan 24, 2025 -
MiniCPM-o Public
Forked from OpenBMB/MiniCPM-V多模态语音大模型:MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Python Apache License 2.0 UpdatedJan 17, 2025 -
google-research Public
Forked from google-research/google-researchGoogle Research
Jupyter Notebook Apache License 2.0 UpdatedJan 9, 2025 -
data-Thorsten-Voice Public
Forked from thorstenMueller/Thorsten-Voicespeech data: Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.
Python Creative Commons Zero v1.0 Universal UpdatedJan 8, 2025 -
openai-cookbook Public
Forked from openai/openai-cookbookExamples and guides for using the OpenAI API
MDX MIT License UpdatedJan 8, 2025 -
vector-quantize-pytorch Public
Forked from lucidrains/vector-quantize-pytorchVector (and Scalar) Quantization, in Pytorch
Python MIT License UpdatedJan 7, 2025 -
pycorrector Public
Forked from shibing624/pycorrectorpycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。
Python Apache License 2.0 UpdatedDec 26, 2024 -
versa Public
Forked from wavlab-speech/versaVersatile Evaluation of Speech and Audio
Python Apache License 2.0 UpdatedDec 25, 2024 -
WavChat Public
Forked from jishengpeng/WavChatA Survey of Spoken Dialogue Models (60 pages)
UpdatedNov 28, 2024 -
snac Public
Forked from hubertsiuzdak/snacaudio codec: Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Python MIT License UpdatedNov 19, 2024 -
streaming-ChatTTS Public
Forked from pengzhendong/streaming-ChatTTSJupyter Notebook Apache License 2.0 UpdatedOct 30, 2024 -
GLM-4-Voice Public
Forked from zai-org/GLM-4-VoiceGLM-4-Voice | 端到端中英语音对话模型, TTS 效果不错
Python Apache License 2.0 UpdatedOct 30, 2024 -
spiritlm Public
Forked from facebookresearch/spiritlm保留情感的音频LLM:Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
Python Other UpdatedOct 28, 2024 -
SNAC-Vocos Public
Forked from hertz-pj/SNAC-VocosA trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.
Python UpdatedOct 28, 2024 -
midi-fluidsynth Public
Forked from FluidSynth/fluidsynthmidi 播放: Software synthesizer based on the SoundFont 2 specifications
C GNU Lesser General Public License v2.1 UpdatedOct 20, 2024 -
amt-apc Public
Forked from misya11p/amt-apc音乐: 自动钢琴翻唱: AMT-APC: AMT-APC: Automatic Piano Cover by Fine-Tuning an Automatic Music Transcription Model
Python MIT License UpdatedOct 19, 2024 -
qa-mdt Public
Forked from ivcylc/OpenMusic文本到音乐生成: 241010-SOTA Text-to-music (TTM) Generation (OpenMusic)
Python MIT License UpdatedOct 9, 2024 -
ml-depth-pro Public
Forked from apple/ml-depth-pro苹果-深度图-估计-Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
-
tiktoken-openai Public
Forked from openai/tiktokentiktoken is a fast BPE tokeniser for use with OpenAI's models.
Python MIT License UpdatedOct 3, 2024