Stars
Real-Time VLAs via Future-state-aware Asynchronous Inference.
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator (NeurIPS 2024)
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
Command-line interface and Python library to transcribe pinyin to IPA. The tones are attached to the vowel of the syllable.
zero-shot voice conversion & singing voice conversion, with real-time support
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
video editing with vim/spreadsheet/sed/python. methodology inspired by BBC digital paper edit. "Excel-dit"
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
CPJKU / asap-dataset
Forked from fosfrancesco/asap-datasetA dataset of 222 digital musical scores aligned with 1068 performances (more than 92 hours) of Western classical piano music.
Foundational Models for State-of-the-Art Speech and Text Translation
Official implementation of "Separate Anything You Describe"
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Book_7_《机器学习》 | 鸢尾花书:从加减乘除到机器学习;欢迎批评指正
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
A book about Text-to-Speech (TTS) in Chinese.
Towards hot directions in industrial end to end speech recognition
PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
an open-source implementation of sequence-to-sequence based speech processing engine
刷算法全靠套路,认准 labuladong 就够了!English version supported! Crack LeetCode, not only how, but also why.
中文语音识别; Mandarin Automatic Speech Recognition;