Stars
NVIDIA Isaac GR00T N1.6 - A Foundation Model for Generalist Robots.
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
Ongoing research training transformer models at scale
✨✨Latest Advances on Multimodal Large Language Models
Fast and memory-efficient exact attention
Muzic: Music Understanding and Generation with Artificial Intelligence
A straightforward collection of Music Generation research resources.
A fully working pytorch implementation of NaturalSpeech (Tan et al., 2022)
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
Official repo for consistency models.
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
Making large AI models cheaper, faster and more accessible
翻墙、免费翻墙、免费科学上网、免费节点、免费梯子、免费ss/v2ray/trojan节点、蓝灯、谷歌商店、翻墙梯子
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
Code release for NeRF (Neural Radiance Fields)
A curated list of awesome neural radiance fields papers
Multi-Agent Resource Optimization (MARO) platform is an instance of Reinforcement Learning as a Service (RaaS) for real-world resource optimization problems.
A repository for generating stylized talking 3D and 3D face
A python package to analyze and compare voices with deep learning
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.