Highlights
- Pro
Lists (27)
Sort Name ascending (A-Z)
AcousticFrontend
AcousticModel
agent
ASR
ASR-pretrain
ASV
AudioQuality
AwesomeList
Paper list, awesome list and so on.BandwidthExtension
Classification
Codec
Data
Develop
Evaluation
FrontEnd
FrontEnd for Text-to-SpeechHow-to
LLM
Music
Performance
Quant
SingingVoiceSynthesis
SpeechEditing
SpeechSeperation
Tools
Universal Method
Vocoder
VoiceConversion
Starred repositories
A curated list of awesome C++ (or C) frameworks, libraries, resources, and shiny things. Inspired by awesome-... stuff.
A PyTorch native platform for training generative AI models
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
The agent that grows with you
Label Studio is a multi-type data labeling and annotation tool with standardized output format
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Apache Spark - A unified analytics engine for large-scale data processing
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Videodl: A lightweight video downloader written in pure python. (轻量级视频下载器,优先高清无水印,支持抖音,快手,小红书,B站,TikTok,YouTube,FIFA+,优酷,腾讯,爱奇艺,1905电影网,乐视,芒果,咪咕,PPTV,搜狐,Facebook,Twitter,新浪微博,今日头条,网易公开课,全民K歌,CCTV央视…
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning …
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
👩🏿💻👨🏾💻👩🏼💻👨🏽💻👩🏻💻中国独立开发者项目列表 -- 分享大家都在做什么
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
TikTok 发布/喜欢/合辑/直播/视频/图集/音乐;抖音发布/喜欢/收藏/收藏夹/视频/图集/实况/直播/音乐/合集/评论/账号/搜索/热榜数据采集工具/下载工具
🧑🚀 全世界最好的LLM资料总结(多模态生成、Agent、辅助编程、AI审稿、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Google Research
Data manipulation and transformation for audio signal processing, powered by PyTorch
Interactive Data Visualization in the browser, from Python
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-R1, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …
ChatGPT 中文调教指南。各种场景使用指南。学习怎么让它听你的话。
🚀 「大模型」2小时从0训练65M参数的视觉多模态VLM!🌏 Train a 65M-parameter VLM from scratch in just 2 hours!